In the last tutorials, we already worked a lot with Strings and even manipulated some of them. Now, it is about time to have a look at the theory behind it. Basically, formatting strings is very easy. The only thing you need is the “format” method appended to a string with a variable amount of data. If you add numbers, the str() function is executed on them by itself, so no need to convert them. This tutorial is about String manipulations in Python.

String manipulations in Python

Basically, the annotation is very similar to the one from other string formatters you are used to. One really nice thing though is that you don’t need to provide the positional arguments. Python assumes that the positions are in-line with the parameters you provide. An easy sample is this:

str01 = "This is my string {} and the value is {}".format("Test", 11)

And the output should look like this:

This is my string Test and the value is 11

You can also use classes for this. Therefore, we define a class “Person”:

class Person:
    def __init__(self, name, age): = name
        self.age = age
p = Person("Mario Meir-Huber", 35)
str02 = "The author \"{}\" is {} years old".format(, p.age)

The output for this should look like this:

Mario Meir-Huber
The author "Mario Meir-Huber" is 35 years old

The difflib in Python

One nice thing in Python is the difflib. This library enables us to easily check two array of strings for differences. One use-case would be to check my lastname for differences. Note that my lastname is one of the most frequent lastname combinations in the german speaking countries and thus allows different ways to write it.

To work with difflib, simply import it and call the difflib context_diff function. This prints the differences detected with “!”.

import difflib
arr01 = ["Mario", "Meir", "Huber"]
arr02 = ["Mario", "Meier", "Huber"]
for line in difflib.context_diff(arr01, arr02):

Below you can see the output. One difference was spotted. You can easily use this for spotting differences in datasets and creating golden records from it.

*** 1,3 ****
! Meir
--- 1,3 ----
! Meier

Textwrap in Python

Another nice feature in Python is the usage of textwrap. This library has some basic features for text “prettyfying”. Basically, in the following sample, we use 5 different things:

  • Indent: creates an indent to a text, e.g. a tab before the text
  • Wrap: wraps the text into an array of strings in case it is longer than the maximum width. This is useful to split text into a maximum number of arrays
  • Fill: does the same as Wrap, but creates new lines out of it
  • Shorten: shortens the text with a specified maximum number. This is written like “[…]” and you might use it to add a “read more” around it
  • Detent: deletes any whitespace before or after the text

The functions are used in simple statements:

from textwrap import *
print(indent("Mario Meir-Huber", "\t"))
print(wrap("Mario Meir-Huber", width=10))
print(fill("Mario Meir-Huber", width=10))
print(shorten("Mario Meir-Huber Another", width=15))
print(dedent(" Mario Meir-Huber "))

And the output should look like this:

	Mario Meir-Huber
['Mario', 'Meir-Huber']
Mario [...]
Mario Meir-Huber 

Today’s tutorial was more of a “housekeeping” since we used it already. In the next tutorial, I will write about object serialisation with JSON, as this is also very useful.

If you are not yet familiar with Spark, have a look at the Spark Tutorial i created here. Also, I will create more tutorials on Python and Machine Learning in the future, so make sure to check back often to the Big Data & Data Science tutorial overview. I hope you liked this tutorial. If you have any suggestions and what to improve, please feel free to get in touch with me! If you want to learn more about Python, I also recommend you the official page.

1 reply

Trackbacks & Pingbacks

  1. […] we want to serialise our own object (e.g. the “Person” class that we have created in this tutorial), we need to implement the serialise-function or a custom serialiser. However, Python is great and […]

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply