Posts

In our previous tutorial, we had a look at how to (de) serialise objects from and to JSON in Python. Now, let’s have a look into how to dynamically create and extend classes in Python. Basically, we are using the library that Python itself is using. This is the dynamic type function in Python. This function takes several parameters, we will only focus on three relevant one’s for our sample.

How to use the dynamic type function in Python

Basically, this function takes several parameters. We utilize 3 parameters. These are:

type(CLASS_NAME, INHERITS, PARAMETERS)

These parameters have the following meaning:

  • CLASS_NAME: The name of the new class
  • INHERITS: from which the new type should inherit
  • PARAMETERS: new methods or parameters added to the class

In our following example, we want to extend the Person class with a new attribute called “location”. We call our new class “PersonNew” and instruct Python to inherit from “Person”, which we have created some tutorials earlier. Strange is that it is passed as an array, even there can only be one inheritance hierarchy in Python. Last, we specify the method “location” as key-value pair. Our sample looks like the following:

pn = type("PersonNew", (Person,), {"location": "Vienna"})
pn.age = 35
pn.name = "Mario"

If you test the code, it will just work like expected. All other objects such as age and name can also be retrieved. Now, let’s make it a bit more complex. We extend our previous sample with the JSON serialisation to be capable of dynamically creating a JSON object from a string.

Dynamically creating a class in Python from JSON

We therefore create a new function that takes the object to serialise and takes all values out of that. In addition, we add one more key-value pair, which we call “__class__” in order to store the name of the class. getting the class-name is a bit more complex, since it is written like “class ‘main.PersonNew'”. Therefore, we first split the object name with a “.”, take the last entry and again split it by the ‘ and take the first one. There are more elegant ways for this, but I want to keep it simple. Once we have the classname, we store it in the dictionary and return the dictionary. The complex sample is here:

def map_proxy(obj):
    dict = {}
    
    for k in obj.__dict__.keys():
        dict.update({k : obj.__dict__.get(k)})
        
    cls_name = str(obj).split(".")[1].split("'")[0]
    dict.update({"__class__" : cls_name})
        
    return dict

We can now use the json.dumps method and call the map_proxy function to return the JSON string:

st_pn = json.dumps(map_proxy(pn))
print(st_pn)

Now, we are ready to dynamically create a new class with the “type” method. We name the method after the class name that was provided above. This can be retrieved with “__class__”. We let it inherit from Person and pass the parameters from the entire object into it, since it is already a key/value pair:

def dyn_create(obj):
    
    return type(obj["__class__"], (Person, ), obj)

We can now also invoke the json.loads method to dynamically create the class:

obj = json.loads(st_pn, object_hook=dyn_create)
print(obj)
print(obj.location)

And the output should be like that:

{"location": "Vienna", "__module__": "__main__", "__doc__": null, "age": 35, "name": "Mario", "__class__": "PersonNew"}
<class '__main__.PersonNew'>
Vienna

As you can see, it is very easy to dynamically create new classes in Python. We could largely improve this code, but i’ve created this tutorial for explanatory reasons rather than usability ;).

In our next tutorial, we will have a look at logging.

Here you can go to the overview of the Python tutorial. If you want to dig deeper into the language, have a look at the official Python documentation.

One important aspect of working with Data is serialisation. Basically, this means that classes can be persisted to a storage (e.g. the file system, HDFS or S3). With Spark, a lot of file formats are possible. However, in this tutorial we will have a look on how to deal with JSON, a very popular file format and often used in Spark.

JSON stands for “Java Script Object Notation” and was usually developed for Client-Server applications with JavaScript as main user of it. It was built to have less overhead than XML.

First, let’s start with copying objects. Basically, Python knows two ways: normal copies and deep copies. The difference is that with normal copies, references to objects within the copied object are built. This is relevant when using objects as classes. In a deep copy, no references are built but every value is copied to the new object. This means that you can now use it independent from the previous one.

To copy objects to another, you only need to import copy and call the copy or deepcopy function. The following code shows how this works.

import copy
ps1 = Person("Mario", 35)
pss = copy.copy(ps1)
psd = copy.deepcopy(ps1)
ps1.name = "Meir-Huber"
print(ps1.name)
print(pss.name)
print(psd.name)

And the output should be this:

Meir-Huber
Mario
Mario

Now, let’s look at how we can serialise an object with the use of JSON. Basically, you need to import “json”. An object that you want to serialise needs to be serialise-able. A lot of classes in Python already implement that. However, when we want to serialise our own object (e.g. the “Person” class that we have created in this tutorial), we need to implement the serialise-function or a custom serialiser. However, Python is great and provides us the possibility to access all variables in an object via the “__dict__” dictionary. This means that we don’t have to write our own serialiser and can do this via an easy call to “dumps” of “json”:

import json
js = json.dumps(ps1.__dict__)
print(js)

The above function creates a JSON representation of the entire class

{"name": "Meir-Huber", "age": 35}

We might want to add more information to the JSON string – e.g. the class name that it was originally stored in. We can do this by calling a custom function in the “dumps” method. This method gets the object to be serialised as only parameter. We then only pass the original object (Person) and the function we want to execute. We name this function “make_nice”. In the function, we create a dictionary and add the name of the class as first parameter. We give this the key “obj_name”. We then join the dictionary of the object into the new dictionary and return it.

Another parameter added to the “dumps” function is “indent”. The only thing it does is printing it pretty – by adding line breaks and indents. This is just for improved readability. The method and call looks like this:

def make_nice(obj):
    dict = {
        "obj_name": obj.__class__.__name__
    }
    
    dict.update(obj.__dict__)
    
    return dict
js_pretty = json.dumps(ps1, default=make_nice,indent=3)
print(js_pretty)

And the result should now look like the following:

{
   "obj_name": "Person",
   "name": "Meir-Huber",
   "age": 35
}

Now, we know how we can serialise an object to a JSON string. Basically, you can now store this string to a file or an object on S3. The only thing that we haven’t discussed yet is how to get back an object from a string. We therefore take the JSON object we “dumps” before. Our goal now is to create a Person object from it. This can be done via the call “loads” from the json-object. We also define a method to do the casting via the “object_hook” parameter. This object_hook method has one argument – the JSON object itself. We access each of the parameters from the object with named indexers and return the new object.

str_json = "{\"name\": \"Meir-Huber\", \"age\": 35}"
def create(obj):
    
    print(obj)
    
    return Person(obj["name"], obj["age"])
    
obj = json.loads(str_json, object_hook=create)
print(obj)

The output should now look like this.

{'name': 'Meir-Huber', 'age': 35}
<__main__.Person object at 0x7fb84831ddd8>

Now we know how to create JSON serialisers and how to get them back from a string value. In the next tutorial, we will have a look on how to improve this and make it more dynamic – by dynamic class creation in Python.

In the last tutorials, we already worked a lot with Strings and even manipulated some of them. Now, it is about time to have a look at the theory behind it. Basically, formatting strings is very easy. The only thing you need is the “format” method appended to a string with a variable amount of data. If you add numbers, the str() function is executed on them by itself, so no need to convert them.

Basically, the annotation is very similar to the one from other string formatters you are used to. One really nice thing though is that you don’t need to provide the positional arguments. Python assumes that the positions are in-line with the parameters you provide. An easy sample is this:

str01 = "This is my string {} and the value is {}".format("Test", 11)
print(str01)

And the output should look like this:

This is my string Test and the value is 11

You can also use classes for this. Therefore, we define a class “Person”:

class Person:    
    def __init__(self, name, age):
        self.name = name
        self.age = age
    
p = Person("Mario Meir-Huber", 35)
str02 = "The author \"{}\" is {} years old".format(p.name, p.age)
print(p.name)
print(str02)

The output for this should look like this:

Mario Meir-Huber
The author "Mario Meir-Huber" is 35 years old

One nice thing in Python is the difflib. This library enables us to easily check two array of strings for differences. One use-case would be to check my lastname for differences. Note that my lastname is one of the most frequent lastname combinations in the german speaking countries and thus allows different ways to write it.

To work with difflib, simply import it and call the difflib context_diff function. This prints the differences detected with “!”.

import difflib
arr01 = ["Mario", "Meir", "Huber"]
arr02 = ["Mario", "Meier", "Huber"]
for line in difflib.context_diff(arr01, arr02):
    print(line)

Below you can see the output. One difference was spotted. You can easily use this for spotting differences in datasets and creating golden records from it.

*** 
--- 
***************
*** 1,3 ****
  Mario
! Meir
  Huber
--- 1,3 ----
  Mario
! Meier
  Huber

Another nice feature in Python is the usage of textwrap. This library has some basic features for text “prettyfying”. Basically, in the following sample, we use 5 different things:

  • Indent: creates an indent to a text, e.g. a tab before the text
  • Wrap: wraps the text into an array of strings in case it is longer than the maximum width. This is useful to split text into a maximum number of arrays
  • Fill: does the same as Wrap, but creates new lines out of it
  • Shorten: shortens the text with a specified maximum number. This is written like “[…]” and you might use it to add a “read more” around it
  • Detent: deletes any whitespace before or after the text

The functions are used in simple statements:

from textwrap import *
print(indent("Mario Meir-Huber", "\t"))
print(wrap("Mario Meir-Huber", width=10))
print(fill("Mario Meir-Huber", width=10))
print(shorten("Mario Meir-Huber Another", width=15))
print(dedent(" Mario Meir-Huber "))

And the output should look like this:

	Mario Meir-Huber
['Mario', 'Meir-Huber']
Mario
Meir-Huber
Mario [...]
Mario Meir-Huber 

Today’s tutorial was more of a “housekeeping” since we used it already. In the next tutorial, I will write about object serialisation with JSON, as this is also very useful.

In the last tutorials, we had a look at methods, classes and deorators. Now, let’s have a brief look at asynchronous operations in Python. Most of the time, this is anyway abstracted for us via Spark, but it is nevertheless relevant to have some basic understanding of it. Basically, you define a method to be asynchronous by simply adding “async” as keyword ahead of the method definition. This is written like that:

async def FUNCTION_NAME():

FUNCTION-BLOCK

Another keyword in that context is “await”. Basically, every function that is doing something asynchronous is awaitable. When adding “await”, nothing else happens until the asynchronous function has finished. This means that you might loose the benefit of asynchronous execution but get better handling when working with web data. In the following code, we create an async function that sleeps some seconds (between 1 and 10). We call the function twice with the “await” operator.

import asyncio
import random
async def func():
    tim = random.randint(1,10)
    await asyncio.sleep(tim)
    print(f"Function finished after {tim} seconds")
    
await func()
await func()

In the output, you can see that it was first waited for the first function to finish and only then the second one was executed. Basically, all of the execution happened sequentially, not in parallel.

Function finished after 9 seconds
Function finished after 9 seconds

Python also knows parallel execution. This is done via Tasks. We use the Method “create_task” from the asyncio library in order to execute a function in parallel. In order to see how this works, we invoke the function several times and add a print-statement at the end of the code.

asyncio.create_task(func())
asyncio.create_task(func())
asyncio.create_task(func())
asyncio.create_task(func())
asyncio.create_task(func())
asyncio.create_task(func())
asyncio.create_task(func())
asyncio.create_task(func())
asyncio.create_task(func())
asyncio.create_task(func())
asyncio.create_task(func())
asyncio.create_task(func())
print("doing something else ...")

This now looks very different to the previous sample. The print statement is the first to show up, and all code path finish after 9 seconds max. This is due to the fact that (A) the first execution finishes after 1 second – thus the print statement is the first to be shown, since it is executed immediately. (B) Everything is executed in parallel and the maximum sleep interval is 9 seconds.

doing something else ...
Function finished after 1 seconds
Function finished after 1 seconds
Function finished after 3 seconds
Function finished after 4 seconds
Function finished after 5 seconds
Function finished after 7 seconds
Function finished after 7 seconds
Function finished after 7 seconds
Function finished after 8 seconds
Function finished after 10 seconds
Function finished after 10 seconds
Function finished after 10 seconds

However, there are also some issues with async operations. You can never say how long it takes a task to execute. It could finish fast or it could also take forever, due to a weak network connection or an overloaded server. Therefore, you might want to specify a timeout, which is the maximum an operation should be waited for. In Python, this is done via the “wait_for” method. It basically takes the function to execute and the timeout in seconds. In case the call runs into a timeout, a “TimeoutError” is raised. This allows us to surround it with a try-block.

try:
    await asyncio.wait_for(func(), timeout=3.0)
except asyncio.TimeoutError:
    print("Timeout occured")

In two third of the cases, our function will run into a timeout. The function should return this:

Timeout occured

Each task that should be executed can also be controlled. Whenever you call the “create_task” function, it returns a Task-object. A task can either be done, cancelled or contain an error. In the next sample, we create a new task and wait for it’s completion. We then check if the task was done or cancelled. You could also check for an error and retrieve the error message from it.

task = asyncio.create_task(func())
print("running task")
await task
if task.done():
    print("Task was done")
elif task.cancelled():
    print("Task was cancelled")

In our case, no error should have occurred and thus the output should be the following:

running task
Function finished after 8 seconds
Task was done

Now we know how to work with async operations in Python. In our next tutorial, we will have a deeper look into how to work with Strings.

Decorators are powerful things in most programming languages. They help us making code more readable and adding functionality to a method or class. Basically, decorators are added above the method or class declaration in order to create some behaviour. Basically, we differentiate between two kind of decorators: method decorators and class decorators. In this tutorial, we will have a look at Class decorators.

Class decorators

Class decorators are used to add some behaviour to a class. Normally, you would use this when you want to add some kind of behaviour to a class that is outside of its inheritance structure – e.g. by adding something that is too abstract to bring it to the inheritance structure itself.

The definition of that is very similar to the method decorators:

@DECORATORNAME
class CLASSNAME():
CLASS-BLOCK

The decorator definition is also very similar to the last tutorial’s sample. We first create a method that takes a class and then create the inner method. Within the inner method, we create a new function that we want to “append” to the class. We call this method “fly” that simply prints “Now flying …” to the console. To add this function to the class, we call the “setattr” function of Python. We then return the class and the class wrapper.

def altitude(cls):
    def clswrapper(*args):
        def fly():
            print("Now flying ... ")
        setattr(cls, "fly", fly)
        return cls
    return clswrapper

Now, our decorator is ready to be used. We first need to create a class. Therefore, we re-use the sample of the vehicles, but simplify it a bit. We create a class “Vehicle” that has a function “accelerate” and create two sub classes “Car” and “Plane” that both inherit from “Vehicle”. The only difference now is that we add a decorator to the class “Plane”. We want to add the possibility to fly to the Plane.

class Vehicle:
    
    speed = 0
        
    def accelerate(self, speed):
        self.speed = speed
class Car(Vehicle):
    pass
@altitude
class Plane(Vehicle):
    pass

Now, we want to test our output:

c = Car()
p = Plane()
c.accelerate(100)
print(c.speed)
print(p.fly())

Output:

100
Now flying ... 

Basically, there are a lot of scenarios when you would use class decorators. For instance, you can add functionality to classes that contain data in order to convert this into a more readable table or alike.

In our next tutorial, we will look at the await-operator.

Decorators are powerful things in most programming languages. They help us making code more readable and adding functionality to a method or class. Basically, decorators are added above the method or class declaration in order to create some behaviour. Basically, we differentiate between two kind of decorators: method decorators and class decorators. In this tutorial, we will have a look at Method decorators.

Method decorators

Method decorators are used to perform some kind of behaviour on a method. For instance, you could add a stopwatch to check for performance, configure logging or make some checks on the method itself. All of that is done by “wrapping” the method into a decorator method. This basically means that the method “decorated” is executed in the decorator method. This, for instance, would allow us to surround a method with a try-catch block and thus add all exceptions occurred in a method into a global error handling tool.

The definition of that is very easy:

@DECORATORNAME
def METHODNAME():
METHOD-BLOCK

Basically, the only thing that you need is the “@” and the decorator name. There are several decorators available, but now we will create our own decorator. We start by creating a performance counter. The goal of that is to measure how long it takes a method to execute. We therefore create the decorator from scratch.

Basically, I stated that the decorator takes the function and executes it inside the decorator function. We start by defining our performance counter as function, that takes one argument – the function to wrap in. Within this function, we add another function (yes, we can do this in Python – creating inline functions!) – typically we call it either “wrapper” or “inner”. I call it “inner”. The inner function should provide the capability to pass on arguments; typically, a function call can have 0 to n arguments. In order to do this, we provide “*args” and “**kwargs”. Both mean that there is a variable number of arguments available. The only difference between args and kwargs is that kwargs are named arguments (e.g. “person = “Pete”).

In this inner function, we now create the start-variable that is the time once the performance counting should start. After the start-variable, we call the function (any function which we decorate) by passing on all the *args and **kwargs. After that, we measure the time again and do the math. Simple, isn’t it? However, we haven’t decorated anything yet. This is now done by creating a function that sleeps and prints text afterwards. The code for this is shown below.

import time
def perfcounter(func):
    def inner(*args, **kwargs):
        start = time.perf_counter()
        func(*args, **kwargs) #This is the invokation of the function!
        print(time.perf_counter() - start)
    return inner
    
@perfcounter
def printText(text):
    time.sleep(0.3)
    print(text)
    
printText("Hello Decorator")

Output:

Hello Decorator
0.3019062000021222

As you can see, we are now capable of adding this perfcounter decorator to any kind of function we like. Normally, it makes sense to add this to functions which take rather long – e.g. in Spark jobs or web requests. In the next sample, I create a type checker decorator. Basically, this type checker should validate that all parameters passed to any kind of function are of a specific type. E.g. we want to ensure that all parameters passed to a multiplication function are only of type integer, parameters passed to a print function are only of type string. Basically, you could also do this check inline, but it is much easier if you write the function once and simply apply it to the function as a decorator. Also, it greatly decreases the number of code lines and thus increases the readability of your code. The decorator for that should look like the following:

@typechecker(int)

For integer values and

@typechecker(str)

for string values.

The only difference now is that the decorator itself takes parameters as well, so we need to wrap the function into another function – compared to the previous sample, another level is added. What are the steps necessary?

  1. Create the method to get the parameter: def typechecker(type)
  2. Create the outer function that takes the function and holds the inner function
  3. Create the function block that holds the inner function and a type checker:
    1. We add a function called “isInt(arg)” that checks if the argument passed is of a specific type. We can use “isinstance” to check if an argument is of a specific type – e.g. int or str. If it isn’t of the expected type, we raise an error
    2. We add the inner function with args and kwargs. In this function, we iterate over all args and kwargs passed and check it against the above function (isInt). If all checks succeed, we invoke the wrapped function.
Sounds a bit complex? Don't worry, it isn't that complex at all. Let's have a look at the code:
def typechecker(type):
    def check(func):
        def isInt(arg):
            if not isinstance(arg, type):
                raise TypeError("Only full numbers permitted. Please check")
        def inner(*args, **kwargs):
            for arg in args:
                isInt(arg)
            for kwarg in kwargs:
                isInt(kwarg)
            return func(*args, **kwargs)
        return inner
    return check

Now, since we are done with the decorator itself, let’s decorate some functions. We create two functions. The first one multiplies all values passed to the function. The values can be of variable length. The second function prints all strings passed to the function. We decorate the two functions with the typechecker-decorator defined above.

@typechecker(int)
def mulall(*args):
    res = 0
    for arg in args:
        if res == 0: res = arg
        else: res *= arg
    return res
@typechecker(str)
def concat(*args):
    res = ""
    for arg in args:
        res += arg
    
    return res

I guess you can now see the benefit of decorators. We can influence the behaviour of a function and create code-snippets that are re-usable. But now, let’s call the functions to see if our decorator works as expected. Note: the third invokation should produce an error 🙂

print(mulall(1,2,3))
print(concat("a", "b", "c"))
print(mulall(1,2,"a"))

Output:

6
abc

… and the error message:

TypeErrorTraceback (most recent call last)
<ipython-input-6-cd2213a0d884> in <module>
     35 print(mulall(1,2,3))
     36 print(concat("a", "b", "c"))
---> 37 print(mulall(1,2,"a"))

<ipython-input-6-cd2213a0d884> in inner(*args, **kwargs)
      7         def inner(*args, **kwargs):
      8             for arg in args:
----> 9                 isInt(arg)
     10 
     11             for kwarg in kwargs:

<ipython-input-6-cd2213a0d884> in isInt(arg)
      3         def isInt(arg):
      4             if not isinstance(arg, type):
----> 5                 raise TypeError("Only full numbers permitted. Please check")
      6 
      7         def inner(*args, **kwargs):

TypeError: Only full numbers permitted. Please check

I hope you like decorators. In my opinion, they are very helpful and provide great value. In the next tutorial, I will show how class decorators work.

Now that we have substantial knowledge of Python, let’s look at one very important thing with software: error handling. Like most other languages, Python also provides error handling with exceptions. Let’s have a look in how this works.

Error handling with Exceptions

Basically, each progress of error-handling starts with a try-statement. This is a block where an error might occur. If an error occurs in the try-block, an exception is raised in Python. The interpreter then looks if there is a surrounding exception handler. It is best to handle exceptions as detailed as possible, since it will prevent errors later in the program. Python has a huge list of pre-defined exceptions, so it is easy to handle them without the need for own exceptions. A try-Block might also have a finally-block. This is useful if you worked with files or opened some connections. In the finally-block, you can close the connections. Note that the finally-block is executed every time, independent of an error or not. The syntax for Exceptions is this:

try:
TRY-BLOCK
except ERRORNAME:
ERROR-BLOCK
finally:
FINAL-BLOCK

If you look for exceptions in the Python documentation, you have to look them up with the “Error” appending. Python doesn’t call them “Exceptions” – even though the base-class is called like that. In the following sample, we will create a division by zero error. Therefore, we define a method “divide” which takes two parameters. We surround the division with the try-statement and check for the ZeroDivisionError. Note that the finally block is executed in all calls to the method.

def divide(val1, val2):
    try:
        return val1 / val2
    except ZeroDivisionError:
        print("Division by zero - return 0 instead")
        return 0
    finally:
        print("Cleanup everything")
        
res = divide(2, 3)
res2 = divide(2, 0)
print("The result for res is: " + str(res) + " and for res2 it is: " + str(res2))

Output:

Cleanup everything
Division by zero - return 0 instead
Cleanup everything
The result for res is: 0.6666666666666666 and for res2 it is: 0

So, this was easy, wasn’t it? Now, let’s have a look at how to raise your own exceptions.

Creating own exceptions

Basically, an exception can be “thrown” with the “raise” statement. Python is much more modest with that. C-like languages throw exceptions at you, whereas Python just kindly raises one ;). The statement to raise an exception is written like this:

raise ERRORNAME:

In our next sample, we want to raise an error for a car that drives too fast. Therefore, we first need to create our own exception. All exceptions inherit from “Exception”. So, we first create a class that inherits from that. We call the error “TooFastError”. We add no further functionality and just write pass. This instructs Python to continue with other logic. We then define a function “accelerate”, which gets exactly one parameter – speed. If speed is higher than 100, we now raise our TooFastError. Let’s try it:

class TooFastError(Exception):
    pass
def accelerate(speed):
    if speed > 100:
        raise TooFastError("You can't drive at " + str(speed) + " the overall speed limit is 100!")
    else:
        print("Ok, let's go!")
    
accelerate(20)
accelerate(110)

Output:

Ok, let's go!
TooFastErrorTraceback (most recent call last)
<ipython-input-8-9be2666bf1f4> in <module>
      9 
     10 accelerate(20)
---> 11 accelerate(110)
<ipython-input-8-9be2666bf1f4> in accelerate(speed)
      4 def accelerate(speed):
      5     if speed > 100:
----> 6         raise TooFastError("You can't drive at " + str(speed) + " the overall speed limit is 100!")
      7     else:
      8         print("Ok, let's go!")
TooFastError: You can't drive at 110 the overall speed limit is 100!

Isn’t it beautiful to raise your own exceptions :)? In our next tutorial we will have a look at decorators in Python and at the Dataclass.

In the last tutorial, we’ve learned about Methods in Python. To further increase the re-usability of your code, it is also possible to use Classes in Python. With Classes, you can encapsulate your methods into better re-usability.

Classes

With a class, we can add several methods together. Imagine you write an app for selling different kind of vehicles – either a car, a bike or a motorcycle. There are several items in your vehicle, that are always the same – e.g. all would have a wheel. With a class, you can sum this up and create re-usability on each of them. Basically, a class in Python is also very similar to classes in C-like languages. However, there are several differences. The most striking one is that we don’t have any public-protected-private modifiers. This applies to the class itself and also to its methods and variables. Basically, a class is defined with the “class” keyword. The syntax of it looks like this:

class CLASSNAME:

SOME_VARIABLES

def __init__(self):
INIT-BLOCK

def METHODS:
METHODS-BLOCK

As you can see, we have several possibilities with classes. First of all, defining a class is easy. You would then specify some variables within the class. The constructor – one thing you often use in C-like languages – is written wit “def__init__(“. You will then have the possibility to specify variables in the brackets. One mandatory variable in there is the “self”. This refers to the class itself and makes all objects in the class accessible. You will also need it in all further method definitions.

Let’s go back to the previous sample with the vehicles. In order to achieve that, we create a class “Vehicle” and add some variables and methods to it. The Class should look like the following. Create a new Notebook and name this notebook “vehicle”.

class Vehicle:
    
    vname = "Not Defined"
    curspeed = 0
    
    def name(self):
        print(self.vname)
        
        
    def accelerate(self, speed):
        self.curspeed += speed

Now comes the tricky part: since we are working in a notebook application and not in a comprehensive IDE, we first need to download the file as “Python” and then re-upload it. If you would just rename the notebook, it would result in a json document that can’t be read. We re-upload the file and name it “vehicle.py”. Make sure that it is in the same folder as your other notebook. We then switch back to your original file and use the just created vehicle:

from vehicle import Vehicle
v = Vehicle()
v.name()

Output:

Not Defined

Note that the output is as expected: we explicitly named the vehicle “Not Defined” – so it isn’t an error. I guess you can imagine what I want to do with the vehicle – I want to use inheritance in the next sample 🙂

Inheritance

Often, it is necessary to inherit from a class. This is helpful, when a sub-class is a specialisation of another class and thus has some common functions or behaviour. It is therefore easier to extend the behaviour without re-writing the code. To enable inheritance, we add brackets to the new class with the name of the parent-class. Also, it is important to import the parent class. Everything else stays the same.

In the following sample, we create a new class and name it “car”. Basically, we extend the behaviour of our vehicle and add a “start” method to it. This will print a statement that the car has started with a specific number of HP. Create a new file in Jupyter and name it “car”:

from vehicle import Vehicle
class Car(Vehicle):
    def __init__(self, brand, hp):
        self.hp = hp
        vname = brand
        
    def start(self):
        print("Engine of " + self.vname + " started @ " + str(self.hp))

Do the same steps as in the “vehicle” file and go back to your initial Notebook. We instantiate a new Car and give it a name and some HP:

from car import Car
c = Car("BMW", 150)
print(c.hp)
print(c.name())

Output:

150
Mercedes

The function “hp” belongs to the “car”-class, whereas the function “name” belongs to the “vehicle” class. Let’s now use the “start” function:

c.start()

Output:

Engine of Mercedes started @ 150

Also here we use the method of our “car” Class.

Since now the code got more complex, it is necessary to think about how to work with Errors – in our next tutorial, we will look at error handling.

In the last tutorial, we’ve learned about the different control structures in Python. Now that we know control structures and basics of Python programming, let’s have a look into how to encapsulate code into functions. Therefore, we will have a look at methods and functions in Python.

Methods

Due to the overall differences with C-like languages, also the method definition is slightly different. However, it is very similar to the control structures layout. Let’s have a look

def functionname(args):
FUNCTION BLOCK
return value

A function in Python is always defined with “def”. After that, a function name is provided. Values passed to the function are then in parentheses. Due to the dynamic aspects of Python, it doesn’t know any dedicated type definitions. This means that values are passed by their name to the function. After the “:”, the function block starts. Everything within the function block needs to be indented. Python can also return values by adding the “return” statement. The following function adds one to a number:

def myiter(n):
    return n + 1
val = myiter(33)
val

Output:

34

As you would expect, it is also possible to add different levels by indenting control structures and alike. The following function creates an array from 0-4 (in the range of 5) and calls a function that iterates over each item in the array.

def printall(vals):
    for val in vals:
        print(val)
values = range(5)
printall(values)

Output:

0
1
2
3
4

Lambda Expressions

A very cool feature of most modern programming languages is the availability of Lambda expressions. With this, it is possible to significantly reduce the code complexity by writing easy functions in one-liners. For instance, the first sample of the iterator could also be written as a lambda expression. Basically, a lambda expression is a function defined in-line to be called on each item in a list or an array. It is very useful for data manipulations. Basically, a Lambda expression is introduced with the following statement:

lambda variable: STATEMENT

In this case, variable is one or more variable(s) to work with in the following statement. For instance, if it is the previous sample, it would mean a number. Thus we would only use one variable. If it would be a dictionary, it can also be just one (and the key/values are available as methods) or you would provide both, for instance as x, y. A sample lambda expression matching with the previous one is this:

v = map(lambda x: x + 1, values)
printall(v)

Output:

1
2
3
4
5

In the above statement, we used the “map” function out of python that is capable of calling a lambda function on the specified iterable (array in our case). We re-used the existing array specified in the statement above. Each item of the array is now changed in its value by one. As you can see, it is very easy to work with lambda expressions in Python and they are very useful to keep your code simple and clean.

In the next tutorial, we will have a look at how to encapsulate methods and functions into classes and packages. We will also have a look at inheritance.

Now we have learned about the basics of Python in the last tutorial. Now, we focus on some very important things, every developer needs – control structures. Basically, in this post, I will explain the If-statement and two loops.

If-then-else

This is something, every programer learns at the very beginning. The good news is: also Python can do it :). Basically, the syntax is very easy:

if expression:
IF-BLOCK
elif expression:
ELSE-IF-BLOCK
else:
ELSE-BLOCK

An if-statement starts with “if” and is then immediately followed by the expression. Please note that there are no brackets like in C-languages. After the expression, a “:”. The if-block is written with an indent. If everything that should be executed in the if-block is written and then the if-block starts. After the if-block is finished, there is either an elif (else-if) or else block – or the end of the entire block. The following example shows this:

ds = 12
if ds > 10:
    print("TRUE")
else: 
    print("FALSE")
    
if ds > 15:
    print("TRUE")
else:
    print("FALSE")
TRUE
FALSE

The if-statement also knows an else-if. Basically, you can check for different conditions within one statement. The following shows the else-if (elif) block:

if ds < 10:
    print("TRUE")
elif ds > 11:
    print("FALSE")
FALSE

While-Loop

A very important loop is the while-loop. The while-loop executes code as long until a condition is false. What is very prominent in Python is the existence of an “else” block in the while-loop. Basically, the else-block is executed once the condition of the while-loop is false. You can use this for cleanup or alike. The syntax of the while-loop is as follows:

while(expression):
WHILE-BLOCK
else:
ELSE-BLOCK

In Python, you can also use “continue” and “break” in your loop. Both have different effects: continue skips the current instance of the loop, whereas break terminates the execution of the entire loop. You might need break for error handling in a loop. A simple loop counting down from 12 looks like the following:

ds = 12
while(ds > 0):
    ds -= 1
    if ds == 0: continue;
    print(ds)
else:
    print("we're done here")
11
10
9
8
7
6
5
4
3
2
1
we're done here

In the above loop, the else-block was used and we added a check if the statement is 0 to skip the execution. Basically, we count from 12 downwards (but start at 11, since at the first iteration we already decreased the value).

In the following sample, we exchange the “continue” with “break”. Check what happens:

ds = 12
while(ds > 0):
    ds -= 1
    if ds == 0: break;
    print(ds)
else:
    print("we're done here")
11
10
9
8
7
6
5
4
3
2
1

For-Loop

The For-Loop is the other loop used in Apache Spark. It is mainly used to iterate over datasets. In the for-loop, we have to stop our thinking about how for-loops looked like in C-like languages. We don’t have any iterators in terms of numbers any more. We only specify the item name for each iteration and the collection/list to iterate on. The syntax is very easy:

for iterator in iterable:
FOR-BLOCK

Normally, you would iterate over an array, list, map or alike. In our sample, we will use the “persons” map we have created in our previous sample. Please note one thing: we have used different types, so not all types are of type string. If you now would like to print and concatenate them, you first need to ensure to convert each non-string type. That’s why we use “str()” for conversion:

for person in persons:
    print(str(person) + " is " + str(persons[person]))
mario is 35
vienna is austria
3 is age

The output here is also very clear. Now you might be disappointed by the non-existing “counter” for. The good thing is that you could still do it by providing the “range” keyword. It isn’t the same as you might be used to, but might get you into Python faster ;). With the range-keyword, the sample looks like this:

for i in range(5):
    print(i)
0
1
2
3
4

Easy, isn’t it? Now, we are ready to have a look at functions in our following tutorial.