Big Data Data Science python Tutorials

Python for Spark Tutorial – Control structures in Python


Now we have learned about the basics of Python in the last tutorial. Now, we focus on some very important things, every developer needs – control structures. Basically, in this post, I will explain the If-statement and two loops.

If-then-else

This is something, every programer learns at the very beginning. The good news is: also Python can do it :). Basically, the syntax is very easy:

if expression:
IF-BLOCK
elif expression:
ELSE-IF-BLOCK
else:
ELSE-BLOCK

An if-statement starts with “if” and is then immediately followed by the expression. Please note that there are no brackets like in C-languages. After the expression, a “:”. The if-block is written with an indent. If everything that should be executed in the if-block is written and then the if-block starts. After the if-block is finished, there is either an elif (else-if) or else block – or the end of the entire block. The following example shows this:

ds = 12

if ds > 10:
    print("TRUE")
else: 
    print("FALSE")
    
if ds > 15:
    print("TRUE")
else:
    print("FALSE")
TRUE
FALSE

The if-statement also knows an else-if. Basically, you can check for different conditions within one statement. The following shows the else-if (elif) block:

if ds < 10:
    print("TRUE")
elif ds &gt; 11:
    print("FALSE")
FALSE

While-Loop

A very important loop is the while-loop. The while-loop executes code as long until a condition is false. What is very prominent in Python is the existence of an “else” block in the while-loop. Basically, the else-block is executed once the condition of the while-loop is false. You can use this for cleanup or alike. The syntax of the while-loop is as follows:

while(expression):
WHILE-BLOCK
else:
ELSE-BLOCK

In Python, you can also use “continue” and “break” in your loop. Both have different effects: continue skips the current instance of the loop, whereas break terminates the execution of the entire loop. You might need break for error handling in a loop. A simple loop counting down from 12 looks like the following:

ds = 12

while(ds > 0):
    ds -= 1
    if ds == 0: continue;
    print(ds)
else:
    print("we're done here")
11
10
9
8
7
6
5
4
3
2
1
we're done here

In the above loop, the else-block was used and we added a check if the statement is 0 to skip the execution. Basically, we count from 12 downwards (but start at 11, since at the first iteration we already decreased the value).

In the following sample, we exchange the “continue” with “break”. Check what happens:

ds = 12

while(ds > 0):
    ds -= 1
    if ds == 0: break;
    print(ds)
else:
    print("we're done here")
11
10
9
8
7
6
5
4
3
2
1

For-Loop

The For-Loop is the other loop used in Apache Spark. It is mainly used to iterate over datasets. In the for-loop, we have to stop our thinking about how for-loops looked like in C-like languages. We don’t have any iterators in terms of numbers any more. We only specify the item name for each iteration and the collection/list to iterate on. The syntax is very easy:

for iterator in iterable:
FOR-BLOCK

Normally, you would iterate over an array, list, map or alike. In our sample, we will use the “persons” map we have created in our previous sample. Please note one thing: we have used different types, so not all types are of type string. If you now would like to print and concatenate them, you first need to ensure to convert each non-string type. That’s why we use “str()” for conversion:

for person in persons:
    print(str(person) + " is " + str(persons[person]))
mario is 35
vienna is austria
3 is age

The output here is also very clear. Now you might be disappointed by the non-existing “counter” for. The good thing is that you could still do it by providing the “range” keyword. It isn’t the same as you might be used to, but might get you into Python faster ;). With the range-keyword, the sample looks like this:

for i in range(5):
    print(i)
0
1
2
3
4

Easy, isn’t it? Now, we are ready to have a look at functions in our following tutorial.

I lead a team of Senior Experts in Data & Data Science as Head of Data & Analytics and AI at A1 Telekom Austria Group. I also teach this topic at various universities and frequently speak at various Conferences. In 2010 I wrote a book about Cloud Computing, which is often used at German & Austrian Universities. In my home country (Austria) I am part of several organisations on Big Data & Data Science.

0 comments on “Python for Spark Tutorial – Control structures in Python

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: