Python has a really great standard library. In the next two tutorial sessions, we will have a first look at this standard library. We will mainly focus on what is relevant for Spark developers in the long run. Today, we will focus on FuncTools and IterTools, the next tutorial will deal with some mathematical functions. But first, let’s start with “reduce

The reduce() function

Basically, the reduce function takes an iterable list and executes a function on it. In most of the cases, this will be a lambda function but it could also be a normal function. In our sample, we take some values and create the sum of it by moving from left to right:

from functools import reduce
values = [1,4,5,3,2]
reduce(lambda x,y: x+y, values)

And we get the expected output

15

The sorted() function

Another very useful function is the “sorted” function. Basically, this sorts values or pairs of tuples in an array. The easiest way to apply it is to do it with our previous values (which were unsorted!):

print(sorted(values))

The output is now in the expected sorting:

[1, 2, 3, 4, 5]

However, we can still improve this by even sorting complex objects. Sorted takes a key to sort on, and this is passed as a lamdba expression. We state that we want to sort it by age. Make sure that you still have the “Person” class from our previous tutorial:

perli = [Person("Mario", "Meir-Huber", 35, 1.0), Person("Helena", "Meir-Huber", 5, 1.0)]
print(perli)
print(sorted(perli, key=lambda p: p.age))

As you can see, our values are now sorted based on the age member.

[Person(firstname='Mario', lastname='Meir-Huber', age=35, score=1.0), Person(firstname='Helena', lastname='Meir-Huber', age=5, score=1.0)]
[Person(firstname='Helena', lastname='Meir-Huber', age=5, score=1.0), Person(firstname='Mario', lastname='Meir-Huber', age=35, score=1.0)]

The chain() function

The chain() method is very helpful if you want to hook up two lists with the same objects in it. Basically, we take the Person-Class again and create a new instance. We then chain the two lists together:

import itertools
perstwo = [Person("Some", "Other", 46, 1.0)]
persons = itertools.chain(perli, perstwo)
for pers in persons:
    print(pers.firstname)

Also here, we get the expected output:

Mario
Helena
Some

The groupby() function

Another great feature when working with data is grouping of data. Python also allows us to do so. The groupby() method takes two parameters: the list to group and the key as lambda expression. We create a new array of tuple pairs and group by the family name:

from itertools import groupby
pl = [("Meir-Huber", "Mario"), ("Meir-Huber", "Helena"), ("Some", "Other")]
    
for k,v in groupby(pl, lambda p: p[0]):
    
    print("Family {}".format(k))
    
    for p in v:
        print("\tFamily member: {}".format(p[1]))

Basically, the groupby() method returns the key (as the value type) and the objects as list in the key group. This means that another iteration is necessary in order to access the elements in the group. The output of the above sample looks like this:

Family Meir-Huber
	Family member: Mario
	Family member: Helena
Family Some
	Family member: Other

The repeat() function

A nice function is the repeat() function. Basically, it copies an element several times. For instance, if we want to copy a person 4 times, this can be done like this:

lst = itertools.repeat(perstwo, 4)
for p in lst:
    print(p)

And also the output is just as expected:

[Person(firstname='Some', lastname='Other', age=46, score=1.0)]
[Person(firstname='Some', lastname='Other', age=46, score=1.0)]
[Person(firstname='Some', lastname='Other', age=46, score=1.0)]
[Person(firstname='Some', lastname='Other', age=46, score=1.0)]

The takewhile() and the dropwhile() function

Two functions – takewhile and dropwhile – are also very helpful in Python. Basically, they are very similar, but their result is the opposite form each other. takewhile runs until a condition is true, dropwhile runs once a condition is false. Takewhile will take elements from an array/list as long as the predicate is true (e.g. lower than 20, this would mean that elements are only considered as long as they are below 20) – Dropwhile with the same condition would remove elements as long as their values are below 20. The following sample shows this:

vals = range(1,40)
for v in itertools.takewhile(lambda vl: vl<20, vals):
    print(v)
    
print("######")
for v in itertools.dropwhile(lambda vl: vl<20, vals):
    print(v)

And also here, the output is as expected:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
######
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39

As you can see, these are quite helpful functions. In our last Python tutorial, we will have a look at some basic mathematical and statistical functions.

0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!