One thing that everyone that deals with data is with classes that make data accessible to the code as objects. In all cases – and Python isn’t different here – wrapper classes and O/R mappers have to be written. However, Python has a powerful decorator for us at hand, that allows us to ease up or work. This decorator is called “dataclass”

The dataclass in Python

The nice thing about the dataclass decorator is that it enables us to add a great set of functionality to an object containing data without the need to re-write it always. Basically, this decorator adds the following functionality:

  • __init__: the constructor with all defined member variables. In order to use this, the member variables must be initialised with its type – which is rather uncommon in Python
  • __repr__: this pretty prints the class with all its member variables as a string
  • __eq__: a function to compare two classes for ordering
  • order functions: this creates several order functions such as __lt__ (lower than), __gt__ (greater than), __le__ (lower equals) and __ge__ (greater equals)
  • __hash__: adds a hash-function to the class
  • frozen: prevents the class from adding/deleting attributes on runtime

The definition for a dataclass in Python is easy:

class Classname():

You can also add each of the above described properties separately, e.g. with frozen=True or alike.

In the following sample, we will create a Person-Dataclass.

from dataclasses import dataclass
class Person:
    firstname: str
    lastname: str
    age: int
    score: float
p = Person("Mario", "Meir-Huber", 35, 1.0)

Please note the differences in how to annotate the member variables. You can see that there is now no need for a constructor anymore, since this is already done for you. When you print the class, the __repr__() function is called. The output should look like the following:

Person(firstname='Mario', lastname='Meir-Huber', age=35, score=1.0)

As you can see, the dataclass abstracts a lot of our problems. In the next tutorial we will have a look at IterTools and FuncTools.

If you are not yet familiar with Spark, have a look at the Spark Tutorial i created here. Also, I will create more tutorials on Python and Machine Learning in the future, so make sure to check back often to the Big Data & Data Science tutorial overview. I hope you liked this tutorial. If you have any suggestions and what to improve, please feel free to get in touch with me! If you want to learn more about Python, I also recommend you the official page.

0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply