In the last tutorials, we had a look at methods, classes and deorators. Now, let’s have a brief look at asynchronous operations in Python. Most of the time, this is anyway abstracted for us via Spark, but it is nevertheless relevant to have some basic understanding of it. In this Tutorial, we will look at Python async and await functionality.
Python Async and await functionality
Basically, you define a method to be asynchronous by simply adding “async” as keyword ahead of the method definition. This is written like that:
async def FUNCTION_NAME():
FUNCTION-BLOCK
Another keyword in that context is “await”. Basically, every function that is doing something asynchronous is awaitable. When adding “await”, nothing else happens until the asynchronous function has finished. This means that you might loose the benefit of asynchronous execution but get better handling when working with web data. In the following code, we create an async function that sleeps some seconds (between 1 and 10). We call the function twice with the “await” operator.
import asyncio import random async def func(): tim = random.randint(1,10) await asyncio.sleep(tim) print(f"Function finished after {tim} seconds") await func() await func()
In the output, you can see that it was first waited for the first function to finish and only then the second one was executed. Basically, all of the execution happened sequentially, not in parallel.
Function finished after 9 seconds Function finished after 9 seconds
Python also knows parallel execution. This is done via Tasks. We use the Method “create_task” from the asyncio library in order to execute a function in parallel. In order to see how this works, we invoke the function several times and add a print-statement at the end of the code.
Parallel execution in Python async
asyncio.create_task(func()) asyncio.create_task(func()) asyncio.create_task(func()) asyncio.create_task(func()) asyncio.create_task(func()) asyncio.create_task(func()) asyncio.create_task(func()) asyncio.create_task(func()) asyncio.create_task(func()) asyncio.create_task(func()) asyncio.create_task(func()) asyncio.create_task(func()) print("doing something else ...")
This now looks very different to the previous sample. The print statement is the first to show up, and all code path finish after 9 seconds max. This is due to the fact that (A) the first execution finishes after 1 second – thus the print statement is the first to be shown, since it is executed immediately. (B) Everything is executed in parallel and the maximum sleep interval is 9 seconds.
doing something else ... Function finished after 1 seconds Function finished after 1 seconds Function finished after 3 seconds Function finished after 4 seconds Function finished after 5 seconds Function finished after 7 seconds Function finished after 7 seconds Function finished after 7 seconds Function finished after 8 seconds Function finished after 10 seconds Function finished after 10 seconds Function finished after 10 seconds
However, there are also some issues with async operations. You can never say how long it takes a task to execute. It could finish fast or it could also take forever, due to a weak network connection or an overloaded server. Therefore, you might want to specify a timeout, which is the maximum an operation should be waited for. In Python, this is done via the “wait_for” method. It basically takes the function to execute and the timeout in seconds. In case the call runs into a timeout, a “TimeoutError” is raised. This allows us to surround it with a try-block.
Dealing with TimeoutError in Python
try: await asyncio.wait_for(func(), timeout=3.0) except asyncio.TimeoutError: print("Timeout occured")
In two third of the cases, our function will run into a timeout. The function should return this:
Timeout occured
Each task that should be executed can also be controlled. Whenever you call the “create_task” function, it returns a Task-object. A task can either be done, cancelled or contain an error. In the next sample, we create a new task and wait for it’s completion. We then check if the task was done or cancelled. You could also check for an error and retrieve the error message from it.
Create_Task in Python
task = asyncio.create_task(func()) print("running task") await task if task.done(): print("Task was done") elif task.cancelled(): print("Task was cancelled")
In our case, no error should have occurred and thus the output should be the following:
running task Function finished after 8 seconds Task was done
Now we know how to work with async operations in Python. In our next tutorial, we will have a deeper look into how to work with Strings.
If you are not yet familiar with Spark, have a look at the Spark Tutorial i created here. Also, I will create more tutorials on Python and Machine Learning in the future, so make sure to check back often to the Big Data & Data Science tutorial overview. I hope you liked this tutorial. If you have any suggestions and what to improve, please feel free to get in touch with me! If you want to learn more about Python, I also recommend you the official page.
Leave a Reply
Want to join the discussion?Feel free to contribute!