In the last tutorials, we had a look at methods, classes and deorators. Now, let’s have a brief look at asynchronous operations in Python. Most of the time, this is anyway abstracted for us via Spark, but it is nevertheless relevant to have some basic understanding of it. Basically, you define a method to be asynchronous by simply adding “async” as keyword ahead of the method definition. This is written like that:

async def FUNCTION_NAME():

FUNCTION-BLOCK

Another keyword in that context is “await”. Basically, every function that is doing something asynchronous is awaitable. When adding “await”, nothing else happens until the asynchronous function has finished. This means that you might loose the benefit of asynchronous execution but get better handling when working with web data. In the following code, we create an async function that sleeps some seconds (between 1 and 10). We call the function twice with the “await” operator.

import asyncio
import random
async def func():
    tim = random.randint(1,10)
    await asyncio.sleep(tim)
    print(f"Function finished after {tim} seconds")
    
await func()
await func()

In the output, you can see that it was first waited for the first function to finish and only then the second one was executed. Basically, all of the execution happened sequentially, not in parallel.

Function finished after 9 seconds
Function finished after 9 seconds

Python also knows parallel execution. This is done via Tasks. We use the Method “create_task” from the asyncio library in order to execute a function in parallel. In order to see how this works, we invoke the function several times and add a print-statement at the end of the code.

asyncio.create_task(func())
asyncio.create_task(func())
asyncio.create_task(func())
asyncio.create_task(func())
asyncio.create_task(func())
asyncio.create_task(func())
asyncio.create_task(func())
asyncio.create_task(func())
asyncio.create_task(func())
asyncio.create_task(func())
asyncio.create_task(func())
asyncio.create_task(func())
print("doing something else ...")

This now looks very different to the previous sample. The print statement is the first to show up, and all code path finish after 9 seconds max. This is due to the fact that (A) the first execution finishes after 1 second – thus the print statement is the first to be shown, since it is executed immediately. (B) Everything is executed in parallel and the maximum sleep interval is 9 seconds.

doing something else ...
Function finished after 1 seconds
Function finished after 1 seconds
Function finished after 3 seconds
Function finished after 4 seconds
Function finished after 5 seconds
Function finished after 7 seconds
Function finished after 7 seconds
Function finished after 7 seconds
Function finished after 8 seconds
Function finished after 10 seconds
Function finished after 10 seconds
Function finished after 10 seconds

However, there are also some issues with async operations. You can never say how long it takes a task to execute. It could finish fast or it could also take forever, due to a weak network connection or an overloaded server. Therefore, you might want to specify a timeout, which is the maximum an operation should be waited for. In Python, this is done via the “wait_for” method. It basically takes the function to execute and the timeout in seconds. In case the call runs into a timeout, a “TimeoutError” is raised. This allows us to surround it with a try-block.

try:
    await asyncio.wait_for(func(), timeout=3.0)
except asyncio.TimeoutError:
    print("Timeout occured")

In two third of the cases, our function will run into a timeout. The function should return this:

Timeout occured

Each task that should be executed can also be controlled. Whenever you call the “create_task” function, it returns a Task-object. A task can either be done, cancelled or contain an error. In the next sample, we create a new task and wait for it’s completion. We then check if the task was done or cancelled. You could also check for an error and retrieve the error message from it.

task = asyncio.create_task(func())
print("running task")
await task
if task.done():
    print("Task was done")
elif task.cancelled():
    print("Task was cancelled")

In our case, no error should have occurred and thus the output should be the following:

running task
Function finished after 8 seconds
Task was done

Now we know how to work with async operations in Python. In our next tutorial, we will have a deeper look into how to work with Strings.

0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!