On demand data in Python, Part 3
Learn how to dramatically improve the performance of software with
input/output processing
Content series:
This content is part # of # in the series: On demand data in Python, Part 3
https://www.ibm.com/developerworks/library/?series_title_by=**auto**
Stay tuned for additional content in this series.
This content is part of the series:On demand data in Python, Part 3
Stay tuned for additional content in this series.
In Part 1 of this series, you looked at Python iterators, in
Part 2 you learned about itertools. In this part, you’re
going to learn about coroutines, a
special sort of generator function. You’ll also learn about another
powerful but tricky standard library module: asyncio
.
Imagine you walk into a tiny restaurant. There are three tables and only
one server. You know what to expect. The server will come to you with the
menu, and return for your order. Once the cook has prepared the order, the
server will bring it to you. After you finish eating the server brings you
the check, and returns to the table when you’re ready with payment.
The other tables are happily enjoying their meals because while you are
thinking of what to order, or while the cook is preparing your food, or
while you are eating your food, the server is available to handle any of
these steps with other tables. You might have to wait a few minutes from
time to time when more than one table needs attention from the server at
the same time, but this shouldn’t be too noticeably the case.
If it takes the average dinner party an hour from walking in the door to
walking out the door as the only occupied table, it might take an hour and
ten minutes if there is one other table, and an hour and twenty if both
other tables are occupied. That’s not too bad.
Now imagine instead that you walk in, but it turns out that the server only
attends to one table until they have completed all the steps. You could
wait up to two hours before you even begin your meal because the server
deals with the other tables first. This would not be a popular
restaurant.
Synchronous versus Asynchronous
Surprisingly, much of the computer code we write works like this very
inefficient restaurant. In computer terms, the unpopular restaurant case
is called serial operation and the server’s actions are said to be
synchronous. The restaurant situation we’re used to, where the server can
attend to different tables at a time as they progress through their needs,
is called parallel operation and the server’s actions are said to be
asynchronous.
The reason I’m spending so much time on this analogy is that it illustrates
one of the most important techniques a developer should properly learn in
order to write scalable applications which use databases, networks, and
other such resources that depend on input/output (I/O). Real world
restaurants use asynchronous processes because they wouldn’t be desirable
or competitive otherwise. Ideally, real-world programs would tend to use
asynchronous processes, but doing so takes the right tools, libraries,
skill, and practice by the developer. This tutorial, the third part of the
series, is a gentle introduction of how to do so in Python.
I do want to mention that Python’s facilities to support asynchronous
programming are quite layered and tricky, with many different ways to do
things. Some of these facilities are relatively recent additions and still
have some of the trimmings from their experimental stage. Nevertheless,
the topic is important, and it’s well worth persevering. I will
deliberately guide you through a pragmatic subset of these facilities, and
once you are familiar with the basic ideas, you can explore other
approaches on your own.
Coroutines
You learned in the previous tutorial about generator functions, and how
these were different from regular functions. When the caller invokes
regular functions, the process starts at the top and exits at one place
depending on the function’s logic. With generators, the caller can enter
and exit a single function multiple times, suspending and resuming its
execution.
A function that can be entered and exited multiple times, suspended and
resumed each time, is called a coroutine. A generator is just a simplified
sort of coroutine. Python has several types of coroutines, but the focus
of this tutorial is the type that’s designed to support asynchronous
programming. Let’s go back to the restaurant analogy. The
menu/order/eat/check/payment sequence for each table is a separate
coroutine, but the server suspends and resumes attention to each table so
that all three coroutines can run at the same time, possibly in different
stages of the process. The well-trained brain of the server acts as a
scheduler for juggling these parallel coroutines.
In the synchronous restaurant, everything is a regular function. It is
entered once, when the party arrives, and exited once, when they leave.
Only one such function is running at a time, so parties might have to wait
as long as two hours to start their own dining experience.
In the asynchronous restaurant, a coroutine function is entered for the
first time when the party arrives, creating a coroutine object, and exited
for the last time when they leave, in which case the coroutine object is
no longer needed. However, after the server brings the menus they can
suspend the coroutine object for that particular table and check to see if
any of the other tables need attention. Same thing after any given table
has ordered, received their check, etc.
Restaurant server code
Taking advantage of the fact that Python is almost as readable as
pseudocode, here is an actual implementation of the server coroutine.
async def serve_table(table_number): await get_menus() print('Welcome. Please sit at table', table_number, 'Here are your menus') order = await get_order() print('Table', table_number, 'what will you be having today?') await prepare_order(order) print('Table', table_number, 'here is your meal:', order) await eat() print('Table', table_number, 'here is your check') await get_payment() print('Thanks for visiting us! (table', table_number, ')')
Rather than just def
, this function is defined using
async def
. This marks it as an asynchronous coroutine
function. I’ll mention in passing that there are also asynchronous
coroutine generator functions, which have a yield
statement somewhere in the body, but those are a special case and beyond
the scope of this tutorial series. Honestly, the zoo of
function/generator/coroutine types in Python 3 is rather bewildering, but
again I’m going to ignore some of the possibilities in this tutorial
series and present a simple pathway to get you started.
Within the body of serve_table
are a series of
await
statements. This creates a coroutine object from the
called coroutine function and invokes this object, also yielding control
to any other coroutines that are ready to run. This is the equivalent of
the restaurant server starting a process such as having the cook begin
preparing a meal and at the same time checking to see if any of the other
tables need attention.
This juggling of tasks happens in the well-trained server’s brain, and the
equivalent of this in Python is called the event loop. We’ll return to
this in a moment.
More
coroutines
Let’s look at the implementations of the other coroutines invoked by
serve_table
.
async def get_menus(): delay_minutes = random.randrange(3) #0 to 3 minutes await asyncio.sleep(delay_minutes) #Pretend a second is a minute async def get_order(): delay_minutes = random.randrange(10) await asyncio.sleep(delay_minutes) order = random.choice(['Special of the day', 'Fish & Chips', 'Pasta']) return order async def prepare_order(order): delay_minutes = random.randrange(10, 20) #10 to 20 minutes await asyncio.sleep(delay_minutes) print(' [Order ready from kitchen: ', order, ']') async def eat(): delay_minutes = random.randrange(20, 40) await asyncio.sleep(delay_minutes) async def get_payment(): delay_minutes = random.randrange(10) await asyncio.sleep(delay_minutes)
These functions use a sleep timer to simulate taking time to do some
processing. The random.randrange
function gives a range of
integers from which one is picked at random. The function
asyncio.sleep
is a special coroutine which suspends action
for the given number of seconds. During this sleep period, the event loop
is, of course, free to run any other coroutine that’s ready. As usual, you
invoke this using the await
keyword.
I’ll take this moment to mention that you can only use the
await
keyword from the body of an asynchronous coroutine
function (for example, defined using async def
). Using
await
anywhere else is a syntax error.
Notice the get_order
coroutine returns a value. This value is
passed back in the await statement of the caller.
Pulling it all together: the event loop
I mentioned the event loop earlier. You need some special set-up code to
get into asynchronous mode, creating an event loop that schedules and
manages coroutines as you have coded them to run as cooperating tasks.
asyncio
coroutines are also called tasks, which keeps things
simple. When a coroutine uses await
to turn over control to
another coroutine, it’s actually handing control back to the event loop.
The event loop is like the well-trained brain of the server.
Here is code for running the restaurant serve coroutines we’ve defined so
far.
#Create coroutines for three tables gathered_coroutines = asyncio.gather( serve_table(1), serve_table(2), serve_table(3) ) #asyncio uses event loops to manage its operation loop = asyncio.get_event_loop() #This is the entry from synchronous to asynchronous code. It will block #Until the coroutine passed in has completed loop.run_until_complete(gathered_coroutines) #We're done with the event loop loop.close()
The special coroutine asyncio.gather
takes one or more other
coroutines and schedules them all to run, and only completes after all the
gathered coroutines have completed. It’s used here to run the coroutines
for three tables in the event loop, which is first obtained using
asyncio.get_event_loop
. The next line runs the given
coroutine until it completes. Because it’s passed a gathered set of three
coroutines, it ends up running until all three of those are complete. Of
course, each serve_table
coroutine invokes additional
coroutines, such as get_menus
and get_order
,
invoked using await and then scheduled by the event loop.
The full
program
Listing 1. serve_tables.py is the
entire
program
import random import asyncio async def get_menus(): delay_minutes = random.randrange(3) #0 to 3 minutes await asyncio.sleep(delay_minutes) #Pretend a second is a minute async def get_order(): delay_minutes = random.randrange(10) await asyncio.sleep(delay_minutes) order = random.choice(['Special of the day', 'Fish & Chips', 'Pasta']) return order async def prepare_order(order): delay_minutes = random.randrange(10, 20) #10 to 20 minutes await asyncio.sleep(delay_minutes) print(' [Order ready from kitchen: ', order, ']') async def eat(): delay_minutes = random.randrange(20, 40) await asyncio.sleep(delay_minutes) async def get_payment(): delay_minutes = random.randrange(10) await asyncio.sleep(delay_minutes) async def serve_table(table_number): await get_menus() print('Welcome. Please sit at table', table_number, 'Here are your menus') order = await get_order() print('Table', table_number, 'what will you be having today?') await prepare_order(order) print('Table', table_number, 'here is your meal:', order) await eat() print('Table', table_number, 'here is your check') await get_payment() print('Thanks for visiting us! (table', table_number, ')') #Create coroutines for three tables gathered_coroutines = asyncio.gather( serve_table(1), serve_table(2), serve_table(3) ) #asyncio uses event loops to manage its operation loop = asyncio.get_event_loop() #This is the entry from synchronous to asynchronous code. It will block #Until the coroutine passed in has completed loop.run_until_complete(gathered_coroutines) #We're done with the event loop loop.close()
Here is an example of output from running this program.
Welcome. Please sit at table 1 Here are your menus Welcome. Please sit at table 2 Here are your menus Table 1 what will you be having today? Welcome. Please sit at table 3 Here are your menus Table 3 what will you be having today? Table 2 what will you be having today? [Order ready from kitchen: Pasta ] Table 1 here is your meal: Pasta [Order ready from kitchen: Fish & Chips ] Table 3 here is your meal: Fish & Chips [Order ready from kitchen: Special of the day ] Table 2 here is your meal: Special of the day Table 3 here is your check Table 1 here is your check Thanks for visiting us! (table 3 ) Thanks for visiting us! (table 1 ) Table 2 here is your check Thanks for visiting us! (table 2 )
Notice a delay of a few seconds between most of these lines. This is the
sleep delay in the various coroutines which simulates the time it takes to
do things in the restaurant. Time is compressed, with one second of the
program representing one minute in the restaurant. Because the sleep
delays are of a random length, the messages appear in a different order
each time you run the program.
Also, notice that the program doesn’t always begin neatly with table 1,
then table 2, then table 3. The asyncio.gather
coroutine
schedules the coroutines you give it but in no particular order.
The main thing to appreciate here is the flow of cooperative multitasking.
Study the full listing above, while running and tweaking the code until
you have a good feel of how coroutines release and regain control.
Sometimes all three serve_table
coroutine objects invoke one
of the other coroutines, all of which happen to be waiting on a sleep
delay. That’s when you don’t see any output for a few seconds. At those
times the event loop is patiently checking each coroutine to see when it’s
ready to resume.
Adding a
coroutine
I mentioned how you get delays between the output running the program in
listing 1. It is more user-friendly to show some sort of progress
indicator. You can use the magic of cooperative multitasking to implement
this. The coroutine function below displays a dot a couple times a second
as a progress indicator.
async def progress_indicator(delay, loop): while True: try: await asyncio.sleep(delay) except asyncio.CancelledError: break #Print a dot, with no newline afterward & force the output to appear immediately print('.', end='', flush=True) #Check if this is the last remaining task, and exit if so num_active_tasks = [ task for task in asyncio.Task.all_tasks(loop) if not task.done() ] if len(num_active_tasks) == 1: break
This function takes two parameters, the minimum delay between printing a
dot, and the event loop object. This is the object that for example you’ve
seen created near the bottom of listing 1. There are quite a few bits of
asyncio
to which you’ll want to pass a loop object, just to
make sure you’re keeping the cooperation among a controlled group of
coroutines. In this case, pass the event loop to
asyncio.Task.all_tasks
, which then returns a list of all the
tasks (i.e. coroutines), which have been scheduled in that event loop,
including those which have completed. To get only the ones that have not
completed, screen the list further using task.done
.
Say you create a coroutine object from this function, passing in a delay of
0.5. It goes straight into an infinite loop, in a way you might remember
from infinite generators in the previous tutorial. It then invokes the
sleep delay but accounts for an exception if an external entity cancels
the coroutine, which can happen in several ways. In such cases, the
coroutine is interrupted with the asyncio.CancelledError
exception, causing us to break out of the infinite loop.
After the coroutine resumes normally, it prints a dot and then checks
whether all other coroutines have run their course. If
progress_indicator
is the single remaining coroutine, it
breaks out of the infinite loop.
Listing 2. A full listing updated
to use the progress_indicator
coroutine
import random import asyncio async def get_menus(): delay_minutes = random.randrange(3) #0 to 3 minutes await asyncio.sleep(delay_minutes) #Pretend a second is a minute async def get_order(): delay_minutes = random.randrange(10) await asyncio.sleep(delay_minutes) order = random.choice(['Special of the day', 'Fish & Chips', 'Pasta']) return order async def prepare_order(order): delay_minutes = random.randrange(10, 20) #10 to 20 minutes await asyncio.sleep(delay_minutes) print(' [Order ready from kitchen: ', order, ']') async def eat(): delay_minutes = random.randrange(20, 40) await asyncio.sleep(delay_minutes) async def get_payment(): delay_minutes = random.randrange(10) await asyncio.sleep(delay_minutes) async def progress_indicator(delay, loop): while True: try: await asyncio.sleep(delay) except asyncio.CancelledError: break #Print a dot, with no newline afterward & force the output to appear immediately print('.', end='', flush=True) #Check if this is the last remaining task, and exit if so num_active_tasks = [ task for task in asyncio.Task.all_tasks(loop) if not task.done() ] if len(num_active_tasks) == 1: break async def serve_table(table_number): await get_menus() print('Welcome. Please sit at table', table_number, 'Here are your menus') order = await get_order() print('Table', table_number, 'what will you be having today?') await prepare_order(order) print('Table', table_number, 'here is your meal:', order) await eat() print('Table', table_number, 'here is your check') await get_payment() print('Thanks for visiting us! (table', table_number, ')') #asyncio uses event loops to manage its operation loop = asyncio.get_event_loop() #Create coroutines for three tables gathered_coroutines = asyncio.gather( serve_table(1), serve_table(2), serve_table(3), progress_indicator(0.5, loop) ) #This is the entry from synchronous to asynchronous code. It will block #Until the coroutine passed in has completed loop.run_until_complete(gathered_coroutines) #We're done with the event loop loop.close()
Notice that now the event loop comes before creating the gathering of
coroutines. That’s because this loop must pass to
progress_indicator
, as you can see in the list of coroutines
to be gathered.
The following output is from a sample run:
.Welcome. Please sit at table 3 Here are your menus ..Welcome. Please sit at table 2 Here are your menus Welcome. Please sit at table 1 Here are your menus ........Table 3 what will you be having today? ......Table 1 what will you be having today? ....Table 2 what will you be having today? .......... [Order ready from kitchen: Fish & Chips ] Table 3 here is your meal: Fish & Chips .............. [Order ready from kitchen: Fish & Chips ] Table 2 here is your meal: Fish & Chips ...... [Order ready from kitchen: Pasta ] Table 1 here is your meal: Pasta ..............................Table 3 here is your check Thanks for visiting us! (table 3 ) ..........Table 2 here is your check ..Thanks for visiting us! (table 2 ) ......................Table 1 here is your check ..........Thanks for visiting us! (table 1 ) .
The progress indicator dots appear regularly, about every half second.
What sort of multitasking is this?
If you’ve ever done multithreading or multiprocessing in Python, you might
wonder how this asyncio
cooperative multitasking approach
compares. The main difference is that in the asyncio
approach, you’re not actually trying to have two coroutines do something
at the same physical moment in time, just as a restaurant server can’t
give table 1 menus at the exact same time that they serve table 3 its
meal. What the asyncio
event loop is doing is taking
advantage of the natural downtimes within tasks, allowing coroutines to do
work when there is work to do, but then cede control to other coroutines
when they go idle.
A coroutine doesn’t have control of when it gets to run again, and there is
a reason this is called cooperative multitasking. If one coroutine spends
too long without yielding control back to the event loop, it blocks
everything, causing unnecessary delay, and you lose the multitasking
benefits. This means you must first of all make sure your program is
suited to be implemented this way, and you must then carefully code your
program by breaking it into coroutines which release control to each other
at suitable times. This can be trickier than it sounds because you could
innocently call a regular function from a coroutine which takes a long
time, and the problem won’t be readily apparent.
As a general rule of thumb asyncio
event loops are best for
programs that frequently connect to networks, or that do a lot of querying
of a database and the like. Waiting for a remote server or database to
respond to a request or query is an ideal time to release control to the
event loop. In the past, programmers tended to use threads in such cases,
but asyncio
event loops are a much clearer and flexible way
to program than multithreading. One complication is that to gain the full
benefits of asyncio
event loops you need your network and
database APIs to be coded in asyncio
coroutines. Luckily,
there are now many Python third-party libraries implemented to take
advantage of asyncio
.
Nevertheless, you might sometimes run into a case where you want to use
asyncio
but need to use a library that does not support
asyncio
. In other words, you need to call synchronous code
from asynchronous code without spoiling the multitasking. You can do this
with asyncio
executors which run the synchronous code in a
separate thread or process. I wanted to mention this because you might be
wondering, but further detail is outside the scope of these tutorials.
Conclusion
As you get more and more proficient with asyncio
, you’ll learn
of other exotic concepts related to the technique, including the
impressively named “futures.” You’ll also learn that there are different
ways for a coroutine to release control to the event loop, including
async with
, and if you’re using Python 3.6 or newer,
async for
. I won’t cover the latter since this tutorial
series has Python 3.5 as the minimum requirement, but in the next
tutorial, you will learn about async with
, along with other
cool techniques.
Related topics
See the tech
talks, code patterns and read blogs about Python.
Browse the Python courses on
cognitiveclass.ai
Downloadable resources
Credit: Source link