asyncio
Updated:
I read this stackoverflow post that explained python asyncio very well.
I’ve tried to understand asyncio a few times in the past year but they’ve all been false starts.
It’s never really stuck and I could never understand the fundamental concepts to comprehend how it was different from
threading in concurrent.futures and why/when I should be using it.
What is Asyncio
TLDR
asyncio is stdlib that implements an event loop and orchestrates multiple “threads” to allow concurrency.
History
asyncio is part of the stdlib that uses the new async features in python 3.4+ to facilitate event-driven programming.
I believe asyncio was included starting in python 3.6, after there were enough features implemented in the language to
support it.
What we need to understand is that it’s an implementation of the event loop that allows for asynchronous programming. If someone wishes to design a different event loop, all the rest of the async concepts can still be used.
Shipping all these async feature support and now easy way to use it would be silly.
But it’s important to understand the history.
Kind of like urllib2, which is part of the stdlib but no one really uses directly anymore: requests and urllib3
are the new hotness.
asyncio might suffer the same fate in the future and that’s okay, it was important and useful at one point in time.
Async Concepts
Generators
Creation
A generator is an object that results when a function yield execution.
yield pauses execution of the function, snapshots the frame, and then moves back to the previous frame to continue execution.
Calling next() on a generator will resume execution with the frame reloaded.
def foo():
yield 1
yield 2
# gen is a generator object. The function is not yet executed
gen = foo()
print(next(gen))
next(gen)
Generators have existed before, they’re commonly used in place of lazily loaded loops or list/dict comprehension.
The literal form uses () with the for i in j format.
Communicating With Generator
One less known feature is that it’s bi-directional and you can send values to it.
def foo():
val = yield 1
if val == 'okay dokie'
print('nice')
else:
print('error')
gen = foo()
total = next(gen) + 10
gen.send('okay dokie')
You can even throw exccption into the generator with throw().
This will load the frame, then immediately raise the exception with the stacktrace point set appropriately.
yield from
The yield from keyword was introduced in python 3.4.
It’s works like a tunnel, allow a functionA to yield from a nested functionB.
Then when the caller of functionA send or throw, it is proxied to functionB
def foo():
print('I am foo')
yield from bar()
print('I am done')
def bar():
val = yield 1
print f'I received {val} from the outside.'
gen = foo()
next(gen)
gen.send('oh hi')
Coroutines
A coroutine is a function that can be suspended.
There’s nothing special from it, it simply uses different keywords (async def) to help python differentiate between an
async function vs. normal function with yield.
await is how you call a coroutine, which is similar to yield from.
This thread will suspend until something tells it to wake up again.
Futures
concurrent.futures let’s you create promises.
I actually don’t know the nuances of JavaScript promises but it sure looks the same to me.
Futures let you create function calls as objects, attach callbacks for when they finish, check their status, etc.
If all you did was create a future and immediately call result(), you would have a normal function call and program flow.
The add_done_callback method on a future is what async features are going to use.
They add a callback that will be called when future is complete.
It calls send on the generator that was await, which signals for it to continue execution.
Tasks
Tasks are a future that wrap coroutine and do all this glue code to setting up callbacks.
I think this concept can get munged with coroutines, as it’s a very light syntactic-ish sugar to reduce boilerplate code.
Back to Asyncio
So asyncio uses all these features in unison to implement asynchronous event loop.
-
First you create your coroutines and load them into the
asyncioevent loop queue. -
The event loop picks up the first task and runs it synchronously until it hits an
await. This triggers a task creation, callback, blah blah. -
Continue, with all this nested
awaitfunctions until you reachasynciosystem level calls. These are network requests or file IO. This is where the real smarts is implemented. -
selectis the OS system call that lets you know if there is data on the buffer or not. You can choose to block or not.asynciochecks it non-blocking and if there is data, it knows that the future is done and marks it as such. -
This triggers the
add_done_callback, which has been hooked up to callsendto the coroutine to wake it up and send the data.
Conclusion
This “under-the-hood” look into asyncio was eye-opening. I estimate about 70% of this information is correct but that’s not the point. I was able to clear the mystery up and learn more about why it’s necessary to have all this extra syntax and fluff and what it buys us.
Event-driven concurrency is a new-ish paradigm and it’s looking like it’s easier to achieve concurrency this way than the traditional “spin up N threads” approach. It feels like it’s safer because you don’t need to worry about sharing thread context. Sometimes I am okay with a single thread but I just want that one http request to background for a little bit.