Making Requests Non-blocking in Tornado

Tornado is one of the most popular web frameworks for Python, which is based on a single thread IO loop (aka event loop). You can handle high concurrency with optimal performance. However, Tornado is single-threaded (in its common usage, although it supports multiple threads in advanced configurations), therefore any “blocking” task will block the whole server. This means that a blocking task will not allow the framework to pick the next task waiting to be processed.

Note: This post has been updated for modern Tornado (6.x), which runs on Python 3 and is built around native async/await coroutines. It was originally written for Tornado 4 in 2017; since then the old @tornado.web.asynchronous decorator has been removed (in Tornado 6.0), and the @tornado.gen.coroutine + yield style is no longer recommended.

The Problem: A Blocking Handler Freezes Everything

For example, this is the wrong way of writing a handler:

import time
import tornado.web


class MainHandler(tornado.web.RequestHandler):
    def get(self):
        self.write('[MainHandler] Hello, world')


class ComplexHandler(tornado.web.RequestHandler):
    def get(self):
        result = self.get_complex_result()
        self.write('[ComplexHandler] Result = %d.\n' % result)

    def get_complex_result(self):
        time.sleep(5)   # Assume the complex calculation takes 5 seconds
        return 100      # Assume the final result is 100

Note that time.sleep(5) blocks the entire event loop. Tornado runs every handler on a single thread and processes one callback at a time, so while get_complex_result() is sleeping, nothing else runs, not even an unrelated request to MainHandler. You can observe this by firing two requests at once: the second one only begins after the first returns, roughly five seconds later. This is exactly the trap the Asynchronous and non-Blocking I/O guide warns about: a single slow call drags down the whole server.

The Fix: Native async / await Coroutines

The solution is to write the handler as a native coroutine with async / await.

import asyncio
import tornado.web


class MainHandler(tornado.web.RequestHandler):
    def get(self):
        self.write('[MainHandler] Hello, world')


class ComplexHandler(tornado.web.RequestHandler):
    async def get(self):
        await asyncio.sleep(5)   # Yields control back to the event loop
        self.write('[ComplexHandler] Hello, world')

When the coroutine hits await asyncio.sleep(5), it suspends itself and hands control back to the event loop, which is then free to serve other requests; once the five seconds elapse, the loop resumes the coroutine right where it left off. As a result, the application can handle requests in MainHandler and ComplexHandler simultaneously.

Blocking Libraries: Offload to a Thread Pool

There is one important caveat, though: await is not magic. It only helps when the thing you await is itself non-blocking. Writing await time.sleep(5) would not even work (None is not awaitable), and calling a blocking library inside an async def handler still freezes the whole loop. Many third-party packages (synchronous database drivers, requests, heavy file or CPU work) simply have no async-aware version. For those, a ThreadPoolExecutor lets you push the blocking call off the event loop’s thread.

import concurrent.futures
import time

import tornado.concurrent
import tornado.web


class ComplexHandler(tornado.web.RequestHandler):
    executor = concurrent.futures.ThreadPoolExecutor(5)

    async def get(self):
        result = await self.get_complex_result()
        self.write('The final result is %d.\n' % result)

    @tornado.concurrent.run_on_executor
    def get_complex_result(self):
        print('Before Sleep.')
        time.sleep(5)    # Assume the complex calculation takes 5 seconds
        print('After Sleep.')
        return 100       # Assume the final result is 100

@tornado.concurrent.run_on_executor runs the decorated function in the handler’s executor and returns an awaitable, so the event loop stays free while the blocking work happens on a worker thread. The ThreadPoolExecutor(5) above caps the work at five concurrent threads; size the pool to match how many blocking calls you expect to run in parallel.

Keep in mind that threads sidestep blocking I/O but not CPU-bound work: because of Python’s GIL, only one thread executes Python bytecode at a time. The thread-pool trick is therefore ideal for I/O-bound blocking calls (network, disk, a synchronous DB driver) and far less useful for heavy computation, which is better moved to a separate process.

References