Warning! The Event Loop May Be Blocked
Matt Kaye
Published on May 27th, 2025
Blocked event loops are, by far, the most common problem we see when providing support to Hatchet users. If you use Hatchet, and Hatchet’s Python SDK in particular, you might’ve seen a warning like this:
THE TIME TO START THE STEP RUN IS TOO LONG, THE MAIN THREAD MAY BE BLOCKED
Scary! Let’s talk through what’s going on under the hood, and some possible causes for this warning in Hatchet and how to effectively debug.
New to async
/ await
and event loops in Python? I’d recommend checking out
FastAPI’s async documentation quickly
before getting started here. Hatchet handles synchronous and asynchronous work
very similarly to FastAPI.
Blocking I/O
First and foremost, in the vast majority of cases, this scary warning from Hatchet is being caused by the event loop being blocked. And if the event loop is blocked, there’s a very good chance that some code it is trying to run (read: a Hatchet task) is doing some blocking work. The asyncio
documentation puts their recommendation for how to handle blocking functions correctly very eloquently, in one sentence that gets right to the crux of the issue:
Blocking (CPU-bound) code should not be called directly.
CPU-bound work in the simplest terms is work that spends most of its time doing actual computation as opposed to e.g. waiting for some external process (like an API call) to complete.
Importantly, using e.g. requests.get
to make an API call also (confusingly) falls under this definition of “CPU-bound” even though it’s also just waiting, since while it waits the program cannot context switch to some other work running in the event loop (because requests
is not async).
A simple example of a blocked loop
Let’s give a simple example, which we’ll come back to later as a helpful debugging strategy. We’ll first write two functions:
And let’s run these concurrently with asyncio.gather
and asyncio.create_task
:
If you run this code, you’ll see logs like this:
Blocking 0
Blocking 1
Blocking 2
Non-blocking 0
Non-blocking 1
Non-blocking 2
On the other hand, you can run two tasks running the non-blocking function concurrently as you’d expect:
Which results in the logs below. Note that the output from the two tasks, A
and B
, are interleaved, indicating that they’re correctly running concurrently.
A 0
B 0
A 1
B 1
A 2
B 2
If you were to run code like this in a Hatchet task, you’d see the scary warning from above.
Understanding the Problem
The long and short of the problem here, as so nicely put by the asyncio
documentation, is that if some async code is doing anything blocking, then everything else will need to wait for that blocking operation to complete. This means that if you have a Hatchet worker running 1,000 tasks concurrently and one of them does something blocking, none of your other tasks will run while that blocking operation is happening.
Some common (and some less common) examples of blocking operations might include:
- Making a synchronous API call using
requests.get
- Performing a synchronous database operation using
psycopg
, such as running an expensiveSELECT
statement that takes a long time to complete - Running a CPU-bound algorithm, such as solving a Sudoku puzzle
In each of these cases, while this work is happening, no other async work on your Hatchet workers will be able to progress. We see some interesting and scary behavior if we run some blocking code in Hatchet. We’ll share some ideas for how to work around each of these blocking operations below.
Here we define a few tasks, one which is async and does blocking work (time.sleep
), one which is sync and does blocking work (time.sleep
), and one that is async and does non-blocking work (asyncio.sleep
).
As an experiment, we can run them as follows to simulate what might happen in a production environment:
The intention of this example is to first kick off the non-blocking sync and async tasks, let them start to process, then kick off the blocking task, let it start to process, and finally kick off the non-blocking sync task again, and then let all of them complete. The worker logs are illustrative:
[INFO] 🪓 -- 2025-05-23 16:27:01,165 - ------------------------------------------ [INFO] 🪓 -- 2025-05-23 16:27:01,165 - STARTING HATCHET... [INFO] 🪓 -- 2025-05-23 16:27:01,169 - starting runner... [INFO] 🪓 -- 2025-05-23 16:27:05,224 - rx: start step run: 7c7f831c-316d-4331-b9d4-9e264c63b82f/non_blocking_async:non_blocking_async [INFO] 🪓 -- 2025-05-23 16:27:05,225 - run: start step: non_blocking_async:non_blocking_async/7c7f831c-316d-4331-b9d4-9e264c63b82f Non blocking async 0 [INFO] 🪓 -- 2025-05-23 16:27:05,226 - rx: start step run: 1e9e1a58-f8e5-43a6-8ebc-20b84b62eef9/non_blocking_sync:non_blocking_sync [INFO] 🪓 -- 2025-05-23 16:27:05,226 - run: start step: non_blocking_sync:non_blocking_sync/1e9e1a58-f8e5-43a6-8ebc-20b84b62eef9 Non blocking sync 0 Non blocking async 1 Non blocking sync 1 [INFO] 🪓 -- 2025-05-23 16:27:06,236 - rx: start step run: 3ce959c5-0b05-4202-a6de-bfcf4600a517/blocking:blocking [INFO] 🪓 -- 2025-05-23 16:27:06,237 - run: start step: blocking:blocking/3ce959c5-0b05-4202-a6de-bfcf4600a517 Blocking 0 Non blocking sync 2 Blocking 1 [INFO] 🪓 -- 2025-05-23 16:27:07,245 - rx: start step run: 7742df98-169f-4afa-9075-e43c8b3ea8df/non_blocking_sync:non_blocking_sync Non blocking sync 3 Blocking 2 [WARNING] 🪓 -- 2025-05-23 16:27:08,899 - THE TIME TO START THE STEP RUN IS TOO LONG, THE MAIN THREAD MAY BE BLOCKED: Waiting Steps 1 <Task pending name='Task-5' coro=<WorkerActionListenerProcess.start_blocked_main_loop() running at /Users/matt/Documents/GitHub/hatchet/sdks/python/hatchet_sdk/worker/action_listener_process.py:163>> Non blocking sync 4 Blocking 3 [WARNING] 🪓 -- 2025-05-23 16:27:09,900 - THE TIME TO START THE STEP RUN IS TOO LONG, THE MAIN THREAD MAY BE BLOCKED: Waiting Steps 1 <Task pending name='Task-5' coro=<WorkerActionListenerProcess.start_blocked_main_loop() running at /Users/matt/Documents/GitHub/hatchet/sdks/python/hatchet_sdk/worker/action_listener_process.py:163>> Non blocking sync 5 Blocking 4 [WARNING] 🪓 -- 2025-05-23 16:27:10,902 - THE TIME TO START THE STEP RUN IS TOO LONG, THE MAIN THREAD MAY BE BLOCKED: Waiting Steps 1 <Task pending name='Task-5' coro=<WorkerActionListenerProcess.start_blocked_main_loop() running at /Users/matt/Documents/GitHub/hatchet/sdks/python/hatchet_sdk/worker/action_listener_process.py:163>> Blocking 5 [WARNING] 🪓 -- 2025-05-23 16:27:11,904 - THE TIME TO START THE STEP RUN IS TOO LONG, THE MAIN THREAD MAY BE BLOCKED: Waiting Steps 1 <Task pending name='Task-5' coro=<WorkerActionListenerProcess.start_blocked_main_loop() running at /Users/matt/Documents/GitHub/hatchet/sdks/python/hatchet_sdk/worker/action_listener_process.py:163>> [INFO] 🪓 -- 2025-05-23 16:27:12,257 - finished step run: blocking:blocking/3ce959c5-0b05-4202-a6de-bfcf4600a517 [INFO] 🪓 -- 2025-05-23 16:27:12,258 - run: start step: non_blocking_sync:non_blocking_sync/7742df98-169f-4afa-9075-e43c8b3ea8df Non blocking async 2 [INFO] 🪓 -- 2025-05-23 16:27:12,258 - finished step run: non_blocking_sync:non_blocking_sync/1e9e1a58-f8e5-43a6-8ebc-20b84b62eef9 Non blocking sync 0 [WARNING] 🪓 -- 2025-05-23 16:27:12,259 - THE TIME TO START THE STEP RUN IS TOO LONG, THE MAIN THREAD MAY BE BLOCKED: time to start: 5.012734889984131s Non blocking async 3 Non blocking sync 1 Non blocking async 4 Non blocking sync 2 Non blocking async 5 Non blocking sync 3 [INFO] 🪓 -- 2025-05-23 16:27:16,262 - finished step run: non_blocking_async:non_blocking_async/7c7f831c-316d-4331-b9d4-9e264c63b82f Non blocking sync 4 Non blocking sync 5 [INFO] 🪓 -- 2025-05-23 16:27:18,283 - finished step run: non_blocking_sync:non_blocking_sync/7742df98-169f-4afa-9075-e43c8b3ea8df ^C[INFO] 🪓 -- 2025-05-23 16:27:23,532 - received signal SIGINT... [INFO] 🪓 -- 2025-05-23 16:27:24,270 - gracefully exiting runner... [INFO] 🪓 -- 2025-05-23 16:27:25,272 - closing worker 'test-worker'... [INFO] 🪓 -- 2025-05-23 16:27:25,272 - 👋
Here’s a play-by-play of what happened:
- The non-blocking sync and async work starts, and their logs are interleaved (as you’d expect, since Hatchet runs tasks concurrently).
- We see this internal event:
run: start step: blocking:blocking
, indicating that the worker has now started running the blocking task. - After that log, we stop seeing any
Non blocking async
logs, as the event loop is blocked. Notice that at this point, we continue to seeNon blocking sync
logs. This is an important design decision in Hatchet. Hatchet runs synchronous tasks in a thread pool so they can be executed in a non-blocking way, which means that once a sync task has started, it can continue executing even if the main event loop is blocked. - We receive a
start step run
event, which indicates the last run has been triggered:rx: start step run: 7742df98-169f-4afa-9075-e43c8b3ea8df/non_blocking_sync:non_blocking_sync
. Importantly, you might expect that since this task is sync, it will be executed correctly without being blocked, similarly to how the previous one was in 3). This is not the case! Since the event loop is blocked, Hatchet cannot begin to execute this task run, which is why we immediately start seeing the scary warning log. THE TIME TO START THE STEP RUN IS TOO LONG, THE MAIN THREAD MAY BE BLOCKED: Waiting Steps 1
- We get a finished event for the blocking step:
finished step run: blocking:blocking
- The scary warning goes away, and we immediately go back to business as usual, seeing the next
Non blocking async 2
log. Importantly, this index (2) was where it left off before, but it “slept” for about six seconds (the duration of the blocking task), as opposed to the one second that we intended, between log lines. - All of the remaining work completes, including the new
Non blocking sync
task starting and finishing.
Debugging
So you’re seeing the scary warning: Now what?
Turn on asyncio
’s DEBUG
mode
asyncio
has a debug mode, which will give you more observability into the async operations that your worker is doing.
This will log warnings about slow callbacks and provide additional information about tasks that are taking too long.
Look for obviously blocking code
First line of defense: look for things that are obviously blocking. API calls, database operations, for loops doing something involved or running many iterations, and so on. Depending on what the problem is, there are different ways to handle different situations:
- If you’re making API calls with
requests
or similar, try usingaiohttp
instead to make the calls async. - If you’re using
psycopg2
or similar synchronous database libraries for database I/O, try usingasyncpg
orpsycopg[binary]
withasyncio
support instead, to make database operations async. - If you’re relying on an external library that does not provide async methods, try wrapping the methods in
asyncio.to_thread
to run them in a separate thread so they don’t block the main event loop. For example:await asyncio.to_thread(some_blocking_function, arg1, arg2)
. - Similarly, if you have some expensive CPU-bound work (see: solving Sudoku), use
asyncio.to_thread
there too to offload the work to a separate thread.
As a last resort, you can also change your tasks from being async to sync, although we don’t recommend this in the majority of cases.
Instrument your code
If you’ve resolved all of the obvious issues but the Scary Warning ™️ is still popping up, instrumenting your code can help find the bottleneck. Hatchet’s Python SDK provides an OpenTelemetry Instrumentor, which allows you to easily export traces and spans from your Hatchet workers. If you have some long-running tasks (or long start times), you can use the traces to get a better sense for what might be blocking. In particular, if there are some async operations that appear to just be hanging for significantly longer durations than they should take, this is a good indication they’re being blocked by something.
Similarly, you can also instrument your code with the AsyncioInstrumentor
and other, similar instrumentors depending on other tools in your stack.
Run your code separately from Hatchet
As a last resort, another thing to try is running your code in a fashion similar to how we did above, outside of Hatchet, by creating async tasks and using gather
to run them concurrently. If there’s blocking behavior, it’ll be apparent when one of the tasks is blocked.
Takeaways
Blocked event loops can significantly impact the performance of your Hatchet workers, causing tasks to wait unnecessarily and triggering those scary warning messages. We added the scary warning to the SDK to help flag that something might be blocking the loop. Note that it’s not always an indication that the event loop is blocked, but it’s a hint that something might be wrong.
By following the debugging steps outlined in this post, you should be able to:
- Identify blocking code in your async functions
- Replace synchronous operations with asynchronous alternatives
- Offload CPU-bound work to separate threads using
asyncio.to_thread
- Use instrumentation to triangulate performance bottlenecks
To reiterate the main point from the start of the post, taken directly from the asyncio
documentation:
Blocking (CPU-bound) code should not be called directly.