Timeouts in Hatchet
Timeouts are an important concept in Hatchet that allow you to control how long a workflow or task is allowed to run before it is considered to have failed. This is useful for ensuring that your workflows don’t run indefinitely and consume unnecessary resources. Timeouts in Hatchet are treated as failures and the task will be retried if specified.
There are two types of timeouts in Hatchet:
- Scheduling Timeouts (Default 5m) - the time a task is allowed to wait in the queue before it is cancelled
- Execution Timeouts (Default 60s) - the time a task is allowed to run before it is considered to have failed
Timeout Format
In Hatchet, timeouts are specified using a string in the format <number><unit>
, where <number>
is an integer and <unit>
is one of:
s
for secondsm
for minutesh
for hours
For example:
10s
means 10 seconds4m
means 4 minutes1h
means 1 hour
If no unit is specified, seconds are assumed.
In the Python SDK, timeouts can also be specified as a datetime.timedelta
object.
Task-Level Timeouts
You can specify execution and scheduling timeouts for a task using the execution_timeout
and schedule_timeout
parameters when creating a task.
In these tasks, both timeouts are specified, meaning:
- If the task is not scheduled before the
schedule_timeout
is reached, it will be cancelled. - If the task does not complete before the
execution_timeout
is reached (after starting), it will be cancelled.
A timed out step does not guarantee that the step will be stopped immediately. The step will be stopped as soon as the worker is able to stop the step. See cancellation for more information.
Refreshing Timeouts
In some cases, you may need to extend the timeout for a step while it is running. This can be done using the refreshTimeout
method provided by the step context (ctx
).
For example:
In this example, the step initially would exceed its execution timeout. But before it does, we call the refreshTimeout
method, which extends the timeout and allows it to complete. Importantly, refreshing a timeout is an additive operation - the new timeout is added to the existing timeout. So for instance, if the task originally had a timeout of 30s
and we call refreshTimeout("15s")
, the new timeout will be 45s
.
The refreshTimeout
function can be called multiple times within a step to further extend the timeout as needed.
Use Cases
Timeouts are useful in a variety of scenarios:
- Ensuring workflows don’t run indefinitely and consume unnecessary resources
- Failing workflows early if a critical step takes too long
- Keeping workflows responsive by ensuring individual steps complete in a timely manner
- Preventing infinite loops or hung processes from blocking the entire system
For example, if you have a workflow that makes an external API call, you may want to set a timeout to ensure the workflow fails quickly if the API is unresponsive, rather than waiting indefinitely.
By carefully considering timeouts for your workflows and steps, you can build more resilient and responsive systems with Hatchet.