Timeouts in Hatchet

Timeouts are an important concept in Hatchet that allow you to control how long a task is allowed to run before it is considered to have failed. This is useful for ensuring that your tasks don’t run indefinitely and consume unnecessary resources. Timeouts in Hatchet are treated as failures and the task will be retried if specified.

There are two types of timeouts in Hatchet:

Scheduling Timeouts (Default 5m) - the time a task is allowed to wait in the queue before it is cancelled
Execution Timeouts (Default 60s) - the time a task is allowed to run before it is considered to have failed

Timeout Format

In Hatchet, timeouts are specified using a string in the format <number><unit>, where <number> is an integer and <unit> is one of:

s for seconds
m for minutes
h for hours

For example:

10s means 10 seconds
4m means 4 minutes
1h means 1 hour

If no unit is specified, seconds are assumed.

In the Python SDK, timeouts can also be specified as a datetime.timedelta object.

Task-Level Timeouts

You can specify execution and scheduling timeouts for a task using the execution_timeout and schedule_timeout parameters when creating a task.

examples/python/timeout/worker.py

# 👀 Specify an execution timeout on a task
@timeout_wf.task(
    execution_timeout=timedelta(seconds=5), schedule_timeout=timedelta(minutes=10)
)
def timeout_task(input: EmptyModel, ctx: Context) -> dict[str, str]:
    time.sleep(30)
    return {"status": "success"}

examples/typescript/with_timeouts/workflow.ts

export const withTimeouts = hatchet.task({
  name: 'with-timeouts',
  // time the task can wait in the queue before it is cancelled
  scheduleTimeout: '10s',
  // time the task can run before it is cancelled
  executionTimeout: '10s',
  fn: async (input: SimpleInput, ctx) => {
    // wait 15 seconds
    await sleep(15000);

    // get the abort controller
    const { abortController } = ctx;

    // if the abort controller is aborted, throw an error
    if (abortController.signal.aborted) {
      throw new Error('cancelled');
    }

    return {
      TransformedMessage: input.Message.toLowerCase(),
    };
  },
});

examples/go/workflows/timeouts.go

// Create a task with a timeout of 3 seconds that tries to sleep for 10 seconds
	timeout := factory.NewTask(
		create.StandaloneTask{
			Name:             "timeout-task",
			ExecutionTimeout: 3 * time.Second, // Task will timeout after 3 seconds
		}, func(ctx worker.HatchetContext, input TimeoutInput) (*TimeoutResult, error) {
			// Sleep for 10 seconds
			time.Sleep(10 * time.Second)

			// Check if the context was cancelled due to timeout
			select {
			case <-ctx.Done():
				return nil, errors.New("TASK TIMED OUT")
			default:
				// Continue execution
			}

			return &TimeoutResult{
				Completed: true,
			}, nil
		},
		hatchet,
	)

In these tasks, both timeouts are specified, meaning:

If the task is not scheduled before the schedule_timeout is reached, it will be cancelled.
If the task does not complete before the execution_timeout is reached (after starting), it will be cancelled.

⚠️

A timed out step does not guarantee that the step will be stopped immediately. The step will be stopped as soon as the worker is able to stop the step. See cancellation for more information.

Refreshing Timeouts

In some cases, you may need to extend the timeout for a step while it is running. This can be done using the refreshTimeout method provided by the step context (ctx).

For example:

examples/python/timeout/worker.py

@refresh_timeout_wf.task(execution_timeout=timedelta(seconds=4))
def refresh_task(input: EmptyModel, ctx: Context) -> dict[str, str]:
    ctx.refresh_timeout(timedelta(seconds=10))
    time.sleep(5)

    return {"status": "success"}

examples/typescript/with_timeouts/workflow.ts

export const refreshTimeout = hatchet.task({
  name: 'refresh-timeout',
  executionTimeout: '10s',
  scheduleTimeout: '10s',
  fn: async (input: SimpleInput, ctx) => {
    // adds 15 seconds to the execution timeout
    ctx.refreshTimeout('15s');
    await sleep(15000);

    // get the abort controller
    const { abortController } = ctx;

    // now this condition will not be met
    // if the abort controller is aborted, throw an error
    if (abortController.signal.aborted) {
      throw new Error('cancelled');
    }

    return {
      TransformedMessage: input.Message.toLowerCase(),
    };
  },
});

examples/go/workflows/timeouts.go

timeout := factory.NewTask(
		create.StandaloneTask{
			Name:             "timeout-task",
			ExecutionTimeout: 3 * time.Second, // Task will timeout after 3 seconds
		}, func(ctx worker.HatchetContext, input TimeoutInput) (*TimeoutResult, error) {

			// Refresh the timeout by 10 seconds (new timeout will be 13 seconds)
			ctx.RefreshTimeout("10s")

			// Sleep for 10 seconds
			time.Sleep(10 * time.Second)

			// Check if the context was cancelled due to timeout
			select {
			case <-ctx.Done():
				return nil, errors.New("TASK TIMED OUT")
			default:
				// Continue execution
			}

			return &TimeoutResult{
				Completed: true,
			}, nil
		},
		hatchet,
	)

In this example, the step initially would exceed its execution timeout. But before it does, we call the refreshTimeout method, which extends the timeout and allows it to complete. Importantly, refreshing a timeout is an additive operation - the new timeout is added to the existing timeout. So for instance, if the task originally had a timeout of 30s and we call refreshTimeout("15s"), the new timeout will be 45s.

The refreshTimeout function can be called multiple times within a step to further extend the timeout as needed.

Use Cases

Timeouts are useful in a variety of scenarios:

Ensuring tasks don’t run indefinitely and consume unnecessary resources
Failing tasks early if a critical step takes too long
Keeping tasks responsive by ensuring individual steps complete in a timely manner
Preventing infinite loops or hung processes from blocking the entire system

For example, if you have a task that makes an external API call, you may want to set a timeout to ensure the task fails quickly if the API is unresponsive, rather than waiting indefinitely.

By carefully considering timeouts for your tasks and steps, you can build more resilient and responsive systems with Hatchet.