Simple Task Retries

Hatchet provides a simple and effective way to handle failures in your tasks using a retry policy. This feature allows you to specify the number of times a task should be retried if it fails, helping to improve the reliability and resilience of your tasks.

Task-level retries can be added to both Standalone Tasks and Workflow Tasks.

How it works

When a task fails (i.e. throws an error or returns a non-zero exit code), Hatchet can automatically retry the task based on the retries configuration defined in the task object. Here’s how it works:

If a task fails and retries is set to a value greater than 0, Hatchet will catch the error and retry the task.
The task will be retried up to the specified number of times, with each retry being executed after a short delay to avoid overwhelming the system.
If the task succeeds during any of the retries, the task will continue as normal.
If the task continues to fail after exhausting all the specified retries, the task will be marked as failed.

This simple retry mechanism can help to mitigate transient failures, such as network issues or temporary unavailability of external services, without requiring complex error handling logic in your task code.

How to use task-level retries

To enable retries for a task, simply add the retries property to the task object in your task definition:

examples/python/retries/worker.py

@simple_workflow.task(retries=3)
def always_fail(input: EmptyModel, ctx: Context) -> dict[str, str]:
    raise Exception("simple task failed")

examples/typescript/retries/workflow.ts

export const retries = hatchet.task({
  name: 'retries',
  retries: 3,
  fn: async (_, ctx) => {
    throw new Error('intentional failure');
  },
});

examples/go/retries/main.go

retries := client.NewStandaloneTask("retries-task", func(ctx hatchet.Context, input RetriesInput) (*RetriesResult, error) {
	return nil, errors.New("intentional failure")
}, hatchet.WithRetries(3))

You can add the retries property to any task, and Hatchet will handle the retry logic automatically.

It’s important to note that task-level retries are not suitable for all types of failures. For example, if a task fails due to a programming error or an invalid configuration, retrying the task will likely not resolve the issue. In these cases, you should fix the underlying problem in your code or configuration rather than relying on retries.

Additionally, if a task interacts with external services or databases, you should ensure that the operation is idempotent (i.e. can be safely repeated without changing the result) before enabling retries. Otherwise, retrying the task could lead to unintended side effects or inconsistencies in your data.

Accessing the Retry Count in a Running Task

If you need to access the current retry count within a task, you can use the retryCount method available in the task context:

examples/python/retries/worker.py

@simple_workflow.task(retries=3)
def fail_twice(input: EmptyModel, ctx: Context) -> dict[str, str]:
    if ctx.retry_count < 2:
        raise Exception("simple task failed")

    return {"status": "success"}

examples/typescript/retries/workflow.ts

export const retriesWithCount = hatchet.task({
  name: 'retriesWithCount',
  retries: 3,
  fn: async (_, ctx) => {
    // > Get the current retry count
    const retryCount = ctx.retryCount();

    console.log(`Retry count: ${retryCount}`);

    if (retryCount < 2) {
      throw new Error('intentional failure');
    }

    return {
      message: 'success',
    };
  },
});

examples/go/retries/main.go

retriesWithCount := client.NewStandaloneTask("fail-twice-task", func(ctx hatchet.Context, input RetriesWithCountInput) (*RetriesWithCountResult, error) {
	// Get the current retry count
	retryCount := ctx.RetryCount()

	fmt.Printf("Retry count: %d\n", retryCount)

	if retryCount < 2 {
		return nil, errors.New("intentional failure")
	}

	return &RetriesWithCountResult{
		Message: "success",
	}, nil
}, hatchet.WithRetries(3))

Exponential Backoff

Hatchet also supports exponential backoff for retries, which can be useful for handling failures in a more resilient manner. Exponential backoff increases the delay between retries exponentially, giving the failing service more time to recover before the next retry.

examples/python/retries/worker.py

@backoff_workflow.task(
    retries=10,
    # 👀 Maximum number of seconds to wait between retries
    backoff_max_seconds=10,
    # 👀 Factor to increase the wait time between retries.
    # This sequence will be 2s, 4s, 8s, 10s, 10s, 10s... due to the maxSeconds limit
    backoff_factor=2.0,
)
def backoff_task(input: EmptyModel, ctx: Context) -> dict[str, str]:
    if ctx.retry_count < 3:
        raise Exception("backoff task failed")

    return {"status": "success"}

examples/typescript/retries/workflow.ts

export const withBackoff = hatchet.task({
  name: 'withBackoff',
  retries: 10,
  backoff: {
    // 👀 Maximum number of seconds to wait between retries
    maxSeconds: 10,
    // 👀 Factor to increase the wait time between retries.
    // This sequence will be 2s, 4s, 8s, 10s, 10s, 10s... due to the maxSeconds limit
    factor: 2,
  },
  fn: async () => {
    throw new Error('intentional failure');
  },
});

examples/go/retries/main.go

withBackoff := client.NewStandaloneTask("with-backoff-task", func(ctx hatchet.Context, input BackoffInput) (*BackoffResult, error) {
	return nil, errors.New("intentional failure")
}, hatchet.WithRetries(3), hatchet.WithRetryBackoff(2, 10))

Bypassing Retry logic

The Hatchet SDKs each expose a NonRetryable exception, which allows you to bypass pre-configured retry logic for the task. If your task raises this exception, it will not be retried. This allows you to circumvent the default retry behavior in instances where you don’t want to or cannot safely retry. Some examples in which this might be useful include:

A task that calls an external API which returns a 4XX response code.
A task that contains a single non-idempotent operation that can fail but cannot safely be rerun on failure, such as a billing operation.
A failure that requires manual intervention to resolve.

examples/python/non_retryable/worker.py

@non_retryable_workflow.task(retries=1)
def should_not_retry(input: EmptyModel, ctx: Context) -> None:
    raise NonRetryableException("This task should not retry")

examples/typescript/non_retryable/workflow.ts

const shouldNotRetry = nonRetryableWorkflow.task({
  name: 'should-not-retry',
  fn: () => {
    throw new NonRetryableError('This task should not retry');
  },
  retries: 1,
});

examples/go/retries/main.go

retries := client.NewStandaloneTask("non-retryable-task", func(ctx hatchet.Context, input NonRetryableInput) (*NonRetryableResult, error) {
	return nil, worker.NewNonRetryableError(errors.New("intentional failure"))
}, hatchet.WithRetries(3))

In these cases, even though retries is set to a non-zero number (meaning the task would ordinarily retry), Hatchet will not retry.

Conclusion

Hatchet’s task-level retry feature is a simple and effective way to handle transient failures in your tasks, improving the reliability and resilience of your tasks. By specifying the number of retries for each task, you can ensure that your tasks can recover from temporary issues without requiring complex error handling logic.

Remember to use retries judiciously and only for tasks that are idempotent and can safely be repeated. For more advanced retry strategies, such as exponential backoff or circuit breaking, stay tuned for future updates to Hatchet’s retry capabilities.

Timeouts Bulk Retries and Cancellations

We use cookies

Simple Task Retries

How it works

How to use task-level retries

Accessing the Retry Count in a Running Task

Exponential Backoff

Bypassing Retry logic

Conclusion