User Guide

Retry Strategies in Hatchet: An Overview

Hatchet provides a comprehensive set of retry strategies to handle failures and ensure the resilience and reliability of your workflows. These strategies include both automatic and manual retry mechanisms, giving you the flexibility to address different types of failures and maintain data integrity.

Retry Strategies

  1. Automatic, Simple Step-Level Retries

    • Configure a specific number of retries for each step in your workflow
    • Automatically retry steps that fail due to transient issues
    • Mitigate the impact of temporary failures without manual intervention
  2. Manual Retries

    • Web dashboard displays a list of all workflow and step runs, along with their status
    • Investigate failures, modify input data, and manually retry failed steps or workflows
    • Useful for addressing non-transient failures, such as bugs or issues with external dependencies

A Note on Dead Letter Queues

A dead letter queue (DLQ) is a messaging concept used to handle messages that cannot be processed successfully. In the context of workflow management, a DLQ can be used to store failed workflow instances that require manual intervention or further analysis.

While Hatchet does not have a built-in dead letter queue feature, the persistence of failed workflow instances in the dashboard serves a similar purpose. By keeping a record of failed instances, Hatchet allows you to track and manage failures, perform root cause analysis, and take appropriate actions, such as modifying input data or updating your workflow code before manually retrying the failed instances.

It's important to note that the term "dead letter queue" is more commonly associated with messaging systems like Apache Kafka or Amazon SQS, where unprocessed messages are automatically moved to a separate queue for manual handling. In Hatchet, the failed instances are not automatically moved to a separate queue but are instead persisted in the dashboard for manual management.