We use cookies

We use cookies to ensure you get the best experience on our website. For more information on how we use cookies, please see our cookie policy.

By clicking "Accept", you agree to our use of cookies.
Learn more.

GuideIntro to Durable Workflows

Durable Workflows

A durable workflow is work whose execution state lives in Hatchet instead of in your process. When you run a durable workflow, the orchestrator owns that state: it records progress, survives your worker crashing or scaling down, and resumes from the last checkpoint so work is not lost or duplicated.

Why durable?

With ordinary tasks, “where we are” in the workflow lives in memory. If the process dies, that state is gone. With durable workflows, execution state is stored in the Hatchet event log. The orchestrator can therefore:

  • Recover from failures — replay from the last recorded step on another worker instead of restarting from scratch.
  • Handle long waits — release the worker slot during “wait 24 hours” or “wait for this event” steps, then resume when the wait completes.
  • Manage distributed state — keep multi-step, branching, or long-running flows consistent and replayable across workers and restarts.

Your code describes the steps; Hatchet makes them durable and resumable.

Two patterns

Hatchet supports two patterns for building durable workflows, and you can mix them within the same application. Both are durable — the difference is how you express the work. The key difference is whether you know the shape of work ahead of time.

Durable Taskshape of work is dynamicdo_work()line 12sleep_for(24h)checkpointspawn_tasks()fan-outchild task 1child task 2...wait_for_results()checkpointprocess_results()line 20single functionrun on any workerprocedural · checkpoints · N decided at runtime
DAG Workflowshape of work is known upfrontExtractTransform ATransform BLoadstartparallelwaits for bothdeclared graph · fixed shape · each task independent

Durable task execution — The shape of work is dynamic. A single long-running function that can pause for time or external signals (SleepFor, WaitForEvent) and spawn child tasks at runtime. Use durable tasks when:

  • The work is IO-bound — waiting for time to pass, external events, or human approval
  • The number of subtasks is determined at runtime (dynamic fan-out)
  • You need procedural control flow — loops, branches, or agent-style reasoning

Directed acyclic graphs (DAGs) — The shape of work is known upfront. You declare which tasks run, in what order, and what depends on what. Hatchet handles execution, parallelism, and retries within that fixed structure. Use DAGs when:

  • You have a well-defined pipeline (ETL, multi-step data processing)
  • Every task and dependency is known before the workflow starts
  • You want the full graph visible in the dashboard for debugging and monitoring

Choosing a pattern

DAGs are easier to visualize and reason about — every task, dependency, and data flow is visible as a graph. Durable tasks offer more flexibility — they can branch, loop, and spawn children dynamically — but their runtime behavior is harder to predict from the code alone. When in doubt, start with a DAG and reach for a durable task only when you need capabilities a static graph can’t express. You can always mix both patterns in the same application.

How workflows relate to tasks

A workflow is a container of tasks. Both standalone tasks and workflows are runnables — they share the same API (run, run_no_wait, schedule, and the other trigger methods all work identically).