# Hatchet Documentation
> Hatchet is a distributed task queue and workflow engine for modern applications. It provides durable execution, concurrency control, rate limiting, and observability for background tasks and workflows in Python, TypeScript, and Go.
---
---
asIndexPage: true
---
import { Callout } from "nextra/components";
import LanguageSwitcher from "@/components/LanguageSwitcher";
import Keywords from "@/components/Keywords";
# What is Hatchet?
Hatchet is a developer platform that helps engineering teams build and deploy mission-critical AI agents, durable workflows, and background tasks. It supports applications written in Python, Typescript, Go and Ruby, and can be used as a service through [Hatchet Cloud](https://cloud.onhatchet.run) or [self-hosting](/self-hosting) (we're [open-source and 100% MIT-licensed](https://github.com/hatchet-dev/hatchet)). Hatchet provides a full platform for queuing, automatic retries, real-time monitoring, alerting, and logging.
Unlike a traditional queuing system, Hatchet is built around the concept of durability. Every task and agent invocation is durably persisted in Hatchet, allowing for debugging, retries and replays, and more complex features like [durable workflows](/v1/durable-execution).
## Using these docs
Every docs page in the user guide uses inline code snippets across all four SDKs which are generated from tested examples:
## Concepts
There are three primary concepts to understand when getting started with Hatchet:
- **[Tasks](/v1/tasks)** — the fundamental unit of work. A task wraps a single function and gives Hatchet everything it needs to schedule, execute, and observe it.
- **[Workers](/v1/workers)** — long-running processes in your infrastructure that pick up and execute tasks.
- **[Durable Workflows](/v1/durable-execution)** — compose multiple tasks into durable pipelines with dependencies, retries, and checkpointing.
All tasks and workflows are **defined as code**, making them easy to version, test, and deploy.
## Use cases
While Hatchet is a general-purpose orchestration platform, it's particularly well-suited for:
- **AI agents** — Hatchet's durability features allow agents to automatically checkpoint their current state and pick up where they left off when faced with unexpected errors. Hatchet's observability features and distributed-first approach are built for debugging long-running agents at scale.
- **Massive parallelization** - Hatchet is built to handle millions of parallel task executions without overloading your workers. Worker-level slot control allows your workers to only accept the amount of work they can handle, while features like [fairness](/v1/concurrency) and [priorities](/v1/priority) are built to help scale massively parallel ingestion.
- **Mission-critical workloads** - everything in Hatchet is durable by default. This means that every task, DAG, event or agent invocation is stored in a durable event log and ready to be replayed at some point in the future.
## Self Hosting
If you plan on self-hosting or have requirements for an on-premise deployment, there are some additional considerations:
**Minimal Infra Dependencies** - Hatchet is built on top of PostgreSQL and for simple workloads, [it's all you need](/self-hosting/hatchet-lite).
**Fully Featured Open Source** - Hatchet is 100% MIT licensed, so you can run the same application code against [Hatchet Cloud](https://cloud.onhatchet.run) to get started quickly or [self-host](/self-hosting) when you need more control.
## Production Readiness
Hatchet has been battle-tested in production environments, processing billions of tasks per month for scale-ups and enterprises across various industries. Our open source offering is deployed over 10k times per month, while Hatchet Cloud supports hundreds of companies running at scale.
> "With Hatchet, we've scaled our indexing workflows effortlessly, reducing failed runs by 50% and doubling our user base in just two weeks!"
> — Soohoon, Co-Founder @ Greptile
> "Hatchet enables Aevy to process up to 50,000 documents in under an hour through optimized parallel execution, compared to nearly a week with our previous setup."
> — Ymir, CTO @ Aevy
## Ready to get started?
Get started quickly with the **[Hatchet Cloud Quickstart](/v1/quickstart)** or **[self-hosting](/self-hosting)**.
---
---
asIndexPage: true
---
import { snippets } from "@/lib/generated/snippets";
import { Snippet } from "@/components/code";
import { Callout, Card, Cards, Steps, Tabs } from "nextra/components";
import UniversalTabs from "../../components/UniversalTabs";
import Keywords from "@/components/Keywords";
# Hatchet Cloud Quickstart
By the end of this guide you'll have a worker running locally that executes a simple task triggered from the CLI.
> **Info:** This guide walks you through getting set up on Hatchet Cloud. If you'd like to
> self-host Hatchet, please see the [self-hosted quickstart](/self-hosting)
> instead.
### Sign up
If you haven't already signed up for Hatchet Cloud, please register [here](https://cloud.onhatchet.run).
### Set up your tenant
In Hatchet Cloud, you'll be shown a screen to create your first organization and tenant. A tenant is a logical separation of your environments (e.g. `dev`, `staging`, `production`). Each tenant has its own set of users who can access it.
After creating the tenant, you can simply follow the instructions in the Hatchet Cloud dashboard to set up your first quickstart project and workflow. We have copied the instructions in the following steps.
### Install the Hatchet CLI
#### Native Install (Recommended)
**MacOS, Linux, WSL**
```sh
curl -fsSL https://install.hatchet.run/install.sh | bash
```
#### Homebrew
**MacOS**
```sh
brew install hatchet-dev/hatchet/hatchet --cask
```
### Set your Hatchet profile
You will need to create a Hatchet CLI profile to connect to your Hatchet Cloud tenant. You can do this using the `hatchet profile add` command:
```sh
hatchet profile add
```
Note that the Hatchet Cloud dashboard will provide you with an API token to use when creating your profile.
### Run the quickstart
You can run the Hatchet Cloud quickstart using the `hatchet quickstart` command:
```sh
hatchet quickstart
```
### Run your worker
After setting up the quickstart project, you can run your worker locally by following the instructions printed after the quickstart command. This will involve using the `hatchet worker dev` command:
```sh
hatchet worker dev
```
### Trigger a workflow
Finally, you can trigger your workflow using the `hatchet trigger simple` command:
```sh
hatchet trigger simple
```
### (Optional) Install Hatchet docs MCP and Agent Skills
Get Hatchet documentation directly in your AI coding assistant (Cursor, Claude Code, and more):
```sh copy
hatchet docs install
```
Get agent skills for common CLI operations:
```sh copy
hatchet skills install
```
See the [full setup guide](/v1/using-coding-agents) for manual configuration options.
And that's it! You should now have a Hatchet project set up on Hatchet Cloud with a worker running locally.
## Next Steps
Once you've completed the quickstart, continue to the next section to learn how to [create your first task](/v1/tasks).
---
McpUrl,
CursorDeeplinkButton,
CursorMcpConfig,
ClaudeCodeCommand,
CursorTabLabel,
ClaudeCodeTabLabel,
OtherAgentsTabLabel,
} from "@/components/McpSetup";
import Keywords from "@/components/Keywords";
# Using Coding Agents
Hatchet is designed to work well with AI coding agents. This page covers how to give your agent access to Hatchet documentation and step-by-step skills for common CLI operations.
> **Info:** **Prerequisite:** The `hatchet skills install` and `hatchet docs install`
> commands require the Hatchet CLI. See the [CLI reference](/cli) for
> installation instructions.
## Agent Skills
Agent skills are reference documents that teach AI coding agents how to use the Hatchet CLI — triggering workflows, starting workers, debugging runs, and more.
Run the following command in your project root to install the skill package:
```bash copy
hatchet skills install
```
This creates a `skills/hatchet-cli/` directory with step-by-step reference files and appends a section to your project's `AGENTS.md` (and `CLAUDE.md`) pointing agents to the right file for each task.
**Install to a custom directory:**
```bash copy
hatchet skills install --dir ./my-project
```
After installation, commit the `skills/` directory and `AGENTS.md` to version control so all agents working in the repo benefit automatically.
### Available references
Reference, When to use
`references/setup-cli.md`, Installing the CLI, creating or listing profiles
`references/start-worker.md`, Starting a dev worker for local development
`references/trigger-and-watch.md`, Triggering a workflow and polling for completion
`references/debug-run.md`, Diagnosing a failed, stuck, or unexpected run
`references/replay-run.md`, Re-running a previous workflow with same or new input
## MCP Server
Hatchet documentation is available as an **MCP (Model Context Protocol) server**, so AI coding assistants like Cursor and Claude Code can search and reference Hatchet docs directly.
MCP endpoint:
#### Hatchet CLI
```bash copy
hatchet docs install claude-code
```
If `claude` is on your PATH, this runs the command automatically. Otherwise it prints it for you to copy.
#### Command
Run this command in your terminal:
For any AI tool that supports [llms.txt](https://llmstxt.org/), Hatchet docs are available at:
| Resource | URL |
|----------|-----|
| **llms.txt** (index) | [docs.hatchet.run/llms.txt](https://docs.hatchet.run/llms.txt) |
| **llms-full.txt** (all docs) | [docs.hatchet.run/llms-full.txt](https://docs.hatchet.run/llms-full.txt) |
| **Per-page markdown** | `docs.hatchet.run/llms/{section}/{page}.md` |
| **MCP endpoint** | |
Every documentation page also includes a `` header
pointing to its markdown version, and a "View as Markdown" link at the top of the page.
## llms.txt
For any AI tool that supports [llms.txt](https://llmstxt.org/), Hatchet docs are available at:
Resource, URL
**llms.txt** (index), [docs.hatchet.run/llms.txt](https://docs.hatchet.run/llms.txt)
**llms-full.txt** (all docs), [docs.hatchet.run/llms-full.txt](https://docs.hatchet.run/llms-full.txt)
**Per-page markdown**, `docs.hatchet.run/llms/{section}/{page}.md`
**MCP endpoint**,
Every documentation page also includes a `` header
pointing to its markdown version, and a "View as Markdown" link at the top of the page.
---
# Tasks
The fundamental unit of work in Hatchet is a **task**. At its most basic level, a task is just a function. You can invoke a task on its own (a "standalone" task), compose tasks into a [DAG workflow](/v1/patterns/directed-acyclic-graphs), or use [durable task composition](/v1/child-spawning) to spawn child tasks at runtime.
Every task you invoke is **durable** - Hatchet persists it, its state, and its results even after it finishes running.
## Defining a task
A task needs a name and a function. The function accepts an [input](#input-and-output) and a [context](#the-context-object).
#### Python
```python
class SimpleInput(BaseModel):
message: str
class SimpleOutput(BaseModel):
transformed_message: str
# Declare the task to run
@hatchet.task(name="first-task", input_validator=SimpleInput)
def first_task(input: SimpleInput, ctx: Context) -> SimpleOutput:
print("first-task task called")
return SimpleOutput(transformed_message=input.message.lower())
```
#### Typescript
```typescript
import { hatchet } from '../hatchet-client';
// (optional) Define the input type for the workflow
export type SimpleInput = {
Message: string;
};
export const simple = hatchet.task({
name: 'simple',
retries: 3,
fn: async (input: SimpleInput) => {
return {
TransformedMessage: input.Message.toLowerCase(),
};
},
});
```
#### Go
```go
type SimpleInput struct {
Message string `json:"message"`
}
type SimpleOutput struct {
Result string `json:"result"`
}
task := client.NewStandaloneTask("process-message", func(ctx hatchet.Context, input SimpleInput) (SimpleOutput, error) {
return SimpleOutput{
Result: "Processed: " + input.Message,
}, nil
})
```
#### Ruby
```ruby
FIRST_TASK = HATCHET.task(name: "first-task") do |input, ctx|
puts "first-task called"
{ "transformed_message" => input["message"].downcase }
end
```
## Input and output
Every task receives an **input** - a JSON-serializable object passed when the task is triggered. The value that is returned from the task becomes the task's **output**, which callers receive when they await the result of the task.
Hatchet's SDKs support type-checked and runtime-validated input and output types for tasks, so that you can integrate your Hatchet tasks into your codebase in a type-safe and predictable way that provides you all of the guarantees you get from, for example, replacing the Hatchet task run with a local function call.
You can refer to the [examples above](#defining-a-task) to see how to provide validators for task inputs and outputs.
## The context object
In addition to input and output payloads, every task receives a **context**. The context provides Hatchet-related information that might be useful to the execution of the task at runtime. For instance, you might access the workflow run ID, the task run ID, or the retry count from the context and have your task's application logic do something with those values.
The context also provides helper methods for interacting with a number of Hatchet's features, such as [managing cancellations](/v1/cancellation), [refreshing timeouts](/v1/timeouts#refreshing-timeouts), [pushing stream events](/v1/streaming#pushing-stream-events) and more.
#### Python
See the [Python SDK reference](/reference/python/context) for more details
#### Typescript
See the [TypeScript SDK reference](/reference/typescript/Context) for more
details
#### Go
See the [Go SDK
reference](https://pkg.go.dev/github.com/hatchet-dev/hatchet/sdks/go#Context)
for more details
#### Ruby
Ruby SDK reference coming soon! For now, see the [Python SDK
reference](/reference/python/context) to get a sense of what's available.
## Configuration
Tasks can be configured to handle common problems in distributed systems. For example, you might want to automatically retry a task when an external API returns a transient error, or limit how many instances of a task run at the same time to avoid overwhelming a downstream service.
Concept, What it does
[Retries](/v1/retry-policies), Retry the task on failure, with optional backoff.
[Timeouts](/v1/timeouts), Limit how long a task may wait to be scheduled or to run.
[Concurrency](/v1/concurrency), Distribute load fairly between your customers.
[Rate limits](/v1/rate-limits), Throttle task execution over a time window.
[Priority](/v1/priority), Influence scheduling order relative to other queued tasks.
[Worker affinity](/v1/advanced-assignment/worker-affinity), Prefer or require specific workers for this task.
## How tasks execute on workers
Tasks don't run on their own. [Workers](/v1/workers) execute them. A worker is a long-running process that registers one or more tasks with Hatchet. When you trigger a task, Hatchet places it in a queue and assigns it to an available worker that has registered that task. When a task completes, the Hatchet SDK running on the worker sends the result back to Hatchet, which marks the task as a success or failure, displays the results, and so on.
---
# Workers
Workers in Hatchet are the long-running processes that execute [tasks](/v1/tasks). In the broadest sense, it may be helpful to think of a worker as a simple `while` loop that receives a new task assignment from Hatchet, executes the task, and reports the results back.
When workers are spun up - in any environment, be it locally, on a VM, etc. - they will register themselves with Hatchet to start receiving and executing tasks.
## Declaring a worker
A worker needs a name and a set of tasks (or workflows, more on this later) to register:
#### Python
```python
def main() -> None:
worker = hatchet.worker("dag-worker", workflows=[dag_workflow])
worker.start()
```
#### Typescript
```typescript
import { hatchet } from '../hatchet-client';
import { simple } from './workflow';
import { parent, child } from './workflow-with-child';
import { simpleWithZod } from './zod';
async function main() {
const worker = await hatchet.worker('simple-worker', {
// 👀 Declare the workflows that the worker can execute
workflows: [simple, simpleWithZod, parent, child],
// 👀 Declare the number of concurrent task runs the worker can accept
slots: 100,
});
await worker.start();
}
if (require.main === module) {
main();
}
```
#### Go
```go
worker, err := client.NewWorker("simple-worker", hatchet.WithWorkflows(task))
if err != nil {
log.Fatalf("failed to create worker: %v", err)
}
interruptCtx, cancel := cmdutils.NewInterruptContext()
defer cancel()
err = worker.StartBlocking(interruptCtx)
if err != nil {
log.Fatalf("failed to start worker: %v", err)
}
```
#### Ruby
```ruby
def main
worker = HATCHET.worker("dag-worker", workflows: [DAG_WORKFLOW])
worker.start
end
```
When a worker starts, it registers each of its tasks and workflows with Hatchet. From that point on, Hatchet knows to route matching tasks to that worker.
One important note is that multiple workers can register the same task. In this scenario, Hatchet distributes work across all of them, allowing for simple horizontal scaling.
## Starting a worker
#### CLI (recommended)
The fastest way to run a worker during development is with the Hatchet CLI. This handles authentication and hot reloads on code changes:
```bash
hatchet worker dev
```
#### Script
You can also run workers without the CLI, which you're likely to do in a production setting, for instance. To do this, you'll first need to set a `HATCHET_CLIENT_TOKEN` environment variable, or provide it via parameters when creating the Hatchet client.
> **Info:** If you don't already have a token, you can generate one in the "API Tokens" section under "Settings" in the dashboard.
```bash
export HATCHET_CLIENT_TOKEN=""
```
If you're running a self-hosted engine without TLS enabled, also set:
```bash
export HATCHET_CLIENT_TLS_STRATEGY=none
```
Then run the worker:
#### Python
```bash
python worker.py
```
#### Typescript
Add a script to your `package.json`:
```json
"scripts": {
"start:worker": "ts-node src/worker.ts"
}
```
Then run it:
```bash
npm run start:worker
```
#### Go
```bash
go run main.go
```
#### Ruby
```bash
bundle exec ruby worker.rb
```
Once the worker starts, you will see logs confirming it is connected:
```
[INFO] 🪓 -- STARTING HATCHET...
[DEBUG] 🪓 -- 'test-worker' waiting for ['simpletask:step1']
[DEBUG] 🪓 -- acquired action listener: efc4aaf2-...
[DEBUG] 🪓 -- sending heartbeat
```
> **Info:** For self-hosted engines, there may be additional gRPC configuration options
> needed. See the [Self-Hosting](/self-hosting/worker-configuration-options)
> docs for details.
## Slots
Every worker has a fixed number of **slots** that control how many tasks it can run concurrently, which can be configured with the `slots` option on the worker. For instance, if `slots` is set to 5, the worker will run up to five tasks concurrently at any time. Any additional tasks wait in the queue until a slot opens up. Slots are a **local** limit. They protect the individual worker from attempting to run more tasks concurrently than desired, which can help control resource usage by the worker.
The default slot count for workers in Hatchet is 100. In many cases, leaving the default as-is will be perfectly fine, especially when first getting set up with Hatchet.
---
# Running Tasks
Once you've [defined some tasks](/v1/tasks) and registered them on a [worker](/v1/workers), you're ready to run them!
Hatchet lets you run tasks in a number of ways, which support different application needs. In broad strokes, these are [fire-and-wait](#fire-and-wait), various forms of [fire-and-forget-style triggering](#fire-and-forget), and scheduling tasks to run either [periodically](#crons) or at some [specific time in the future](#scheduled-runs).
## Fire-and-wait
Fire-and-wait is a common way of triggering a task which blocks until it completes and returns a result. This is particularly useful for situations where you want to do something with the result of your task. For instance, if your task generates some LLM output which you want to return to the user or persist in the database, you might trigger a task, wait for it to complete, collect its result, and then continue on with your application logic.
#### Python
Call `run` or `aio_run` on a `Task` or `Workflow` object to invoke it. These methods block until the task or workflow completes and return the result.
```python
from examples.child.worker import SimpleInput, child_task
child_task.run(SimpleInput(message="Hello, World!"))
```
The run methods in Hatchet's Python SDK also have async flavors you can `await`. These are prefixed with `aio_`, such as `aio_run`.
```python
result = await child_task.aio_run(SimpleInput(message="Hello, World!"))
```
Note that the type of `input` here is a Pydantic model that matches the input schema of the task or workflow being triggered.
#### Typescript
Call `run` on the `Task` object to invoke it. This returns a promise that resolves when the task completes and returns the result.
```typescript
const res = await parent.run(
{
Message: 'HeLlO WoRlD',
},
{
additionalMetadata: {
test: 'test',
},
}
);
// 👀 Access the results of the Task
console.log(res.TransformedMessage);
```
#### Go
Call `Run` on the `Task` object to invoke it. This blocks until the task completes and returns the result.
```go
result, err := task.Run(context.Background(), SimpleInput{Message: "Hello, World!"})
if err != nil {
return err
}
```
#### Ruby
Call `run` on the `Task` object to invoke it. This blocks until the task completes and returns the result.
```ruby
result = CHILD_TASK_WF.run({ "message" => "Hello, World!" })
```
## Fire-and-forget
On the other hand, fire-and-forget-style triggering enqueues a task without waiting for the result. This is useful for background jobs like sending emails, processing uploads, or kicking off long-running pipelines where the application does _not_ need to wait for the result to continue along.
#### Python
Call `run_no_wait` on a `Task` or `Workflow` object to enqueue it fire-and-forget. This returns a `WorkflowRunRef` you can use to access the run ID and result later.
```python
ref = say_hello.run(input=HelloInput(name="World"), wait_for_result=False)
```
There's also an async flavor:
```python
ref = await say_hello.aio_run(
input=HelloInput(name="Async World"), wait_for_result=False
)
```
Note that the type of `input` here is a Pydantic model that matches the input schema of the task.
#### Typescript
Call `run_no_wait` on the `Task` object to enqueue it without waiting for the result. This returns a `WorkflowRunRef`.
#### Go
Call `RunNoWait` on the `Task` object to enqueue it without waiting for the result. This returns a `WorkflowRunRef`.
```go
runRef, err := task.RunNoWait(context.Background(), SimpleInput{Message: "Hello, World!"})
if err != nil {
return err
}
fmt.Println(runRef.RunId)
```
#### Ruby
Call `run_no_wait` on the `Task` object to enqueue it without waiting for the result. This returns a `WorkflowRunRef`.
```ruby
ref = SAY_HELLO.run_no_wait({ "name" => "World" })
```
When running a task fire-and-forget-style, you can also always retrieve the result later on. The workflow run ref that's returned from these trigger methods has a result method, which lets you retrieve the result of the triggered task if you need it later.
#### Python
Use `ref.result()` to block until the result is available:
```python
result = ref.result()
```
or await `aio_result`:
```python
result = await ref.aio_result()
```
#### Typescript
```typescript
// the return object of the enqueue method is a WorkflowRunRef which includes a listener for the result of the workflow
const result = await run.output;
console.log(result);
// if you need to subscribe to the result of the workflow at a later time, you can use the runRef method and the stored runId
const ref = hatchet.runRef(runId);
const result2 = await ref.output;
console.log(result2);
```
#### Go
```go
result, err := runRef.Result()
if err != nil {
return err
}
var resultOutput SimpleOutput
err = result.TaskOutput("process-message").Into(&resultOutput)
if err != nil {
return err
}
fmt.Println(resultOutput.Result)
```
#### Ruby
```ruby
result = ref.result
```
### Triggering from events
Another method of running tasks fire-and-forget style is via [pushing events](/v1/external-events/pushing-events) to Hatchet. If a task is configured to be triggered by an event, then when the event is pushed to Hatchet, a corresponding run will be triggered using the payload of the event as the input to the run.
You can push events to Hatchet directly using the SDKs, or via [webhooks](/v1/webhooks), which are converted into Hatchet events internally on ingestion.
## Crons
Another common way to run tasks is on a cron schedule, which Hatchet [supports natively](/v1/cron-runs). Crons are useful for running tasks that are expected to run at the same time every day, such as data processing pipelines, reconciliation jobs, and so on.
Note that Hatchet supports second-level cron granularity, although in most cases using the minutes as the most granular level is perfectly fine.
## Scheduled runs
Finally, Hatchet also supports [scheduling a run to be triggered at a specific time in the future](/v1/scheduled-runs). This is particularly useful for situations where you want to wait a known amount of time before running some task, such as sending a follow up email or a welcome email to a new user, or allowing your customers to choose when they want something to run, such as sending a reminder, for instance.
## Triggering from the dashboard
Finally, there are a number of pages in the dashboard that have a `Trigger Run` button. You can provide run parameters such as input, additional metadata, and a scheduled time.

---
# Introduction to Durable Execution
At its core, Hatchet is a _**durable execution**_ platform. Unfortunately, durable execution is an overloaded, often-confusing term. If you're new to durable execution, or are curious for a refresher, we wrote a [blog post outlining the core ideas](https://hatchet.run/blog/durable-execution).
At its most basic level, durable execution provides a toolbox that, when used correctly, gives you some guarantees about tasks and workflows you write in Hatchet that you wouldn't get from an ordinary task queueing system.
## Guarantees
One of the main promises of durable execution, when used correctly, is to give your tasks something closer to [exactly-once semantics](https://www.confluent.io/blog/exactly-once-semantics-are-possible-heres-how-apache-kafka-does-it/) than you'd get from traditional task queues. In practice, this means that a durable task can guarantee that your application logic is cached correctly and retry-safe, such that every time a piece of a durable task completes, it creates a new checkpoint (an entry in a durable event log), from which we can replay without needing to re-execute the actual application logic.
This means that if you run a durable task to a midway point and the worker it's running on crashes, you can replay the task from whatever checkpoint it last reached without re-running any of the previous steps or duplicating any work. This is priceless in systems that cannot reasonably be made idempotent, so replaying on failure is impossible.
## Core Assumptions
The core assumption of durable execution in Hatchet is that durable tasks only do one of two things: They can _wait_ for something, such as a [sleep to complete](/v1/sleep) or an [event to be received](/v1/events), or they can _[spawn child tasks](/v1/child-spawning)_. These operations can also be composed, such that you can have a durable task wait for _either_ a sleep to complete _or_ an event to be pushed, whichever comes first. You can achieve this behavior by using [or groups](/v1/patterns/durable-task-execution#or-groups).
## Example Uses
There are lots of cases where durable execution is useful. A few common ones where it's an obvious choice are:
1. Agentic workflows, especially ones that require human-in-the-loop steps, which continuously spawn children, collect results, spawn more children, and so on, in a loop. Durable tasks are an obvious fit here, since the durable task can be replayed from where it left off without losing any of the progress that was made by the agent in the past, and without needing to e.g. replay the human-in-the-loop portions of the task, such as approvals or similar.
2. Tasks that are hard to make idempotent, where we cannot replay part of the task once it's completed. For example, something that involves sending an email to a customer, or updating a value in a table midway through.
3. Dynamic workflows, where we build a DAG at runtime by selecting which child workflows to spawn based on the input to the durable task or the results of upstream checkpoints. This is particularly useful for powering tools like drag-and-drop DAG builders.
## Learn More!
There are lots of durable execution concepts and features to cover, and we're only just scratching the surface here! Check out our more detailed [durable execution documentation](/v1/patterns) to keep learning and building.
---
# Scheduled Runs
> This example assumes we have a [task](/v1/tasks) registered on a running [worker](/v1/workers).
Scheduled runs allow you to trigger a task at a specific time in the future. Some example use cases of scheduling runs might include:
- Sending a reminder email at a specific time after a user took an action.
- Running a one-time maintenance task at a predetermined time as determined by your application. For instance, you might want to run a database vacuum during a maintenance window any time a task matches a certain criteria.
- Allowing a customer to decide when they want your application to perform a specific task. For instance, if your application is a simple alarm app that sends a customer a notification at a time that they specify, you might create a scheduled run for each alarm that the customer sets.
Hatchet supports scheduled runs to run on a schedule defined in a few different ways:
- [Programmatically](/v1/scheduled-runs#programmatically-creating-scheduled-runs): Use the Hatchet SDKs to dynamically set the schedule of a task.
- [Hatchet Dashboard](/v1/scheduled-runs#managing-scheduled-runs-in-the-hatchet-dashboard): Manually create scheduled runs from the Hatchet Dashboard.
> **Warning:** The scheduled time is when Hatchet **enqueues** the task, not when the run
> starts. Scheduling constraints like concurrency limits, rate limits, and retry
> policies can affect run start times.
## Programmatically Creating Scheduled Runs
### Create a Scheduled Run
You can create dynamic scheduled runs programmatically via the API to run tasks at a specific time in the future.
Here's an example of creating a scheduled run to trigger a task tomorrow at noon:
#### Python
```python
from datetime import datetime
from examples.simple.worker import simple
schedule = simple.schedule(datetime(2025, 3, 14, 15, 9, 26))
## 👀 do something with the id
print(schedule.id)
```
#### Typescript
```typescript
const runAt = new Date(new Date().setHours(12, 0, 0, 0) + 24 * 60 * 60 * 1000);
const scheduled = await simple.schedule(runAt, {
Message: 'hello',
});
// 👀 Get the scheduled run ID of the workflow
// it may be helpful to store the scheduled run ID of the workflow
// in a database or other persistent storage for later use
const scheduledRunId = scheduled.metadata.id;
console.log(scheduledRunId);
```
#### Go
```go
scheduledRun, err := client.Schedules().Create(
context.Background(),
"scheduled",
features.CreateScheduledRunTrigger{
TriggerAt: time.Now().Add(1 * time.Minute),
Input: map[string]interface{}{"message": "Hello, World!"},
},
)
if err != nil {
log.Fatalf("failed to create scheduled run: %v", err)
}
```
#### Ruby
```ruby
schedule = SIMPLE.schedule(Time.now + 86_400, input: { "message" => "Hello, World!" })
## do something with the id
puts schedule.metadata.id
```
In this example you can have different scheduled times for different customers, or dynamically set the scheduled time based on some other business logic.
When creating a scheduled run via the API, you will receive a scheduled run object with a metadata property containing the id of the scheduled run. This id can be used to reference the scheduled run when deleting the scheduled run and is often stored in a database or other persistence layer.
> **Info:** Note: Be mindful of the time zone of the scheduled run. Scheduled runs are
> **always** stored and returned in UTC.
### Deleting a Scheduled Run
You can delete a scheduled run by calling the `delete` method on the scheduled client.
#### Python
```python
hatchet.scheduled.delete(scheduled_id=scheduled_run.metadata.id)
```
#### Typescript
```typescript
await hatchet.scheduled.delete(scheduled);
```
#### Go
```go
err = client.Schedules().Delete(
context.Background(),
scheduledRun.Metadata.Id,
)
if err != nil {
log.Fatalf("failed to delete scheduled run: %v", err)
}
```
#### Ruby
```ruby
hatchet.scheduled.delete(scheduled_run.metadata.id)
```
### Listing Scheduled Runs
You can list all scheduled runs for a task by calling the `list` method on the scheduled client.
#### Python
```python
scheduled_runs = hatchet.scheduled.list()
```
#### Typescript
```typescript
const scheduledRuns = await hatchet.scheduled.list({
workflow: simple,
});
console.log(scheduledRuns);
```
#### Go
```go
scheduledRuns, err := client.Schedules().List(
context.Background(),
rest.WorkflowScheduledListParams{},
)
if err != nil {
log.Fatalf("failed to list scheduled runs: %v", err)
}
```
#### Ruby
```ruby
scheduled_runs = hatchet.scheduled.list
```
### Rescheduling a Scheduled Run
If you need to change the trigger time for an existing scheduled run, you can reschedule it by updating its `triggerAt`.
#### Python
```python
hatchet.scheduled.update(
scheduled_id=scheduled_run.metadata.id,
trigger_at=datetime.now(tz=timezone.utc) + timedelta(hours=1),
)
```
#### Typescript
```typescript
await hatchet.scheduled.update(scheduledRunId, {
triggerAt: new Date(Date.now() + 60 * 60 * 1000),
});
```
#### Ruby
```ruby
hatchet.scheduled.update(
scheduled_run.metadata.id,
trigger_at: Time.now + 3600
)
```
> **Warning:** You can only reschedule scheduled runs created via the API (not runs created
> via a code-defined schedule), and Hatchet may reject rescheduling if the run
> has already triggered.
### Bulk operations (delete / reschedule)
Hatchet supports bulk operations for scheduled runs. You can bulk delete scheduled runs, and you can bulk reschedule scheduled runs by providing a list of updates.
#### Python
```python
hatchet.scheduled.bulk_delete(scheduled_ids=[id])
hatchet.scheduled.bulk_delete(
workflow_id="workflow_id",
statuses=[ScheduledRunStatus.SCHEDULED],
additional_metadata={"customer_id": "customer-a"},
)
```
```python
hatchet.scheduled.bulk_update(
[
(id, datetime.now(tz=timezone.utc) + timedelta(hours=2)),
]
)
```
#### Typescript
```typescript
await hatchet.scheduled.bulkDelete({
scheduledRuns: [scheduledRunId],
});
```
```typescript
await hatchet.scheduled.bulkUpdate([
{ scheduledRun: scheduledRunId, triggerAt: new Date(Date.now() + 2 * 60 * 60 * 1000) },
]);
```
#### Ruby
```ruby
hatchet.scheduled.bulk_delete(scheduled_ids: [id])
```
```ruby
hatchet.scheduled.bulk_update(
[[id, Time.now + 7200]]
)
```
## Managing Scheduled Runs in the Hatchet Dashboard
In the Hatchet Dashboard, you can view and manage scheduled runs for your tasks.
Navigate to "Triggers" > "Scheduled Runs" in the left sidebar and click "Create Scheduled Run" at the top right.
You can specify run parameters such as Input, Additional Metadata, and the Scheduled Time.

You can also manage existing scheduled runs:
- **Single-run actions**: Use the per-row actions menu to **Reschedule** or **Delete** an individual scheduled run.
- **Bulk actions**: Use the **Actions** menu to bulk **Delete** or **Reschedule** either:
- The selected rows, or
- All rows matching the current filters (including “all” if no filters are set).
> **Info:** In the dashboard, reschedule/delete actions may be disabled for runs that were
> created via a code-defined schedule, and rescheduling may be disabled for runs
> that have already triggered.
## Scheduled Run Considerations
When using scheduled runs, there are a few considerations to keep in mind:
1. **Time Zone**: Scheduled runs are stored and returned in UTC. Make sure to consider the time zone when defining your scheduled time.
2. **Execution Time**: The actual execution time of a scheduled run may vary slightly from the scheduled time. Hatchet makes a best-effort attempt to enqueue the task as close to the scheduled time as possible, but there may be slight delays due to system load or other factors.
3. **Missed Schedules**: If a scheduled task is missed (e.g., due to system downtime), Hatchet will not automatically run the missed instances when the service comes back online.
4. **Overlapping Schedules**: If a task is still running when a second scheduled run is scheduled to start, Hatchet will start a new instance of the task or respect [concurrency](/v1/concurrency) policy.
---
# Recurring Runs with Cron
> This example assumes we have a [task](/v1/tasks) registered on a running [worker](/v1/workers).
A [Cron](https://en.wikipedia.org/wiki/Cron) is a time-based job scheduler that allows you to define when a task should be executed automatically on a pre-determined schedule.
Some example use cases for cron-style tasks might include:
1. Running a daily report at a specific time.
2. Sending weekly digest emails to users about their activity from the past week.
3. Running a monthly billing process to generate invoices for customers.
Hatchet supports cron triggers to run on a schedule defined in a few different ways:
- [Task Definitions](/v1/cron-runs#defining-a-cron-in-your-task-definition): Define a cron expression in your task definition to trigger the task on a predefined schedule.
- [Dynamic Programmatically](/v1/cron-runs#programmatically-creating-cron-triggers): Use the Hatchet SDKs to dynamically set the cron schedule of a task.
- [Hatchet Dashboard](/v1/cron-runs#managing-cron-jobs-in-the-hatchet-dashboard): Manually create cron triggers from the Hatchet Dashboard.
> **Warning:** The expression is when Hatchet **enqueues** the task, not when the run starts.
> Scheduling constraints like concurrency limits, rate limits, and retry
> policies can affect run start times.
### Cron Expression Syntax
Cron expressions in Hatchet follow the standard cron syntax. Hatchet supports both 5-field and 6-field expressions.
A cron expression consists of 5 to 6 fields separated by spaces. If there are 6 fields, the first field is seconds; if
there are 5 fields, the first field is minutes.
```
┌───────────── second (0 - 59) (optional)
│ ┌───────────── minute (0 - 59)
│ │ ┌───────────── hour (0 - 23)
│ │ │ ┌───────────── day of the month (1 - 31)
│ │ │ │ ┌───────────── month (1 - 12)
│ │ │ │ │ ┌───────────── day of the week (0 - 6) (Sunday to Saturday)
* * * * * *
```
Each field can contain a specific value, an asterisk (`*`) to represent all possible values, or a range of values. Here are some examples of cron expressions:
- `0 0 * * *`: Run every day at midnight
- `*/15 * * * *`: Run every 15 minutes
- `0 9 * * 1`: Run every Monday at 9 AM
- `0 0 1 * *`: Run on the first day of every month at midnight
- `30 * * * * *`: Run at 30 seconds past every minute (6-field)
> **Info:** Keep in mind, Hatchet Cloud meters by Task runs so use seconds wisely. Have
> questions about pricing? [Contact us](https://hatchet.run/office-hours)
## Defining a Cron in Your Task Definition
You can define a task with a cron schedule by configuring the cron expression as part of the task definition:
#### Python-Sync
```python
# Adding a cron trigger to a workflow is as simple
# as adding a `cron expression` to the `on_cron`
# prop of the workflow definition
cron_workflow = hatchet.workflow(name="CronWorkflow", on_crons=["* * * * *"])
@cron_workflow.task()
def step1(input: EmptyModel, ctx: Context) -> dict[str, str]:
return {
"time": "step1",
}
```
#### Python-Async
```python
# Adding a cron trigger to a workflow is as simple
# as adding a `cron expression` to the `on_cron`
# prop of the workflow definition
cron_workflow = hatchet.workflow(name="CronWorkflow", on_crons=["* * * * *"])
@cron_workflow.task()
def step1(input: EmptyModel, ctx: Context) -> dict[str, str]:
return {
"time": "step1",
}
```
#### Typescript
```typescript
export const onCron = hatchet.workflow({
name: 'on-cron-workflow',
on: {
// 👀 add a cron expression to run the workflow every 15 minutes
cron: '*/15 * * * *',
},
});
```
#### Go
```go
dailyCleanup := client.NewStandaloneTask("cleanup-temp-files", func(ctx hatchet.Context, input CronInput) (CronOutput, error) {
log.Printf("Running daily cleanup at %s", input.Timestamp)
time.Sleep(2 * time.Second)
return CronOutput{
JobName: "daily-cleanup",
ExecutedAt: time.Now().Format(time.RFC3339),
NextRun: "Next run: tomorrow at 2 AM",
}, nil
},
hatchet.WithWorkflowCron("0 2 * * *"),
hatchet.WithWorkflowCronInput(CronInput{
Timestamp: time.Now().Format(time.RFC3339),
}),
hatchet.WithWorkflowDescription("Daily cleanup and maintenance tasks"),
)
```
#### Ruby
```ruby
CRON_WORKFLOW = HATCHET.workflow(
name: "CronWorkflow",
on_crons: ["*/5 * * * *"]
)
CRON_WORKFLOW.task(:cron_task) do |input, ctx|
puts "Cron task executed at #{Time.now}"
{ "status" => "success" }
end
```
In the examples above, we set the `on cron` property of the task. The property specifies the cron expression that determines when the task should be triggered.
Note: When modifying a cron in your task definition, it will override any cron
schedule for previous crons defined in previous task definitions, but crons
created via the API or Dashboard will still be respected.
## Programmatically Creating Cron Triggers
### Create a Cron Trigger
You can create dynamic cron triggers programmatically via the API. This is useful if you want to create a cron trigger that is not known at the time of task definition,
Here's an example of creating a a cron to trigger a report for a specific customer every day at noon:
#### Python-Sync
```python
cron_trigger = dynamic_cron_workflow.create_cron(
cron_name="customer-a-daily-report",
expression="0 12 * * *",
input=DynamicCronInput(name="John Doe"),
additional_metadata={
"customer_id": "customer-a",
},
)
id = cron_trigger.metadata.id # the id of the cron trigger
```
#### Python-Async
```python
cron_trigger = await dynamic_cron_workflow.aio_create_cron(
cron_name="customer-a-daily-report",
expression="0 12 * * *",
input=DynamicCronInput(name="John Doe"),
additional_metadata={
"customer_id": "customer-a",
},
)
cron_trigger.metadata.id # the id of the cron trigger
```
#### Typescript
```typescript
const cron = await simple.cron('simple-daily', '0 0 * * *', {
Message: 'hello',
});
// it may be useful to save the cron id for later
const cronId = cron.metadata.id;
```
#### Go
```go
createdCron, err := client.Crons().Create(context.Background(), "cleanup-temp-files", features.CreateCronTrigger{
Name: "daily-cleanup",
Expression: "0 0 * * *",
Input: map[string]interface{}{
"timestamp": time.Now().Format(time.RFC3339),
},
AdditionalMetadata: map[string]interface{}{
"description": "Daily cleanup and maintenance tasks",
},
})
if err != nil {
return err
}
```
#### Ruby
```ruby
cron_trigger = dynamic_cron_workflow.create_cron(
"customer-a-daily-report",
"0 12 * * *",
input: { "name" => "John Doe" }
)
id = cron_trigger.metadata.id
```
In this example you can have different expressions for different customers, or dynamically set the expression based on some other business logic.
When creating a cron via the API, you will receive a cron trigger object with a metadata property containing the id of the cron trigger. This id can be used to reference the cron trigger when deleting the cron trigger and is often stored in a database or other persistence layer.
Note: Cron Name and Expression are required fields when creating a cron
trigger and we enforce a unique constraint on the two.
### Delete a Cron Trigger
You can delete a cron trigger by passing the cron object or a cron trigger id to the delete method.
#### Python-Sync
```python
hatchet.cron.delete(cron_id=cron_trigger.metadata.id)
```
#### Python-Async
```python
await hatchet.cron.aio_delete(cron_id=cron_trigger.metadata.id)
```
#### Typescript
```typescript
await hatchet.crons.delete(cronId);
```
#### Go
```go
err = client.Crons().Delete(context.Background(), createdCron.Metadata.Id)
if err != nil {
return err
}
```
#### Ruby
```ruby
hatchet.cron.delete(cron_trigger.metadata.id)
```
Note: Deleting a cron trigger will not cancel any currently running instances
of the task. It will simply stop the cron trigger from triggering the task
again.
### List Cron Triggers
Retrieves a list of all task cron triggers matching the criteria.
#### Python-Sync
```python
cron_triggers = hatchet.cron.list()
```
#### Python-Async
```python
await hatchet.cron.aio_list()
```
#### Typescript
```typescript
const crons = await hatchet.crons.list({
workflow: simple,
});
```
#### Go
```go
cronList, err := client.Crons().List(context.Background(), rest.CronWorkflowListParams{
AdditionalMetadata: &[]string{"description:Daily cleanup and maintenance tasks"},
})
if err != nil {
return err
}
```
#### Ruby
```ruby
cron_triggers = hatchet.cron.list
```
## Managing Cron Triggers in the Hatchet Dashboard
In the Hatchet Dashboard, you can view and manage cron triggers for your tasks.
Navigate to "Triggers" > "Cron Jobs" in the left sidebar and click "Create Cron Job" at the top right.
You can specify run parameters such as Input, Additional Metadata, and the Expression.

## Cron Considerations
When using cron triggers, there are a few considerations to keep in mind:
1. **Time Zone**: Cron schedules are UTC. Make sure to consider the time zone when defining your cron expressions.
2. **Execution Time**: The actual execution time of a cron-triggered task may vary slightly from the scheduled time. Hatchet makes a best-effort attempt to enqueue the task as close to the scheduled time as possible, but there may be slight delays due to system load or other factors.
3. **Missed Schedules**: If a scheduled task is missed (e.g., due to system downtime), Hatchet will **not** automatically run the missed instances. It will wait for the next scheduled time to trigger the task.
4. **Overlapping Schedules**: If a task is still running when the next scheduled time arrives, Hatchet will start a new instance of the task or respect the [concurrency](/v1/concurrency) policy.
---
# Bulk Run Many Tasks
Often you may want to run a task multiple times with different inputs. There is significant overhead (i.e. network roundtrips) to write the task, so if you're running multiple tasks, it's best to use the bulk run methods.
#### Python
You can use the `aio_run_many` method to bulk run a task. This will return a list of results.
```python
greetings = ["Hello, World!", "Hello, Moon!", "Hello, Mars!"]
results = await child_task.aio_run_many(
[
# run each greeting as a task in parallel
child_task.create_bulk_run_item(
input=SimpleInput(message=greeting),
)
for greeting in greetings
]
)
# this will await all results and return a list of results
print(results)
```
> **Info:** `Workflow.create_bulk_run_item` is a typed helper to create the inputs for
> each task.
There are additional bulk methods available on the `Workflow` object.
- `aio_run_many`
- `aio_run_many_no_wait`
And blocking variants:
- `run_many`
- `run_many_no_wait`
As with the run methods, you can call bulk methods from within a task and the runs will be associated with the parent task in the dashboard.
#### Typescript
You can use the `run` method directly to bulk run tasks by passing an array of inputs. This will return a list of results.
```typescript
const res = await simple.run([
{
Message: 'HeLlO WoRlD',
},
{
Message: 'Hello MoOn',
},
]);
// 👀 Access the results of the Task
console.log(res[0].TransformedMessage);
console.log(res[1].TransformedMessage);
```
There are additional bulk methods available on the `Task` object.
- `run`
- `runNoWait`
You can also use `runMany` and `runManyNoWait` for per-run options while keeping the same bulk execution behavior.
```typescript
const runManyRes = await simple.runMany([
{
input: {
Message: 'HeLlO WoRlD',
},
},
{
input: {
Message: 'Hello MoOn',
},
opts: {
priority: 3,
},
},
]);
console.log(runManyRes[0].TransformedMessage);
console.log(runManyRes[1].TransformedMessage);
```
Additional `Task` bulk methods:
- `runMany`
- `runManyNoWait`
As with the run methods, you can call bulk methods on the task fn context parameter within a task and the runs will be associated with the parent task in the dashboard.
```typescript
const parent = hatchet.task({
name: 'simple',
fn: async (input: SimpleInput, ctx) => {
// Bulk run two tasks in parallel
const child = await ctx.bulkRunChildren([
{
workflow: simple,
input: {
Message: 'Hello, World!',
},
},
{
workflow: simple,
input: {
Message: 'Hello, Moon!',
},
},
]);
return {
TransformedMessage: `${child[0].TransformedMessage} ${child[1].TransformedMessage}`,
};
},
});
```
Available bulk methods on the `Context` object are: - `bulkRunChildren` - `bulkRunChildrenNoWait`
#### Go
You can use the `RunMany` method directly on the `Workflow` or `StandaloneTask` instance to bulk run tasks by passing an array of inputs. This will return a list of run IDs.
```go
// Prepare inputs as []RunManyOpt for bulk run
inputs := make([]hatchet.RunManyOpt, len(bulkInputs))
for i, input := range bulkInputs {
inputs[i] = hatchet.RunManyOpt{
Input: input,
}
}
// Run workflows in bulk
ctx := context.Background()
runRefs, err := workflow.RunMany(ctx, inputs)
if err != nil {
log.Fatalf("failed to run bulk workflows: %v", err)
}
```
Additional bulk methods are coming soon for the Go SDK. Join our [Discord](https://hatchet.run/discord) to stay up to date.
#### Ruby
```ruby
greetings = ["Hello, World!", "Hello, Moon!", "Hello, Mars!"]
results = CHILD_TASK_WF.run_many(
greetings.map do |greeting|
CHILD_TASK_WF.create_bulk_run_item(
input: { "message" => greeting }
)
end
)
puts results
```
---
# Webhooks
Webhooks allow external systems to trigger Hatchet workflows by sending HTTP requests to dedicated endpoints. This enables real-time integration with third-party services like GitHub, Stripe, Slack, or any system that can send webhook events.
## Guides
We have step-by-step guides for the most common webhook integrations:
- [**Stripe**](/cookbooks/webhooks-stripe) — payments, subscriptions, invoices
- [**GitHub**](/cookbooks/webhooks-github) — pull requests, issues, pushes
- [**Slack**](/cookbooks/webhooks-slack) — slash commands, interactive components, event subscriptions
## Creating a webhook
To create a webhook, you'll need to fill out some fields that tell Hatchet how to determine which workflows to trigger from your webhook, and how to validate it when it arrives from the sender. In particular, you'll need to provide the following fields:
#### Name
The **Webhook Name** is tenant-unique (meaning a single tenant can only use each name once), and is used to create the URL for where the incoming webhook request should be sent. For instance, if your tenant id was `d60181b7-da6c-4d4c-92ec-8aa0fc74b3e5` and your webhook name was `my-webhook`, then the URL might look like `https://cloud.onhatchet.run/api/v1/stable/tenants/d60181b7-da6c-4d4c-92ec-8aa0fc74b3e5/webhooks/my-webhook`. Note that you can copy this URL in the dashboard.
#### Source
The **Source** indicates the source of the webhook, which can be a pre-provided one for easy setup like Stripe or Github, or a "generic" one, which lets you configure all of the necessary fields for your webhook integration based on what the webhook sender provides.
#### Event Key Expression
The **Event Key Expression** is a [CEL](https://cel.dev/) expression that you can use to create a dynamic event key from the payload and headers of the incoming webhook. You can either set this to a constant value, like `webhook`, or you could set it to something dynamic using those two options. Some examples:
1. `'stripe:' + input.type` would create event keys where `'stripe:'` is a prefix for all keys indicating the webhook came from Stripe, and `input.type` selects the `type` field off of the webhook payload and uses it to create the final event key. The result might look something like `stripe:payment_intent.created`.
2. `'github:' + headers['x-github-event'] + ':' + input.action` could create a key like `github:star:created`
> **Info:** The result of the event key expression is what Hatchet will use as the event
> key, so you'd need to set a matching event key as a trigger on your workflows
> in order to trigger them from the webhooks you create. For instance, you might
> add `on_events=["stripe:payment_intent.created"]` to listen for payment intent
> created events in the previous example.
#### Scope Expression (Optional)
The **Scope Expression** is an optional [CEL](https://cel.dev/) expression that evaluates to a string used to filter which workflows to trigger. This is useful when you have multiple workflows listening to the same event key but want to route to specific workflows based on the webhook content.
Like the event key expression, you have access to `input` (the webhook payload) and `headers` (the request headers). Some examples:
1. `input.customer_id` would use the customer ID from the payload as the scope
2. `headers['x-organization-id']` would use a header value as the scope
3. `input.metadata.environment` could route to different workflows based on environment
#### Static Payload (Optional)
The **Static Payload** is an optional JSON object that gets merged with the incoming webhook payload before it's passed to your workflows. This is useful for:
- Adding constant metadata to all events from this webhook
- Injecting configuration values that aren't in the original payload
- Overriding specific fields from the incoming payload
> **Info:** When there's a key collision between the incoming webhook payload and the
> static payload, the static payload values take precedence.
For example, if you set a static payload of `{"source": "stripe", "environment": "production"}` and receive a webhook with `{"type": "payment_intent.created", "source": "api"}`, the final payload passed to your workflow would be `{"type": "payment_intent.created", "source": "stripe", "environment": "production"}`.
#### Authentication
Finally, you'll need to specify how Hatchet should authenticate incoming webhook requests. For non-generic sources like Stripe and Github, Hatchet has presets for most of the fields, so in most cases you'd only need to provide a secret.
If you're using a generic source, then you'll need to specify an authentication method (either basic auth, an API key, HMAC-based auth), and provide the required fields (such as a username and password in the basic auth case).
> **Warning:** Hatchet encrypts any secrets you provide for validating incoming webhooks.
The different authentication methods require different fields to be provided:
- **Pre-configured sources** (Stripe, GitHub, Slack): Only require a webhook secret
- **Generic sources** require different fields depending on the selected authentication method:
- **Basic Auth**: Requires a username and password
- **API Key**: Requires header name containing the key on incoming requests, and secret key itself
- **HMAC**: Requires a header name containing the secret on incoming requests, the secret itself, an encoding method (e.g. hex, base64), and an algorithm (e.g. `SHA256`, `SHA1`, etc.).
## Usage
While you're creating your webhook (and also after you've created it), you can copy the webhook URL, which is what you'll provide to the webhook _sender_.
Once you've done that, the last thing to do is register the event keys you want your workers to listen for so that they can be triggered by incoming webhooks.
For examples on how to do this, see the [documentation on event triggers](/v1/external-events/run-on-event).
---
# Pushing Events
You can push an event to Hatchet by calling the `push` method on the Hatchet event client and providing the event name and payload. Any tasks that have registered an [event trigger](/v1/external-events/run-on-event) for that event key will be run.
#### Python
```python
hatchet.event.push("user:create", {"should_skip": False})
```
#### Typescript
```typescript
const res = await hatchet.events.push('simple-event:create', {
Message: 'hello',
ShouldSkip: false,
});
```
#### Go
```go
err := client.Events().Push(
context.Background(),
"simple-event:create",
EventInput{
Message: "Hello, World!",
},
)
if err != nil {
return err
}
```
#### Ruby
```ruby
HATCHET.event.push("user:create", { "should_skip" => false })
```
> **Info:** Event triggers evaluate tasks to run at the time of the event. If an event is
> received before the task is registered, the task will not be run.
---
# Event Trigger
> This example assumes we have a [task](/v1/tasks) registered on a running [worker](/v1/workers).
Run-on-event allows you to trigger one or more tasks when a specific event occurs. This is useful when you need to execute a task in response to an ephemeral event where the result is not important. A few common use cases for event-triggered task runs are:
1. Running a task when an ephemeral event is received, such as a webhook or a message from a queue.
2. When you want to run multiple independent tasks in response to a single event. For instance, if you wanted to run a `send_welcome_email` task, and you also wanted to run a `grant_new_user_credits` task, and a `reward_referral` task, all triggered by the signup. In this case, you might declare all three of those tasks with an event trigger for `user:signup`, and then have them all kick off when that event happens.
> **Info:** Event triggers evaluate tasks to run at the time of the event. If an event is
> received before the task is registered, the task will not be run.
## Declaring Event Triggers
To run a task on an event, you need to declare the event that will trigger the task. This is done by declaring the `on_events` property in the task declaration.
#### Python
```python
EVENT_KEY = "user:create"
SECONDARY_KEY = "foobarbaz"
WILDCARD_KEY = "subscription:*"
class EventWorkflowInput(BaseModel):
should_skip: bool
event_workflow = hatchet.workflow(
name="EventWorkflow",
on_events=[EVENT_KEY, SECONDARY_KEY, WILDCARD_KEY],
input_validator=EventWorkflowInput,
)
```
#### Typescript
```typescript
export const lower = hatchet.workflow({
name: 'lower',
// 👀 Declare the event that will trigger the workflow
onEvents: ['simple-event:create'],
});
```
#### Go
```go
const SimpleEvent = "simple-event:create"
func Lower(client *hatchet.Client) *hatchet.StandaloneTask {
return client.NewStandaloneTask(
"lower", func(ctx hatchet.Context, input EventInput) (*LowerTaskOutput, error) {
return &LowerTaskOutput{
TransformedMessage: strings.ToLower(input.Message),
}, nil
},
hatchet.WithWorkflowEvents(SimpleEvent),
)
}
```
#### Ruby
```ruby
EVENT_KEY = "user:create"
SECONDARY_KEY = "foobarbaz"
WILDCARD_KEY = "subscription:*"
EVENT_WORKFLOW = HATCHET.workflow(
name: "EventWorkflow",
on_events: [EVENT_KEY, SECONDARY_KEY, WILDCARD_KEY]
)
```
> **Info:** Note: Multiple tasks can be triggered by the same event.
> **Info:** As of engine version 0.65.0, Hatchet supports wildcard event triggers using
> the `*` wildcard pattern. For example, you could register `subscription:*` as
> your event key, which would match incoming events like `subcription:create`,
> `subscription:renew`, `subscription:cancel`, and so on.
---
# Event Filters
Events can be _filtered_ in Hatchet, which allows you to push events to Hatchet and only trigger task runs from them in certain cases. **If you enable filters on a workflow, your workflow will be triggered once for each matching filter on any incoming event with a matching scope** (more on scopes below).
## Basic Usage
There are two ways to create filters in Hatchet.
### Default filters on the workflow
The simplest way to create a filter is to register it declaratively with your workflow when it's created. For example:
#### Python
```python
event_workflow_with_filter = hatchet.workflow(
name="EventWorkflow",
on_events=[EVENT_KEY, SECONDARY_KEY, WILDCARD_KEY],
input_validator=EventWorkflowInput,
default_filters=[
DefaultFilter(
expression="true",
scope="example-scope",
payload={
"main_character": "Anna",
"supporting_character": "Stiva",
"location": "Moscow",
},
)
],
)
```
#### Typescript
```typescript
export const lowerWithFilter = hatchet.workflow({
name: 'lower',
// 👀 Declare the event that will trigger the workflow
onEvents: ['simple-event:create'],
defaultFilters: [
{
expression: 'true',
scope: 'example-scope',
payload: {
mainCharacter: 'Anna',
supportingCharacter: 'Stiva',
location: 'Moscow',
},
},
],
});
```
#### Go
```go
func LowerWithFilter(client *hatchet.Client) *hatchet.StandaloneTask {
return client.NewStandaloneTask(
"lower", accessFilterPayload,
hatchet.WithWorkflowEvents(SimpleEvent),
hatchet.WithFilters(types.DefaultFilter{
Expression: "true",
Scope: "example-scope",
Payload: map[string]interface{}{
"main_character": "Anna",
"supporting_character": "Stiva",
"location": "Moscow"},
}),
)
}
```
#### Ruby
```ruby
EVENT_WORKFLOW_WITH_FILTER = HATCHET.workflow(
name: "EventWorkflow",
on_events: [EVENT_KEY, SECONDARY_KEY, WILDCARD_KEY],
default_filters: [
Hatchet::DefaultFilter.new(
expression: "true",
scope: "example-scope",
payload: {
"main_character" => "Anna",
"supporting_character" => "Stiva",
"location" => "Moscow"
}
)
]
)
EVENT_WORKFLOW.task(:task) do |input, ctx|
puts "event received"
ctx.filter_payload
end
```
In each of these cases, we register a filter with the workflow. Note that these "declarative" filters are overwritten each time your workflow is updated, so the ids associated with them will not be stable over time. This allows you to modify a filter in-place or remove a filter, and not need to manually delete it over the API.
### Filters feature client
You also can create event filters by using the `filters` clients on the SDKs:
#### Python
```python
hatchet.filters.create(
workflow_id=event_workflow.id,
expression="input.should_skip == false",
scope="foobarbaz",
payload={
"main_character": "Anna",
"supporting_character": "Stiva",
"location": "Moscow",
},
)
```
#### Typescript
```typescript
hatchet.filters.create({
workflowId: lower.id,
expression: 'input.ShouldSkip == false',
scope: 'foobarbaz',
payload: {
main_character: 'Anna',
supporting_character: 'Stiva',
location: 'Moscow',
},
});
```
#### Go
```go
_, err = client.Filters().Create(
context.Background(),
rest.V1CreateFilterRequest{
WorkflowId: uuid.MustParse("bb866b59-5a86-451b-8023-10d451db11d3"),
Expression: "true",
Scope: "example-scope",
},
)
if err != nil {
return err
}
```
#### Ruby
```ruby
HATCHET_CLIENT.filters.create(
workflow_id: EVENT_WORKFLOW.id,
expression: "input.should_skip == false",
scope: "foobarbaz",
payload: {
"main_character" => "Anna",
"supporting_character" => "Stiva",
"location" => "Moscow"
}
)
```
> **Warning:** Note the `scope` argument to the filter is required **both when creating a
> filter, and when pushing events**. If the scope on filter creation does not
> match the scope provided when pushing events, the filter will not apply.
Then, push an event that uses the filter to determine whether to run. For instance, this run will be skipped, since the payload does not match the expression:
#### Python
```python
hatchet.event.push(
event_key=EVENT_KEY,
payload={
"should_skip": True,
},
scope="foobarbaz",
)
```
#### Typescript
```typescript
hatchet.events.push(
SIMPLE_EVENT,
{
Message: 'hello',
ShouldSkip: true,
},
{
scope: 'foobarbaz',
}
);
```
#### Go
```go
skipPayload := map[string]interface{}{
"shouldSkip": true,
}
skipScope := "foobarbaz"
err = client.Events().Push(
context.Background(),
"simple-event:create",
skipPayload,
v0Client.WithFilterScope(&skipScope),
)
if err != nil {
return err
}
```
#### Ruby
```ruby
HATCHET_CLIENT.event.push(
EVENT_KEY,
{ "should_skip" => true },
scope: "foobarbaz"
)
```
But this one will be triggered since the payload _does_ match the expression:
#### Python
```python
hatchet.event.push(
event_key=EVENT_KEY,
payload={
"should_skip": False,
},
scope="foobarbaz",
)
```
#### Typescript
```typescript
hatchet.events.push(
SIMPLE_EVENT,
{
Message: 'hello',
ShouldSkip: false,
},
{
scope: 'foobarbaz',
}
);
```
#### Go
```go
triggerPayload := map[string]interface{}{
"shouldSkip": false,
}
triggerScope := "foobarbaz"
err = client.Events().Push(
context.Background(),
"simple-event:create",
triggerPayload,
v0Client.WithFilterScope(&triggerScope),
)
if err != nil {
return err
}
```
#### Ruby
```ruby
HATCHET_CLIENT.event.push(
EVENT_KEY,
{ "should_skip" => false },
scope: "foobarbaz"
)
```
> **Info:** In Hatchet, filters are "positive", meaning that we look for _matches_ to the
> filter to determine which tasks to trigger.
## Accessing the filter payload
You can access the filter payload by using the `Context` in the task that was triggered by your event:
#### Python
```python
@event_workflow_with_filter.task()
def filtered_task(input: EventWorkflowInput, ctx: Context) -> None:
print(ctx.filter_payload)
```
#### Typescript
```typescript
lowerWithFilter.task({
name: 'lowerWithFilter',
fn: (input, ctx) => {
console.log(ctx.filterPayload());
},
});
```
#### Go
```go
func accessFilterPayload(ctx hatchet.Context, input EventInput) (*LowerTaskOutput, error) {
fmt.Println(ctx.FilterPayload())
return &LowerTaskOutput{
TransformedMessage: strings.ToLower(input.Message),
}, nil
}
```
#### Ruby
```ruby
EVENT_WORKFLOW_WITH_FILTER.task(:filtered_task) do |input, ctx|
puts ctx.filter_payload.inspect
end
```
## Advanced Usage
In addition to referencing `input` in the expression (which corresponds to the _event_ payload), you can also reference the following fields:
1. `payload` corresponds to the _filter_ payload (which was part of the request when the filter was created).
2. `additional_metadata` allows for filtering based on `additional_metadata` sent with the event.
3. `event_key` allows for filtering based on the key of the event, such as `user:created`.
---
## Invoking Tasks From Other Services
While Hatchet recommends importing your workflows and standalone tasks directly to use for triggering runs, this only works in a monorepo or similar setups where you have access to those objects. However, it's common to have a polyrepo, have code written in multiple languages, or otherwise not be able to import your workflows and standalone tasks directly.
Hatchet provides stub tasks for these cases, allowing you to trigger your tasks from anywhere in a type-safe way with only minor code duplication.
### Creating a "Stub" Task on your External Service (Recommended)
The recommended way to trigger a run from a service where you _cannot_ import the workflow or standalone task definition directly is to create a "stub" task or workflow on your external service. This is a Hatchet task or workflow that has the same name and input/output types as the task you want to trigger on your Hatchet worker, but without the function or other configuration.
This allows you to have a polyglot, fully typed interface with full SDK support.
#### Typescript
```typescript
import { hatchet } from '../hatchet-client';
// (optional) Define the input type for the workflow
export type SimpleInput = {
Message: string;
};
// (optional) Define the output type for the workflow
export type SimpleOutput = {
'to-lower': {
TransformedMessage: string;
};
};
// declare the workflow with the same name as the
// workflow name on the worker
export const simple = hatchet.workflow({
name: 'simple',
});
// you can use all the same run methods on the stub
// with full type-safety
simple.run({ Message: 'Hello, World!' });
simple.runNoWait({ Message: 'Hello, World!' });
simple.schedule(new Date(), { Message: 'Hello, World!' });
simple.cron('my-cron', '0 0 * * *', { Message: 'Hello, World!' });
```
#### Python
Consider a task with an implementation like this:
```python
from pydantic import BaseModel
from hatchet_sdk import Context, Hatchet
class TaskInput(BaseModel):
user_id: int
class TaskOutput(BaseModel):
ok: bool
hatchet = Hatchet()
@hatchet.task(name="externally-triggered-task", input_validator=TaskInput)
async def externally_triggered_task(input: TaskInput, ctx: Context) -> TaskOutput:
return TaskOutput(ok=True)
```
To trigger this task from a separate service where the code is not shared, start by defining models that match the input and output types of the task defined above.
```python
class TaskInput(BaseModel):
user_id: int
class TaskOutput(BaseModel):
ok: bool
```
Next, create the stub task.
```python
stub = hatchet.stubs.task(
# make sure the name and schemas exactly match the implementation
name="externally-triggered-task",
input_validator=TaskInput,
output_validator=TaskOutput,
)
```
Finally, use the stub to trigger the underlying task, and (optionally) retrieve the result.
```python
# input type checks properly
result = await stub.aio_run(input=TaskInput(user_id=1234))
# `result.ok` type checks properly
print("Is successful:", result.ok)
```
#### Go
```go
package main
import (
"context"
"fmt"
"log"
hatchet "github.com/hatchet-dev/hatchet/sdks/go"
)
type StubInput struct {
Message string `json:"message"`
}
type StubOutput struct {
Ok bool `json:"ok"`
}
func StubWorkflow(client *hatchet.Client) *hatchet.StandaloneTask {
return client.NewStandaloneTask("stub-workflow", func(ctx hatchet.Context, input StubInput) (StubOutput, error) {
return StubOutput{
Ok: true,
}, nil
})
}
func main() {
client, err := hatchet.NewClient()
if err != nil {
log.Fatalf("failed to create hatchet client: %v", err)
}
task := StubWorkflow(client)
// we are simply running the task here, but it can be implemented in another service / worker
// and in another language with the same name and input-output types
result, err := task.Run(context.Background(), StubInput{Message: "Hello, World!"})
if err != nil {
log.Fatalf("failed to run task: %v", err)
}
fmt.Println(result)
}
```
#### Ruby
Note that this approach requires code duplication, which can break type
safety. For instance, if the input type to your workflow changes, you need to
remember to also change the type passed to the stub. Some ways to mitigate
risks here are helpful comments reminding developers to keep these types in
sync, code generation tools, and end-to-end tests.
---
# Simple Task Retries
Hatchet provides a simple and effective way to handle failures in your tasks using a retry policy. This feature allows you to specify the number of times a task should be retried if it fails, helping to improve the reliability and resilience of your tasks.
> **Info:** Task-level retries can be added to both `Standalone Tasks` and `Workflow
> Tasks`.
## How it works
When a task fails (i.e. throws an error or returns a non-zero exit code), Hatchet can automatically retry the task based on the `retries` configuration defined in the task object. Here's how it works:
1. If a task fails and `retries` is set to a value greater than 0, Hatchet will catch the error and retry the task.
2. The task will be retried up to the specified number of times, with each retry being executed after a short delay to avoid overwhelming the system.
3. If the task succeeds during any of the retries, the task will continue as normal.
4. If the task continues to fail after exhausting all the specified retries, the task will be marked as failed.
This simple retry mechanism can help to mitigate transient failures, such as network issues or temporary unavailability of external services, without requiring complex error handling logic in your task code.
## How to use task-level retries
To enable retries for a task, simply add the `retries` property to the task object in your task definition:
#### Python
```python
@simple_workflow.task(retries=3)
def always_fail(input: EmptyModel, ctx: Context) -> dict[str, str]:
raise Exception("simple task failed")
```
#### Typescript
```typescript
export const retries = hatchet.task({
name: 'retries',
retries: 3,
fn: async (_, ctx) => {
throw new Error('intentional failure');
},
});
```
#### Go
```go
retries := client.NewStandaloneTask("retries-task", func(ctx hatchet.Context, input RetriesInput) (*RetriesResult, error) {
return nil, errors.New("intentional failure")
}, hatchet.WithRetries(3))
```
#### Ruby
```ruby
SIMPLE_RETRY_WORKFLOW.task(:always_fail, retries: 3) do |input, ctx|
raise "simple task failed"
end
```
You can add the `retries` property to any task, and Hatchet will handle the retry logic automatically.
It's important to note that task-level retries are not suitable for all types of failures.
For example, if a task fails due to a programming error or an invalid configuration, retrying the task will likely not resolve the issue.
In these cases, you should fix the underlying problem in your code or configuration rather than relying on retries. See [Bypassing retry logic](#bypassing-retry-logic).
Additionally, if a task interacts with external services or databases, you should ensure that the operation is idempotent (i.e. can be safely repeated without changing the result) before enabling retries. Otherwise, retrying the task could lead to unintended side effects or inconsistencies in your data.
## Accessing the Retry Count in a Running Task
You can access the current retry count on the task's context object:
#### Python
```python
@simple_workflow.task(retries=3)
def fail_twice(input: EmptyModel, ctx: Context) -> dict[str, str]:
if ctx.retry_count < 2:
raise Exception("simple task failed")
return {"status": "success"}
```
#### Typescript
```typescript
export const retriesWithCount = hatchet.task({
name: 'retries-with-count',
retries: 3,
fn: async (_, ctx) => {
// > Get the current retry count
const retryCount = ctx.retryCount();
console.log(`Retry count: ${retryCount}`);
if (retryCount < 2) {
throw new Error('intentional failure');
}
return {
message: 'success',
};
},
});
```
#### Go
```go
retriesWithCount := client.NewStandaloneTask("fail-twice-task", func(ctx hatchet.Context, input RetriesWithCountInput) (*RetriesWithCountResult, error) {
// Get the current retry count
retryCount := ctx.RetryCount()
fmt.Printf("Retry count: %d\n", retryCount)
if retryCount < 2 {
return nil, errors.New("intentional failure")
}
return &RetriesWithCountResult{
Message: "success",
}, nil
}, hatchet.WithRetries(3))
```
#### Ruby
```ruby
SIMPLE_RETRY_WORKFLOW.task(:fail_twice, retries: 3) do |input, ctx|
raise "simple task failed" if ctx.retry_count < 2
{ "status" => "success" }
end
```
## Exponential Backoff
Hatchet also supports exponential backoff for retries, which can be useful for handling failures in a more resilient manner. Exponential backoff increases the delay between retries exponentially, giving the failing service more time to recover before the next retry.
#### Python
```python
@backoff_workflow.task(
retries=10,
# 👀 Maximum number of seconds to wait between retries
backoff_max_seconds=10,
# 👀 Factor to increase the wait time between retries.
# This sequence will be 2s, 4s, 8s, 10s, 10s, 10s... due to the maxSeconds limit
backoff_factor=2.0,
)
def backoff_task(input: EmptyModel, ctx: Context) -> dict[str, str]:
if ctx.retry_count < 3:
raise Exception("backoff task failed")
return {"status": "success"}
```
#### Typescript
```typescript
export const withBackoff = hatchet.task({
name: 'with-backoff',
retries: 10,
backoff: {
// 👀 Maximum number of seconds to wait between retries
maxSeconds: 10,
// 👀 Factor to increase the wait time between retries.
// This sequence will be 2s, 4s, 8s, 10s, 10s, 10s... due to the maxSeconds limit
factor: 2,
},
fn: async () => {
throw new Error('intentional failure');
},
});
```
#### Go
```go
withBackoff := client.NewStandaloneTask("with-backoff-task", func(ctx hatchet.Context, input BackoffInput) (*BackoffResult, error) {
return nil, errors.New("intentional failure")
}, hatchet.WithRetries(3), hatchet.WithRetryBackoff(2, 10))
```
#### Ruby
```ruby
BACKOFF_WORKFLOW.task(
:backoff_task,
retries: 10,
# Maximum number of seconds to wait between retries
backoff_max_seconds: 10,
# Factor to increase the wait time between retries.
# This sequence will be 2s, 4s, 8s, 10s, 10s, 10s... due to the maxSeconds limit
backoff_factor: 2.0
) do |input, ctx|
raise "backoff task failed" if ctx.retry_count < 3
{ "status" => "success" }
end
```
## Bypassing Retry logic
The Hatchet SDKs each expose a `NonRetryable` exception, which allows you to bypass pre-configured retry logic for the task. **If your task raises this exception, it will not be retried.** This allows you to circumvent the default retry behavior in instances where you don't want to or cannot safely retry. Some examples in which this might be useful include:
1. A task that calls an external API which returns a 4XX response code.
2. A task that contains a single non-idempotent operation that can fail but cannot safely be rerun on failure, such as a billing operation.
3. A failure that requires manual intervention to resolve.
#### Python
```python
@non_retryable_workflow.task(retries=1)
def should_not_retry(input: EmptyModel, ctx: Context) -> None:
raise NonRetryableException("This task should not retry")
```
#### Typescript
```typescript
const shouldNotRetry = nonRetryableWorkflow.task({
name: 'should-not-retry',
fn: () => {
throw new NonRetryableError('This task should not retry');
},
retries: 1,
});
```
#### Go
```go
retries := client.NewStandaloneTask("non-retryable-task", func(ctx hatchet.Context, input NonRetryableInput) (*NonRetryableResult, error) {
return nil, worker.NewNonRetryableError(errors.New("intentional failure"))
}, hatchet.WithRetries(3))
```
#### Ruby
```ruby
NON_RETRYABLE_WORKFLOW.task(:should_not_retry, retries: 1) do |input, ctx|
raise Hatchet::NonRetryableError, "This task should not retry"
end
NON_RETRYABLE_WORKFLOW.task(:should_retry_wrong_exception_type, retries: 1) do |input, ctx|
raise TypeError, "This task should retry because it's not a NonRetryableError"
end
NON_RETRYABLE_WORKFLOW.task(:should_not_retry_successful_task, retries: 1) do |input, ctx|
# no-op
end
```
In these cases, even though `retries` is set to a non-zero number (meaning the task would ordinarily retry), Hatchet will not retry.
## Python SDK Client Retry Behavior
The retry behavior described above is for task execution inside Hatchet. The Python SDK also has separate retry behavior for certain client-side REST and gRPC calls made by the SDK itself.
These client retries are configured separately from task retries and do not control whether a task is retried after failing in a worker.
> **Info:** Task retries and SDK client retries are separate mechanisms. Task retries
> control whether Hatchet retries a task after task failure. SDK client retries
> control whether the Python SDK retries certain API calls to Hatchet.
### Default client retry behavior
By default, the Python SDK retries certain client calls with exponential backoff, with `max_attempts` defaulting to 5.
**REST API calls**
Error Type, Retried by Default
HTTP 5xx (server errors), Yes
HTTP 404 (not found), Yes
HTTP 429 (too many requests), No
HTTP 400, 401, 403, 409, 422 (client errors), No
Transport errors (timeout, connection, TLS, protocol), No
**gRPC calls**
Status Code, Retried
`UNAVAILABLE`, `DEADLINE_EXCEEDED`, `INTERNAL`, Yes
`RESOURCE_EXHAUSTED`, `ABORTED`, `UNKNOWN`, Yes
`UNIMPLEMENTED`, `NOT_FOUND`, `INVALID_ARGUMENT`, No
`ALREADY_EXISTS`, `UNAUTHENTICATED`, `PERMISSION_DENIED`, No
> **Info:** REST 404 responses are retried by default because some REST reads can observe
> replication lag between the core database and the OLAP database.
### Configuring Python SDK client retries
The Python SDK exposes client retry configuration through `TenacityConfig`, either directly in `ClientConfig` or via environment variables.
```python
import os
from hatchet_sdk import Hatchet
from hatchet_sdk.config import ClientConfig, HTTPMethod, TenacityConfig
hatchet = Hatchet(
config=ClientConfig(
token=os.environ["HATCHET_CLIENT_TOKEN"],
tenacity=TenacityConfig(
max_attempts=5,
retry_429=False,
retry_transport_errors=False,
retry_transport_methods=[HTTPMethod.GET, HTTPMethod.DELETE],
),
)
)
```
Name, Type, Description, Default
`max_attempts`, `int`, Maximum number of retry attempts. Set to 0 to disable retries., `5`
`retry_429`, `bool`, Enable retries for HTTP 429 Too Many Requests responses., `False`
`retry_transport_errors`, `bool`, Enable retries for REST transport-level errors (timeout, connection, TLS)., `False`
`retry_transport_methods`, `list[HTTPMethod]`, HTTP methods to retry on transport errors when `retry_transport_errors` is enabled., `[GET, DELETE]`
You can also configure these via environment variables:
Environment Variable, Description
`HATCHET_CLIENT_TENACITY_MAX_ATTEMPTS`, Maximum retry attempts
`HATCHET_CLIENT_TENACITY_RETRY_429`, Enable 429 retries (`true`/`false`)
`HATCHET_CLIENT_TENACITY_RETRY_TRANSPORT_ERRORS`, Enable transport error retries (`true`/`false`)
### Idempotency considerations
> **Warning:** When `retry_transport_errors` is enabled, only idempotent HTTP methods (`GET`,
> `DELETE`) are retried by default. Non-idempotent methods (`POST`, `PUT`,
> `PATCH`) are excluded because retrying them after a transport error could
> result in duplicate operations if the original request succeeded but the
> response was lost.
You can add non-idempotent methods to `retry_transport_methods`, but only do so if:
1. Your operations are idempotent (for example, because they use idempotency keys), or
2. You understand and accept the risk of duplicate operations
### Retry timing
Python SDK client retries use exponential backoff with jitter. Fine-grained backoff timing is not currently configurable through `TenacityConfig`.
## Conclusion
Hatchet's task-level retry feature is a simple and effective way to handle transient failures in your tasks, improving the reliability and resilience of your tasks. By specifying the number of retries for each task, you can ensure that your tasks can recover from temporary issues without requiring complex error handling logic.
Remember to use retries judiciously and only for tasks that are idempotent. For more advanced retry strategies, such as exponential backoff or circuit breaking, stay tuned for future updates to Hatchet's retry capabilities.
---
# Timeouts in Hatchet
Timeouts are an important concept in Hatchet that allow you to control how long a task is allowed to run before it is considered to have failed. This is useful for ensuring that your tasks don't run indefinitely and consume unnecessary resources. Timeouts in Hatchet are treated as failures and the task will be [retried](./retry-policies.mdx) if specified.
There are two types of timeouts in Hatchet:
1. **Scheduling Timeouts** (Default 5m) - the time a task is allowed to wait in the queue before it is cancelled
2. **Execution Timeouts** (Default 60s) - the time a task is allowed to run before it is considered to have failed
## Timeout Format
In Hatchet, timeouts are specified using a string in the format ``, where `` is an integer and `` is one of:
- `s` for seconds
- `m` for minutes
- `h` for hours
For example:
- `10s` means 10 seconds
- `4m` means 4 minutes
- `1h` means 1 hour
If no unit is specified, seconds are assumed.
> **Info:** In the Python SDK, timeouts can also be specified as a `datetime.timedelta`
> object.
### Task-Level Timeouts
You can specify execution and scheduling timeouts for a task using the `execution_timeout` and `schedule_timeout` parameters when creating a task.
#### Ruby
```python
# 👀 Specify an execution timeout on a task
@timeout_wf.task(
execution_timeout=timedelta(seconds=5), schedule_timeout=timedelta(minutes=10)
)
def timeout_task(input: EmptyModel, ctx: Context) -> dict[str, str]:
time.sleep(30)
return {"status": "success"}
```
#### Tab 2
```typescript
export const withTimeouts = hatchet.task({
name: 'with-timeouts',
// time the task can wait in the queue before it is cancelled
scheduleTimeout: '10s',
// time the task can run before it is cancelled
executionTimeout: '10s',
fn: async (input: SimpleInput, ctx) => {
// wait 15 seconds
await sleep(15000);
// get the abort controller
const { abortController } = ctx;
// if the abort controller is aborted, throw an error
if (abortController.signal.aborted) {
throw new Error('cancelled');
}
return {
TransformedMessage: input.Message.toLowerCase(),
};
},
});
```
#### Tab 3
```go
// Task that will timeout - sleeps for 10 seconds but has 3 second timeout
_ = timeoutWorkflow.NewTask("timeout-task",
func(ctx hatchet.Context, input TimeoutInput) (TimeoutOutput, error) {
log.Printf("Starting task that will timeout. Message: %s", input.Message)
// Sleep for 10 seconds (will be interrupted by timeout)
time.Sleep(10 * time.Second)
// This should not be reached due to timeout
log.Println("Task completed successfully (this shouldn't be reached)")
return TimeoutOutput{
Status: "completed",
Completed: true,
}, nil
},
hatchet.WithExecutionTimeout(3*time.Second), // 3 second timeout
)
```
#### Tab 4
```ruby
# Specify an execution timeout on a task
TIMEOUT_WF.task(:timeout_task, execution_timeout: 5, schedule_timeout: 600) do |input, ctx|
sleep 30
{ "status" => "success" }
end
REFRESH_TIMEOUT_WF = HATCHET.workflow(name: "RefreshTimeoutWorkflow")
```
In these tasks, both timeouts are specified, meaning:
1. If the task is not scheduled before the `schedule_timeout` is reached, it will be cancelled.
2. If the task does not complete before the `execution_timeout` is reached (after starting), it will be cancelled.
> **Warning:** A timed out task does not guarantee that the task will be stopped immediately.
> The task will be stopped as soon as the worker is able to stop the task. See
> [cancellation](./cancellation.mdx) for more information.
## Refreshing Timeouts
In some cases, you may need to extend the timeout for a task while it is running. This can be done by using the task context.
For example:
#### Ruby
```python
@refresh_timeout_wf.task(execution_timeout=timedelta(seconds=4))
def refresh_task(input: EmptyModel, ctx: Context) -> dict[str, str]:
ctx.refresh_timeout(timedelta(seconds=10))
time.sleep(5)
return {"status": "success"}
```
#### Tab 2
```typescript
export const refreshTimeout = hatchet.task({
name: 'refresh-timeout',
executionTimeout: '10s',
scheduleTimeout: '10s',
fn: async (input: SimpleInput, ctx) => {
// adds 15 seconds to the execution timeout
ctx.refreshTimeout('15s');
await sleep(15000);
// get the abort controller
const { abortController } = ctx;
// now this condition will not be met
// if the abort controller is aborted, throw an error
if (abortController.signal.aborted) {
throw new Error('cancelled');
}
return {
TransformedMessage: input.Message.toLowerCase(),
};
},
});
```
#### Tab 3
```go
// Create workflow with timeout refresh example
refreshTimeoutWorkflow := client.NewWorkflow("refresh-timeout-demo",
hatchet.WithWorkflowDescription("Demonstrates timeout refresh functionality"),
hatchet.WithWorkflowVersion("1.0.0"),
)
// Task that refreshes its timeout to avoid timing out
_ = refreshTimeoutWorkflow.NewTask("refresh-timeout-task",
func(ctx hatchet.Context, input TimeoutInput) (TimeoutOutput, error) {
log.Printf("Starting task with timeout refresh. Message: %s", input.Message)
// Refresh timeout by 10 seconds
log.Println("Refreshing timeout by 10 seconds...")
err := ctx.RefreshTimeout("10s")
if err != nil {
log.Printf("Failed to refresh timeout: %v", err)
return TimeoutOutput{
Status: "failed",
Completed: false,
}, err
}
// Now sleep for 5 seconds (should complete successfully)
log.Println("Sleeping for 5 seconds...")
time.Sleep(5 * time.Second)
log.Println("Task completed successfully after timeout refresh")
return TimeoutOutput{
Status: "completed",
Completed: true,
}, nil
},
hatchet.WithExecutionTimeout(3*time.Second), // Initial 3 second timeout
)
```
#### Tab 4
```ruby
REFRESH_TIMEOUT_WF.task(:refresh_task, execution_timeout: 4) do |input, ctx|
ctx.refresh_timeout(10)
sleep 5
{ "status" => "success" }
end
```
In this example, the task initially would exceed its execution timeout. But before it does, we call the `refreshTimeout` method, which extends the timeout and allows it to complete. Importantly, refreshing a timeout is an additive operation - the new timeout is added to the existing timeout. So for instance, if the task originally had a timeout of `30s` and we call `refreshTimeout("15s")`, the new timeout will be `45s`.
The task timeout can be refreshed multiple times within a task to further extend the timeout as needed.
---
# Cancellation in Hatchet Tasks
Hatchet provides a mechanism for canceling task executions gracefully, allowing you to signal to running tasks that they should stop running. Cancellation can be triggered on graceful termination of a worker or automatically through concurrency control strategies like [`CANCEL_IN_PROGRESS`](./concurrency.mdx#cancel_in_progress), which cancels currently running task instances to free up slots for new instances when the concurrency limit is reached.
When a task is canceled, Hatchet sends a cancellation signal to the task. The task can then check for the cancellation signal and take appropriate action, such as cleaning up resources, aborting network requests, or gracefully terminating their execution.
## Cancellation Mechanisms
#### Python
```python
@cancellation_workflow.task()
def check_flag(input: EmptyModel, ctx: Context) -> dict[str, str]:
for i in range(3):
time.sleep(1)
# Note: Checking the status of the exit flag is mostly useful for cancelling
# sync tasks without needing to forcibly kill the thread they're running on.
if ctx.exit_flag:
print("Task has been cancelled")
raise ValueError("Task has been cancelled")
return {"error": "Task should have been cancelled"}
```
```python
@cancellation_workflow.task()
async def self_cancel(input: EmptyModel, ctx: Context) -> dict[str, str]:
await asyncio.sleep(2)
## Cancel the task
await ctx.aio_cancel()
await asyncio.sleep(10)
return {"error": "Task should have been cancelled"}
```
#### Typescript
```typescript
export const cancellation = hatchet.task({
name: 'cancellation',
fn: async (_, ctx) => {
await sleep(10 * 1000);
if (ctx.cancelled) {
throw new Error('Task was cancelled');
}
return {
Completed: true,
};
},
});
```
```typescript
export const abortSignal = hatchet.task({
name: 'abort-signal',
fn: async (_, { abortController }) => {
try {
const response = await axios.get('https://api.example.com/data', {
signal: abortController.signal,
});
// Handle the response
} catch (error) {
if (axios.isCancel(error)) {
// Request was canceled
console.log('Request canceled');
} else {
// Handle other errors
}
}
},
});
```
#### Go
```go
// Add a long-running task that can be cancelled
_ = workflow.NewTask("long-running-task", func(ctx hatchet.Context, input CancellationInput) (CancellationOutput, error) {
log.Printf("Starting long-running task with message: %s", input.Message)
// Simulate long-running work with cancellation checking
for i := 0; i < 10; i++ {
select {
case <-ctx.Done():
log.Printf("Task cancelled after %d steps", i)
return CancellationOutput{
Status: "cancelled",
Completed: false,
}, nil
default:
log.Printf("Working... step %d/10", i+1)
time.Sleep(1 * time.Second)
}
}
log.Println("Task completed successfully")
return CancellationOutput{
Status: "completed",
Completed: true,
}, nil
}, hatchet.WithExecutionTimeout(30*time.Second))
```
#### Ruby
```ruby
CANCELLATION_WORKFLOW.task(:check_flag) do |input, ctx|
3.times do
sleep 1
# Note: Checking the status of the exit flag is mostly useful for cancelling
# sync tasks without needing to forcibly kill the thread they're running on.
if ctx.cancelled?
puts "Task has been cancelled"
raise "Task has been cancelled"
end
end
{ "error" => "Task should have been cancelled" }
end
```
```ruby
CANCELLATION_WORKFLOW.task(:self_cancel) do |input, ctx|
sleep 2
## Cancel the task
ctx.cancel
sleep 10
{ "error" => "Task should have been cancelled" }
end
```
## Cancellation Best Practices
When working with cancellation in Hatchet tasks, consider the following best practices:
1. **Graceful Termination**: When a task receives a cancellation signal, aim to terminate its execution gracefully. Clean up any resources, abort pending operations, and perform any necessary cleanup tasks before returning from the task function.
2. **Cancellation Checks**: Regularly check for cancellation signals within long-running tasks or loops. This allows the task to respond to cancellation in a timely manner and avoid unnecessary processing.
3. **Asynchronous Operations**: If a task performs asynchronous operations, such as network requests or file I/O, consider passing the cancellation signal to those operations. Many libraries and APIs support cancellation through the `AbortSignal` interface.
4. **Error Handling**: Handle cancellation errors appropriately. Distinguish between cancellation errors and other types of errors to provide meaningful error messages and take appropriate actions.
5. **Cancellation Propagation**: If a task invokes other functions or libraries, consider propagating the cancellation signal to those dependencies. This ensures that cancellation is handled consistently throughout the task.
## Additional Features
In addition to the methods of cancellation listed here, Hatchet also supports [bulk cancellation](./bulk-retries-and-cancellations.mdx), which allows you to cancel many tasks in bulk using either their IDs or a set of filters, which is often the easiest way to cancel many things at once.
## Conclusion
Cancellation is a powerful feature in Hatchet that allows you to gracefully stop task executions when needed. Remember to follow best practices when implementing cancellation in your tasks, such as graceful termination, regular cancellation checks, handling asynchronous operations, proper error handling, and cancellation propagation.
By incorporating cancellation into your Hatchet tasks and workflows, you can build more resilient and responsive systems that can adapt to changing circumstances and user needs.
---
# Bulk Cancellations and Replays
V1 adds the ability to cancel or replay task runs in bulk, which you can now do either in the Hatchet Dashboard or programmatically via the SDKs and the REST API.
There are two ways of bulk cancelling or replaying tasks in both cases:
1. You can provide a list of task run ids to cancel or replay, which will cancel or replay all of the tasks in the list.
2. You can provide a list of filters, similar to the list of filters on task runs in the Dashboard, and cancel or replay runs matching those filters. For instance, if you wanted to replay all failed runs of a `SimpleTask` from the past fifteen minutes that had the `foo` field in `additional_metadata` set to `bar`, you could apply those filters and replay all of the matching runs.
### Bulk Operations by Run Ids
The first way to bulk cancel or replay runs is by providing a list of run ids. This is the most straightforward way to cancel or replay runs in bulk.
#### Python
> **Info:** In the Python SDK, the mechanics of bulk replaying and bulk cancelling tasks
> are exactly the same. The only change would be replacing e.g.
> `hatchet.runs.bulk_cancel` with `hatchet.runs.bulk_replay`.
First, we'll start by fetching a task via the REST API.
```python
from datetime import datetime, timedelta, timezone
from hatchet_sdk import BulkCancelReplayOpts, Hatchet, RunFilter, V1TaskStatus
hatchet = Hatchet()
workflows = hatchet.workflows.list()
assert workflows.rows
workflow = workflows.rows[0]
```
Now that we have a task, we'll get runs for it, so that we can use them to bulk cancel by run id.
```python
workflow_runs = hatchet.runs.list(workflow_ids=[workflow.metadata.id])
```
And finally, we can cancel the runs in bulk.
```python
workflow_run_ids = [workflow_run.metadata.id for workflow_run in workflow_runs.rows]
bulk_cancel_by_ids = BulkCancelReplayOpts(ids=workflow_run_ids)
hatchet.runs.bulk_cancel(bulk_cancel_by_ids)
```
> **Info:** Note that the Python SDK also exposes async versions of each of these methods:
>
> - `workflows.list` -> `await workflows.aio_list`
> - `runs.list` -> `await runs.aio_list`
> - `runs.bulk_cancel` -> `await runs.aio_bulk_cancel`
#### Go
> **Info:** Just like in the Python SDK, the mechanics of bulk replaying and bulk
> cancelling tasks are exactly the same.
First, we'll start by fetching a task via the REST API.
```python
from datetime import datetime, timedelta, timezone
from hatchet_sdk import BulkCancelReplayOpts, Hatchet, RunFilter, V1TaskStatus
hatchet = Hatchet()
workflows = hatchet.workflows.list()
assert workflows.rows
workflow = workflows.rows[0]
```
Now that we have a task, we'll get runs for it, so that we can use them to bulk cancel by run id.
```python
workflow_runs = hatchet.runs.list(workflow_ids=[workflow.metadata.id])
```
And finally, we can cancel the runs in bulk.
```python
workflow_run_ids = [workflow_run.metadata.id for workflow_run in workflow_runs.rows]
bulk_cancel_by_ids = BulkCancelReplayOpts(ids=workflow_run_ids)
hatchet.runs.bulk_cancel(bulk_cancel_by_ids)
```
#### Ruby
```ruby
hatchet = Hatchet::Client.new
workflows = hatchet.workflows.list
workflow = workflows.rows.first
```
```ruby
workflow_runs = hatchet.runs.list(workflow_ids: [workflow.metadata.id])
```
```ruby
workflow_run_ids = workflow_runs.rows.map { |run| run.metadata.id }
hatchet.runs.bulk_cancel(ids: workflow_run_ids)
```
### Bulk Operations by Filters
The second way to bulk cancel or replay runs is by providing a list of filters. This is the most powerful way to cancel or replay runs in bulk, as it allows you to cancel or replay all runs matching a set of arbitrary filters without needing to provide IDs for the runs in advance.
#### Python
The example below provides some filters you might use to cancel or replay runs in bulk. Importantly, these filters are very similar to the filters you can use in the Hatchet Dashboard to filter which task runs are displaying.
```python
bulk_cancel_by_filters = BulkCancelReplayOpts(
filters=RunFilter(
since=datetime.today() - timedelta(days=1),
until=datetime.now(tz=timezone.utc),
statuses=[V1TaskStatus.RUNNING],
workflow_ids=[workflow.metadata.id],
additional_metadata={"key": "value"},
)
)
hatchet.runs.bulk_cancel(bulk_cancel_by_filters)
```
Running this request will cancel all task runs matching the filters provided.
#### Go
The example below provides some filters you might use to cancel or replay runs in bulk. Importantly, these filters are very similar to the filters you can use in the Hatchet Dashboard to filter which task runs are displaying.
```python
bulk_cancel_by_filters = BulkCancelReplayOpts(
filters=RunFilter(
since=datetime.today() - timedelta(days=1),
until=datetime.now(tz=timezone.utc),
statuses=[V1TaskStatus.RUNNING],
workflow_ids=[workflow.metadata.id],
additional_metadata={"key": "value"},
)
)
hatchet.runs.bulk_cancel(bulk_cancel_by_filters)
```
Running this request will cancel all task runs matching the filters provided.
#### Ruby
```ruby
hatchet.runs.bulk_cancel(
since: Time.now - 86_400,
until_time: Time.now,
statuses: ["RUNNING"],
workflow_ids: [workflow.metadata.id],
additional_metadata: { "key" => "value" }
)
```
# Manual Retries
Hatchet provides a manual retry mechanism that allows you to handle failed task instances flexibly from the Hatchet dashboard.
Navigate to the specific task in the Hatchet dashboard and click on the failed run. From there, you can inspect the details of the run, including the input data and the failure reason for each task.
To retry a failed task, simply click on the task in the run details view and then click the "Replay" button. This will create a new instance of the task, starting from the failed task, and using the same input data as the original run.
Manual retries give you full control over when and how to reprocess failed instances. For example, you may choose to wait until an external service is back online before retrying instances that depend on that service, or you may need to deploy a bug fix to your task code before retrying instances that were affected by the bug.
## A Note on Dead Letter Queues
A dead letter queue (DLQ) is a messaging concept used to handle messages that cannot be processed successfully. In the context of task management, a DLQ can be used to store failed task instances that require manual intervention or further analysis.
While Hatchet does not have a built-in dead letter queue feature, the persistence of failed task instances in the dashboard serves a similar purpose. By keeping a record of failed instances, Hatchet allows you to track and manage failures, perform root cause analysis, and take appropriate actions, such as modifying input data or updating your task code before manually retrying the failed instances.
It's important to note that the term "dead letter queue" is more commonly associated with messaging systems like Apache Kafka or Amazon SQS, where unprocessed messages are automatically moved to a separate queue for manual handling. In Hatchet, the failed instances are not automatically moved to a separate queue but are instead persisted in the dashboard for manual management.
---
# Concurrency Control in Hatchet Tasks
Hatchet provides powerful concurrency control features to help you manage the execution of your tasks. This is particularly useful when you have tasks that may be triggered frequently or have long-running steps, and you want to limit the number of concurrent executions to prevent overloading your system, ensure fairness, or avoid race conditions.
> **Info:** Concurrency strategies can be added to both `Tasks` and `Workflows`.
### Why use concurrency control?
There are several reasons why you might want to use concurrency control in your Hatchet tasks:
1. **Fairness**: When you have multiple clients or users triggering tasks, concurrency control can help ensure fair access to resources. By limiting the number of concurrent runs per client or user, you can prevent a single client from monopolizing the system and ensure that all clients get a fair share of the available resources.
2. **Resource management**: If your task steps are resource-intensive (e.g., they make external API calls or perform heavy computations), running too many instances concurrently can overload your system. By limiting concurrency, you can ensure your system remains stable and responsive.
3. **Avoiding race conditions**: If your task steps modify shared resources, running multiple instances concurrently can lead to race conditions and inconsistent data. Concurrency control helps you avoid these issues by ensuring only a limited number of instances run at a time.
4. **Compliance with external service limits**: If your task steps interact with external services that have rate limits, concurrency control can help you stay within those limits and avoid being throttled or blocked.
5. **Spike Protection**: When you have tasks that are triggered by external events, such as webhooks or user actions, you may experience spikes in traffic that can overwhelm your system. Concurrency control can help you manage these spikes by limiting the number of concurrent runs and queuing new runs until resources become available.
### Available Strategies:
- [`GROUP_ROUND_ROBIN`](#group-round-robin): Distribute task instances across available slots in a round-robin fashion based on the `key` function.
- [`CANCEL_IN_PROGRESS`](#cancel-in-progress): Cancel the currently running task instances for the same concurrency key to free up slots for the new instance.
- [`CANCEL_NEWEST`](#cancel-newest): Cancel the newest task instance for the same concurrency key to free up slots for the new instance.
> We're always open to adding more strategies to fit your needs. Join our [discord](https://hatchet.run/discord) to let us know.
### Setting concurrency on workers
In addition to setting concurrency limits at the task level, you can also control concurrency at the worker level by passing the `slots` option when creating a new `Worker` instance:
#### Python
```python
class WorkflowInput(BaseModel):
group: str
concurrency_limit_rr_workflow = hatchet.workflow(
name="ConcurrencyDemoWorkflowRR",
concurrency=ConcurrencyExpression(
expression="input.group",
max_runs=1,
limit_strategy=ConcurrencyLimitStrategy.GROUP_ROUND_ROBIN,
),
input_validator=WorkflowInput,
)
```
#### Typescript
```typescript
export const simpleConcurrency = hatchet.workflow({
name: 'simple-concurrency',
concurrency: {
maxRuns: 1,
limitStrategy: ConcurrencyLimitStrategy.GROUP_ROUND_ROBIN,
expression: 'input.GroupKey',
},
});
```
#### Go
```go
var maxRuns int32 = 1
strategy := types.GroupRoundRobin
return client.NewStandaloneTask("simple-concurrency",
func(ctx worker.HatchetContext, input ConcurrencyInput) (*TransformedOutput, error) {
// Random sleep between 200ms and 1000ms
time.Sleep(time.Duration(200+rand.Intn(800)) * time.Millisecond)
return &TransformedOutput{
TransformedMessage: input.Message,
}, nil
},
hatchet.WithWorkflowConcurrency(types.Concurrency{
Expression: "input.GroupKey",
MaxRuns: &maxRuns,
LimitStrategy: &strategy,
}),
)
```
#### Ruby
```ruby
CONCURRENCY_LIMIT_RR_WORKFLOW = HATCHET.workflow(
name: "ConcurrencyDemoWorkflowRR",
concurrency: Hatchet::ConcurrencyExpression.new(
expression: "input.group",
max_runs: 1,
limit_strategy: :group_round_robin
)
)
CONCURRENCY_LIMIT_RR_WORKFLOW.task(:step1) do |input, ctx|
puts "starting step1"
sleep 2
puts "finished step1"
end
```
This example will only let 1 run in each group run at a given time to fairly distribute the load across the workers.
## Group Round Robin
### How it works
When a new task instance is triggered, the `GROUP_ROUND_ROBIN` strategy will:
1. Determine the group that the instance belongs to based on the `key` function defined in the task's concurrency configuration.
2. Check if there are any available slots for the instance's group based on the `slots` limit of available workers.
3. If a slot is available, the new task instance starts executing immediately.
4. If no slots are available, the new task instance is added to a queue for its group.
5. When a running task instance completes and a slot becomes available for a group, the next queued instance for that group (in round-robin order) is dequeued and starts executing.
This strategy ensures that task instances are processed fairly across different groups, preventing any one group from monopolizing the available resources. It also helps to reduce latency for instances within each group, as they are processed in a round-robin fashion rather than strictly in the order they were triggered.
### When to use `GROUP_ROUND_ROBIN`
The `GROUP_ROUND_ROBIN` strategy is particularly useful in scenarios where:
- You have multiple clients or users triggering task instances, and you want to ensure fair resource allocation among them.
- You want to process instances within each group in a round-robin fashion to minimize latency and ensure that no single instance within a group is starved for resources.
- You have long-running task instances and want to avoid one group's instances monopolizing the available slots.
Keep in mind that the `GROUP_ROUND_ROBIN` strategy may not be suitable for all use cases, especially those that require strict ordering or prioritization of the most recent events.
## Cancel In Progress
### How it works
When a new task instance is triggered, the `CANCEL_IN_PROGRESS` strategy will:
1. Determine the group that the instance belongs to based on the `key` function defined in the task's concurrency configuration.
2. Check if there are any available slots for the instance's group based on the `maxRuns` limit of available workers.
3. If a slot is available, the new task instance starts executing immediately.
4. If there are no available slots, currently running task instances for the same concurrency key are cancelled to free up slots for the new instance.
5. The new task instance starts executing immediately.
### When to use Cancel In Progress
The `CANCEL_IN_PROGRESS` strategy is particularly useful in scenarios where:
- You have long-running task instances that may become stale or irrelevant if newer instances are triggered.
- You want to prioritize processing the most recent data or events, even if it means canceling older task instances.
- You have resource-intensive tasks where it's more efficient to cancel an in-progress instance and start a new one than to wait for the old instance to complete.
- Your user UI allows for multiple inputs, but only the most recent is relevant (i.e. chat messages, form submissions, etc.).
## Cancel Newest
### How it works
The `CANCEL_NEWEST` strategy is similar to `CANCEL_IN_PROGRESS`, but it cancels the newly enqueued run instead of the oldest.
### When to use `CANCEL_NEWEST`
The `CANCEL_NEWEST` strategy is particularly useful in scenarios where:
- You want to allow in progress runs to complete before starting new work.
- You have long-running task instances and want to avoid one group's instances monopolizing the available slots.
## Multiple concurrency strategies
You can also combine multiple concurrency strategies to create a more complex concurrency control system. For example, you can use one group key to represent a specific team, and another group to represent a specific resource in that team, giving you more control over the rate at which tasks are executed.
#### Python
```python
class WorkflowInput(BaseModel):
name: str
digit: str
concurrency_workflow_level_workflow = hatchet.workflow(
name="ConcurrencyWorkflowLevel",
input_validator=WorkflowInput,
concurrency=[
ConcurrencyExpression(
expression="input.digit",
max_runs=DIGIT_MAX_RUNS,
limit_strategy=ConcurrencyLimitStrategy.GROUP_ROUND_ROBIN,
),
ConcurrencyExpression(
expression="input.name",
max_runs=NAME_MAX_RUNS,
limit_strategy=ConcurrencyLimitStrategy.GROUP_ROUND_ROBIN,
),
],
)
```
#### Typescript
```typescript
export const multipleConcurrencyKeys = hatchet.workflow({
name: 'simple-concurrency',
concurrency: [
{
maxRuns: 1,
limitStrategy: ConcurrencyLimitStrategy.GROUP_ROUND_ROBIN,
expression: 'input.Tier',
},
{
maxRuns: 1,
limitStrategy: ConcurrencyLimitStrategy.GROUP_ROUND_ROBIN,
expression: 'input.Account',
},
],
});
```
#### Go
```go
strategy := types.GroupRoundRobin
var maxRuns int32 = 20
return client.NewStandaloneTask("multi-concurrency",
func(ctx worker.HatchetContext, input ConcurrencyInput) (*TransformedOutput, error) {
// Random sleep between 200ms and 1000ms
time.Sleep(time.Duration(200+rand.Intn(800)) * time.Millisecond)
return &TransformedOutput{
TransformedMessage: input.Message,
}, nil
},
hatchet.WithWorkflowConcurrency(
types.Concurrency{
Expression: "input.Tier",
MaxRuns: &maxRuns,
LimitStrategy: &strategy,
}, types.Concurrency{
Expression: "input.Account",
MaxRuns: &maxRuns,
LimitStrategy: &strategy,
},
),
)
```
#### Ruby
```ruby
CONCURRENCY_WORKFLOW_LEVEL_WORKFLOW = HATCHET.workflow(
name: "ConcurrencyWorkflowLevel",
concurrency: [
Hatchet::ConcurrencyExpression.new(
expression: "input.digit",
max_runs: DIGIT_MAX_RUNS_WL,
limit_strategy: :group_round_robin
),
Hatchet::ConcurrencyExpression.new(
expression: "input.name",
max_runs: NAME_MAX_RUNS_WL,
limit_strategy: :group_round_robin
)
]
)
CONCURRENCY_WORKFLOW_LEVEL_WORKFLOW.task(:task_1) do |input, ctx|
sleep SLEEP_TIME_WL
end
CONCURRENCY_WORKFLOW_LEVEL_WORKFLOW.task(:task_2) do |input, ctx|
sleep SLEEP_TIME_WL
end
```
---
# Rate Limiting Step Runs in Hatchet
Hatchet allows you to enforce rate limits on task runs, enabling you to control the rate at which your service runs consume resources, such as external API calls, database queries, or other services. By defining rate limits, you can prevent task runs from exceeding a certain number of requests per time window (e.g., per second, minute, or hour), ensuring efficient resource utilization and avoiding overloading external services.
The state of active rate limits can be viewed in the dashboard in the `Rate Limit` resource tab.
## Dynamic vs Static Rate Limits
Hatchet offers two patterns for Rate Limiting task runs:
1. [Dynamic Rate Limits](#dynamic-rate-limits): Allows for complex rate limiting scenarios, such as per-user limits, by using `input` or `additional_metadata` keys to upsert a limit at runtime.
2. [Static Rate Limits](#static-rate-limits): Allows for simple rate limiting for resources known prior to runtime (e.g., external APIs).
## Dynamic Rate Limits
Dynamic rate limits are ideal for complex scenarios where rate limits need to be partitioned by resources that are only known at runtime.
This pattern is especially useful for:
1. Rate limiting individual users or tenants
2. Implementing variable rate limits based on subscription tiers or user roles
3. Dynamically adjusting limits based on real-time system load or other factors
### How It Works
1. Define the dynamic rate limit key with a CEL (Common Expression Language) Expression on the key, referencing either `input` or `additional_metadata`.
2. Provide this key as part of the workflow trigger or event `input` or `additional_metadata` at runtime.
3. Hatchet will create or update the rate limit based on the provided key and enforce it for the step run.
> **Info:** Note: Dynamic keys are a shared resource, this means the same rendered cel on
> multiple steps will be treated as one global rate limit.
### Declaring and Consuming Dynamic Rate Limits
#### Ruby
> Note: `dynamic_key` must be a CEL expression. `units` and `limits` can be either an integer or a CEL expression.
We can add one or more rate limits to a task by adding the `rate_limits` configuration to the task definition.
```python
@rate_limit_workflow.task(
rate_limits=[
RateLimit(
dynamic_key="input.user_id",
units=1,
limit=10,
duration=RateLimitDuration.MINUTE,
)
]
)
def step_2(input: RateLimitInput, ctx: Context) -> None:
print("executed step_2")
```
#### Tab 2
> Note: `dynamicKey` must be a CEL expression. `units` and `limit` can be either an integer or a CEL expression.
We can add one or more rate limits to a task by adding the `rate_limits` configuration to the task definition.
```typescript
const task2 = hatchet.task({
name: 'task2',
fn: (input: { userId: string }) => {
console.log('executed task2 for user: ', input.userId);
},
rateLimits: [
{
dynamicKey: 'input.userId',
units: 1,
limit: 10,
duration: RateLimitDuration.MINUTE,
},
],
});
```
#### Tab 3
> Note: Go requires both a key and KeyExpr be set and the LimitValueExpr must be a CEL.
```go
userUnits := 1
userLimit := "10"
duration := types.Minute
dynamicTask := client.NewStandaloneTask("task2",
func(ctx hatchet.Context, input APIRequest) (string, error) {
log.Printf("executed task2 for user: %s", input.UserID)
return "completed", nil
},
hatchet.WithRateLimits(&types.RateLimit{
Key: "input.userId",
Units: &userUnits,
LimitValueExpr: &userLimit,
Duration: &duration,
}),
)
```
#### Tab 4
```ruby
RATE_LIMIT_WORKFLOW.task(
:step_2,
rate_limits: [
Hatchet::RateLimit.new(
dynamic_key: "input.user_id",
units: 1,
limit: 10,
duration: :minute
)
]
) do |input, ctx|
puts "executed step_2"
end
```
## Static Rate Limits
Static Rate Limits (formerly known as Global Rate Limits) are defined as part of your worker startup lifecycle prior to runtime. This model provides a single "source of truth" for pre-defined resources such as:
1. External API resources that have a rate limit across all users or tenants
2. Database connection pools with a maximum number of concurrent connections
3. Shared computing resources with limited capacity
### How It Works
1. Declare static rate limits using the `put_rate_limit` method in the `Admin` client before starting your worker.
2. Specify the units of consumption for a specific rate limit key in each step definition using the `rate_limits` configuration.
3. Hatchet enforces the defined rate limits by tracking the number of units consumed by each step run across all workflow runs.
If a step run exceeds the rate limit, Hatchet re-queues the step run until the rate limit is no longer exceeded.
### Declaring Static Limits
Define the static rate limits that can be consumed by any step run across all workflow runs using the `put_rate_limit` method in the `Admin` client within your code.
#### Ruby
```python
RATE_LIMIT_KEY = "test-limit"
hatchet.rate_limits.put(RATE_LIMIT_KEY, 2, RateLimitDuration.SECOND)
```
#### Tab 2
{" "}
```typescript
hatchet.ratelimits.upsert({
key: 'api-service-rate-limit',
limit: 10,
duration: RateLimitDuration.SECOND,
});
```
#### Tab 3
```go
err = client.RateLimits().Upsert(features.CreateRatelimitOpts{
Key: RATE_LIMIT_KEY,
Limit: 10,
Duration: types.Second,
})
if err != nil {
log.Fatalf("failed to create rate limit: %v", err)
}
```
#### Tab 4
```ruby
def main
HATCHET.rate_limits.put(RATE_LIMIT_KEY, 2, :second)
worker = HATCHET.worker(
"rate-limit-worker", slots: 10, workflows: [RATE_LIMIT_WORKFLOW]
)
worker.start
end
```
### Consuming Static Rate Limits
With your rate limit key defined, specify the units of consumption for a specific key in each step definition by adding the `rate_limits` configuration to your step definition in your workflow.
#### Ruby
```python
RATE_LIMIT_KEY = "test-limit"
@rate_limit_workflow.task(rate_limits=[RateLimit(static_key=RATE_LIMIT_KEY, units=1)])
def step_1(input: RateLimitInput, ctx: Context) -> None:
print("executed step_1")
```
#### Tab 2
```typescript
const RATE_LIMIT_KEY = 'api-service-rate-limit';
const task1 = hatchet.task({
name: 'task1',
rateLimits: [
{
staticKey: RATE_LIMIT_KEY,
units: 1,
},
],
fn: (input) => {
console.log('executed task1');
},
});
```
#### Tab 3
```go
units := 1
staticTask := client.NewStandaloneTask("task1",
func(ctx hatchet.Context, input APIRequest) (string, error) {
log.Println("executed task1")
return "completed", nil
},
hatchet.WithRateLimits(&types.RateLimit{
Key: RATE_LIMIT_KEY,
Units: &units,
}),
)
```
#### Tab 4
```ruby
RATE_LIMIT_KEY = "test-limit"
RATE_LIMIT_WORKFLOW.task(
:step_1,
rate_limits: [Hatchet::RateLimit.new(static_key: RATE_LIMIT_KEY, units: 1)]
) do |input, ctx|
puts "executed step_1"
end
```
### Limiting Workflow Runs
To rate limit an entire workflow run, it's recommended to specify the rate limit configuration on the entry step (i.e., the first step in the workflow). This will gate the execution of all downstream steps in the workflow.
---
# Assigning priority to tasks in Hatchet
Hatchet allows you to assign different `priority` values to your tasks depending on how soon you want them to run. `priority` can be set to either `1`, `2`, or `3`, (`low`, `medium`, and `high`, respectively) with relatively higher values resulting in that task being picked up before others of the same type. **By default, runs in Hatchet have a priority of 1 (low) unless otherwise specified.**
Priority only affects multiple runs of a _single_ workflow. If you have two different workflows (A and B) and set A to globally have a priority of 3, and B to globally have a priority of 1, this does _not_ guarantee that if there is one task from A and one from B in the queue, that A's task will be run first.
However, _within_ A, if you enqueue one task with priority 3 and one with priority 1, the priority 3 task will be run first.
A couple of common use cases for assigning priorities are things like:
1. Having high-priority (e.g. paying, new, etc.) customers be prioritized over lower-priority ones, allowing them to get faster turnaround times on their tasks.
2. Having tasks triggered via your API run with higher priority than the same tasks triggered by a cron.
## Setting priority for a task or workflow
There are a few different ways to set priorities for tasks or workflows in Hatchet.
### Workflow-level default priority
First, you can set a default priority at the workflow level:
#### Ruby
```python
DEFAULT_PRIORITY = Priority.LOW
SLEEP_TIME = 0.25
priority_workflow = hatchet.workflow(
name="PriorityWorkflow",
default_priority=DEFAULT_PRIORITY,
)
```
#### Tab 2
```typescript
export const priorityWf = hatchet.workflow({
name: 'priority-wf',
defaultPriority: Priority.LOW,
});
```
#### Tab 3
```go
workflow := client.NewWorkflow(
"priority",
hatchet.WithWorkflowDefaultPriority(features.RunPriorityLow),
)
```
#### Tab 4
```ruby
DEFAULT_PRIORITY = 1
SLEEP_TIME = 0.25
PRIORITY_WORKFLOW = HATCHET.workflow(
name: "PriorityWorkflow",
default_priority: DEFAULT_PRIORITY
)
PRIORITY_WORKFLOW.task(:priority_task) do |input, ctx|
puts "Priority: #{ctx.priority}"
sleep SLEEP_TIME
end
```
This will assign the same default priority to all runs of this workflow (and all of the workflow's corresponding tasks), but will have no effect without also setting run-level priorities, since every run will use the same default.
### Priority-on-trigger
When you trigger a run, you can set the priority of the triggered run to override its default priority.
#### Ruby
```python
low_prio = priority_workflow.run(
## 👀 Adding priority and key to metadata to show them in the dashboard
priority=Priority.LOW,
additional_metadata={"priority": "low", "key": 1},
wait_for_result=False,
)
high_prio = priority_workflow.run(
## 👀 Adding priority and key to metadata to show them in the dashboard
priority=Priority.HIGH,
additional_metadata={"priority": "high", "key": 1},
wait_for_result=False,
)
```
#### Tab 2
```typescript
const run = priority.run(new Date(Date.now() + 60 * 60 * 1000), { priority: Priority.HIGH });
```
#### Tab 3
```go
ref, err := client.RunNoWait(
context.Background(),
workflow.GetName(),
PriorityInput{},
hatchet.WithRunPriority(features.RunPriorityLow),
)
if err != nil {
return err
}
```
#### Tab 4
```ruby
low_prio = PRIORITY_WORKFLOW.run_no_wait(
{},
options: Hatchet::TriggerWorkflowOptions.new(
priority: 1,
additional_metadata: { "priority" => "low", "key" => 1 }
)
)
high_prio = PRIORITY_WORKFLOW.run_no_wait(
{},
options: Hatchet::TriggerWorkflowOptions.new(
priority: 3,
additional_metadata: { "priority" => "high", "key" => 1 }
)
)
```
Similarly, you can also assign a priority to scheduled and cron workflows.
#### Ruby
```python
schedule = priority_workflow.schedule(
run_at=datetime.now(tz=timezone.utc) + timedelta(minutes=1),
priority=Priority.HIGH,
)
cron = priority_workflow.create_cron(
cron_name="my-scheduled-cron",
expression="0 * * * *",
priority=Priority.HIGH,
)
```
#### Tab 2
```typescript
const scheduled = priority.schedule(
new Date(Date.now() + 60 * 60 * 1000),
{},
{ priority: Priority.HIGH }
);
const delayed = priority.delay(60 * 60 * 1000, {}, { priority: Priority.HIGH });
const cron = priority.cron(
`daily-cron-${Math.random()}`,
'0 0 * * *',
{},
{ priority: Priority.HIGH }
);
```
#### Tab 3
```go
priority := features.RunPriorityHigh
schedule, err := client.Schedules().Create(
context.Background(),
workflow.GetName(),
features.CreateScheduledRunTrigger{
Priority: &priority,
},
)
if err != nil {
return err
}
cron, err := client.Crons().Create(
context.Background(),
workflow.GetName(),
features.CreateCronTrigger{
Priority: &priority,
},
)
if err != nil {
return err
}
```
#### Tab 4
```ruby
schedule = PRIORITY_WORKFLOW.schedule(
Time.now + 60,
options: Hatchet::TriggerWorkflowOptions.new(priority: 3)
)
cron = PRIORITY_WORKFLOW.create_cron(
"my-scheduled-cron",
"0 * * * *",
input: {},
)
```
In these cases, the priority set on the trigger will override the default priority, so these runs will be processed ahead of lower-priority ones.
---
# Durable Tasks
Use durable tasks when **you don't know the shape of work ahead of time**. For example, an AI agent that picks its next action based on a model response, a fan-out where N is determined by the input data, or a pipeline that branches and spawns sub-workflows based on intermediate results. In all of these cases, the "graph" of work doesn't exist when the task starts; it emerges at runtime as the task makes decisions and [spawns children](/v1/child-spawning).
A durable task is a single long-running function that acts as an **orchestrator**: it spawns child tasks, waits for their results, makes decisions, and spawns more. Hatchet checkpoints its progress so it can recover from crashes, survive long waits, and resume on any worker without re-executing completed work.
> **Info:** If you know the full graph of work upfront (every task and dependency is fixed
> before execution begins), use a [DAG](/v1/patterns/directed-acyclic-graphs)
> instead. You can always [mix both patterns](/v1/patterns/mixing-patterns) in
> the same application.
## When to Use Durable Tasks
Scenario, Why Durable?
**Dynamic fan-out** (N unknown), Spawn children based on runtime data; wait for results without holding a slot.
**Agentic workflows**, An agent decides what to do next, spawns subtasks, loops, or stops at runtime.
**Long waits** (hours/days), Worker slots are freed during waits; no wasted compute.
**Human-in-the-loop**, Wait for approval events without holding resources
**Multi-step with inline pauses**, `SleepFor` and `WaitForEvent` let you express complex procedural flows.
**Crash-resilient pipelines**, Automatically resume from checkpoints after failures.
## How It Works
A durable task builds the workflow at runtime through **child spawning**. The task function runs, inspects data, and decides what to do next by spawning child tasks. The parent is [evicted](/v1/task-eviction) while children execute, freeing its worker slot. When children complete, the parent resumes from its checkpoint and continues.
```mermaid
sequenceDiagram
participant P as Durable Task
participant H as Hatchet
participant W as Workers
P->>H: Spawn Child A, Child B, Child C...N
H-->>P: Evicted (slot freed)
H->>W: Schedule children across fleet
W->>H: Child results
H->>P: Resume from checkpoint
P->>P: Inspect results, decide next step
P->>H: Spawn more children, sleep, or finish
```
This is fundamentally different from a DAG, where every task and dependency is declared before execution begins. With durable tasks, the number of children, which branches to take, and whether to loop or stop are all determined by your code at runtime.
### Checkpoints
Each call to `SleepFor`, `WaitForEvent`, `WaitFor`, `Memo`, or `RunChild` creates a checkpoint in the durable event log. These checkpoints record the task's progress.
### Worker slot is freed during waits
When a durable task enters a wait (sleep, event, or child result), Hatchet [evicts](/v1/task-eviction) it from the worker. The slot is immediately available for other tasks.
### Task resumes from checkpoint
When the wait completes, Hatchet re-queues the task on any available worker. It replays the event log up to the last checkpoint and resumes execution from there. Completed operations are not re-executed.
## The Durable Context
Declare a task as durable (using `durable_task` instead of `task`) and it receives a `DurableContext` instead of a normal `Context`. The `DurableContext` extends `Context` with methods for checkpointing and waiting:
Method, Purpose
**`SleepFor(duration)`**, Pause for a fixed duration. Respects the original sleep time on restart; if interrupted after 23 of 24 hours, only sleeps 1 more hour.
**`WaitForEvent(key, expr)`**, Wait for an external event by key, with optional [CEL filter](https://github.com/google/cel-spec) expression on the payload.
**`WaitFor(conditions)`**, General-purpose wait accepting any combination of sleep conditions, event conditions, or or-groups. `SleepFor` and `WaitForEvent` are convenience wrappers around this method.
**`Memo(function)`**, Run functions whose outputs are memoized based on the input arguments.
**`RunChild(task, input)`**, Spawn a child task and wait for its result. The parent is evicted during the wait.
## Example Task
```python
durable_workflow = hatchet.workflow(name="DurableWorkflow")
```
Now add tasks to the workflow. The first is a regular task; the second is a durable task that sleeps and waits for an event:
```python
EVENT_KEY = "durable-example:event"
SLEEP_TIME = 5
REPLAY_RESET_SLEEP_TIME = 3
@durable_workflow.task()
async def ephemeral_task(input: EmptyModel, ctx: Context) -> None:
print("Running non-durable task")
class AwaitedEvent(BaseModel):
id: str
@durable_workflow.durable_task()
async def durable_task(input: EmptyModel, ctx: DurableContext) -> dict[str, str | int]:
print("Waiting for sleep")
sleep = await ctx.aio_sleep_for(duration=timedelta(seconds=SLEEP_TIME))
print("Sleep finished")
print("Waiting for event")
event = await ctx.aio_wait_for_event(
EVENT_KEY, "true", payload_validator=AwaitedEvent
)
print("Event received")
return {
"status": "success",
"event_id": event.id,
"sleep_duration_seconds": sleep.duration.seconds,
}
```
> **Info:** The `durable_task` decorator gives the function a `DurableContext` instead of
> a regular `Context`. This is the only difference in declaration; the task
> registers and runs on the same worker as regular tasks.
If this task is interrupted at any time, it will continue from where it left off. If the task calls `ctx.aio_sleep_for` for 24 hours and is interrupted after 23 hours, it will only sleep for 1 more hour on restart.
### Or Groups
Durable tasks can combine multiple wait conditions using [or groups](/v1/conditions#or-groups). For example, you could wait for either an event or a sleep (whichever comes first):
```python
@durable_workflow.durable_task()
async def wait_for_or_group_1(
_i: EmptyModel, ctx: DurableContext
) -> dict[str, str | int | float]:
start = time.time()
wait_result = await ctx.aio_wait_for(
uuid4().hex,
or_(
SleepCondition(timedelta(seconds=SLEEP_TIME)),
UserEventCondition(event_key=EVENT_KEY),
),
)
key = list(wait_result.keys())[0]
event_id = list(wait_result[key].keys())[0]
return {
"runtime": time.time() - start,
"key": key,
"event_id": event_id,
}
```
## Spawning Child Tasks
Child spawning is the primary way durable tasks build workflows at runtime. A durable task can spawn any runnable (regular tasks, other durable tasks, or entire DAG workflows), wait for results, and decide what to do next.
Child type, Example
**Regular task**, Spawn a stateless task for a quick computation or API call.
**Durable task**, Spawn another durable task that has its own checkpoints, sleeps, and event waits.
**DAG workflow**, Spawn an entire multi-task workflow and wait for its final output.
The parent is evicted while children execute, so it consumes no resources. The number and type of children can be determined dynamically based on input, intermediate results, or model outputs.
See [Child Spawning](/v1/child-spawning) for patterns and full examples.
> **Info:** For an in-depth look at how durable execution works internally, see [this blog
> post](https://hatchet.run/blog/durable-execution).
---
# Declarative Workflow Design (DAGs)
Hatchet workflows are designed in a **Directed Acyclic Graph (DAG)** format, where each task is a node in the graph, and the dependencies between tasks are the edges. This structure ensures that workflows are organized, predictable, and free from circular dependencies.
## How DAG Workflows Work
### You declare the graph
Define tasks and their dependencies upfront. Hatchet knows the full shape of work before execution begins.
### Hatchet executes in order
Tasks run as soon as their parents complete. Independent tasks run in parallel automatically. A worker slot is only assigned when a task is ready to execute, so tasks waiting on parents consume no resources. Each task has configurable [retry policies](/v1/retry-policies) and [timeouts](/v1/timeouts).
### Results flow downstream
Task outputs are cached and passed to child tasks. If a failure occurs mid-workflow, completed tasks don't re-run.
### Everything is observable
Every task execution is tracked in the dashboard — inputs, outputs, durations, and errors. You can see exactly where a workflow succeeded or failed.
## Defining a Workflow
Start by declaring a workflow with a name. The workflow object can declare additional workflow-level configuration options which we'll cover later.
The returned object is an instance of the `Workflow` class, which is the primary interface for interacting with the workflow (i.e. [running](/v1/running-your-task#run-and-wait), [enqueuing](/v1/running-your-task#fire-and-forget), [scheduling](/v1/scheduled-runs), etc).
#### Python
```python
dag_workflow = hatchet.workflow(name="DAGWorkflow")
```
#### Typescript
```typescript
// First, we declare the workflow
export const dag = hatchet.workflow({
name: 'simple',
});
```
#### Go
```go
workflow := client.NewWorkflow("dag-workflow")
```
#### Ruby
```ruby
DAG_WORKFLOW = HATCHET.workflow(name: "DAGWorkflow")
```
The Workflow return object can be interacted with in the same way as a
[task](/v1/tasks), however, it can only take a subset of options which are
applied at the task level.
## Defining a Task
Now that we have a workflow, we can define a task to be executed as part of the workflow. Tasks are defined by calling the `task` method on the workflow object.
The `task` method takes a name and a function that defines the task's behavior. The function will receive the workflow's input and return the task's output. Tasks also accept a number of other configuration options, which are covered elsewhere in our documentation.
#### Python
In Python, the `task` method is a decorator, which is used like this to wrap a function:
```python
@dag_workflow.task(execution_timeout=timedelta(seconds=5))
def step1(input: EmptyModel, ctx: Context) -> StepOutput:
return StepOutput(random_number=random.randint(1, 100))
```
The function takes two arguments: `input`, which is a Pydantic model, and `ctx`, which is the Hatchet `Context` object. We'll discuss both of these more later.
> **Info:** In the internals of Hatchet, the task is called using _positional arguments_, meaning that you can name `input` and `ctx` whatever you like.
>
> For instance, `def task_1(foo: EmptyModel, bar: Context) -> None:` is perfectly valid.
#### Typescript
```typescript
// Next, we declare the tasks bound to the workflow
const toLower = dag.task({
name: 'to-lower',
fn: (input) => {
return {
TransformedMessage: input.Message.toLowerCase(),
};
},
});
```
The `fn` argument is a function that takes the workflow's input and a
context object. The context object contains information about the workflow
run (e.g. the run ID, the workflow's input, etc). It can be synchronous or
asynchronous.
#### Go
```go
step1 := workflow.NewTask("step-1", func(ctx hatchet.Context, input Input) (StepOutput, error) {
return StepOutput{
Step: 1,
Result: input.Value * 2,
}, nil
})
```
#### Ruby
```ruby
STEP1 = DAG_WORKFLOW.task(:step1, execution_timeout: 5) do |input, ctx|
{ "random_number" => rand(1..100) }
end
STEP2 = DAG_WORKFLOW.task(:step2, execution_timeout: 5) do |input, ctx|
{ "random_number" => rand(1..100) }
end
```
## Building a DAG with Task Dependencies
The power of Hatchet's workflow design comes from connecting tasks into a DAG structure. Tasks can specify dependencies (parents) which must complete successfully before the task can start.
#### Python
```python
@dag_workflow.task(execution_timeout=timedelta(seconds=5))
async def step2(input: EmptyModel, ctx: Context) -> StepOutput:
return StepOutput(random_number=random.randint(1, 100))
@dag_workflow.task(parents=[step1, step2])
async def step3(input: EmptyModel, ctx: Context) -> RandomSum:
one = ctx.task_output(step1).random_number
two = ctx.task_output(step2).random_number
return RandomSum(sum=one + two)
```
#### Typescript
```typescript
dag.task({
name: 'reverse',
parents: [toLower],
fn: async (input, ctx) => {
const lower = await ctx.parentOutput(toLower);
return {
Original: input.Message,
Transformed: lower.TransformedMessage.split('').reverse().join(''),
};
},
});
```
#### Go
```go
step2 := workflow.NewTask("step-2", func(ctx hatchet.Context, input Input) (StepOutput, error) {
// Get output from step 1
var step1Output StepOutput
if err := ctx.ParentOutput(step1, &step1Output); err != nil {
return StepOutput{}, err
}
return StepOutput{
Step: 2,
Result: step1Output.Result + 10,
}, nil
}, hatchet.WithParents(step1))
```
#### Ruby
```ruby
DAG_WORKFLOW.task(:step3, parents: [STEP1, STEP2]) do |input, ctx|
one = ctx.task_output(STEP1)["random_number"]
two = ctx.task_output(STEP2)["random_number"]
{ "sum" => one + two }
end
DAG_WORKFLOW.task(:step4, parents: [STEP1, :step3]) do |input, ctx|
puts(
"executed step4",
Time.now.strftime("%H:%M:%S"),
input.inspect,
ctx.task_output(STEP1).inspect,
ctx.task_output(:step3).inspect
)
{ "step4" => "step4" }
end
```
## Accessing Parent Task Outputs
As shown in the examples above, tasks can access outputs from their parent tasks using the context object:
#### Python
```python
@dag_workflow.task(execution_timeout=timedelta(seconds=5))
async def step2(input: EmptyModel, ctx: Context) -> StepOutput:
return StepOutput(random_number=random.randint(1, 100))
@dag_workflow.task(parents=[step1, step2])
async def step3(input: EmptyModel, ctx: Context) -> RandomSum:
one = ctx.task_output(step1).random_number
two = ctx.task_output(step2).random_number
return RandomSum(sum=one + two)
```
#### Typescript
```typescript
dag.task({
name: 'task-with-parent-output',
parents: [toLower],
fn: async (input, ctx) => {
const lower = await ctx.parentOutput(toLower);
return {
Original: input.Message,
Transformed: lower.TransformedMessage.split('').reverse().join(''),
};
},
});
```
#### Go
```go
// Inside a task with parent dependencies
var parentOutput ParentOutputType
err := ctx.ParentOutput(parentTask, &parentOutput)
if err != nil {
return nil, err
}
```
#### Ruby
```ruby
DAG_WORKFLOW.task(:step3, parents: [STEP1, STEP2]) do |input, ctx|
one = ctx.task_output(STEP1)["random_number"]
two = ctx.task_output(STEP2)["random_number"]
{ "sum" => one + two }
end
DAG_WORKFLOW.task(:step4, parents: [STEP1, :step3]) do |input, ctx|
puts(
"executed step4",
Time.now.strftime("%H:%M:%S"),
input.inspect,
ctx.task_output(STEP1).inspect,
ctx.task_output(:step3).inspect
)
{ "step4" => "step4" }
end
```
## Running a Workflow
You can run workflows directly or enqueue them for asynchronous execution. All the same methods for running a task are available for workflows!
#### Python
```python
dag_workflow.run()
```
#### Typescript
```typescript
const input = { Message: 'Hello, World!' };
// Run workflow and wait for the result
const result = await simple.run(input);
// Enqueue workflow to be executed asynchronously
const runReference = await simple.runNoWait(input);
```
#### Go
```go
// Run workflow and wait for the result
result, err := simple.Run(ctx, input)
// Enqueue workflow to be executed asynchronously
runID, err := simple.RunNoWait(ctx, input)
```
#### Ruby
```ruby
result = DAG_WORKFLOW.run
puts result
```
## Pre-Determined Pipelines
DAGs naturally model fixed multi-stage pipelines where the sequence of tasks and their dependencies are known before execution. ETL workflows, document processing pipelines, and CI/CD workflows all follow this pattern: each stage depends on the previous, and the overall structure is visible and predictable in the dashboard.
---
# Best Practices
## Choosing a Pattern
Use a **DAG** for any portion of work whose shape you know upfront, and use a **durable task** to orchestrate the parts whose shape is dynamic. You can mix them freely within the same application and even within the same workflow.
Scenario, Pattern
Fixed pipeline, every step is known, DAG
Fixed pipeline, but one step needs a long wait, DAG with a durable task node
Dynamic orchestration of known pipelines, Durable task spawning DAGs
Fully dynamic, shape decided at runtime, Durable task spawning tasks/durable tasks
Agent that reasons and acts in a loop, Durable task spawning children per iteration
[DAGs](/v1/patterns/directed-acyclic-graphs) are inherently deterministic, since their shape is predefined and intermediate results are cached. If your workflow can be represented as a DAG, prefer that. Reach for a durable task only when you need capabilities a static graph can't express.
> **Info:** You don't have to choose one pattern for your entire application. Different
> workflows can use different patterns, and a single workflow can mix them.
> Start with the simplest pattern that fits and add complexity only when needed.
## Mixing Patterns
### A durable task inside a DAG
A DAG workflow can include a durable task as one of its nodes. The durable task checkpoints and waits like any other, while the rest of the DAG proceeds according to its declared dependencies.
This is useful when most of your pipeline is a fixed graph but one step needs dynamic behavior, for example a pipeline where one stage runs an agentic loop that decides what to do at runtime.
```mermaid
graph LR
A[Prepare Data] --> B[Durable: Agentic Loop]
B --> C[Publish Results]
style B stroke:#3392FF,stroke-dasharray: 5 5
```
The durable task (`Agentic Loop`) can spawn children, sleep, wait for events, or loop until a condition is met. When it completes, the downstream `Publish Results` task runs automatically.
### Spawning a DAG from a durable task
A durable task can spawn an entire DAG workflow as a child, wait for its result, and then continue. This lets you use procedural control flow to decide _which_ pipeline to run and _how many times_ to run it, while the pipeline itself is a well-defined graph.
```mermaid
graph TD
DT[Durable Task] -->|spawns| DAG1[DAG: Process Batch 1]
DT -->|spawns| DAG2[DAG: Process Batch 2]
DT -->|spawns| DAG3[DAG: Process Batch N]
DAG1 -->|result| DT
DAG2 -->|result| DT
DAG3 -->|result| DT
style DT stroke:#3392FF
style DAG1 stroke:#22C55E
style DAG2 stroke:#22C55E
style DAG3 stroke:#22C55E
```
The durable task decides at runtime how many batches to process, spawns a DAG workflow for each one, and collects the results. The DAG workflows run in parallel across your worker fleet while the durable task's slot is freed.
### Durable tasks spawning durable tasks
A durable task can spawn other durable tasks as children, each with their own checkpoints and event waits. This creates a tree of durable work that's entirely driven by runtime logic.
```mermaid
graph TD
Root[Durable: Orchestrator] -->|spawns| A[Durable: Agent A]
Root -->|spawns| B[Durable: Agent B]
A -->|spawns| A1[Task: Subtask]
A -->|spawns| A2[Task: Subtask]
B -->|spawns| B1[Durable: Sub-Agent]
B1 -->|spawns| B1a[Task: Subtask]
style Root stroke:#3392FF,stroke-dasharray: 5 5
style A stroke:#3392FF,stroke-dasharray: 5 5
style B stroke:#3392FF,stroke-dasharray: 5 5
style B1 stroke:#3392FF,stroke-dasharray: 5 5
```
This pattern is ideal for agent-based systems where each level of the tree decides what to do next. Each durable task in the tree can sleep, wait for events, or spawn more children, and none of them hold a worker slot while waiting.
## Determinism in Durable Tasks
Durable tasks must be **deterministic** between checkpoints. The task should always perform the same sequence of operations in between retries. This is what allows Hatchet to replay the task from the last checkpoint. If a task is not deterministic, it may produce different results on each retry, which can lead to unexpected behavior.
### Rules for determinism
1. **Only call methods available on the `DurableContext`**: a common way to introduce non-determinism is to call methods that produce side effects. If you need to fetch data from a database, call an API, or otherwise interact with external systems, spawn those operations as a **child task** using `RunChild`. Durable tasks are [evicted](/v1/task-eviction) at every wait point and replayed from checkpoint on resume. Any side effect not behind a checkpoint will re-execute.
2. **When updating durable tasks, always guarantee backwards compatibility**: if you change the order of checkpoint operations in a durable task, you may break determinism. For example, if you call `SleepFor` followed by `WaitFor`, and then change the order of those calls, Hatchet will not be able to replay the task correctly. The task may have already been checkpointed at the first call to `SleepFor`, and changing the order makes that checkpoint meaningless.
---
# Child Spawning
A task can spawn child tasks at runtime, including other durable tasks or entire DAG workflows. Children run independently on any available worker, and the parent can wait for their results.
Both durable tasks and DAG tasks support child spawning with the same core API. The key difference is that durable tasks free the parent's worker slot while waiting (via [eviction](/v1/task-eviction)), while DAG tasks hold their slot for the duration of execution.
#### Durable Tasks
## Spawning from Durable Tasks
A durable task can spawn child tasks at runtime. This is one of the core reasons to choose durable tasks over DAGs: the shape of work is decided as the task runs, not declared upfront.
> **Info:** Waiting for child results puts the parent task into an [evictable
> state](/v1/task-eviction), the worker slot is freed and the parent is
> re-queued when results are available.
Because the parent is evicted while children execute:
- **No slot waste** — the parent doesn't hold a worker slot while N children run across your fleet.
- **No deadlocks** — because the parent is evicted, it can't starve its own children for slots.
- **Dynamic N** — you decide how many children to spawn based on runtime data (input size, API responses, agent reasoning).
### Spawning child tasks
Use the context object to spawn a child task from within a durable task. The child runs independently on any available worker.
#### Python
```python
from examples.fanout.worker import ChildInput, child_wf
# 👀 example: run this inside of a parent task to spawn a child
child_wf.run(
ChildInput(a="b"),
)
```
#### Typescript
```typescript
export const parentSingleChild = hatchet.task({
name: 'parent-single-child',
fn: async () => {
const childRes = await child.run({ N: 1 });
return {
Result: childRes.Value,
};
},
});
```
#### Go
```go
// Inside a parent task
childResult, err := childWorkflow.Run(hCtx, ChildInput{
Value: 1,
})
if err != nil {
return err
}
```
#### Ruby
```ruby
FANOUT_CHILD_WF.run({ "a" => "b" })
```
### Parallel fan-out
Spawn many children at once and wait for all results. The parent is evicted during the wait, so it consumes no resources while children run.
#### Python
```python
async def run_child_workflows(n: int) -> list[dict[str, Any]]:
return await child_wf.aio_run_many(
[
child_wf.create_bulk_run_item(
input=ChildInput(a=str(i)),
)
for i in range(n)
]
)
```
#### Typescript
```typescript
type ParentInput = {
N: number;
};
export const parent = hatchet.task({
name: 'parent',
fn: async (input: ParentInput, ctx) => {
const n = input.N;
const promises = [];
for (let i = 0; i < n; i++) {
promises.push(child.run({ N: i }));
}
const childRes = await Promise.all(promises);
const sum = childRes.reduce((acc, curr) => acc + curr.Value, 0);
return {
Result: sum,
};
},
});
```
#### Go
```go
// Run multiple child tasks in parallel using goroutines
var wg sync.WaitGroup
var mu sync.Mutex
results := make([]*ChildOutput, 0, n)
wg.Add(n)
for i := 0; i < n; i++ {
go func(index int) {
defer wg.Done()
result, err := childWorkflow.Run(hCtx, ChildInput{Value: index})
if err != nil {
return
}
var childOutput ChildOutput
err = result.Into(&childOutput)
if err != nil {
return
}
mu.Lock()
results = append(results, &childOutput)
mu.Unlock()
}(i)
}
wg.Wait()
```
#### Ruby
```ruby
def run_child_workflows(n)
FANOUT_CHILD_WF.run_many(
n.times.map do |i|
FANOUT_CHILD_WF.create_bulk_run_item(
input: { "a" => i.to_s }
)
end
)
end
```
### What children can be
A durable task can spawn any runnable:
Child type, Example
**Regular task**, Spawn a stateless task for a quick computation or API call.
**Durable task**, Spawn another durable task that has its own checkpoints, sleeps, and event waits.
**DAG workflow**, Spawn an entire multi-task workflow and wait for its final output.
### Error handling
#### Python
```python
try:
child_wf.run(
ChildInput(a="b"),
)
except Exception as e:
print(f"Child workflow failed: {e}")
```
#### Typescript
```typescript
export const withErrorHandling = hatchet.task({
name: 'parent-error-handling',
fn: async () => {
try {
const childRes = await child.run({ N: 1 });
return {
Result: childRes.Value,
};
} catch (error) {
// decide how to proceed here
return {
Result: -1,
};
}
},
});
```
#### Go
```go
result, err := childWorkflow.Run(hCtx, ChildInput{Value: 1})
if err != nil {
// Handle error from child workflow
fmt.Printf("Child workflow failed: %v\n", err)
// Decide how to proceed - retry, skip, or fail the parent
}
```
#### Ruby
```ruby
begin
FANOUT_CHILD_WF.run({ "a" => "b" })
rescue StandardError => e
puts "Child workflow failed: #{e.message}"
end
```
#### DAGs
## Spawning from DAG Tasks
DAG tasks can also spawn child tasks procedurally during execution. This lets you combine a fixed pipeline structure with dynamic child work inside individual tasks.
### Creating parent and child tasks
To implement child task spawning, you first need to create both parent and child task definitions.
#### Python
First, we'll declare a couple of tasks for the parent and child:
```python
class ParentInput(BaseModel):
n: int = 100
class ChildInput(BaseModel):
a: str
parent_wf = hatchet.workflow(name="FanoutParent", input_validator=ParentInput)
child_wf = hatchet.workflow(name="FanoutChild", input_validator=ChildInput)
@parent_wf.task(execution_timeout=timedelta(minutes=5))
async def spawn(input: ParentInput, ctx: Context) -> dict[str, Any]:
print("spawning child")
result = await child_wf.aio_run_many(
[
child_wf.create_bulk_run_item(
input=ChildInput(a=str(i)),
additional_metadata={"hello": "earth"},
key=f"child{i}",
)
for i in range(input.n)
],
)
print(f"results {result}")
return {"results": result}
```
We also created a step on the parent task that spawns the child tasks. Now, we'll add a couple of steps to the child task:
```python
@child_wf.task()
async def process(input: ChildInput, ctx: Context) -> dict[str, str]:
print(f"child process {input.a}")
return {"status": input.a}
@child_wf.task(parents=[process])
async def process2(input: ChildInput, ctx: Context) -> dict[str, str]:
process_output = ctx.task_output(process)
a = process_output["status"]
return {"status2": a + "2"}
```
And that's it! The fanout parent will run and spawn the child, and then will collect the results from its steps.
#### Typescript
```typescript
import sleep from '@hatchet-dev/typescript-sdk/util/sleep';
import { hatchet } from '../hatchet-client';
// (optional) Define the input type for the workflow
export type ChildInput = {
Message: string;
};
export type ParentInput = {
Message: string;
};
export const child = hatchet.workflow({
name: 'child',
});
export const child1 = child.task({
name: 'child1',
fn: async (input: ChildInput, ctx) => {
await sleep(30 * 1000);
ctx.logger.info('hello from the child1', { hello: 'moon' });
return {
TransformedMessage: input.Message.toLowerCase(),
};
},
});
export const child2 = child.task({
name: 'child2',
fn: (input: ChildInput, ctx) => {
ctx.logger.info('hello from the child2');
return {
TransformedMessage: input.Message.toLowerCase(),
};
},
});
export const child3 = child.task({
name: 'child3',
parents: [child1, child2],
fn: async (input: ChildInput, ctx) => {
ctx.logger.info('hello from the child3');
return {
TransformedMessage: input.Message.toLowerCase(),
};
},
});
export const parent = hatchet.task({
name: 'parent',
executionTimeout: '5m',
fn: async (input: ParentInput, ctx) => {
const c = await ctx.runChild(child, {
Message: input.Message,
});
return {
TransformedMessage: 'not implemented',
};
},
});
```
#### Go
```go
type ParentInput struct {
Count int `json:"count"`
}
type ParentOutput struct {
Sum int `json:"sum"`
}
func Parent(client *hatchet.Client) *hatchet.StandaloneTask {
return client.NewStandaloneTask("parent-task",
func(ctx hatchet.Context, input ParentInput) (ParentOutput, error) {
log.Printf("Parent workflow spawning %d child workflows", input.Count)
// Spawn multiple child workflows and collect results
sum := 0
for i := 0; i < input.Count; i++ {
log.Printf("Spawning child workflow %d/%d", i+1, input.Count)
// Spawn child workflow and wait for result
childResult, err := Child(client).Run(ctx, ChildInput{
Value: i + 1,
})
if err != nil {
return ParentOutput{}, fmt.Errorf("failed to spawn child workflow %d: %w", i, err)
}
var childOutput ChildOutput
err = childResult.Into(&childOutput)
if err != nil {
return ParentOutput{}, fmt.Errorf("failed to get child workflow result: %w", err)
}
sum += childOutput.Result
log.Printf("Child workflow %d completed with result: %d", i+1, childOutput.Result)
}
log.Printf("All child workflows completed. Total sum: %d", sum)
return ParentOutput{
Sum: sum,
}, nil
},
)
}
type ChildInput struct {
Value int `json:"value"`
}
type ChildOutput struct {
Result int `json:"result"`
}
func Child(client *hatchet.Client) *hatchet.StandaloneTask {
return client.NewStandaloneTask("child-task",
func(ctx hatchet.Context, input ChildInput) (ChildOutput, error) {
return ChildOutput{
Result: input.Value * 2,
}, nil
},
)
}
```
#### Ruby
```ruby
FANOUT_PARENT_WF = HATCHET.workflow(name: "FanoutParent")
FANOUT_CHILD_WF = HATCHET.workflow(name: "FanoutChild")
FANOUT_PARENT_WF.task(:spawn, execution_timeout: 300) do |input, ctx|
puts "spawning child"
n = input["n"] || 100
result = FANOUT_CHILD_WF.run_many(
n.times.map do |i|
FANOUT_CHILD_WF.create_bulk_run_item(
input: { "a" => i.to_s },
options: Hatchet::TriggerWorkflowOptions.new(
additional_metadata: { "hello" => "earth" },
key: "child#{i}"
)
)
end
)
puts "results #{result}"
{ "results" => result }
end
```
```ruby
FANOUT_CHILD_PROCESS = FANOUT_CHILD_WF.task(:process) do |input, ctx|
puts "child process #{input['a']}"
{ "status" => input["a"] }
end
FANOUT_CHILD_WF.task(:process2, parents: [FANOUT_CHILD_PROCESS]) do |input, ctx|
process_output = ctx.task_output(FANOUT_CHILD_PROCESS)
a = process_output["status"]
{ "status2" => "#{a}2" }
end
```
### Running child tasks
To spawn and run a child task from a parent task, use the appropriate method for your language:
#### Python
```python
from examples.fanout.worker import ChildInput, child_wf
# 👀 example: run this inside of a parent task to spawn a child
child_wf.run(
ChildInput(a="b"),
)
```
#### Typescript
```typescript
export const parentSingleChild = hatchet.task({
name: 'parent-single-child',
fn: async () => {
const childRes = await child.run({ N: 1 });
return {
Result: childRes.Value,
};
},
});
```
#### Go
```go
// Inside a parent task
childResult, err := childWorkflow.Run(hCtx, ChildInput{
Value: 1,
})
if err != nil {
return err
}
```
#### Ruby
```ruby
FANOUT_CHILD_WF.run({ "a" => "b" })
```
### Parallel child task execution
Spawn multiple child tasks in parallel:
#### Python
```python
async def run_child_workflows(n: int) -> list[dict[str, Any]]:
return await child_wf.aio_run_many(
[
child_wf.create_bulk_run_item(
input=ChildInput(a=str(i)),
)
for i in range(n)
]
)
```
#### Typescript
```typescript
type ParentInput = {
N: number;
};
export const parent = hatchet.task({
name: 'parent',
fn: async (input: ParentInput, ctx) => {
const n = input.N;
const promises = [];
for (let i = 0; i < n; i++) {
promises.push(child.run({ N: i }));
}
const childRes = await Promise.all(promises);
const sum = childRes.reduce((acc, curr) => acc + curr.Value, 0);
return {
Result: sum,
};
},
});
```
#### Go
```go
// Run multiple child tasks in parallel using goroutines
var wg sync.WaitGroup
var mu sync.Mutex
results := make([]*ChildOutput, 0, n)
wg.Add(n)
for i := 0; i < n; i++ {
go func(index int) {
defer wg.Done()
result, err := childWorkflow.Run(hCtx, ChildInput{Value: index})
if err != nil {
return
}
var childOutput ChildOutput
err = result.Into(&childOutput)
if err != nil {
return
}
mu.Lock()
results = append(results, &childOutput)
mu.Unlock()
}(i)
}
wg.Wait()
```
#### Ruby
```ruby
def run_child_workflows(n)
FANOUT_CHILD_WF.run_many(
n.times.map do |i|
FANOUT_CHILD_WF.create_bulk_run_item(
input: { "a" => i.to_s }
)
end
)
end
```
### Error handling
#### Python
```python
try:
child_wf.run(
ChildInput(a="b"),
)
except Exception as e:
print(f"Child workflow failed: {e}")
```
#### Typescript
```typescript
export const withErrorHandling = hatchet.task({
name: 'parent-error-handling',
fn: async () => {
try {
const childRes = await child.run({ N: 1 });
return {
Result: childRes.Value,
};
} catch (error) {
// decide how to proceed here
return {
Result: -1,
};
}
},
});
```
#### Go
```go
result, err := childWorkflow.Run(hCtx, ChildInput{Value: 1})
if err != nil {
// Handle error from child workflow
fmt.Printf("Child workflow failed: %v\n", err)
// Decide how to proceed - retry, skip, or fail the parent
}
```
#### Ruby
```ruby
begin
FANOUT_CHILD_WF.run({ "a" => "b" })
rescue StandardError => e
puts "Child workflow failed: #{e.message}"
end
```
## Common Patterns
### Dynamic fan-out / fan-in
Process a list of items whose length is only known at runtime. Spawn one child per item, collect all results, then continue. Document processing and batch processing are canonical examples: when a batch of files arrives, a parent fans out to one child per document; each child parses, extracts, and validates its document in parallel across your worker fleet.
[Concurrency](/v1/concurrency) controls how many children run simultaneously. Hatchet distributes child tasks across available workers, so adding workers increases throughput without code changes. For rate-limited external services (OCR, LLM APIs), combine with [Rate Limits](/v1/rate-limits) to throttle child execution across all workers.
### Agent loops
An **agent loop** is implemented by having a durable task spawn a new child run of itself with updated input until a termination condition is met. Each iteration is a separate child task, giving full observability in the dashboard. AI agents use this pattern when they reason about what to do, spawn a subtask (or a sub-workflow), inspect the result, and decide whether to continue, branch, or stop.
### Recursive workflows
A durable task spawns child durable tasks, each of which may spawn their own children. This creates a tree of work that's entirely driven by runtime logic, useful for crawlers, recursive search, and tree-structured computations.
## Use cases
1. **Dynamic fan-out processing** — When the number of parallel tasks is determined at runtime.
2. **Reusable workflow components** — Create modular workflows that can be reused across different parent workflows.
3. **Resource-intensive operations** — Spread computation across multiple workers.
4. **Agent-based systems** — Allow AI agents to spawn new workflows based on their reasoning.
5. **Long-running operations** — Break down long operations into smaller, trackable units of work.
---
# Sleep & Delays
Sleep pauses a task for a specified duration while freeing the worker slot. No resources are consumed during the wait, whether the pause lasts seconds or weeks.
Both durable tasks and DAGs support sleeping, but the API differs: durable tasks call `SleepFor` dynamically at runtime, while DAGs declare a sleep condition upfront on the task definition.
#### Durable Tasks
## Durable Sleep
Durable sleep pauses execution for a specified amount of time and frees the worker slot until the sleep expires.
> **Info:** Sleeping puts the task into an [evictable state](/v1/task-eviction), the
> worker slot is freed and the task is re-queued when the sleep expires.
Unlike a language-level sleep (e.g. `time.sleep` in Python or `setTimeout` in Node), durable sleep is guaranteed to respect the original duration across interruptions. A language-level sleep ties the wait to the local process, so if the process restarts, the sleep starts over from zero.
For example, say you'd like to send a notification to a user after 24 hours. With `time.sleep`, if the task is interrupted after 23 hours, it will restart and sleep for 24 hours again (47 hours total). With durable sleep, Hatchet tracks the original deadline server-side, so the task will only sleep for 1 more hour on restart.
### Using durable sleep
Durable sleep can be used by calling the `SleepFor` method on the `DurableContext` object. This method takes a duration as an argument and will sleep for that duration.
#### Python
```python
@hatchet.durable_task(name="DurableSleepTask")
async def durable_sleep_task(input: EmptyModel, ctx: DurableContext) -> None:
res = await ctx.aio_sleep_for(timedelta(seconds=5))
print("got result", res)
```
#### Typescript
```typescript
durableSleep.durableTask({
name: 'durable-sleep',
executionTimeout: '10m',
fn: async (_, ctx) => {
console.log('sleeping for 5s');
const sleepRes = await ctx.sleepFor('5s');
console.log('done sleeping for 5s', sleepRes);
return {
Value: 'done',
};
},
});
```
#### Go
```go
task := client.NewStandaloneDurableTask("long-running-task", func(ctx hatchet.DurableContext, input DurableInput) (DurableOutput, error) {
log.Printf("Starting task, will sleep for %d seconds", input.Delay)
if _, err := ctx.SleepFor(time.Duration(input.Delay) * time.Second); err != nil {
return DurableOutput{}, err
}
log.Printf("Finished sleeping, processing message: %s", input.Message)
return DurableOutput{
ProcessedAt: time.Now().Format(time.RFC3339),
Message: "Processed: " + input.Message,
}, nil
})
```
#### Ruby
```ruby
DURABLE_SLEEP_TASK = HATCHET.durable_task(name: "DurableSleepTask") do |input, ctx|
res = ctx.sleep_for(duration: 5)
puts "got result #{res}"
end
```
#### DAGs
## Sleep Conditions
Sleep conditions pause a DAG task for a specified duration before it runs. Use them when a task should wait for a fixed amount of time after its parent tasks complete.
Unlike durable sleep (which is called dynamically at runtime), DAG sleep conditions are declared upfront on the task definition. Both free the worker slot during the wait.
### Using sleep conditions
Declare a task with a `wait_for` sleep condition. The task will wait for its parent tasks to complete, then sleep for the specified duration before executing.
#### Python
```python
@task_condition_workflow.task(
parents=[start], wait_for=[SleepCondition(timedelta(seconds=10))]
)
def wait_for_sleep(input: EmptyModel, ctx: Context) -> StepOutput:
return StepOutput(random_number=random.randint(1, 100))
```
#### Typescript
```typescript
const waitForSleep = taskConditionWorkflow.task({
name: 'waitForSleep',
parents: [start],
waitFor: [new SleepCondition('10s')],
fn: () => {
return {
randomNumber: Math.floor(Math.random() * 100) + 1,
};
},
});
```
#### Go
```go
waitForSleep := workflow.NewTask("wait-for-sleep", func(ctx hatchet.Context, _ any) (StepOutput, error) {
return StepOutput{RandomNumber: rand.Intn(100) + 1}, nil //nolint:gosec
},
hatchet.WithParents(start),
hatchet.WithWaitFor(hatchet.SleepCondition(10*time.Second)),
)
```
#### Ruby
```ruby
WAIT_FOR_SLEEP = TASK_CONDITION_WORKFLOW.task(
:wait_for_sleep,
parents: [COND_START],
wait_for: [Hatchet::SleepCondition.new(10)]
) do |input, ctx|
{ "random_number" => rand(1..100) }
end
```
This task will first wait for its parent to complete, then sleep for the specified duration before executing.
### Combining with other conditions
Sleep conditions can be combined with other conditions using or groups. For example, you can wait for _either_ a sleep duration or an event (whichever comes first). See [Combining Conditions](/v1/conditions#or-groups) for details.
---
# Events
Tasks can pause until an external event arrives before continuing. This is the foundation for human-in-the-loop workflows, webhook-driven pipelines, and any flow that depends on signals from outside the task.
Both durable tasks and DAGs support waiting for events. Durable tasks call `WaitForEvent` dynamically at runtime, while DAGs declare event conditions upfront on the task definition.
Events are delivered by [pushing events](/v1/external-events/pushing-events) into Hatchet using the event client. The event key you push must match the key your task is waiting for.
#### Durable Tasks
## Wait For Events
Wait For Events lets a durable task pause until an external event arrives. Even if the task is interrupted and requeued while waiting, the event will still be processed. When it resumes, it reads the event from the durable event log and continues.
> **Info:** Waiting for an event puts the task into an [evictable
> state](/v1/task-eviction), the worker slot is freed and the task is re-queued
> when the event arrives.
### Declaring a wait for event
Wait For Event is declared using the context method `WaitFor` (or utility method `WaitForEvent`) on the `DurableContext` object.
#### Python
```python
@hatchet.durable_task(name="DurableEventTask")
async def durable_event_task(input: EmptyModel, ctx: DurableContext) -> None:
res = await ctx.aio_wait_for_event(
"user:update",
)
print("got event", res)
```
#### Typescript
```typescript
export const durableEvent = hatchet.durableTask({
name: 'durable-event',
executionTimeout: '10m',
fn: async (_, ctx) => {
const res = await ctx.waitForEvent(EVENT_KEY);
console.log('res', res);
return {
Value: 'done',
};
},
});
```
#### Go
```go
task := client.NewStandaloneDurableTask("long-running-task", func(ctx hatchet.DurableContext, input DurableInput) (DurableOutput, error) {
log.Printf("Starting task, will sleep for %d seconds", input.Delay)
if _, err := ctx.WaitForEvent("user:updated", ""); err != nil {
return DurableOutput{}, err
}
log.Printf("Finished waiting for event, processing message: %s", input.Message)
return DurableOutput{
ProcessedAt: time.Now().Format(time.RFC3339),
Message: "Processed: " + input.Message,
}, nil
})
```
#### Ruby
```ruby
DURABLE_EVENT_TASK = HATCHET.durable_task(name: "DurableEventTask") do |input, ctx|
res = ctx.wait_for(
"event",
Hatchet::UserEventCondition.new(event_key: "user:update")
)
puts "got event #{res}"
end
DURABLE_EVENT_TASK_WITH_FILTER = HATCHET.durable_task(name: "DurableEventWithFilterTask") do |input, ctx|
```
### Event filters
Events can be filtered using [CEL](https://github.com/google/cel-spec) expressions. For example, to only receive `user:update` events for a specific user:
#### Python
```python
res = await ctx.aio_wait_for_event("user:update", "input.user_id == '1234'")
```
#### Typescript
```typescript
const res = await ctx.waitForEvent(EVENT_KEY, "input.userId == '1234'");
```
#### Go
```go
if _, err := ctx.WaitForEvent("user:updated", "input.status_code == 200"); err != nil {
return DurableOutput{}, err
}
```
#### Ruby
```ruby
res = ctx.wait_for(
"event",
Hatchet::UserEventCondition.new(
event_key: "user:update",
expression: "input.user_id == '1234'"
)
)
puts "got event #{res}"
end
```
### Pushing events
For a waiting task to resume, something must [push an event](/v1/external-events/pushing-events) into Hatchet with a matching key. You can do this from any service that has access to the Hatchet client.
#### Python
```python
hatchet.event.push("user:create", {"should_skip": False})
```
#### Typescript
```typescript
const res = await hatchet.events.push('simple-event:create', {
Message: 'hello',
ShouldSkip: false,
});
```
#### Go
```go
err := client.Events().Push(
context.Background(),
"simple-event:create",
EventInput{
Message: "Hello, World!",
},
)
if err != nil {
return err
}
```
#### Ruby
```ruby
HATCHET.event.push("user:create", { "should_skip" => false })
```
When the pushed event's key matches what a durable task is waiting for (and passes any CEL filter), the task is re-queued and resumes from its checkpoint.
#### DAGs
## Event Conditions
Event conditions let a DAG task react to external events. A task can wait for an event before running, be skipped when an event arrives, or be cancelled by an event.
Unlike durable tasks (where `WaitForEvent` is called dynamically at runtime), DAG event conditions are declared upfront on the task definition.
### Usage modes
Event conditions can be used with three operators:
- **`wait_for`** — the task waits for the event before starting.
- **`skip_if`** — the task is skipped if the event arrives.
- **`cancel_if`** — the task is cancelled if the event arrives.
> **Warning:** A task cancelled by `cancel_if` behaves like any other cancellation in Hatchet
> — downstream tasks will be cancelled as well.
### Waiting for an event
Declare a task with a `wait_for` event condition. The task will not start until the specified event is pushed into Hatchet.
#### Python
```python
@task_condition_workflow.task(
parents=[start],
wait_for=[
or_(
SleepCondition(duration=timedelta(minutes=1)),
UserEventCondition(event_key="wait_for_event:start"),
)
],
)
def wait_for_event(input: EmptyModel, ctx: Context) -> StepOutput:
return StepOutput(random_number=random.randint(1, 100))
```
#### Typescript
```typescript
const waitForEvent = taskConditionWorkflow.task({
name: 'waitForEvent',
parents: [start],
waitFor: [Or(new SleepCondition('1m'), new UserEventCondition('wait_for_event:start', 'true'))],
fn: () => {
return {
randomNumber: Math.floor(Math.random() * 100) + 1,
};
},
});
```
#### Go
```go
waitForEvent := workflow.NewTask("wait-for-event", func(ctx hatchet.Context, _ any) (StepOutput, error) {
return StepOutput{RandomNumber: rand.Intn(100) + 1}, nil //nolint:gosec
},
hatchet.WithParents(start),
hatchet.WithWaitFor(hatchet.OrCondition(
hatchet.SleepCondition(1*time.Minute),
hatchet.UserEventCondition("wait_for_event:start", ""),
)),
)
```
#### Ruby
```ruby
WAIT_FOR_EVENT = TASK_CONDITION_WORKFLOW.task(
:wait_for_event,
parents: [COND_START],
wait_for: [
Hatchet.or_(
Hatchet::SleepCondition.new(60),
Hatchet::UserEventCondition.new(event_key: "wait_for_event:start")
)
]
) do |input, ctx|
{ "random_number" => rand(1..100) }
end
```
### Skipping on an event
Declare a task with a `skip_if` event condition. The task will be skipped if the event arrives before the task starts.
#### Python
```python
@task_condition_workflow.task(
parents=[start],
wait_for=[SleepCondition(timedelta(seconds=30))],
skip_if=[UserEventCondition(event_key="skip_on_event:skip")],
)
def skip_on_event(input: EmptyModel, ctx: Context) -> StepOutput:
return StepOutput(random_number=random.randint(1, 100))
```
#### Typescript
```typescript
const skipOnEvent = taskConditionWorkflow.task({
name: 'skipOnEvent',
parents: [start],
waitFor: [new SleepCondition('10s')],
skipIf: [new UserEventCondition('skip_on_event:skip', 'true')],
fn: () => {
return {
randomNumber: Math.floor(Math.random() * 100) + 1,
};
},
});
```
#### Go
```go
skipOnEvent := workflow.NewTask("skip-on-event", func(ctx hatchet.Context, _ any) (StepOutput, error) {
return StepOutput{RandomNumber: rand.Intn(100) + 1}, nil //nolint:gosec
},
hatchet.WithParents(start),
hatchet.WithWaitFor(hatchet.SleepCondition(30*time.Second)),
hatchet.WithSkipIf(hatchet.UserEventCondition("skip_on_event:skip", "")),
)
```
#### Ruby
```ruby
SKIP_ON_EVENT = TASK_CONDITION_WORKFLOW.task(
:skip_on_event,
parents: [COND_START],
wait_for: [Hatchet::SleepCondition.new(30)],
skip_if: [Hatchet::UserEventCondition.new(event_key: "skip_on_event:skip")]
) do |input, ctx|
{ "random_number" => rand(1..100) }
end
```
### Event filters
Events can be filtered using [CEL](https://github.com/google/cel-spec) expressions. The CEL expression is evaluated against the event payload, and the condition only matches if the expression returns `true`. This works identically to event filters in durable tasks.
### Pushing events
For a waiting task to proceed, something must [push an event](/v1/external-events/pushing-events) into Hatchet with a matching key. You can do this from any service that has access to the Hatchet client.
#### Python
```python
hatchet.event.push("user:create", {"should_skip": False})
```
#### Typescript
```typescript
const res = await hatchet.events.push('simple-event:create', {
Message: 'hello',
ShouldSkip: false,
});
```
#### Go
```go
err := client.Events().Push(
context.Background(),
"simple-event:create",
EventInput{
Message: "Hello, World!",
},
)
if err != nil {
return err
}
```
#### Ruby
```ruby
HATCHET.event.push("user:create", { "should_skip" => false })
```
### Combining with other conditions
Event conditions can be combined with parent and sleep conditions using or groups. For example, you can wait for _either_ an event or a timeout (whichever comes first). See [Conditions & Branching](/v1/conditions) for details.
---
# Conditions & Branching
Workflows often need to branch: run different paths depending on data, skip steps when conditions aren't met, or wait for a combination of signals before proceeding. Both durable tasks and DAGs support conditional logic, but the approach differs.
#### Durable Tasks
## Procedural Branching
Durable tasks use standard language control flow (`if`/`else`, `match`, loops) to branch at runtime. Because the task is a single long-running function, you can make decisions based on any data available during execution: inputs, intermediate results, API responses, or child task outputs.
```python
@workflow.durable_task()
async def process(input: ProcessInput, ctx: DurableContext):
result = await ctx.run_child(analyze_task, input)
if result["score"] > 0.8:
await ctx.run_child(fast_path_task, result)
else:
await ctx.run_child(slow_path_task, result)
await ctx.run_child(review_task, result)
```
This is one of the key advantages of durable tasks: branching logic is expressed directly in code, making it easy to handle complex, dynamic flows. Each branch can spawn different children, sleep for different durations, or wait for different events.
> **Warning:** Branching logic must be **deterministic** between checkpoints. If the task is
> evicted and replayed, the same branches must execute in the same order. Base
> decisions on checkpoint outputs (child results, event payloads) rather than
> wall-clock time or external state that may change between replays. See [Best
> Practices](/v1/patterns/mixing-patterns#determinism-in-durable-tasks) for
> details.
## Or Groups
Durable tasks can combine multiple wait conditions using or groups. An or group evaluates to `True` if **at least one** of its conditions is satisfied, letting you express "proceed on timeout or event, whichever comes first."
#### Python
```python
@durable_workflow.durable_task()
async def wait_for_or_group_1(
_i: EmptyModel, ctx: DurableContext
) -> dict[str, str | int | float]:
start = time.time()
wait_result = await ctx.aio_wait_for(
uuid4().hex,
or_(
SleepCondition(timedelta(seconds=SLEEP_TIME)),
UserEventCondition(event_key=EVENT_KEY),
),
)
key = list(wait_result.keys())[0]
event_id = list(wait_result[key].keys())[0]
return {
"runtime": time.time() - start,
"key": key,
"event_id": event_id,
}
```
`or_()` wraps a `SleepCondition` and a `UserEventCondition` into a single or group. The task will resume as soon as either the sleep expires or the event arrives.
#### Typescript
```typescript
export const durableEvent = hatchet.durableTask({
name: 'durable-event',
executionTimeout: '10m',
fn: async (_, ctx) => {
const res = await ctx.waitForEvent(EVENT_KEY);
console.log('res', res);
return {
Value: 'done',
};
},
});
```
#### Go
```go
task := client.NewStandaloneDurableTask("long-running-task", func(ctx hatchet.DurableContext, input DurableInput) (DurableOutput, error) {
log.Printf("Starting task, will sleep for %d seconds", input.Delay)
if _, err := ctx.WaitForEvent("user:updated", ""); err != nil {
return DurableOutput{}, err
}
log.Printf("Finished waiting for event, processing message: %s", input.Message)
return DurableOutput{
ProcessedAt: time.Now().Format(time.RFC3339),
Message: "Processed: " + input.Message,
}, nil
})
```
#### Ruby
```ruby
DURABLE_EVENT_TASK = HATCHET.durable_task(name: "DurableEventTask") do |input, ctx|
res = ctx.wait_for(
"event",
Hatchet::UserEventCondition.new(event_key: "user:update")
)
puts "got event #{res}"
end
DURABLE_EVENT_TASK_WITH_FILTER = HATCHET.durable_task(name: "DurableEventWithFilterTask") do |input, ctx|
```
#### DAGs
## Parent Conditions
Parent conditions let a DAG task decide whether to run based on the output of a parent task. This enables branching logic within a DAG: different paths can execute depending on runtime data, while the overall graph structure remains fixed and visible in the dashboard.
Parent conditions can be used with two operators:
- **`skip_if`** — skip the task if the parent output matches the condition.
- **`cancel_if`** — cancel the task (and its downstream dependents) if the parent output matches the condition.
> **Warning:** A task cancelled by `cancel_if` behaves like any other cancellation in Hatchet
> — downstream tasks will be cancelled as well.
### Branching example
A common pattern is to create two sibling tasks with complementary parent conditions. For example, one task runs when a value is greater than 50 and the other runs when it is less than or equal to 50. Only one branch executes per run.
First, declare a base task that returns a value:
#### Python
```python
@task_condition_workflow.task()
def start(input: EmptyModel, ctx: Context) -> StepOutput:
return StepOutput(random_number=random.randint(1, 100))
```
#### Typescript
```typescript
const start = taskConditionWorkflow.task({
name: 'start',
fn: () => {
return {
randomNumber: Math.floor(Math.random() * 100) + 1,
};
},
});
```
#### Go
```go
start := workflow.NewTask("start", func(ctx hatchet.Context, _ any) (StepOutput, error) {
return StepOutput{RandomNumber: rand.Intn(100) + 1}, nil //nolint:gosec
})
```
#### Ruby
```ruby
COND_START = TASK_CONDITION_WORKFLOW.task(:start) do |input, ctx|
{ "random_number" => rand(1..100) }
end
```
Then add two branches that use `ParentCondition` with `skip_if`:
#### Python
```python
@task_condition_workflow.task(
parents=[wait_for_sleep],
skip_if=[
ParentCondition(
parent=wait_for_sleep,
expression="output.random_number > 50",
)
],
)
def left_branch(input: EmptyModel, ctx: Context) -> StepOutput:
return StepOutput(random_number=random.randint(1, 100))
@task_condition_workflow.task(
parents=[wait_for_sleep],
skip_if=[
ParentCondition(
parent=wait_for_sleep,
expression="output.random_number <= 50",
)
],
)
def right_branch(input: EmptyModel, ctx: Context) -> StepOutput:
return StepOutput(random_number=random.randint(1, 100))
```
#### Typescript
```typescript
const leftBranch = taskConditionWorkflow.task({
name: 'leftBranch',
parents: [waitForSleep],
skipIf: [new ParentCondition(waitForSleep, 'output.randomNumber > 50')],
fn: () => {
return {
randomNumber: Math.floor(Math.random() * 100) + 1,
};
},
});
const rightBranch = taskConditionWorkflow.task({
name: 'rightBranch',
parents: [waitForSleep],
skipIf: [new ParentCondition(waitForSleep, 'output.randomNumber <= 50')],
fn: () => {
return {
randomNumber: Math.floor(Math.random() * 100) + 1,
};
},
});
```
#### Go
```go
leftBranch := workflow.NewTask("left-branch", func(ctx hatchet.Context, _ any) (StepOutput, error) {
return StepOutput{RandomNumber: rand.Intn(100) + 1}, nil //nolint:gosec
},
hatchet.WithParents(waitForSleep),
hatchet.WithSkipIf(hatchet.ParentCondition(waitForSleep, "output.random_number > 50")),
)
rightBranch := workflow.NewTask("right-branch", func(ctx hatchet.Context, _ any) (StepOutput, error) {
return StepOutput{RandomNumber: rand.Intn(100) + 1}, nil //nolint:gosec
},
hatchet.WithParents(waitForSleep),
hatchet.WithSkipIf(hatchet.ParentCondition(waitForSleep, "output.random_number <= 50")),
)
```
#### Ruby
```ruby
LEFT_BRANCH = TASK_CONDITION_WORKFLOW.task(
:left_branch,
parents: [WAIT_FOR_SLEEP],
skip_if: [
Hatchet::ParentCondition.new(
parent: WAIT_FOR_SLEEP,
expression: "output.random_number > 50"
)
]
) do |input, ctx|
{ "random_number" => rand(1..100) }
end
RIGHT_BRANCH = TASK_CONDITION_WORKFLOW.task(
:right_branch,
parents: [WAIT_FOR_SLEEP],
skip_if: [
Hatchet::ParentCondition.new(
parent: WAIT_FOR_SLEEP,
expression: "output.random_number <= 50"
)
]
) do |input, ctx|
{ "random_number" => rand(1..100) }
end
```
These two tasks check whether the output of the base task was greater or less than `50`, respectively. Only one of the two will run per workflow execution.
### Checking if a task was skipped
Downstream tasks can check whether a parent was skipped using `ctx.was_skipped`:
#### Python
```python
@task_condition_workflow.task(
parents=[
start,
wait_for_sleep,
wait_for_event,
skip_on_event,
left_branch,
right_branch,
],
)
def sum(input: EmptyModel, ctx: Context) -> RandomSum:
one = ctx.task_output(start).random_number
two = ctx.task_output(wait_for_event).random_number
three = ctx.task_output(wait_for_sleep).random_number
four = (
ctx.task_output(skip_on_event).random_number
if not ctx.was_skipped(skip_on_event)
else 0
)
five = (
ctx.task_output(left_branch).random_number
if not ctx.was_skipped(left_branch)
else 0
)
six = (
ctx.task_output(right_branch).random_number
if not ctx.was_skipped(right_branch)
else 0
)
return RandomSum(sum=one + two + three + four + five + six)
```
#### Typescript
```typescript
taskConditionWorkflow.task({
name: 'sum',
parents: [start, waitForSleep, waitForEvent, skipOnEvent, leftBranch, rightBranch],
fn: async (_, ctx: Context) => {
const one = (await ctx.parentOutput(start)).randomNumber;
const two = (await ctx.parentOutput(waitForEvent)).randomNumber;
const three = (await ctx.parentOutput(waitForSleep)).randomNumber;
const four = (await ctx.parentOutput(skipOnEvent))?.randomNumber || 0;
const five = (await ctx.parentOutput(leftBranch))?.randomNumber || 0;
const six = (await ctx.parentOutput(rightBranch))?.randomNumber || 0;
return {
sum: one + two + three + four + five + six,
};
},
});
```
#### Go
```go
_ = workflow.NewTask("sum", func(ctx hatchet.Context, _ any) (RandomSum, error) {
var startOut StepOutput
err := ctx.ParentOutput(start, &startOut)
if err != nil {
return RandomSum{}, err
}
var waitForEventOut StepOutput
err = ctx.ParentOutput(waitForEvent, &waitForEventOut)
if err != nil {
return RandomSum{}, err
}
var waitForSleepOut StepOutput
err = ctx.ParentOutput(waitForSleep, &waitForSleepOut)
if err != nil {
return RandomSum{}, err
}
total := startOut.RandomNumber + waitForEventOut.RandomNumber + waitForSleepOut.RandomNumber
if !ctx.WasSkipped(skipOnEvent) {
var out StepOutput
err = ctx.ParentOutput(skipOnEvent, &out)
if err == nil {
total += out.RandomNumber
}
}
if !ctx.WasSkipped(leftBranch) {
var out StepOutput
err = ctx.ParentOutput(leftBranch, &out)
if err == nil {
total += out.RandomNumber
}
}
if !ctx.WasSkipped(rightBranch) {
var out StepOutput
err = ctx.ParentOutput(rightBranch, &out)
if err == nil {
total += out.RandomNumber
}
}
return RandomSum{Sum: total}, nil
}, hatchet.WithParents(
start,
waitForSleep,
waitForEvent,
skipOnEvent,
leftBranch,
rightBranch,
))
```
#### Ruby
```ruby
TASK_CONDITION_WORKFLOW.task(
:sum,
parents: [COND_START, WAIT_FOR_SLEEP, WAIT_FOR_EVENT, SKIP_ON_EVENT, LEFT_BRANCH, RIGHT_BRANCH]
) do |input, ctx|
one = ctx.task_output(COND_START)["random_number"]
two = ctx.task_output(WAIT_FOR_EVENT)["random_number"]
three = ctx.task_output(WAIT_FOR_SLEEP)["random_number"]
four = ctx.was_skipped?(SKIP_ON_EVENT) ? 0 : ctx.task_output(SKIP_ON_EVENT)["random_number"]
five = ctx.was_skipped?(LEFT_BRANCH) ? 0 : ctx.task_output(LEFT_BRANCH)["random_number"]
six = ctx.was_skipped?(RIGHT_BRANCH) ? 0 : ctx.task_output(RIGHT_BRANCH)["random_number"]
{ "sum" => one + two + three + four + five + six }
end
```
## Or Groups
DAG tasks can declare multiple conditions that work together to control when and whether a task runs. Conditions of different types (parent conditions, [event conditions](/v1/events), and [sleep conditions](/v1/sleep)) can be mixed on a single task using **or groups**.
An **or group** is a set of conditions combined with an `Or` operator. The group evaluates to `True` if **at least one** of its conditions is satisfied. Multiple or groups on the same task are combined with `AND`, so every group must have at least one satisfied condition for the task to proceed.
This lets you express arbitrarily complex sets of conditions in [conjunctive normal form](https://en.wikipedia.org/wiki/Conjunctive_normal_form) (CNF).
### Sleep + Event example
The most common combination is a sleep condition with an event condition: proceed when an external signal arrives _or_ after a timeout (whichever comes first). This is ideal for human-in-the-loop workflows where you want a deadline.
#### Python
```python
@task_condition_workflow.task(
parents=[start],
wait_for=[
or_(
SleepCondition(duration=timedelta(minutes=1)),
UserEventCondition(event_key="wait_for_event:start"),
)
],
)
def wait_for_event(input: EmptyModel, ctx: Context) -> StepOutput:
return StepOutput(random_number=random.randint(1, 100))
```
`or_()` wraps a `SleepCondition` and a `UserEventCondition` into a single or group. The task will start as soon as either the sleep expires or the event arrives.
#### Typescript
```typescript
const waitForEvent = taskConditionWorkflow.task({
name: 'waitForEvent',
parents: [start],
waitFor: [Or(new SleepCondition('1m'), new UserEventCondition('wait_for_event:start', 'true'))],
fn: () => {
return {
randomNumber: Math.floor(Math.random() * 100) + 1,
};
},
});
```
`Or()` wraps a `SleepCondition` and a `UserEventCondition` into a single or group. The task will start as soon as either the sleep expires or the event arrives.
#### Go
```go
waitForEvent := workflow.NewTask("wait-for-event", func(ctx hatchet.Context, _ any) (StepOutput, error) {
return StepOutput{RandomNumber: rand.Intn(100) + 1}, nil //nolint:gosec
},
hatchet.WithParents(start),
hatchet.WithWaitFor(hatchet.OrCondition(
hatchet.SleepCondition(1*time.Minute),
hatchet.UserEventCondition("wait_for_event:start", ""),
)),
)
```
`hatchet.WithWaitFor` and `hatchet.WithSkipIf` attach conditions to the task. The task will wait for the sleep to expire before starting, and will be skipped if the event arrives.
#### Ruby
```ruby
WAIT_FOR_EVENT = TASK_CONDITION_WORKFLOW.task(
:wait_for_event,
parents: [COND_START],
wait_for: [
Hatchet.or_(
Hatchet::SleepCondition.new(60),
Hatchet::UserEventCondition.new(event_key: "wait_for_event:start")
)
]
) do |input, ctx|
{ "random_number" => rand(1..100) }
end
```
`Hatchet.or_()` wraps a `SleepCondition` and a `UserEventCondition` into a single or group. The task will start as soon as either the sleep expires or the event arrives.
### Multiple or groups
For more complex logic, you can declare multiple or groups on a single task. Consider three conditions:
- **Condition A**: Parent output is greater than 50
- **Condition B**: Sleep for 30 seconds
- **Condition C**: Receive the `payment:processed` event
To proceed if (A _or_ B) **and** (A _or_ C), declare two or groups:
1. Group 1: `A or B`
2. Group 2: `A or C`
The task will run once both groups are satisfied. If A is true, both groups pass immediately. If A is false, the task needs both B (sleep expires) and C (event arrives).
### Common combinations
Combination, Use case
Sleep + Event, Proceed after a timeout _or_ when an external signal arrives (whichever comes first)
Parent + Event, Proceed if a parent output meets a threshold _or_ a manual override event arrives
Parent + Sleep, Proceed if a parent indicates readiness _or_ after a maximum wait time
All three, Complex gates combining data-driven, time-based, and event-driven conditions
---
# Error Handling
When a task fails, you need a way to run cleanup logic, send notifications, or trigger compensating actions. Both durable tasks and DAGs support error handling, but the mechanism differs: durable tasks use standard try/catch blocks, while DAGs declare a special on-failure task.
#### Durable Tasks
## Try/Catch in Durable Tasks
Durable tasks are regular functions, so you handle errors with your language's native error handling (`try`/`except` in Python, `try`/`catch` in TypeScript/Go). This gives you full control over what happens when a child task or operation fails.
### Handling child task errors
When spawning child tasks, wrap the call in a try/catch block to handle failures gracefully:
#### Python
```python
try:
child_wf.run(
ChildInput(a="b"),
)
except Exception as e:
print(f"Child workflow failed: {e}")
```
#### Typescript
```typescript
export const withErrorHandling = hatchet.task({
name: 'parent-error-handling',
fn: async () => {
try {
const childRes = await child.run({ N: 1 });
return {
Result: childRes.Value,
};
} catch (error) {
// decide how to proceed here
return {
Result: -1,
};
}
},
});
```
#### Go
```go
result, err := childWorkflow.Run(hCtx, ChildInput{Value: 1})
if err != nil {
// Handle error from child workflow
fmt.Printf("Child workflow failed: %v\n", err)
// Decide how to proceed - retry, skip, or fail the parent
}
```
#### Ruby
```ruby
begin
FANOUT_CHILD_WF.run({ "a" => "b" })
rescue StandardError => e
puts "Child workflow failed: #{e.message}"
end
```
### Common patterns
- **Retry with backoff** — Catch the error, sleep, and retry the child task.
- **Fallback logic** — If a primary path fails, spawn a different child task as a fallback.
- **Partial failure handling** — In a fan-out, collect results from successful children and handle failures individually rather than failing the entire workflow.
- **Cleanup** — Release resources, cancel in-progress work, or notify external systems.
#### DAGs
## On-Failure Tasks
The on-failure task is a special task that runs when any task in the workflow fails. It lets you handle errors, perform cleanup, or trigger notifications declaratively as part of the workflow definition.
### Defining an on-failure task
You can define an on-failure task on your workflow the same as you'd define any other task:
#### Python
```python
# This workflow will fail because the step will throw an error
# we define an onFailure step to handle this case
on_failure_wf = hatchet.workflow(name="OnFailureWorkflow")
@on_failure_wf.task(execution_timeout=timedelta(seconds=1))
def step1(input: EmptyModel, ctx: Context) -> None:
# 👀 this step will always raise an exception
raise Exception(ERROR_TEXT)
# 👀 After the workflow fails, this special step will run
@on_failure_wf.on_failure_task()
def on_failure(input: EmptyModel, ctx: Context) -> dict[str, str]:
# 👀 we can do things like perform cleanup logic
# or notify a user here
# 👀 Fetch the errors from upstream step runs from the context
print(ctx.task_run_errors)
return {"status": "success"}
```
Note: Only one on-failure task can be defined per workflow.
#### Typescript
```typescript
// This workflow will fail because `step1` throws. We define an `onFailure` handler to run cleanup.
export const failureWorkflow = hatchet.workflow({
name: 'on-failure-workflow',
});
failureWorkflow.task({
name: 'step1',
executionTimeout: '1s',
fn: async () => {
throw new Error(ERROR_TEXT);
},
});
// 👀 After the workflow fails, this special step will run
failureWorkflow.onFailure({
name: 'on_failure',
fn: async (_input, ctx) => {
console.log('onFailure for run:', ctx.workflowRunId());
console.log('upstream errors:', ctx.errors());
return {
status: 'success',
};
},
});
```
#### Go
```go
multiStepWorkflow.OnFailure(func(ctx hatchet.Context, input FailureInput) (FailureHandlerOutput, error) {
log.Printf("Multi-step failure handler called for input: %s", input.Message)
stepErrors := ctx.StepRunErrors()
var errorDetails string
for stepName, errorMsg := range stepErrors {
log.Printf("Multi-step: Step '%s' failed with error: %s", stepName, errorMsg)
errorDetails += stepName + ": " + errorMsg + "; "
}
// Access successful step outputs for cleanup
var step1Output TaskOutput
if err := ctx.StepOutput("first-step", &step1Output); err == nil {
log.Printf("First step completed successfully with: %s", step1Output.Message)
}
return FailureHandlerOutput{
FailureHandled: true,
ErrorDetails: "Multi-step workflow failed: " + errorDetails,
OriginalInput: input.Message,
}, nil
})
```
#### Ruby
```ruby
# This workflow will fail because the step will throw an error
# we define an onFailure step to handle this case
ON_FAILURE_WF = HATCHET.workflow(name: "OnFailureWorkflow")
ON_FAILURE_WF.task(:step1, execution_timeout: 1) do |input, ctx|
# This step will always raise an exception
raise ERROR_TEXT
end
# After the workflow fails, this special step will run
ON_FAILURE_WF.on_failure_task do |input, ctx|
# We can do things like perform cleanup logic
# or notify a user here
# Fetch the errors from upstream step runs from the context
puts ctx.task_run_errors.inspect
{ "status" => "success" }
end
```
The on-failure task will be executed only if any of the main tasks in the workflow fail.
### Use cases
- Performing cleanup tasks after a task failure in a workflow
- Sending notifications or alerts about the failure
- Logging additional information for debugging purposes
- Triggering a compensating action or a fallback task
---
# Resource Management During Waits
When a task needs to wait (for time, an event, or child results), how does Hatchet handle the worker slot? The answer depends on which pattern you're using.
#### Durable Tasks
## Task Eviction
When a durable task enters a wait, whether from `SleepFor`, `WaitForEvent`, or `WaitFor`, Hatchet **evicts** the task from the worker. The worker slot is released, the task's progress is persisted in the durable event log, and the task does not consume slots or hold resources while it is idle.
This is what makes durable tasks fundamentally different from regular tasks: a regular task consumes a slot for the entire duration of execution, even if it's just sleeping. A durable task gives the slot back the moment it starts waiting.
### How eviction works
```mermaid
graph LR
QUEUED -->|Assigned to worker| RUNNING
RUNNING -->|Hits SleepFor / WaitForEvent| EVICTED
EVICTED -->|Wait completes or event arrives| QUEUED
```
1. **Task reaches a wait.** The durable task calls `SleepFor`, `WaitForEvent`, or `WaitFor`.
2. **Checkpoint is written.** Hatchet records the current progress in the durable event log.
3. **Worker slot is freed.** The task is evicted from the worker. The slot is immediately available for other tasks.
4. **Wait completes.** When the sleep expires or the expected event arrives, Hatchet re-queues the task.
5. **Task resumes on any available worker.** A worker picks up the task, replays the event log to the last checkpoint, and continues execution from where it left off.
The resumed task does not need to run on the same worker that originally started it. Any worker that has registered the task can pick it up.
### Why eviction matters
Without eviction, a task that sleeps for 24 hours would consume a slot for the entire duration, wasting capacity that could be running other work. With eviction, the slot is freed immediately.
This is especially important for:
- **Long waits** — Tasks that sleep for hours or days should not hold slots.
- **Human-in-the-loop** — Waiting for a human to approve or respond could take minutes or weeks. Eviction ensures no resources are held in the meantime.
- **Large fan-outs** — A parent task that spawns thousands of children and waits for results can release its slot while the children run, preventing deadlocks where the parent holds resources that the children need.
### Separate slot pools
Durable tasks consume slots from a **separate slot pool** than regular tasks. This prevents a common deadlock: if durable and regular tasks shared the same pool, a durable task waiting on child tasks could hold the very slot those children need to execute.
By isolating slot pools, Hatchet ensures that durable tasks waiting on children never starve the workers that need to run those children.
### Eviction and determinism
Because a task may be evicted and resumed on a different worker at any time, the code between checkpoints must be [deterministic](/v1/patterns/mixing-patterns#determinism-in-durable-tasks). On resume, Hatchet replays the event log; it does not re-execute completed operations. If the code has changed between the original run and the replay, the checkpoint sequence may not match, leading to unexpected behavior.
#### DAGs
## No Eviction Needed
DAG tasks do not require eviction because they are **never assigned to a worker until they can actually run**. A worker slot is only allocated when all of the task's conditions are met: parent tasks have completed, sleep durations have elapsed, and expected events have arrived.
This means resources are only consumed during active execution, never during waits.
### How DAG scheduling works
```mermaid
graph LR
PENDING -->|"All conditions met (parents, sleep, events)"| QUEUED
QUEUED -->|Assigned to worker| RUNNING
RUNNING -->|Completes| COMPLETED
```
1. **Task is pending.** The task exists in the workflow but is not queued. No worker slot is allocated. No resources are consumed.
2. **Conditions are met.** All parent tasks have completed, any sleep duration has elapsed, and any required events have arrived.
3. **Task is queued.** Only now does Hatchet place the task in the queue for worker assignment.
4. **Task runs to completion.** A worker picks up the task, executes it, and the slot is freed.
### Why this matters
Because DAG tasks are only scheduled when ready, there is no wasted capacity:
- **Sleep conditions** — A task that waits 24 hours after its parent completes does not hold a slot. It sits in a pending state until the timer expires, then gets queued.
- **Event conditions** — A task waiting for an external event consumes no resources. When the event arrives, the task is queued and assigned a slot.
- **Parent dependencies** — Tasks waiting on upstream results are not queued until those results are available.
This is one of the advantages of DAGs: the scheduling model is simpler. You declare the conditions upfront, and Hatchet handles the timing. There is no eviction, no checkpointing, and no replay, because the task never starts until it's ready to run straight through.
> **Info:** If you need a task to start running and then pause partway through (for
> example, to wait for an event based on intermediate results), use a [durable
> task](/v1/patterns/durable-task-execution) instead. DAG tasks run from start
> to finish once scheduled.
---
# Dockerizing Hatchet Applications
This guide explains how to create Dockerfiles for Hatchet applications. There are examples for Python, TypeScript, Go, and Ruby applications here.
## Entrypoint Configuration for Hatchet
Before creating your Dockerfile, understand that Hatchet workers require specific entry point configuration:
1. The entry point must run code that runs the Hatchet worker. This can be done by calling the `worker.start()` method in your respective SDK.
2. Proper environment variables must be set for Hatchet SDK
3. The worker should be configured to handle your workflows using the `worker.register` method or by passing workflows into the worker constructor or factory.
## Example Dockerfiles
#### Python - Poetry
```dockerfile
FROM python:3.13-slim
ENV PYTHONUNBUFFERED=1 \
POETRY_VERSION=1.4.2 \
HATCHET_ENV=production
# Install system dependencies and Poetry
RUN apt-get update && \
apt-get install -y curl && \
curl -sSL https://install.python-poetry.org | python3 - && \
ln -s /root/.local/bin/poetry /usr/local/bin/poetry && \
apt-get clean && \
rm -rf /var/lib/apt/lists/\*
WORKDIR /app
COPY pyproject.toml poetry.lock\* /app/
RUN poetry config virtualenvs.create false && \
poetry install --no-interaction --no-ansi
COPY . /app
CMD ["poetry", "run", "python", "worker.py"]
```
> **Info:** If you're using a poetry script to run your worker, you can replace `poetry run python worker.py` with `poetry run ` in the CMD.
#### Python - pip
```dockerfile
FROM python:3.13-slim
ENV PYTHONUNBUFFERED=1 \
HATCHET_ENV=production
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . /app
CMD ["python", "worker.py"]
```
#### JavaScript - npm
```dockerfile
# Stage 1: Build
FROM node:18 AS builder
WORKDIR /app
COPY package\*.json ./
RUN npm ci
COPY . .
RUN npm run build
# Stage 2: Production
FROM node:22-alpine
WORKDIR /app
COPY package\*.json ./
RUN npm ci --omit=dev
COPY --from=builder /app/dist ./dist
ENV NODE_ENV=production
CMD ["node", "dist/worker.js"]
```
> **Info:** Use `npm ci` instead of `npm install` for more reliable builds. It's faster and ensures consistent installs across environments.
#### JavaScript - pnpm
```dockerfile
# Stage 1: Build
FROM node:18 AS builder
WORKDIR /app
# Install pnpm
RUN npm install -g pnpm
COPY pnpm-lock.yaml package.json ./
RUN pnpm install --frozen-lockfile
COPY . .
RUN pnpm build
# Stage 2: Production
FROM node:22-alpine
WORKDIR /app
RUN npm install -g pnpm
COPY pnpm-lock.yaml package.json ./
RUN pnpm install --frozen-lockfile --prod
COPY --from=builder /app/dist ./dist
ENV NODE_ENV=production
CMD ["node", "dist/worker.js"]
```
> **Info:** PNPM's `--frozen-lockfile` flag ensures consistent installs and fails if an update is needed.
#### JavaScript - yarn
```dockerfile
# Stage 1: Build
FROM node:18 AS builder
WORKDIR /app
COPY package.json yarn.lock ./
RUN yarn install --frozen-lockfile
COPY . .
RUN yarn build
# Stage 2: Production
FROM node:22-alpine
WORKDIR /app
COPY package.json yarn.lock ./
RUN yarn install --frozen-lockfile --production
COPY --from=builder /app/dist ./dist
ENV NODE_ENV=production
CMD ["node", "dist/worker.js"]
```
> **Info:** Yarn's `--frozen-lockfile` ensures your dependencies match the lock file exactly.
#### Go
```dockerfile
# Stage 1: Build
FROM golang:1.25-alpine3.21 AS builder
WORKDIR /app
COPY . .
RUN go mod download
RUN go build -o hatchet-worker .
# Stage 2: Production
FROM golang:1.25-alpine3.21
WORKDIR /app
COPY --from=builder hatchet-worker .
CMD ["/app/hatchet-worker"]
```
#### Ruby
```dockerfile
FROM ruby:3.3-slim
ENV HATCHET_ENV=production
# Install system dependencies for native gems
RUN apt-get update && \
apt-get install -y build-essential && \
apt-get clean && \
rm -rf /var/lib/apt/lists/\*
WORKDIR /app
COPY Gemfile Gemfile.lock ./
RUN bundle config set --local without 'development test' && \
bundle install
COPY . /app
CMD ["bundle", "exec", "ruby", "worker.rb"]
```
> **Info:** If you're using a Rake task or binstub to start your worker, replace the CMD with the appropriate command, e.g. `CMD ["bundle", "exec", "rake", "hatchet:worker"]`.
```
---
# Autoscaling Workers
Hatchet provides a Task Stats API that enables you to implement autoscaling for your worker pools. By querying real-time queue depths and task distribution, you can dynamically scale workers based on actual workload demand.
## Task Stats API
The Task Stats endpoint returns current statistics for queued and running tasks across your tenant, broken down by task name, queue, and concurrency group.
### Endpoint
```
GET /api/v1/tenants/{tenantId}/task-stats
```
### Authentication
The endpoint requires Bearer token authentication using a valid API token:
```
Authorization: Bearer
```
### Response Format
The response is a JSON object keyed by task name, with each task containing statistics for queued and running states:
```json
{
"my-task": {
"queued": {
"total": 150,
"queues": {
"my-task:default": 100,
"my-task:priority": 50
},
"concurrency": [
{
"expression": "input.user_id",
"type": "GROUP_ROUND_ROBIN",
"keys": {
"user-123": 10,
"user-456": 15
}
}
],
"oldest": "2024-01-15T10:30:00Z"
},
"running": {
"total": 25,
"oldest": "2024-01-15T10:25:00Z",
"concurrency": []
}
}
}
```
Each task stat includes:
- **total**: The total count of tasks in this state
- **concurrency**: Distribution across concurrency groups (if concurrency limits are configured)
- **oldest**: Timestamp of the oldest task in the specified state
These are available only for `queued` tasks:
- **queues**: A breakdown of task counts by queue name
### Example Usage
```bash
curl -H "Authorization: Bearer your-api-token-here" \
https://cloud.onhatchet.run/api/v1/tenants/707d0855-80ab-4e1f-a156-f1c4546cbf52/task-stats
```
## Autoscaling with KEDA
[KEDA](https://keda.sh) (Kubernetes Event-driven Autoscaling) can use the Task Stats API to automatically scale your worker deployments based on queue depth.
### Setting Up a KEDA ScaledObject
Create a `ScaledObject` that queries the Hatchet Task Stats API and scales your worker deployment based on the number of queued tasks:
```yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: hatchet-worker-scaler
spec:
scaleTargetRef:
name: hatchet-worker
minReplicaCount: 1
maxReplicaCount: 10
triggers:
- type: metrics-api
metadata:
targetValue: "100"
url: "https://cloud.onhatchet.run/api/v1/tenants/YOUR_TENANT_ID/task-stats"
valueLocation: "my-task.queued.total"
authMode: "bearer"
authenticationRef:
name: hatchet-api-token
---
apiVersion: v1
kind: Secret
metadata:
name: hatchet-api-token
type: Opaque
stringData:
token: "your-api-token-here"
---
apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
name: hatchet-api-token
spec:
secretTargetRef:
- parameter: token
name: hatchet-api-token
key: token
```
> **Info:** The `valueLocation` field uses JSONPath-style notation to extract a specific
> value from the response. Adjust `my-task` to match your actual task name.
### Scaling Based on Multiple Tasks
If you have multiple task types handled by the same worker, you can create multiple triggers or use a custom metrics endpoint that aggregates the totals:
```yaml
triggers:
- type: metrics-api
metadata:
targetValue: "50"
url: "https://cloud.onhatchet.run/api/v1/tenants/YOUR_TENANT_ID/task-stats"
valueLocation: "task-a.queued.total"
authMode: "bearer"
authenticationRef:
name: hatchet-api-token
- type: metrics-api
metadata:
targetValue: "50"
url: "https://cloud.onhatchet.run/api/v1/tenants/YOUR_TENANT_ID/task-stats"
valueLocation: "task-b.queued.total"
authMode: "bearer"
authenticationRef:
name: hatchet-api-token
```
---
# Sticky Worker Assignment (Beta)
> **Info:** This feature is currently in beta and may be subject to change.
Sticky assignment is a task property that allows you to specify that all child tasks should be assigned to the same worker for the duration of its execution. This can be useful in situations like when you need to maintain expensive local memory state across multiple tasks in a workflow or ensure that certain tasks are processed by the same worker for consistency.
> **Warning:** This feature is only compatible with long lived workers, and not webhook
> workers.
## Setting Sticky Assignment
Sticky assignment is set on the task level by adding the `sticky` property to the task definition. When a task is marked as sticky, all steps within that task will be assigned to the same worker for the duration of the task execution.
> **Warning:** While sticky assignment can be useful in certain scenarios, it can also
> introduce potential bottlenecks if the assigned worker becomes unavailable, or
> if local state is not maintained when the job is picked up. Be sure to
> consider the implications of sticky assignment when designing your tasks and
> have a plan in place to handle local state issues.
There are two strategies for setting sticky assignment for [DAG](./dags.mdx) workflows:
- `SOFT`: All tasks in the workflow will attempt to be assigned to the same worker, but if that worker is unavailable, it will be assigned to another worker.
- `HARD`: All taks in the workflow will only be assigned to the same worker. If that worker is unavailable, the workflow run will not be assigned to another worker and will remain in a pending state until the original worker becomes available or timeout is reached. (See [Scheduling Timeouts](./timeouts.mdx#task-level-timeouts))
#### Ruby
```python
sticky_workflow = hatchet.workflow(
name="StickyWorkflow",
# 👀 Specify a sticky strategy when declaring the workflow
sticky=StickyStrategy.SOFT,
)
@sticky_workflow.task()
def step1a(input: EmptyModel, ctx: Context) -> dict[str, str | None]:
return {"worker": ctx.worker_id}
@sticky_workflow.task()
def step1b(input: EmptyModel, ctx: Context) -> dict[str, str | None]:
return {"worker": ctx.worker_id}
```
#### Tab 2
```typescript
export const sticky = hatchet.task({
name: 'sticky',
retries: 3,
sticky: StickyStrategy.SOFT,
fn: async (_, ctx) => {
// specify a child workflow to run on the same worker
const result = await child.run(
{
N: 1,
},
{ sticky: true }
);
return {
result,
};
},
});
```
#### Tab 3
```go
func StickyDag(client *hatchet.Client) *hatchet.Workflow {
stickyDag := client.NewWorkflow("sticky-dag",
hatchet.WithWorkflowStickyStrategy(types.StickyStrategy_SOFT),
)
_ = stickyDag.NewTask("sticky-task",
func(ctx worker.HatchetContext, input StickyInput) (interface{}, error) {
workerId := ctx.Worker().ID()
return &StickyResult{
Result: workerId,
}, nil
},
)
_ = stickyDag.NewTask("sticky-task-2",
func(ctx worker.HatchetContext, input StickyInput) (interface{}, error) {
workerId := ctx.Worker().ID()
return &StickyResult{
Result: workerId,
}, nil
},
)
return stickyDag
}
```
#### Tab 4
```ruby
STICKY_WORKFLOW = HATCHET.workflow(
name: "StickyWorkflow",
# Specify a sticky strategy when declaring the workflow
sticky: :soft
)
STEP1A = STICKY_WORKFLOW.task(:step1a) do |input, ctx|
{ "worker" => ctx.worker.id }
end
STEP1B = STICKY_WORKFLOW.task(:step1b) do |input, ctx|
{ "worker" => ctx.worker.id }
end
```
In this example, the `sticky` property is set to `SOFT`, which means that the task will attempt to be assigned to the same worker for the duration of its execution. If the original worker is unavailable, the task will be assigned to another worker.
## Sticky Child Tasks
It is possible to spawn child tasks on the same worker as the parent task by setting the `sticky` property to `true` in the `run` method options. This can be useful when you need to maintain local state across multiple tasks or ensure that child tasks are processed by the same worker for consistency.
However, the child task must:
1. Specify a `sticky` strategy in the child task's definition
2. Be registered with the same worker as the parent task
If either condition is not met, an error will be thrown when the child task is spawned.
#### Ruby
```python
sticky_child_workflow = hatchet.workflow(
name="StickyChildWorkflow", sticky=StickyStrategy.SOFT
)
@sticky_workflow.task(parents=[step1a, step1b])
async def step2(input: EmptyModel, ctx: Context) -> dict[str, str | None]:
ref = await sticky_child_workflow.aio_run(
sticky=True,
wait_for_result=False,
)
await ref.aio_result()
return {"worker": ctx.worker_id}
@sticky_child_workflow.task()
def child(input: EmptyModel, ctx: Context) -> dict[str, str | None]:
return {"worker": ctx.worker_id}
```
#### Tab 2
```typescript
export const sticky = hatchet.task({
name: 'sticky',
retries: 3,
sticky: StickyStrategy.SOFT,
fn: async (_, ctx) => {
// specify a child workflow to run on the same worker
const result = await child.run(
{
N: 1,
},
{ sticky: true }
);
return {
result,
};
},
});
```
#### Tab 3
```go
func Sticky(client *hatchet.Client) *hatchet.StandaloneTask {
sticky := client.NewStandaloneTask("sticky-task",
func(ctx worker.HatchetContext, input StickyInput) (*StickyResult, error) {
// Run a child workflow on the same worker
childWorkflow := Child(client)
childResult, err := childWorkflow.Run(ctx, ChildInput{N: 1}, hatchet.WithRunSticky(true))
if err != nil {
return nil, err
}
var childOutput ChildResult
err = childResult.Into(&childOutput)
if err != nil {
return nil, err
}
return &StickyResult{
Result: fmt.Sprintf("child-result-%s", childOutput.Result),
}, nil
},
)
return sticky
}
```
#### Tab 4
```ruby
STICKY_CHILD_WORKFLOW = HATCHET.workflow(
name: "StickyChildWorkflow",
sticky: :soft
)
STICKY_WORKFLOW.task(:step2, parents: [STEP1A, STEP1B]) do |input, ctx|
ref = STICKY_CHILD_WORKFLOW.run_no_wait(
options: Hatchet::TriggerWorkflowOptions.new(sticky: true)
)
ref.result
{ "worker" => ctx.worker.id }
end
STICKY_CHILD_WORKFLOW.task(:child) do |input, ctx|
{ "worker" => ctx.worker.id }
end
```
---
# Worker Affinity Assignment (Beta)
> **Info:** This feature is currently in beta and may be subject to change.
It is often desirable to assign workflows to specific workers based on certain criteria, such as worker capabilities, resource availability, or location. Worker affinity allows you to specify that a workflow should be assigned to a specific worker based on worker label state. Labels can be set dynamically on workers to reflect their current state, such as a specific model loaded into memory or specific disk requirements.
Specific tasks can then specify desired label state to ensure that workflows are assigned to workers that meet specific criteria. If no worker meets the specified criteria, the task run will remain in a pending state until a suitable worker becomes available or the task is cancelled. (See [Scheduling Timeouts](./timeouts.mdx#task-level-timeouts))
## Specifying Worker Labels
Labels can be set on workers when they are registered with Hatchet. Labels are key-value pairs that can be used to specify worker capabilities, resource availability, or other criteria that can be used to match workflows to workers. Values can be strings or numbers, and multiple labels can be set on a worker.
#### Python
```python
worker = hatchet.worker(
"affinity-worker",
slots=10,
labels={
"model": "fancy-ai-model-v2",
"memory": 512,
},
workflows=[affinity_worker_workflow],
)
worker.start()
```
#### Typescript
```typescript
const workflow = hatchet.workflow({
name: 'affinity-workflow',
description: 'test',
});
workflow.task({
name: 'step1',
fn: async (_, ctx) => {
const results = [];
for (let i = 0; i < 50; i++) {
const result = await childWorkflow.run({});
results.push(result);
}
console.log('Spawned 50 child workflows');
console.log('Results:', await Promise.all(results));
return { step1: 'step1 results!' };
},
});
```
#### Go
```go
worker, err := client.NewWorker("affinity-worker",
hatchet.WithWorkflows(affinityWorkflow),
hatchet.WithSlots(10),
hatchet.WithLabels(map[string]any{
"model": "fancy-ai-model-v2",
"memory": 512,
}),
)
```
#### Ruby
```ruby
def main
worker = HATCHET.worker(
"affinity-worker",
slots: 10,
labels: {
"model" => "fancy-ai-model-v2",
"memory" => 512
},
workflows: [AFFINITY_WORKER_WORKFLOW]
)
worker.start
end
```
## Specifying Step Desired Labels
You can specify desired worker label state for specific tasks in a workflow by setting the `desired_worker_labels` property on the task definition. This property is an object where the keys are the label keys and the values are objects with the following properties:
- `value`: The desired value of the label
- `comparator` (default: `EQUAL`): The comparison operator to use when matching the label value.
- `EQUAL`: The label value must be equal to the desired value
- `NOT_EQUAL`: The label value must not be equal to the desired value
- `GREATER_THAN`: The label value must be greater than the desired value
- `GREATER_THAN_OR_EQUAL`: The label value must be greater than or equal to the desired value
- `LESS_THAN`: The label value must be less than the desired value
- `LESS_THAN_OR_EQUAL`: The label value must be less than or equal to the desired value
- `required` (default: `true`): Whether the label is required for the task to run. If `true`, the task will remain in a pending state until a worker with the desired label state becomes available. If `false`, the worker will be prioritized based on the sum of the highest matching weights.
- `weight` (optional, default: `100`): The weight of the label. Higher weights are prioritized over lower weights when selecting a worker for the task. If multiple workers have the same highest weight, the worker with the highest sum of weights will be selected. Ignored if `required` is `true`.
#### Ruby
```python
affinity_worker_workflow = hatchet.workflow(name="AffinityWorkflow")
@affinity_worker_workflow.task(
desired_worker_labels=[
DesiredWorkerLabel(key="model", value="fancy-ai-model-v2", weight=10),
DesiredWorkerLabel(
key="memory",
value=256,
required=True,
comparator=WorkerLabelComparator.LESS_THAN,
),
],
)
```
#### Tab 2
```typescript
const workflow = hatchet.workflow({
name: 'affinity-workflow',
description: 'test',
});
workflow.task({
name: 'step1',
fn: async (_, ctx) => {
const results = [];
for (let i = 0; i < 50; i++) {
const result = await childWorkflow.run({});
results.push(result);
}
console.log('Spawned 50 child workflows');
console.log('Results:', await Promise.all(results));
return { step1: 'step1 results!' };
},
});
```
#### Tab 3
```go
err = w.RegisterWorkflow(
&worker.WorkflowJob{
On: worker.Events("user:create:affinity"),
Name: "affinity",
Description: "affinity",
Steps: []*worker.WorkflowStep{
worker.Fn(func(ctx worker.HatchetContext) (result *taskOneOutput, err error) {
return &taskOneOutput{
Message: ctx.Worker().ID(),
}, nil
}).
SetName("task-one").
SetDesiredLabels(map[string]*types.DesiredWorkerLabel{
"model": {
Value: "fancy-ai-model-v2",
Weight: 10,
},
"memory": {
Value: 512,
Required: true,
Comparator: types.ComparatorPtr(types.WorkerLabelComparator_GREATER_THAN),
},
}),
},
},
)
```
#### Tab 4
```ruby
AFFINITY_WORKER_WORKFLOW = HATCHET.workflow(name: "AffinityWorkflow")
```
> **Warning:** Use extra care when using worker affinity with [sticky assignment `HARD`
> strategy](./sticky-assignment.mdx). In this case, it is recommended to set
> desired labels on the first task of the workflow to ensure that the workflow
> is assigned to a worker that meets the desired criteria and remains on that
> worker for the duration of the workflow.
### Dynamic Worker Labels
Labels can also be set dynamically on workers using the `upsertLabels` method. This can be useful when worker state changes over time, such as when a new model is loaded into memory or when a worker's resource availability changes.
#### Ruby
```python
async def step(input: EmptyModel, ctx: Context) -> dict[str, str | None]:
if ctx.worker_labels.get("model") != "fancy-ai-model-v2":
ctx.worker.upsert_labels({"model": "unset"})
# DO WORK TO EVICT OLD MODEL / LOAD NEW MODEL
ctx.worker.upsert_labels({"model": "fancy-ai-model-v2"})
return {"worker": ctx.worker_id}
```
#### Tab 2
```typescript
const childWorkflow = hatchet.workflow({
name: 'child-affinity-workflow',
description: 'test',
});
childWorkflow.task({
name: 'child-step1',
desiredWorkerLabels: {
model: {
value: 'xyz',
required: true,
},
},
fn: async (ctx) => {
return { childStep1: 'childStep1 results!' };
},
});
```
#### Tab 3
```go
err = w.RegisterWorkflow(
&worker.WorkflowJob{
On: worker.Events("user:create:affinity"),
Name: "affinity",
Description: "affinity",
Steps: []*worker.WorkflowStep{
worker.Fn(func(ctx worker.HatchetContext) (result *taskOneOutput, err error) {
model := ctx.Worker().GetLabels()["model"]
if model != "fancy-vision-model" {
ctx.Worker().UpsertLabels(map[string]interface{}{
"model": nil,
})
// Do something to load the model
evictModel();
loadNewModel("fancy-vision-model");
ctx.Worker().UpsertLabels(map[string]interface{}{
"model": "fancy-vision-model",
})
}
return &taskOneOutput{
Message: ctx.Worker().ID(),
}, nil
}).
SetName("task-one").
SetDesiredLabels(map[string]*types.DesiredWorkerLabel{
"model": {
Value: "fancy-vision-model",
Weight: 10,
},
"memory": {
Value: 512,
Required: true,
Comparator: types.WorkerLabelComparator_GREATER_THAN,
},
}),
},
},
)
```
#### Tab 4
```ruby
AFFINITY_WORKER_WORKFLOW.task(
:step,
desired_worker_labels: {
"model" => Hatchet::DesiredWorkerLabel.new(value: "fancy-ai-model-v2", weight: 10),
"memory" => Hatchet::DesiredWorkerLabel.new(
value: 256,
required: true,
comparator: :less_than
)
}
) do |input, ctx|
if ctx.worker.labels["model"] != "fancy-ai-model-v2"
ctx.worker.upsert_labels("model" => "unset")
# DO WORK TO EVICT OLD MODEL / LOAD NEW MODEL
ctx.worker.upsert_labels("model" => "fancy-ai-model-v2")
end
{ "worker" => ctx.worker.id }
end
```
---
# Manual Slot Release
The Hatchet execution model sets a number of available slots for running tasks in a workflow. When a task is running, it occupies a slot, and if a worker has no available slots, it will not be able to run any more tasks concurrently.
In some cases, you may have a task in your workflow that is resource-intensive and requires exclusive access to a shared resource, such as a database connection or a GPU compute instance. To ensure that other tasks in the workflow can run concurrently, you can manually release the slot after the resource-intensive task has completed, but the task still has non-resource-intensive work to do (i.e. upload or cleanup).
> **Warning:** This is an advanced feature and should be used with caution. Manually
> releasing the slot can have unintended side effects on system performance and
> concurrency. For example, if the worker running the task dies, the task will
> not be reassigned and will remain in a running state until manually
> terminated.
## Using Manual Slot Release
You can manually release a slot in from within a running task in your workflow using the Hatchet context method `release_slot`:
#### Go
```python
slot_release_workflow = hatchet.workflow(name="SlotReleaseWorkflow")
@slot_release_workflow.task()
def step1(input: EmptyModel, ctx: Context) -> dict[str, str]:
print("RESOURCE INTENSIVE PROCESS")
time.sleep(10)
# 👀 Release the slot after the resource-intensive process, so that other steps can run
ctx.release_slot()
print("NON RESOURCE INTENSIVE PROCESS")
return {"status": "success"}
```
#### Ruby
```go
_ = workflow.NewTask("step1", func(ctx hatchet.Context, _ any) (*StepOutput, error) {
fmt.Println("RESOURCE INTENSIVE PROCESS")
time.Sleep(10 * time.Second)
// Release the slot after the resource-intensive process,
// so that other steps can run on this worker.
if releaseErr := ctx.ReleaseSlot(); releaseErr != nil {
return nil, fmt.Errorf("failed to release slot: %w", releaseErr)
}
fmt.Println("NON RESOURCE INTENSIVE PROCESS")
return &StepOutput{Status: "success"}, nil
})
```
#### Tab 3
```ruby
SLOT_RELEASE_WORKFLOW = HATCHET.workflow(name: "SlotReleaseWorkflow")
SLOT_RELEASE_WORKFLOW.task(:step1) do |input, ctx|
puts "RESOURCE INTENSIVE PROCESS"
sleep 10
# Release the slot after the resource-intensive process, so that other steps can run
ctx.release_slot
puts "NON RESOURCE INTENSIVE PROCESS"
{ "status" => "success" }
end
```
In the above examples, the `release_slot()` method is called after the resource-intensive process has completed. This allows other tasks in the workflow to start executing while the current task continues with non-resource-intensive tasks.
> **Info:** Manually releasing the slot does not terminate the current task. The task will
> continue executing until it completes or encounters an error.
## Use Cases
Some common use cases for Manual Slot Release include:
- Performing data processing or analysis that requires significant CPU, GPU, or memory resources
- Acquiring locks or semaphores to access shared resources
- Executing long-running tasks that don't need to block other tasks after some initial work is done
By utilizing Manual Slot Release, you can optimize the concurrency and resource utilization of your workflows, allowing multiple tasks to run in parallel when possible.
---
# Logging
Hatchet comes with a built-in logging view where you can push logs from your workflows. This is useful for debugging and monitoring your workflows.
#### Ruby
You can use either Python's built-in `logging` package, or the `context.log` method for more control over the logs that are sent.
## Using the built-in `logging` package
You can pass a custom logger to the `Hatchet` class when initializing it. For example:
```python
import logging
from hatchet_sdk import ClientConfig, Hatchet
logging.basicConfig(level=logging.INFO)
root_logger = logging.getLogger()
hatchet = Hatchet(
config=ClientConfig(
logger=root_logger,
),
)
```
It's recommended that you pass the root logger to the `Hatchet` class, as this will ensure that all logs are captured by the Hatchet logger. If you have workflows defined in multiple files, they should be children of the root logger. For example, with the following file structure:
```
workflows/
workflow.py
client.py
worker.py
workflow.py
```
You should pass the root logger to the `Hatchet` class in `client.py`:
```python
import logging
from hatchet_sdk import ClientConfig, Hatchet
logging.basicConfig(level=logging.INFO)
root_logger = logging.getLogger()
hatchet = Hatchet(
config=ClientConfig(
logger=root_logger,
),
)
```
And then in `workflows/workflow.py`, you should create a child logger:
```python
import logging
import time
from examples.logger.client import hatchet
from hatchet_sdk import Context, EmptyModel
logger = logging.getLogger(__name__)
logging_workflow = hatchet.workflow(
name="LoggingWorkflow",
)
@logging_workflow.task()
def root_logger(input: EmptyModel, ctx: Context) -> dict[str, str]:
for i in range(12):
logger.info(f"executed step1 - {i}")
logger.info({"step1": "step1"})
time.sleep(0.1)
return {"status": "success"}
```
## Using the `context.log` method
You can also use the `context.log` method to log messages from your workflows. This method is available on the `Context` object that is passed to each task in your workflow. For example:
```python
@logging_workflow.task()
def context_logger(input: EmptyModel, ctx: Context) -> dict[str, str]:
for i in range(12):
ctx.log(f"executed step1 - {i}")
ctx.log({"step1": "step1"})
time.sleep(0.1)
return {"status": "success"}
```
Each task is currently limited to 1000 log lines.
#### Tab 2
In TypeScript, there are two options for logging from your tasks. The first is to use the `ctx.log()` method (from the `Context`) to send logs:
```typescript
const workflow = hatchet.workflow({
name: 'logger-example',
description: 'test',
on: {
event: 'user:create',
},
});
workflow.task({
name: 'logger-step1',
fn: async (_, ctx) => {
// log in a for loop
for (let i = 0; i < 10; i++) {
ctx.logger.info(`log message ${i}`);
await sleep(200);
}
return { step1: 'completed step run' };
},
});
```
This has the benefit of being easy to use out of the box (no setup required!), but it's limited in its flexibiliy and how pluggable it is with your existing logging setup.
Hatchet also allows you to "bring your own" logger when you define a workflow:
```typescript
const logger = pino();
class PinoLogger implements Logger {
logLevel: LogLevel;
context: string;
constructor(context: string, logLevel: LogLevel = 'DEBUG') {
this.logLevel = logLevel;
this.context = context;
}
debug(message: string, extra?: JsonObject): void {
logger.debug(extra, message);
}
info(message: string, extra?: JsonObject): void {
logger.info(extra, message);
}
green(message: string, extra?: JsonObject): void {
logger.info(extra, `%c${message}`);
}
warn(message: string, error?: Error, extra?: JsonObject): void {
logger.warn(extra, `${message} ${error}`);
}
error(message: string, error?: Error, extra?: JsonObject): void {
logger.error(extra, `${message} ${error}`);
}
// optional util method
util(key: string, message: string, extra?: JsonObject): void {
// for example you may want to expose a trace method
if (key === 'trace') {
logger.info(extra, 'trace');
}
}
}
const hatchet = Hatchet.init({
log_level: 'DEBUG',
logger: (ctx, level) => new PinoLogger(ctx, level),
});
```
In this example, we create Pino logger that implement's Hatchet's `Logger` interface and pass it to the Hatchet client constructor. We can then use that logger in our steps:
```typescript
const workflow = hatchet.workflow({
name: 'logger-example',
description: 'test',
on: {
event: 'user:create',
},
});
workflow.task({
name: 'logger-step1',
fn: async (_, ctx) => {
// log in a for loop
for (let i = 0; i < 10; i++) {
ctx.logger.info(`log message ${i}`);
await sleep(200);
}
return { step1: 'completed step run' };
},
});
```
#### Tab 3
```ruby
require "hatchet-sdk"
require "logger"
logger = Logger.new($stdout)
logger.level = Logger::INFO
HATCHET = Hatchet::Client.new(debug: true) unless defined?(HATCHET)
LOGGING_WORKFLOW = HATCHET.workflow(name: "LoggingWorkflow")
LOGGING_WORKFLOW.task(:root_logger) do |input, ctx|
12.times do |i|
logger.info("executed step1 - #{i}")
logger.info({ "step1" => "step1" }.inspect)
sleep 0.1
end
{ "status" => "success" }
end
```
```ruby
LOGGING_WORKFLOW.task(:context_logger) do |input, ctx|
12.times do |i|
ctx.log("executed step1 - #{i}")
ctx.log({ "step1" => "step1" }.inspect)
sleep 0.1
end
{ "status" => "success" }
end
```
---
# OpenTelemetry
Hatchet supports exporting traces from your tasks to an [OpenTelemetry Collector](https://opentelemetry.io/docs/collector/) to improve visibility into your Hatchet tasks.
## Setup
#### Python
Install the `otel` extra:
```bash
pip install hatchet-sdk[otel]
```
Then create the instrumentor and call `instrument()`:
```python
from hatchet_sdk.opentelemetry.instrumentor import HatchetInstrumentor
HatchetInstrumentor().instrument()
```
By default, `HatchetInstrumentor` creates a `TracerProvider` that sends spans to the Hatchet engine's OTLP collector. You can also pass your own `TracerProvider`:
```python
HatchetInstrumentor(
tracer_provider=your_tracer_provider,
).instrument()
```
### Options
Option, Type, Default, Description
`tracer_provider`, `TracerProvider`, —, Custom TracerProvider. If not set, one is created automatically.
`enable_hatchet_otel_collector`, `bool`, `True`, Send traces to the Hatchet engine's OTLP collector.
`schedule_delay_millis`, `int`, —, Delay between consecutive exports of the BatchSpanProcessor.
`max_export_batch_size`, `int`, —, Maximum batch size for the BatchSpanProcessor.
`max_queue_size`, `int`, —, Maximum queue size for the BatchSpanProcessor.
#### TypeScript
Install the required OpenTelemetry packages:
```bash
npm install @opentelemetry/api @opentelemetry/instrumentation @opentelemetry/sdk-trace-base @opentelemetry/exporter-trace-otlp-grpc
```
Register the instrumentor before creating any Hatchet clients or workers:
```typescript
const { registerInstrumentations } = require("@opentelemetry/instrumentation");
import { HatchetInstrumentor } from "@hatchet-dev/typescript-sdk/opentelemetry";
registerInstrumentations({
instrumentations: [new HatchetInstrumentor()],
});
```
By default, `HatchetInstrumentor` sends spans to the Hatchet engine's OTLP collector. You can disable this with `enableHatchetCollector: false`.
### Options
Option, Type, Default, Description
`enableHatchetCollector`, `boolean`, `true`, Send traces to the Hatchet engine's OTLP collector.
`clientConfig`, `object`, —, Override Hatchet client config for the collector connection.
`includeTaskNameInSpanName`, `boolean`, `false`, Append the task action ID to the `hatchet.start_step_run` span name.
`excludedAttributes`, `array`, `[]`, List of `hatchet.*` attribute keys to exclude from spans.
#### Go
Import the `opentelemetry` package (commonly aliased as `hatchetotel`):
```go
import hatchetotel "github.com/hatchet-dev/hatchet/sdks/go/opentelemetry"
```
Create the instrumentor, then register its middleware on the worker:
```go
instrumentor, err := hatchetotel.NewInstrumentor()
if err != nil {
log.Fatalf("failed to create instrumentor: %v", err)
}
worker.Use(instrumentor.Middleware())
```
By default, `NewInstrumentor` creates a `TracerProvider` that sends spans to the Hatchet engine's OTLP collector.
Remember to shut down the instrumentor on exit to flush remaining spans:
```go
defer instrumentor.Shutdown(context.Background())
```
### Options
Option, Description
`hatchetotel.WithTracerProvider(tp)`, Use a custom `*sdktrace.TracerProvider` instead of creating a new one.
`hatchetotel.DisableHatchetCollector()`, Disable sending traces to the Hatchet engine's OTLP collector.
`hatchetotel.WithBatchSpanProcessorOptions(...)`, Configure the `BatchSpanProcessor` for the Hatchet collector.
#### Ruby
> **Info:** OpenTelemetry support for the Ruby SDK is coming soon.
## Spans
By default, Hatchet creates spans at the following points in the lifecycle of a task run:
1. **Producer spans** — when a trigger is called on the client side (e.g. `run()`, `runNoWait()`, `push()`, `schedule()`).
2. **Consumer spans** — when a worker handles a task run, such as starting to run the task (`hatchet.start_step_run`) or cancelling a task (`hatchet.cancel_step_run`).
### Span Names
Span Name, Kind, Description
`hatchet.start_step_run`, `CONSUMER`, Worker begins executing a task.
`hatchet.cancel_step_run`, `CONSUMER`, Worker cancels a running task.
`hatchet.run_workflow`, `PRODUCER`, Client triggers a workflow run.
`hatchet.run_workflows`, `PRODUCER`, Client triggers multiple workflow runs.
`hatchet.schedule_workflow`, `PRODUCER`, Client creates a scheduled workflow run.
`hatchet.push_event`, `PRODUCER`, Client pushes an event.
`hatchet.bulk_push_event`, `PRODUCER`, Client pushes events in bulk.
`hatchet.durable.wait_for`, `INTERNAL`, Durable task waits for a signal/condition.
### Span Attributes
All spans include an `instrumentor` attribute set to `"hatchet"`. Consumer spans (`hatchet.start_step_run`) include the following `hatchet.*` attributes:
Attribute, Type, Description
`hatchet.tenant_id`, `string`, The tenant ID for the task run.
`hatchet.worker_id`, `string`, The worker handling the task.
`hatchet.workflow_run_id`, `string`, The workflow run ID.
`hatchet.step_run_id`, `string`, The task run ID.
`hatchet.step_id`, `string`, The task ID.
`hatchet.workflow_name`, `string`, The workflow name.
`hatchet.action_name`, `string`, The action ID (format: `workflowName:taskName`).
`hatchet.step_name`, `string`, The human-readable task name.
`hatchet.retry_count`, `int`, Current retry attempt (0-indexed).
`hatchet.workflow_id`, `string`, The workflow definition ID (if available).
`hatchet.workflow_version_id`, `string`, The workflow version ID (if available).
`hatchet.parent_workflow_run_id`, `string`, Parent workflow run ID (for child workflows).
`hatchet.child_workflow_index`, `int`, Child workflow index (for child workflows).
`hatchet.child_workflow_key`, `string`, Child workflow key (for child workflows).
Producer spans (`hatchet.run_workflow`) include `hatchet.step_name` (the workflow being triggered) and, on success, `hatchet.child_workflow_run_id`.
### Context Propagation
All SDKs:
1. Automatically inject W3C `traceparent` into `additionalMetadata` on producer spans, so consumer spans on the worker become children of the trigger span.
2. Provide an `HatchetAttributeSpanProcessor` that propagates `hatchet.*` attributes from the parent task run span to all child spans created within the task. This means any custom spans you create inside a task function will automatically carry the same `hatchet.*` attributes.
---
# Worker Health Checks
The Python SDK allows you to enable and ping a healthcheck to check on the status of your worker.
### Usage
First, set the `HATCHET_CLIENT_WORKER_HEALTHCHECK_ENABLED` environment variable to `True`. Once that flag is set, two health check endpoints will be available (on port `8001` by default):
1. `/health` - Returns **200** when the worker listener is healthy, otherwise **503** with body `{"status":"HEALTHY"}` or `{"status":"UNHEALTHY"}`.
2. `/metrics` - A metrics endpoint intended to be used by a monitoring system like Prometheus.
### Custom Port
You can set a custom port with the `HATCHET_CLIENT_WORKER_HEALTHCHECK_PORT` environment variable, e.g. `HATCHET_CLIENT_WORKER_HEALTHCHECK_PORT=8002`.
### Event loop blocked threshold
If the worker listener process event loop becomes blocked for longer than a threshold, `/health` will return **503**.
You can configure this threshold (in seconds) with:
- `HATCHET_CLIENT_WORKER_HEALTHCHECK_EVENT_LOOP_BLOCK_THRESHOLD_SECONDS` (default: `5.0`)
#### Example request to `/health`:
```bash
curl localhost:8001/health
{"status":"HEALTHY"}
```
#### Example request to `/metrics`:
```bash
curl localhost:8001/metrics
# HELP python_gc_objects_collected_total Objects collected during gc
# TYPE python_gc_objects_collected_total counter
python_gc_objects_collected_total{generation="0"} 18782.0
python_gc_objects_collected_total{generation="1"} 4907.0
python_gc_objects_collected_total{generation="2"} 244.0
# HELP python_gc_objects_uncollectable_total Uncollectable objects found during GC
# TYPE python_gc_objects_uncollectable_total counter
python_gc_objects_uncollectable_total{generation="0"} 0.0
python_gc_objects_uncollectable_total{generation="1"} 0.0
python_gc_objects_uncollectable_total{generation="2"} 0.0
# HELP python_gc_collections_total Number of times this generation was collected
# TYPE python_gc_collections_total counter
python_gc_collections_total{generation="0"} 308.0
python_gc_collections_total{generation="1"} 27.0
python_gc_collections_total{generation="2"} 2.0
# HELP python_info Python platform information
# TYPE python_info gauge
python_info{implementation="CPython",major="3",minor="10",patchlevel="15",version="3.10.15"} 1.0
# HELP hatchet_worker_listener_health_my_worker Listener health (1 healthy, 0 unhealthy)
# TYPE hatchet_worker_listener_health_my_worker gauge
hatchet_worker_listener_health_my_worker 1.0
# HELP hatchet_worker_event_loop_lag_seconds_my_worker Event loop lag in seconds (listener process)
# TYPE hatchet_worker_event_loop_lag_seconds_my_worker gauge
hatchet_worker_event_loop_lag_seconds_my_worker 0.0
```
#### Example Prometheus Configuration for `/metrics`:
```yaml
scrape_configs:
- job_name: "hatchet"
scrape_interval: 5s
static_configs:
- targets: ["localhost:8001"]
```
#### Example Prometheus Query
An example query to check if the worker is healthy might look something like:
```
(hatchet_worker_listener_health_my_worker{instance="localhost:8001", job="hatchet"}) or vector(0)
```
---
# Prometheus Metrics
> **Info:** Only available in the Dedicated tier and above on Hatchet Cloud, [reach
> out](https://hatchet.run/office-hours) to upgrade.
Hatchet exports Prometheus Metrics for your tenant which can be scraped with services like Grafana and DataDog.
## Tenant Metrics
> **Warning:** Only works with v1 tenants
Metrics for individual tenants are available in Prometheus Text Format via a REST API endpoint.
### Endpoint
```
GET /api/v1/tenants/{tenantId}/prometheus-metrics
```
### Authentication
The endpoint requires Bearer token authentication using a valid API token:
```
Authorization: Bearer
```
### Response Format
The response is returned in standard Prometheus Text Format, including:
- HELP comments describing each metric
- TYPE declarations (counter, gauge, etc.)
- Metric samples with labels and values
### Example Usage
```bash
curl -H "Authorization: Bearer your-api-token-here" \
https://cloud.onhatchet.run/api/v1/tenants/707d0855-80ab-4e1f-a156-f1c4546cbf52/prometheus-metrics
```
---
# Additional Metadata
Hatchet allows you to attach arbitrary key-value string pairs to events and task runs, which can be used for filtering, searching, or any other lookup purposes. This additional metadata is not part of the event payload or task input data but provides supplementary information for better organization and discoverability.
> **Info:** Additional metadata can be added to `Runs`, `Scheduled Runs`, `Cron Runs`, and
> `Events`. The data is propagated from parents to children or from events to
> runs.
You can attach additional metadata when pushing events or triggering task runs using the Hatchet client libraries:
#### Event Push
#### Ruby
```python
hatchet.event.push(
"user:create",
{"userId": "1234", "should_skip": False},
additional_metadata={"source": "api"}, # Arbitrary key-value pair
)
```
#### Tab 2
```typescript
const withMetadata = await hatchet.events.push(
'user:create',
{
test: 'test',
},
{
additionalMetadata: {
source: 'api', // Arbitrary key-value pair
},
}
);
```
#### Tab 3
```go
err = client.Events().Push(
context.Background(),
"user:create",
Input{Message: "hello"},
v0Client.WithEventMetadata(
map[string]string{"version": "1.0.0"},
),
)
if err != nil {
log.Fatalf("failed to push event: %v", err)
}
```
#### Tab 4
```ruby
HATCHET.event.push(
"user:create",
{ "userId" => "1234", "should_skip" => false },
additional_metadata: { "source" => "api" }
)
```
#### Task Run Trigger
#### Ruby
```python
simple.run(
additional_metadata={"source": "api"}, # Arbitrary key-value pair
)
```
#### Tab 2
```typescript
const withMetadata = simple.run(
{
Message: 'HeLlO WoRlD',
},
{
additionalMetadata: {
source: 'api', // Arbitrary key-value pair
},
}
);
```
#### Tab 3
```go
_, err = client.Run(
context.Background(),
"my-workflow",
Input{Message: "hello"},
hatchet.WithRunMetadata(
map[string]string{"version": "1.0.0"},
),
)
if err != nil {
log.Fatalf("failed to run workflow: %v", err)
}
```
#### Tab 4
```ruby
SIMPLE.run(
{},
options: Hatchet::TriggerWorkflowOptions.new(
additional_metadata: { "source" => "api" }
)
)
```
## Filtering in the Dashboard
Once you've attached additional metadata to events or task runs, this data will be available in the Event and Task Run list views in the Hatchet dashboard. You can use the filter input field to search for events or task runs based on the additional metadata key-value pairs you've attached.
For example, you can filter events by the `source` metadata keys to quickly find events originating from a specific source or environment.

## Use Cases
Some common use cases for additional metadata include:
- Tagging events or task runs with environment information (e.g., `production`, `staging`, `development`)
- Specifying the source or origin of events (e.g., `api`, `webhook`, `manual`)
- Categorizing events or task runs based on business-specific criteria (e.g., `priority`, `region`, `product`)
By leveraging additional metadata, you can enhance the organization, searchability, and discoverability of your events and task runs within Hatchet.
---
# Middleware & Dependency Injection
Middleware lets you run logic **before** and **after** every task on a client, without touching individual task definitions. Common uses include injecting request IDs, enriching inputs with shared data, encrypting/decrypting payloads, and normalizing or augmenting outputs.
This feature is experimental, and middleware hook signatures may change in
future releases.
#### Python
Hatchet's Python SDK uses FastAPI-style dependency injection to run logic
before tasks and inject the results as parameters. Dependencies are declared
as functions and wired into tasks with `Depends`.
#### Typescript
Middleware hooks are registered on the client with `withMiddleware` and are
fully type-safe — TypeScript sees the union of fields from the task input
type and any values returned by `before` hooks, and similarly for task
outputs and `after` hooks.
#### Go
> **Info:** Middleware support for the Go SDK is coming soon. Join our
> [Discord](https://hatchet.run/discord) to stay up to date.
#### Ruby
In Ruby, this pattern uses callable objects (lambdas/procs) passed as `deps`
when defining tasks. Dependencies are evaluated before each task run and
made available via `ctx.deps`.
## Defining Middleware
#### Python
Define your dependency functions — they receive the workflow input and context, and their return values are injected into the task as parameters.
```python
async def async_dep(input: EmptyModel, ctx: Context) -> str:
return ASYNC_DEPENDENCY_VALUE
def sync_dep(input: EmptyModel, ctx: Context) -> str:
return SYNC_DEPENDENCY_VALUE
@asynccontextmanager
async def async_cm_dep(
input: EmptyModel, ctx: Context, async_dep: Annotated[str, Depends(async_dep)]
) -> AsyncGenerator[str, None]:
try:
yield ASYNC_CM_DEPENDENCY_VALUE + "_" + async_dep
finally:
pass
@contextmanager
def sync_cm_dep(
input: EmptyModel, ctx: Context, sync_dep: Annotated[str, Depends(sync_dep)]
) -> Generator[str, None, None]:
try:
yield SYNC_CM_DEPENDENCY_VALUE + "_" + sync_dep
finally:
pass
@contextmanager
def base_cm_dep(input: EmptyModel, ctx: Context) -> Generator[str, None, None]:
try:
yield CHAINED_CM_VALUE
finally:
pass
def chained_dep(
input: EmptyModel, ctx: Context, base_cm: Annotated[str, Depends(base_cm_dep)]
) -> str:
return "chained_" + base_cm
@asynccontextmanager
async def base_async_cm_dep(
input: EmptyModel, ctx: Context
) -> AsyncGenerator[str, None]:
try:
yield CHAINED_ASYNC_CM_VALUE
finally:
pass
async def chained_async_dep(
input: EmptyModel,
ctx: Context,
base_async_cm: Annotated[str, Depends(base_async_cm_dep)],
) -> str:
return "chained_" + base_async_cm
```
#### Typescript
Create a client and attach middleware with `before` and `after` hooks.
- **`before(input, ctx)`** runs before the task. Its return value **replaces** the task input.
- **`after(output, ctx, input)`** runs after the task. Its return value **replaces** the task output.
```typescript
import { HatchetClient, HatchetMiddleware } from '@hatchet/v1';
export type GlobalInputType = {
first: number;
second: number;
};
export type GlobalOutputType = {
extra: number;
};
const myMiddleware = {
before: (input, ctx) => {
console.log('before', input.first);
return { ...input, dependency: 'abc-123' };
},
after: (output, ctx, input) => {
return { ...output, additionalData: 2 };
},
} satisfies HatchetMiddleware;
export const hatchetWithMiddleware = HatchetClient.init<
GlobalInputType,
GlobalOutputType
>().withMiddleware(myMiddleware);
```
**Spread the original value if you want to keep it.** The return value of each hook **replaces** the input (or output) entirely — it does not shallow-merge. If you omit `...input` in a `before` hook, the original fields are lost. The same applies to `...output` in an `after` hook.
```typescript
// ✅ Keeps original fields and adds `requestId`
before: (input) => ({ ...input, requestId: crypto.randomUUID() })
// ❌ Replaces input entirely — task only receives { requestId }
before: (input) => ({ requestId: crypto.randomUUID() })
```
### Chaining Middleware
You can chain multiple `.withMiddleware()` calls to run hooks in sequence. Each `before` hook receives the return value of the previous `before` hook (or the original input for the first hook), and each `after` hook receives the return value of the previous `after` hook.
```typescript
const firstMiddleware = {
before: (input, ctx) => {
console.log('before', input.first);
return { ...input, dependency: 'abc-123' };
},
after: (output, ctx, input) => {
return { ...output, firstExtra: 3 };
},
} satisfies HatchetMiddleware;
const secondMiddleware = {
before: (input, ctx) => {
console.log('before', input.dependency); // available from previous middleware
return { ...input, anotherDep: true };
},
after: (output, ctx, input) => {
return { ...output, secondExtra: 4 };
},
} satisfies HatchetMiddleware;
export const hatchetWithMiddlewareChaining = HatchetClient.init()
.withMiddleware(firstMiddleware)
.withMiddleware(secondMiddleware);
```
#### Go
> **Info:** Middleware support for the Go SDK is coming soon. Join our [Discord](https://hatchet.run/discord) to stay up to date.
#### Ruby
Define your dependencies as callable objects (lambdas). They receive the input, context, and optionally a hash of previously resolved dependencies for chaining.
```ruby
sync_dep = ->(_input, _ctx) { SYNC_DEPENDENCY_VALUE }
async_dep = ->(_input, _ctx) { ASYNC_DEPENDENCY_VALUE }
sync_cm_dep = lambda { |_input, _ctx, deps|
"#{SYNC_CM_DEPENDENCY_VALUE}_#{deps[:sync_dep]}"
}
async_cm_dep = lambda { |_input, _ctx, deps|
"#{ASYNC_CM_DEPENDENCY_VALUE}_#{deps[:async_dep]}"
}
chained_dep = ->(_input, _ctx, deps) { "chained_#{CHAINED_CM_VALUE}" }
chained_async_dep = ->(_input, _ctx, deps) { "chained_#{CHAINED_ASYNC_CM_VALUE}" }
```
## How Middleware Executes
#### Python
Dependencies are resolved before each task execution. Each dependency function receives the original workflow input and the task context, and its return value is injected as a named parameter to the task function.
#### Typescript
When a task runs, the worker applies middleware hooks in this order:
### Before hooks run in registration order
Each `before` hook receives the current input and the task `Context`. Its return value **replaces** the input for the next hook (or the task itself). Returning `undefined` (or `void`) skips replacement and passes the input through unchanged.
### The task function executes
The task receives the final input after all `before` hooks have run.
### After hooks run in registration order
Each `after` hook receives the current output, the task `Context`, and the final input. Its return value **replaces** the output for the next hook (or the final result). Returning `undefined` skips replacement.
Both `before` and `after` hooks can be **async** — return a `Promise` and it will be awaited before proceeding.
> **Info:** If a middleware hook throws an error, the task run fails with that error. There is no built-in error recovery within middleware — use try/catch inside your hooks if you need graceful fallback.
### The `ctx` Parameter
The second parameter of both `before` and `after` hooks is the task `Context` object. This gives middleware access to:
- `ctx.workflowRunId` — the ID of the current workflow run
- `ctx.stepRunId` — the ID of the current step run
- `ctx.log()` — emit structured logs visible in the Hatchet dashboard
- `ctx.cancel()` — cancel the current run from within middleware
### Global Types vs Middleware Types
There are two ways extra fields end up on a task's input:
Mechanism, Set via, Required at call site?, Available at runtime?
**Global input type**, `HatchetClient.init()`, Yes — callers must provide these fields, Yes
**Middleware before hook**, `.withMiddleware({ before })`, No — injected automatically by the worker, Yes
Global input types (`T` in `init()`) represent fields that **callers must supply** when triggering a task. This is useful when you know every task must always receive certain parameters — for example, a `userId` for authentication or a `tenantId` for multi-tenant routing. By declaring these as the global type, TypeScript enforces that every caller provides them.
Middleware `before` hooks, on the other hand, inject fields that are **computed at runtime** (e.g. request IDs, decrypted secrets, fetched config) and are **not** required from callers.
```typescript
type RequiredContext = { userId: string; orgId: string };
const client = HatchetClient.init()
.withMiddleware({
before: (input) => ({
...input,
resolvedAt: Date.now(), // injected, not required from caller
permissions: lookupPerms(input.userId), // derived from global type
}),
});
// Callers MUST provide userId and orgId — TypeScript enforces this
await myTask.run({ userId: 'usr_123', orgId: 'org_456', /* ...task fields */ });
```
#### Go
> **Info:** Middleware support for the Go SDK is coming soon. Join our [Discord](https://hatchet.run/discord) to stay up to date.
#### Ruby
Dependencies are resolved in the order they are declared in the `deps` hash. Each dependency function can optionally receive already-resolved dependencies as its third argument, enabling chaining.
## Using Middleware in Tasks
#### Python
Inject dependencies into your tasks using `Depends` and type annotations. The dependency results are passed directly as function parameters.
```python
@hatchet.task()
async def async_task_with_dependencies(
_i: EmptyModel,
ctx: Context,
async_dep: Annotated[str, Depends(async_dep)],
sync_dep: Annotated[str, Depends(sync_dep)],
async_cm_dep: Annotated[str, Depends(async_cm_dep)],
sync_cm_dep: Annotated[str, Depends(sync_cm_dep)],
chained_dep: Annotated[str, Depends(chained_dep)],
chained_async_dep: Annotated[str, Depends(chained_async_dep)],
) -> Output:
return Output(
sync_dep=sync_dep,
async_dep=async_dep,
async_cm_dep=async_cm_dep,
sync_cm_dep=sync_cm_dep,
chained_dep=chained_dep,
chained_async_dep=chained_async_dep,
)
```
Your dependency functions must take two positional arguments: the workflow input and the `Context` (the same as any other task).
#### Typescript
Tasks created from a middleware-enabled client automatically receive the merged input and output types. There is no extra configuration needed on the task itself.
```typescript
import { hatchetWithMiddleware } from './client';
type TaskInput = {
message: string;
};
type TaskOutput = {
message: string;
};
export const taskWithMiddleware = hatchetWithMiddleware.task({
name: 'task-with-middleware',
fn: (input, _ctx) => {
console.log('task', input.message); // string (from TaskInput)
console.log('task', input.first); // number (from GlobalInputType)
console.log('task', input.second); // number (from GlobalInputType)
console.log('task', input.dependency); // string (from Pre Middleware)
return {
message: input.message,
extra: 1,
};
},
});
// !!
```
The task's `input` type is the intersection of `TaskInput`, `GlobalInputType`, and the return type of the `before` middleware hook. The task's return type must satisfy `TaskOutput` and `GlobalOutputType`, while the caller receives the intersection of those with the `after` middleware return type.
#### Go
> **Info:** Middleware support for the Go SDK is coming soon. Join our [Discord](https://hatchet.run/discord) to stay up to date.
#### Ruby
Pass a `deps` hash when defining a task. The resolved dependency values are available inside the task block via `ctx.deps`.
```ruby
ASYNC_TASK_WITH_DEPS = HATCHET.task(
name: "async_task_with_dependencies",
deps: {
sync_dep: sync_dep,
async_dep: async_dep,
sync_cm_dep: sync_cm_dep,
async_cm_dep: async_cm_dep,
chained_dep: chained_dep,
chained_async_dep: chained_async_dep
}
) do |input, ctx|
{
"sync_dep" => ctx.deps[:sync_dep],
"async_dep" => ctx.deps[:async_dep],
"async_cm_dep" => ctx.deps[:async_cm_dep],
"sync_cm_dep" => ctx.deps[:sync_cm_dep],
"chained_dep" => ctx.deps[:chained_dep],
"chained_async_dep" => ctx.deps[:chained_async_dep]
}
end
SYNC_TASK_WITH_DEPS = HATCHET.task(
name: "sync_task_with_dependencies",
deps: {
sync_dep: sync_dep,
async_dep: async_dep,
sync_cm_dep: sync_cm_dep,
async_cm_dep: async_cm_dep,
chained_dep: chained_dep,
chained_async_dep: chained_async_dep
}
) do |input, ctx|
{
"sync_dep" => ctx.deps[:sync_dep],
"async_dep" => ctx.deps[:async_dep],
"async_cm_dep" => ctx.deps[:async_cm_dep],
"sync_cm_dep" => ctx.deps[:sync_cm_dep],
"chained_dep" => ctx.deps[:chained_dep],
"chained_async_dep" => ctx.deps[:chained_async_dep]
}
end
DURABLE_ASYNC_TASK_WITH_DEPS = HATCHET.durable_task(
name: "durable_async_task_with_dependencies",
deps: {
sync_dep: sync_dep,
async_dep: async_dep,
sync_cm_dep: sync_cm_dep,
async_cm_dep: async_cm_dep,
chained_dep: chained_dep,
chained_async_dep: chained_async_dep
}
) do |input, ctx|
{
"sync_dep" => ctx.deps[:sync_dep],
"async_dep" => ctx.deps[:async_dep],
"async_cm_dep" => ctx.deps[:async_cm_dep],
"sync_cm_dep" => ctx.deps[:sync_cm_dep],
"chained_dep" => ctx.deps[:chained_dep],
"chained_async_dep" => ctx.deps[:chained_async_dep]
}
end
DURABLE_SYNC_TASK_WITH_DEPS = HATCHET.durable_task(
name: "durable_sync_task_with_dependencies",
deps: {
sync_dep: sync_dep,
async_dep: async_dep,
sync_cm_dep: sync_cm_dep,
async_cm_dep: async_cm_dep,
chained_dep: chained_dep,
chained_async_dep: chained_async_dep
}
) do |input, ctx|
{
"sync_dep" => ctx.deps[:sync_dep],
"async_dep" => ctx.deps[:async_dep],
"async_cm_dep" => ctx.deps[:async_cm_dep],
"sync_cm_dep" => ctx.deps[:sync_cm_dep],
"chained_dep" => ctx.deps[:chained_dep],
"chained_async_dep" => ctx.deps[:chained_async_dep]
}
end
DI_WORKFLOW = HATCHET.workflow(name: "dependency-injection-workflow")
# Workflow tasks with dependencies follow the same pattern
DI_WORKFLOW.task(:wf_task_with_dependencies) do |input, ctx|
{
"sync_dep" => SYNC_DEPENDENCY_VALUE,
"async_dep" => ASYNC_DEPENDENCY_VALUE
}
end
```
## Running a Worker
#### Python
No special worker configuration is needed — dependencies are evaluated automatically each time a task runs.
#### Typescript
Workers are created from the same middleware-enabled client. No special setup is required — the middleware hooks are applied automatically when tasks execute.
```typescript
import { hatchetWithMiddleware } from './client';
import { taskWithMiddleware } from './workflow';
async function main() {
const worker = await hatchetWithMiddleware.worker('task-with-middleware', {
workflows: [taskWithMiddleware],
});
await worker.start();
}
if (require.main === module) {
main();
}
```
#### Go
> **Info:** Middleware support for the Go SDK is coming soon. Join our [Discord](https://hatchet.run/discord) to stay up to date.
#### Ruby
No special worker configuration is needed — dependencies are resolved automatically before each task execution.
## Practical Examples
The examples below show TypeScript middleware for common production patterns. Each can be adapted to the Python dependency injection model by extracting the same logic into a dependency function.
### End-to-End Encryption
Encrypt sensitive input fields before they reach the Hatchet server, and decrypt the output on the way back. This ensures plaintext data never leaves your worker process.
```typescript
import { HatchetClient, HatchetMiddleware } from '@hatchet/v1';
import { randomUUID, createCipheriv, createDecipheriv, randomBytes } from 'crypto';
```
> **Info:** The `before` hook decrypts incoming data so your task function works with
> plaintext. The `after` hook encrypts the output before it is stored. The
> encryption key never leaves the worker environment.
### Offloading Large Payloads to S3
When task inputs or outputs exceed Hatchet's payload size limit (or you simply want to keep large blobs out of the control plane), upload them to S3 and pass a signed URL instead.
```typescript
import { S3Client, PutObjectCommand, GetObjectCommand } from '@aws-sdk/client-s3';
import { getSignedUrl } from '@aws-sdk/s3-request-presigner';
const ALGORITHM = 'aes-256-gcm';
const KEY = Buffer.from(process.env.ENCRYPTION_KEY!, 'hex');
type EncryptedEnvelope = { ciphertext: string; iv: string; tag: string };
function encrypt(plaintext: string): EncryptedEnvelope {
const iv = randomBytes(16);
const cipher = createCipheriv(ALGORITHM, KEY, iv);
const encrypted = Buffer.concat([cipher.update(plaintext, 'utf8'), cipher.final()]);
return {
ciphertext: encrypted.toString('base64'),
iv: iv.toString('base64'),
tag: cipher.getAuthTag().toString('base64'),
};
}
function decrypt(ciphertext: string, iv: string, tag: string): string {
const decipher = createDecipheriv(ALGORITHM, KEY, Buffer.from(iv, 'base64'));
decipher.setAuthTag(Buffer.from(tag, 'base64'));
return decipher.update(ciphertext, 'base64', 'utf8') + decipher.final('utf8');
}
type EncryptedInput = { encrypted?: EncryptedEnvelope };
const e2eEncryption: HatchetMiddleware = {
before: (input) => {
if (!input.encrypted) {
return input;
}
const { ciphertext, iv, tag } = input.encrypted;
const decrypted = JSON.parse(decrypt(ciphertext, iv, tag));
return { ...input, ...decrypted, encrypted: undefined };
},
after: (output) => {
const payload = JSON.stringify(output);
return { encrypted: encrypt(payload) };
},
};
const encryptionClient = HatchetClient.init().withMiddleware(e2eEncryption);
const s3 = new S3Client({ region: process.env.AWS_REGION });
const BUCKET = process.env.S3_BUCKET!;
const PAYLOAD_THRESHOLD = 256 * 1024; // 256 KB
async function uploadToS3(data: unknown): Promise {
const key = `hatchet-payloads/${randomUUID()}.json`;
await s3.send(
new PutObjectCommand({
Bucket: BUCKET,
Key: key,
Body: JSON.stringify(data),
ContentType: 'application/json',
})
);
return getSignedUrl(s3, new GetObjectCommand({ Bucket: BUCKET, Key: key }), {
expiresIn: 3600,
});
}
async function downloadFromS3(url: string): Promise {
const res = await fetch(url);
return res.json();
}
type S3Input = { s3Url?: string };
const s3Offload: HatchetMiddleware = {
before: async (input) => {
if (input.s3Url) {
const restored = (await downloadFromS3(input.s3Url)) as Record;
return { ...restored, s3Url: undefined };
}
return input;
},
after: async (output) => {
const serialized = JSON.stringify(output);
if (serialized.length > PAYLOAD_THRESHOLD) {
const url = await uploadToS3(output);
return { s3Url: url };
}
return output;
},
};
const s3Client = HatchetClient.init().withMiddleware(s3Offload);
```
The caller is responsible for uploading oversized inputs to S3 before triggering the task. The `before` hook only handles the download side. You can use the same `uploadToS3` helper on the caller side to upload the input and pass `{ __s3Url: url }` as the task input.
## FAQ
### What is Hatchet middleware and how does it differ from Express middleware?
Hatchet middleware runs **inside the worker process** around each task invocation — not on an HTTP request path. A `before` hook transforms input before the task runs, and an `after` hook transforms output after. Unlike Express middleware, there is no `next()` function; hooks return their result directly and the runner chains them automatically.
### Can I use middleware with both tasks and workflows?
Yes. Middleware is registered on the `HatchetClient` instance, so it applies to every task created from that client — whether the task is a standalone `client.task()` or part of a multi-step `client.workflow()`. Each step in a workflow will have middleware applied independently.
### Does middleware run on the server or on the worker?
Middleware runs entirely **on the worker**. The Hatchet server never sees or executes your middleware code. This is what makes patterns like end-to-end encryption possible — plaintext data stays within your infrastructure.
### What happens if my middleware throws an error?
If a `before` or `after` hook throws (or returns a rejected `Promise`), the task run fails with that error. There is no automatic retry of middleware itself, but the task's configured retry policy will still apply, re-running the task (and its middleware) from scratch.
### Can I use async/await in middleware hooks?
Yes. Both `before` and `after` hooks can be synchronous or asynchronous. If a hook returns a `Promise`, the worker will `await` it before proceeding to the next hook or the task function.
### How do I share state between `before` and `after` hooks?
The `after` hook receives the task input (after `before` hooks have run) as its third argument. Add fields in `before` (e.g. `startedAt`, `traceId`) and read them from `input` in `after`. There is no separate shared context object — the input itself is the carrier.
### Does middleware apply to child tasks spawned via fanout?
Middleware is scoped to the **client instance**. If a child task is defined on the same middleware-enabled client, its middleware will run when that child task executes. If the child task uses a different client instance, only that client's middleware (if any) applies.
### Can I selectively skip middleware for certain tasks?
Middleware applies to **all** tasks on a given client. To skip middleware for specific tasks, create a second client without middleware and define those tasks on it. This is a deliberate design choice — middleware is a cross-cutting concern, and selective opt-out is handled at the client boundary.
### Is there a performance overhead to using middleware?
Middleware hooks are plain JavaScript functions that run in-process on the worker. The overhead is the execution time of your hook code. For lightweight operations (adding a field, logging), the overhead is negligible. For heavier operations (network calls like S3 uploads or decryption), the task's total duration will include that time, so keep hooks as efficient as possible.
### What is the difference between global types and middleware types in TypeScript?
Global types (`HatchetClient.init()`) define fields that **callers must provide** when triggering a task. Middleware types (inferred from `withMiddleware` return values) define fields that are **injected at runtime** by the worker. Both end up on the task's `input` type, but only global types appear in the caller-facing `run()` signature.
### Can I use middleware for rate limiting or authentication?
Yes. A `before` hook can check rate limits, validate API keys, or verify JWTs before the task runs. If the check fails, throw an error to abort the task. However, for rate limiting specifically, consider using Hatchet's built-in [rate limiting](/home/rate-limits) feature, which operates at the scheduling layer and is more efficient than in-worker checks.
### How do I test middleware in isolation?
Middleware hooks are plain functions — you can unit-test them directly by calling them with mock input and a mock context object. For integration tests, the e2e test pattern of creating a client, attaching middleware, defining a task, starting a worker, and asserting on the result works well. See the [middleware example on GitHub](https://github.com/hatchet-dev/hatchet/tree/main/examples/typescript/middleware) for a complete test setup.
---
# Streaming in Hatchet
Hatchet tasks can stream data back to a consumer in real-time. This has a number of valuable uses, such as streaming the results of an LLM call back from a Hatchet worker to a frontend or sending progress updates as a task chugs along.
## Publishing Stream Events
You can stream data out of a task run by using the `put_stream` (or equivalent) method on the `Context`.
#### Python
```python
anna_karenina = """
Happy families are all alike; every unhappy family is unhappy in its own way.
Everything was in confusion in the Oblonskys' house. The wife had discovered that the husband was carrying on an intrigue with a French girl, who had been a governess in their family, and she had announced to her husband that she could not go on living in the same house with him.
"""
def create_chunks(content: str, n: int) -> Generator[str, None, None]:
for i in range(0, len(content), n):
yield content[i : i + n]
chunks = list(create_chunks(anna_karenina, 10))
@hatchet.task()
async def stream_task(input: EmptyModel, ctx: Context) -> None:
# 👀 Sleeping to avoid race conditions
await asyncio.sleep(2)
for chunk in chunks:
await ctx.aio_put_stream(chunk)
await asyncio.sleep(0.20)
```
#### Typescript
```typescript
const annaKarenina = `
Happy families are all alike; every unhappy family is unhappy in its own way.
Everything was in confusion in the Oblonskys' house. The wife had discovered that the husband was carrying on an intrigue with a French girl, who had been a governess in their family, and she had announced to her husband that she could not go on living in the same house with him.
`;
function* createChunks(content: string, n: number): Generator {
for (let i = 0; i < content.length; i += n) {
yield content.slice(i, i + n);
}
}
export const streamingTask = hatchet.task({
name: 'stream-example',
fn: async (_, ctx) => {
await sleep(2000);
for (const chunk of createChunks(annaKarenina, 10)) {
ctx.putStream(chunk);
await sleep(200);
}
},
});
```
#### Go
```go
const annaKarenina = `
Happy families are all alike; every unhappy family is unhappy in its own way.
Everything was in confusion in the Oblonskys' house. The wife had discovered that the husband was carrying on an intrigue with a French girl, who had been a governess in their family, and she had announced to her husband that she could not go on living in the same house with him.
`
func createChunks(content string, n int) []string {
var chunks []string
for i := 0; i < len(content); i += n {
end := i + n
if end > len(content) {
end = len(content)
}
chunks = append(chunks, content[i:end])
}
return chunks
}
func StreamTask(ctx hatchet.Context, input StreamTaskInput) (*StreamTaskOutput, error) {
time.Sleep(2 * time.Second)
chunks := createChunks(annaKarenina, 10)
for _, chunk := range chunks {
ctx.PutStream(chunk)
time.Sleep(200 * time.Millisecond)
}
return &StreamTaskOutput{
Message: "Streaming completed",
}, nil
}
```
#### Ruby
```ruby
ANNA_KARENINA = <<~TEXT
Happy families are all alike; every unhappy family is unhappy in its own way.
Everything was in confusion in the Oblonskys' house. The wife had discovered that the husband was carrying on an intrigue with a French girl, who had been a governess in their family, and she had announced to her husband that she could not go on living in the same house with him.
TEXT
STREAM_CHUNKS = ANNA_KARENINA.scan(/.{1,10}/)
STREAM_TASK = HATCHET.task(name: "stream_task") do |input, ctx|
# Sleeping to avoid race conditions
sleep 2
STREAM_CHUNKS.each do |chunk|
ctx.put_stream(chunk)
sleep 0.20
end
end
```
This task will stream small chunks of content through Hatchet, which can then be consumed elsewhere. Here we use some text as an example, but this is intended to replicate streaming the results of an LLM call back to a consumer.
## Consuming Streams
You can easily consume stream events by using the stream method on the workflow run ref that the various [fire-and-forget](/v1/running-your-task#fire-and-forget) methods return.
#### Python
```python
ref = await stream_task.aio_run(wait_for_result=False)
async for chunk in hatchet.runs.subscribe_to_stream(ref.workflow_run_id):
print(chunk, flush=True, end="")
```
#### Typescript
```typescript
const ref = await streamingTask.runNoWait({});
const id = await ref.getWorkflowRunId();
for await (const content of hatchet.runs.subscribeToStream(id)) {
process.stdout.write(content);
}
```
#### Go
```go
func main() {
client, err := hatchet.NewClient()
if err != nil {
log.Fatalf("Failed to create Hatchet client: %v", err)
}
ctx := context.Background()
streamingWorkflow := shared.StreamingWorkflow(client)
workflowRun, err := streamingWorkflow.RunNoWait(ctx, shared.StreamTaskInput{})
if err != nil {
log.Fatalf("Failed to run workflow: %v", err)
}
id := workflowRun.RunId
stream := client.Runs().SubscribeToStream(ctx, id)
for content := range stream {
fmt.Print(content)
}
fmt.Println("\nStreaming completed!")
}
```
#### Ruby
```ruby
ref = STREAM_TASK.run_no_wait
HATCHET.runs.subscribe_to_stream(ref.workflow_run_id) do |chunk|
print chunk
end
```
In the examples above, this will result in the famous text below being gradually printed to the console, bit by bit.
```
Happy families are all alike; every unhappy family is unhappy in its own way.
Everything was in confusion in the Oblonskys' house. The wife had discovered that the husband was carrying on an intrigue with a French girl, who had been a governess in their family, and she had announced to her husband that she could not go on living in the same house with him.
```
You must begin consuming the stream before any events are published. Any
events published before a consumer is initialized will be dropped. In
practice, this will not be an issue in most cases, but adding a short sleep
before beginning streaming results back can help.
## Streaming to a Web Application
It's common to want to stream events out of a Hatchet task and back to the frontend of your application, for consumption by an end user. As mentioned before, some clear cases where this is useful would be for streaming back progress of some long-running task for a customer to monitor, or streaming back the results of an LLM call.
In both cases, we recommend using your application's backend as a proxy for the stream, where you would subscribe to the stream of events from Hatchet, and then stream events through to the frontend as they're received by the backend.
#### Python
For example, with FastAPI, you'd do the following:
```python
hatchet = Hatchet()
app = FastAPI()
@app.get("/stream")
async def stream() -> StreamingResponse:
ref = await stream_task.aio_run(wait_for_result=False)
return StreamingResponse(
hatchet.runs.subscribe_to_stream(ref.workflow_run_id), media_type="text/plain"
)
```
#### Typescript
For example, with NextJS backend-as-frontend, you'd do the following:
```typescript
export async function GET(): Promise {
try {
const ref = await streamingTask.runNoWait({});
const workflowRunId = await ref.getWorkflowRunId();
const stream = Readable.from(hatchet.runs.subscribeToStream(workflowRunId));
return new Response(Readable.toWeb(stream), {
headers: {
'Content-Type': 'text/plain',
'Cache-Control': 'no-cache',
Connection: 'keep-alive',
},
});
} catch (error) {
return new Response('Internal Server Error', { status: 500 });
}
}
```
#### Go
For example, with Go's built-in HTTP server, you'd do the following:
```go
func main() {
client, err := hatchet.NewClient()
if err != nil {
log.Fatalf("Failed to create Hatchet client: %v", err)
}
streamingWorkflow := shared.StreamingWorkflow(client)
http.HandleFunc("/stream", func(w http.ResponseWriter, r *http.Request) {
ctx := context.Background()
w.Header().Set("Content-Type", "text/plain")
w.Header().Set("Cache-Control", "no-cache")
w.Header().Set("Connection", "keep-alive")
workflowRun, err := streamingWorkflow.RunNoWait(ctx, shared.StreamTaskInput{})
if err != nil {
http.Error(w, err.Error(), http.StatusInternalServerError)
return
}
stream := client.Runs().SubscribeToStream(ctx, workflowRun.RunId)
flusher, _ := w.(http.Flusher)
for content := range stream {
fmt.Fprint(w, content)
if flusher != nil {
flusher.Flush()
}
}
})
server := &http.Server{
Addr: ":8000",
ReadTimeout: 5 * time.Second,
WriteTimeout: 10 * time.Second,
}
if err := server.ListenAndServe(); err != nil {
log.Println("Failed to start server:", err)
}
}
```
#### Ruby
Then, assuming you run the server on port `8000`, running `curl -N http://localhost:8000/stream` would result in the text streaming back to your console from Hatchet through your FastAPI proxy.
---
# Managing Environments with Hatchet
## Multiple Developers, One Orchestrator
When multiple developers share a single Hatchet orchestrator, conflicts can arise as workflow runs and events intermingle. Without proper isolation, one developer's workflows might interfere with another's testing or development work.
Hatchet provides three key solutions for managing this challenge: namespaces, multi-tenancy, and a local Hatchet instance.
### Solution 1: Multi-Tenancy
The easiest way to isolate environments for different developers or teams is to use Hatchet's multi-tenancy feature. Each tenant represents a separate environment with its own set of workflows and workers. To add a new tenant for each developer, create an organization and follow these steps:
1. Access the organization dropdown in the dashboard (top right)
2. Select the `+` icon next to your organization's name
3. Generate a new token for that tenant
4. Each developer configures their environment with their designated tenant token
### Solution 2: Local Hatchet Instance
If you are using Hatchet locally, you can create a local instance of Hatchet to manage your isolated local development environment. Follow instructions [here](/self-hosting/hatchet-lite) to get started.
---
# Troubleshooting Hatchet Workers
This guide covers common issues when deploying and operating Hatchet workers.
## Quick debugging checklist
Before diving into specific issues, run through these checks:
1. **Verify your API token** — make sure `HATCHET_CLIENT_TOKEN` matches the token generated in the Hatchet dashboard for your tenant.
2. **Check worker logs** — look for connection errors, heartbeat failures, or crash traces in your worker output.
3. **Check the dashboard** — navigate to the Workers tab to see if your worker is registered and healthy.
4. **Confirm network connectivity** — workers need to reach the Hatchet engine over gRPC. Firewalls, VPNs, or missing TLS configuration can block this.
5. **Check SDK version** — ensure your SDK version is compatible with your engine version. Mismatches can cause subtle failures.
## Could not send task to worker
If you see this error in the event history of a task, it could mean several things:
1. The worker is closing its network connection while the task is being sent. This could be caused by the worker crashing or going offline.
2. The payload is too large for the worker to accept or the Hatchet engine to send. The default maximum payload size is 4MB. Consider reducing the size of the input data or output data of your tasks.
3. The worker has a large backlog of tasks in-flight on the network connection and is rejecting new tasks. This can occur if workers are geographically distant from the Hatchet engine or if there are network issues causing delays. Hatchet Cloud runs by default in `us-west-2` (Oregon, USA), so consider deploying your workers in a region close to that for the best performance.
If you are self-hosting, you can increase the maximum backlog size via the `SERVER_GRPC_WORKER_STREAM_MAX_BACKLOG_SIZE` environment variable in your Hatchet engine configuration. The default is 20.
## No workers visible in dashboard
If you have deployed workers but they are not visible in the Hatchet dashboard, it is likely that:
1. Your API token is invalid or incorrect. Ensure that the token you are using to start the worker matches the token generated in the Hatchet dashboard for your tenant.
2. Worker heartbeats are not reaching the Hatchet engine. You will see noisy logs in the worker output if this is the case.
## Tasks stuck in QUEUED state
If tasks remain in the `QUEUED` state and never move to `RUNNING`:
1. **No workers registered for the task** — check the Workers tab in the dashboard and confirm a worker is registered that handles the task name. If you recently renamed a task, make sure the worker has been restarted with the updated code.
2. **All worker slots are full** — if every slot is occupied by other tasks, new tasks will wait in the queue. Check worker utilization in the dashboard or increase the [slot count](/v1/workers#slots).
3. **Concurrency or rate limit is blocking** — if you've configured [concurrency limits](/v1/concurrency) or [rate limits](/v1/rate-limits), tasks may be held back intentionally. Review your configuration.
## Worker keeps disconnecting
If your worker repeatedly connects and then drops:
1. **Resource exhaustion** — the worker process may be running out of memory or CPU and getting killed by the OS or orchestrator (OOM kill). Check system logs and increase resource limits.
2. **Network instability** — intermittent connectivity between the worker and the Hatchet engine will cause reconnection cycles. Check for packet loss or high latency between the worker and the engine.
3. **Graceful shutdown not configured** — if your deployment platform sends `SIGTERM` and the worker doesn't handle it, in-flight tasks may be interrupted. Ensure your worker handles shutdown signals and gives tasks time to complete.
## Phantom workers active in dashboard
This is often due to workers still running in your deployed environment. We see this most often with very long termination periods for workers, or in local development environments where worker processes are leaking. If you are in a local development environment, you can usually view running Hatchet worker processes via `ps -a | grep worker` (or whatever your entrypoint binary is called) and kill them manually.
---
# Architecture & Guarantees
This page explains how Hatchet is put together, what the main components do, and what reliability guarantees you should design your workers around.
## Architecture overview
Hatchet has three main moving pieces:
- **API server**: the HTTP surface area for triggering workflows, querying state, and powering the UI
- **Engine**: schedules and dispatches work, enforces dependencies/policies, and records state transitions durably
- **Workers**: your processes that run the actual task code
State is stored durably (PostgreSQL is the source of truth). In many deployments that’s enough—no separate broker required—while self-hosted high-throughput setups can add additional components (for example, RabbitMQ) based on your needs.
Hatchet Cloud and self-hosted Hatchet share the same architecture; the difference is who runs and operates the Hatchet services.
```mermaid
graph LR
subgraph "External (Optional)"
EXT[Webhooks Events]
end
subgraph "Your Infrastructure"
APP[Your API, App, Service, etc.]
W[Workers]
end
subgraph "Hatchet"
API[API Server]
ENG[Engine]
DB[(Database)]
end
EXT --> API
APP <--> API
API --> ENG
ENG <--> DB
API <--> DB
ENG <-.->|gRPC| W
classDef userInfra fill:#e3f2fd,stroke:#1976d2,stroke-width:2px,color:#0d47a1
classDef hatchet fill:#f1f8e9,stroke:#388e3c,stroke-width:2px,color:#1b5e20
classDef external fill:#fff8e1,stroke:#f57c00,stroke-width:2px,color:#e65100
class APP,W userInfra
class API,ENG,DB hatchet
class EXT external
```
## Core components
### API server
The API server is the front door to Hatchet. It’s what your application and the Hatchet UI talk to in order to:
- trigger workflows with input data
- query workflow/task state (and, where supported, subscribe to updates)
- manage resources like schedules and settings
- ingest webhooks/events (optional)
### Engine
The engine is responsible for turning “a workflow should run” into “these tasks are ready and should be executed.” In practice, it:
- evaluates workflow dependencies
- enforces policies like concurrency limits, rate limits, and priorities
- schedules ready tasks and dispatches them to connected workers
- records state transitions durably and applies retry/timeout behavior
- runs scheduled/cron workflows
Workers connect to the engine over bidirectional gRPC, which allows low-latency dispatch and frequent status updates.
### Workers
Workers are your processes. They connect to the engine, receive tasks, run your code, and report progress/results back to Hatchet.
Workers are intentionally flexible: you can run them locally, in containers, or on VMs, and you can scale workers independently from the Hatchet services. You can also run different “types” of workers (and even different languages) depending on what your system needs.
### Storage (and optional messaging)
PostgreSQL is the durable store for workflow definitions and execution state (queued/running/completed, inputs/outputs, retries, etc.). In self-hosted deployments, you can start with PostgreSQL-only and add components like RabbitMQ if you need higher throughput.
## Guarantees & tradeoffs
Hatchet aims to sit in the middle: more structure than a simple queue, but simpler to operate than a full distributed workflow system.
### Good fit for
- **Workflow orchestration** with dependencies, retries, and timeouts
- **Durable background jobs** where “don’t lose work” matters
- **Moderate to high throughput** systems (and a path to higher scale with tuning/sharding). If you’re pushing the limits, [contact us](https://hatchet.run/contact).
- **Multi-language / polyglot workers**
- **Teams already on PostgreSQL** who want operational simplicity
- **Cloud or air-gapped environments** ([Hatchet Cloud](https://cloud.onhatchet.run) or [self-hosting](/self-hosting))
### Not a good fit for
- **Extremely high throughput** without sharding/custom tuning (for example, sustaining 10,000+ tasks/sec)
- **Sub-millisecond dispatch latency** requirements
- **In-memory-only queuing** where durability is unnecessary
- **Serverless-only runtimes** (e.g. AWS Lambda / Cloud Functions) as your primary worker model
## Core reliability guarantees
### At-least-once task execution
Hatchet is **at least once**: tasks are not silently dropped, and failures retry according to your configuration. This also means **a task can run more than once**, so your task code should be **idempotent** (or otherwise safe to retry).
### Durable state transitions
Workflow state is persisted in PostgreSQL, and state transitions are performed transactionally. This helps keep dependency resolution consistent and makes the system resilient to restarts and transient failures.
### Execution policies are explicit
By default, task assignment is FIFO, and you can change execution behavior using:
- [Concurrency policies](/v1/concurrency)
- [Rate limits](/v1/rate-limits)
- [Priorities](/v1/priority)
### Stateless services; resilient workers
The engine and API server are designed to restart without losing state, which also enables horizontal scaling by running multiple instances. Workers reconnect after network interruptions and can run close to your services (or close to Hatchet) depending on your latency goals.
## Performance expectations
Real-world performance depends heavily on topology (worker ↔ engine network latency), database sizing, and workload shape.
- **Dispatch latency**: often sub-50ms with PostgreSQL-backed storage; in optimized, “hot worker” setups it can be closer to ~25ms P95.
- **Throughput**: varies by setup. PostgreSQL-only deployments often handle hundreds of tasks/sec per engine instance; higher throughput typically requires additional tuning and/or components like RabbitMQ. With tuning and sharding, Hatchet can scale into the high tens of thousands of tasks/sec—[contact us](https://hatchet.run/contact) if you want to design for that.
- **Common bottlenecks**: DB connection limits, large payloads (e.g. > 1MB), complex dependency graphs, and cross-region latency.
> **Warning:** **Not seeing expected performance?**
>
> If you're not seeing the performance you expect, please [reach out to us](https://hatchet.run/office-hours) or [join our community](https://hatchet.run/discord) to explore tuning options.
## Next Steps
- **[Quick Start](/v1/quickstart)**: set up your first Hatchet worker
- **[Self-Hosting](/self-hosting)**: deploy Hatchet on your own infrastructure
---
# Cloud vs OSS
Hatchet is available as **Hatchet Cloud** (managed) and as **open source** (self-hosted). The programming model is the same: you write tasks/workflows in code and run workers that connect to Hatchet.
This page helps you decide which deployment model fits your team.
## Quick decision guide
Choose **Hatchet Cloud** if you want:
- the Hatchet control plane operated for you (upgrades, scaling, backups)
- the fastest path to production
- a status page and managed incident response
Choose **self-hosted (OSS)** if you need:
- full control over infrastructure and networking
- strict data residency or air-gapped environments
- a deployment you can customize and operate with your own tooling
## What’s the same in both
- **SDK + worker model**: your workers run your code and connect to Hatchet
- **Durability + retries**: tasks are durably tracked and retry according to configuration
- **Observability surfaces**: you can inspect runs, workers, and workflow history
- **Core semantics**: the same workflows/tasks/concurrency patterns apply
## What changes (operational responsibilities)
### Hatchet Cloud (managed)
Hatchet runs and operates the Hatchet services. You bring:
- your worker processes
- your application code that triggers workflows
- your operational policies (timeouts, retries, concurrency, rate limits)
For security and compliance documentation, see the **[Hatchet Trust Center](https://trust.hatchet.run/)**. For current incidents and historical uptime, see **[status.hatchet.run](https://status.hatchet.run/)**.
### Self-hosted (OSS)
You run and operate the Hatchet services and their dependencies. Typical responsibilities include:
- provisioning and scaling the Hatchet services
- managing PostgreSQL (and any optional components you deploy)
- backups, upgrades, and monitoring
- network security and access control for the API/DB
If you’re planning production usage, start with:
- [Self Hosting](/self-hosting)
- [High Availability](/self-hosting/high-availability)
- [Security](/v1/security)
## Migrating between Cloud and self-hosted
You can move between deployment models without rewriting worker code. In practice, migration usually means:
- pointing workers and clients at a new endpoint
- swapping credentials/tokens
- validating environment-specific settings (TLS, networking, retention, etc.)
## Next steps
- **[Quickstart](/v1/quickstart)**: run a worker and trigger your first workflow
- **[Architecture & Guarantees](/v1/architecture-and-guarantees)**: understand reliability semantics and tradeoffs
- **[Self Hosting](/self-hosting)**: deploy Hatchet on your own infrastructure
---
# Security
This page points you to Hatchet's security resources and highlights the most important security considerations for Hatchet Cloud and self-hosted deployments.
## Trust center
Hatchet is SOC 2 Type II, HIPAA, and GDPR compliant. Company-level security practices, compliance reports, and security documentation are available at the **[Hatchet Trust Center](https://trust.hatchet.run/)**.
## Same source, same security
Hatchet Cloud and self-hosted Hatchet run the same codebase. The open source project is 100% MIT licensed and undergoes regular third-party penetration testing. Findings are remediated across both deployment models, so security improvements benefit all users equally.
## Hatchet Cloud
Hatchet Cloud is Hatchet's managed service:
- **Encryption in transit**: all API and worker traffic is encrypted with TLS. gRPC connections between workers and the engine use TLS by default.
- **Encryption at rest**: data stored in Hatchet Cloud is encrypted at rest.
- **Tenant isolation**: each tenant's data is logically isolated. Requests are authenticated and scoped to a single tenant.
- **Authentication**: API tokens are scoped per-tenant with configurable expiration. The dashboard supports SSO via Google, GitHub, and more coming soon.
- **Penetration testing**: Hatchet Cloud is regularly tested by independent security firms. Findings are tracked and remediated on a defined timeline.
- **Infrastructure**: Hatchet Cloud runs on AWS with private networking, automated patching, and centralized logging.
For the definitive controls, policies, and compliance reports, refer to the **[Hatchet Trust Center](https://trust.hatchet.run/)**.
## Self-hosted
When you self-host Hatchet, your security posture depends on how you deploy and operate the Hatchet services and their dependencies. A practical baseline:
- **Put TLS in front of the API**: terminate TLS at your ingress/load balancer (or directly on the API) and only expose it to the networks that need it.
- **Treat tokens and DB credentials as secrets**: use a secrets manager and rotate credentials; avoid committing secrets into git or baking them into images.
- **Limit network reachability**: restrict access to the Hatchet API and PostgreSQL to trusted networks (VPC, private subnets, or Kubernetes network policies).
- **Use least privilege**: run Hatchet with the minimum DB permissions needed; don't reuse "admin" DB credentials.
- **Stay current**: keep Hatchet and dependencies up to date to pick up security fixes.
See [Self Hosting](/self-hosting) for deployment and configuration guidance, or [contact us](https://hatchet.run/contact) for help.
---
# Audit Logs
Hatchet records audit logs for key actions performed across your organization, giving you visibility into who did what, when, and from where.
> **Info:** Audit logs are available on **Business** plans and above. If you're on the
> open-source edition and need audit logs, [contact
> us](mailto:support@hatchet.run) to learn more about upgrading.
## What gets logged
Every audit log entry captures the following:
Field, Description
**Actor**, The user or API key that performed the action
**Action**, The operation performed (e.g. `ApiTokenCreate`, `TenantMemberDelete`)
**Resource type**, The type of resource acted upon (e.g. `workflow-run`, `api-token`)
**Resource ID**, The specific resource that was affected
**IP address**, The IP address of the actor (HTTP requests only)
**User agent**, The user agent string of the request (HTTP requests only)
**Timestamp**, When the action occurred
**Correlation ID**, An optional ID for grouping related actions together (gRPC requests)
## Audited actions
The following actions are currently recorded as audit log entries:
Action, Resource Type, Description
`TenantInviteAccept`, `tenant-invite`, A user accepts a tenant invitation
`TenantMemberDelete`, `tenant-member`, A tenant member is removed
`ApiTokenCreate`, `api-token`, A new API token is created
`ApiTokenUpdateRevoke`, `api-token`, An API token is revoked
`V1WorkflowRunCreate`, `workflow-run`, A workflow run is triggered via the API
`ScheduledWorkflowRunCreate`, `scheduled-workflow`, A scheduled workflow run is created
## Actor types
Audit log entries distinguish between two types of actors:
- **User** — actions performed by a logged-in user through the dashboard or API. These entries include the actor's IP address and user agent.
- **API key** — actions performed programmatically via an API key (e.g. triggering workflow runs over gRPC). These entries may include a correlation ID for grouping related actions.
## Retention
Audit log entries are retained for **30 days**. Entries older than 30 days are automatically removed.
## Viewing audit logs
Organization admins can view audit logs in the Hatchet dashboard under the organization settings. Logs can be filtered by tenant and time range.
## API access
Audit logs can also be retrieved programmatically via the Management API:
```
GET /api/v1/management/organizations/{organization}/audit-logs
```
Query parameters:
Parameter, Type, Default, Description
`tenant`, UUID, all active tenants in the organization, Filter logs to a specific tenant
`limit`, integer, `1000`, Maximum number of results to return
`offset`, integer, `0`, Number of results to skip
`since`, ISO 8601, 24 hours ago, Start of the time range
`until`, ISO 8601, now, End of the time range
Results are ordered by timestamp descending (most recent first).
---
# Region availability
Hatchet Cloud is available in multiple regions so you can run workloads close to your users and data.
## Current regions
**Hatchet Cloud** ([cloud.onhatchet.run](https://cloud.onhatchet.run)) is currently deployed in **us-west-2** (Oregon).
We are expanding availability. Planned or available regions include:
Region, Location, Status
us-west-2, Oregon (US), **Live**
us-east-1, N. Virginia (US), Private Beta
eu-west-1, Ireland, Private Beta
ap-southeast-2, Sydney, Private Beta
## Request a region
We are always open to rolling out new regions based on demand. If you need a specific region for latency or compliance, [contact us](https://hatchet.run/contact) and we can discuss availability.
---
# Uptime and status
For Hatchet Cloud availability and incident updates, use the status page. For self-hosted deployments, availability depends on your own infrastructure.
## Hatchet Cloud status page
Use **[status.hatchet.run](https://status.hatchet.run/)** for real-time status and incident history for Hatchet Cloud and related services:
- **API**: Hatchet API availability
- **Hatchet Cloud**: `cloud.onhatchet.run`
- **Website**: `hatchet.run` and documentation sites
You can also subscribe to updates (email/SMS/etc.) directly from the status page.
## Self-hosted deployments
If you self-host Hatchet, you’re responsible for uptime, monitoring, backups, and upgrade procedures.
- **Deployment guidance**: [Self Hosting](/self-hosting)
- **Redundancy & failover**: [High Availability](/self-hosting/high-availability)
---
# Developer experience
Hatchet is designed to be practical day-to-day: write workflows in code, run workers locally with a tight feedback loop, and debug production runs with good visibility.
## Workflows as code
You define tasks and workflows in your application code, then trigger them with input data. Hatchet handles the operational pieces you’d otherwise build yourself:
- **Durability** (work isn’t lost on restarts)
- **Retries/timeouts**
- **Concurrency and rate limiting**
- **Visibility into what ran, where, and why**
## Dashboard (UI)
The dashboard is where you go to understand “what is happening right now?”:
- **Runs**: status, inputs/outputs, and execution history
- **Workers**: connected workers and health
- **Workflows**: definitions and recent activity
- **Settings**: tenants, API tokens, configuration
It’s useful for debugging, operational checks, and ad-hoc triggers.
## CLI
The [Hatchet CLI](/reference/cli) is the fastest way to develop and operate Hatchet from your terminal:
- **`hatchet worker dev`**: run a local worker with hot reload
- **`hatchet trigger`**: trigger a workflow from the command line (handy for smoke tests)
- **`hatchet tui`**: terminal UI for runs/workers/workflows
- **`hatchet profile`**: switch between tenants and environments
See the [CLI reference](/reference/cli) for installation and the full command set.
## Coding agents (MCP)
If you use AI coding tools in your editor, Hatchet’s docs can be used via an [MCP (Model Context Protocol) server](/v1/using-coding-agents). We also publish “agent skills” (short, step-by-step playbooks) so coding agents can run common Hatchet workflows—like starting a worker, triggering a workflow, and debugging a run—without guessing at CLI usage.
See [Using Coding Agents](/v1/using-coding-agents) for setup.
---
# Frequently Asked Questions
This page provides answers to a number of the most common questions we're asked, to help you keep making great use of Hatchet!
## How do I choose how many slots to set on my worker?
The default slot count for workers in Hatchet is 100. In many cases, leaving the default as-is will be perfectly fine, especially when first getting set up with Hatchet.
Over time, you'll likely run into one of two issues: Resource starvation (meaning the worker is using up too much memory, CPU, etc.), or wanting to squeeze more juice out of your workers.
If your workers are resource starved, there are basically two options:
1. Reduce the slot count, so the worker runs less work concurrently. This is a blunt instrument, in the sense that it doesn't let you _tune_ resources to the needs of the workload running on the worker. For instance, if you're using 100% of your memory but only 10% of your CPU, reducing the slot count will likely help the worker stay online, but you'll be significantly under-utilizing CPU. In this case, you can:
2. Reconfigure the specs of the machine the worker is running on. For instance, in the example above, you might be able to migrate from a CPU-optimized machine to a memory-optimized one, which will give you more efficient resource utilization across the board.
On the other hand, if your workers are underutilizing resources, your options are:
1. Increase the number of slots on them so they can pick up more work. This is especially helpful for heavily I/O bound tasks, which generally are spending most of their time waiting.
2. Similar to the opposite case of resource starvation, you can scale down the resource requirements of the machine the worker is running on.
> **Info:** In general, we recommend not pushing the number of slots on a single worker
> much past 250-300. At this point, it likely makes sense to scale more
> horizontally.
## Why am I seeing missed heartbeats and task reassignments?
Hatchet uses heartbeats to monitor worker health. Workers send a heartbeat every **4 seconds**. If the engine does not receive a heartbeat for **30 seconds**, the engine considers the worker to be inactive, and re-queues its in-flight tasks for other workers to pick up.
There are a number of common reasons a worker might miss heartbeats:
- **Process crash** - the worker process exits unexpectedly (OOM kill, unhandled exception, SIGKILL).
- **Network disruption** - the connection between the worker and the Hatchet engine is interrupted (DNS failure, firewall change, cloud network blip).
- **Resource pressure** - High CPU or memory usage can starve the worker for resources
---
---
asIndexPage: true
---
import { Cards } from "nextra/components";
# Cookbooks
Our cookbooks are guides to help you solve common problems you'll find easy to tackle with Hatchet.
## Webhooks
Receive webhooks from external services and use them to trigger tasks in Hatchet. Each guide walks through setup end-to-end — creating the webhook in Hatchet, wiring it up to the source, and writing a task that handles the incoming event.
Handle payment events, subscription changes, and other Stripe webhooks.
React to pushes, pull requests, issues, and other GitHub events.
Respond to Slack events like messages, reactions, and slash commands.
---
# Stripe Webhooks
Stripe sends webhooks for all sorts of events, such as payments succeeding, subscription cancellations, invoice creation, and so on. This guide walks through setting up webhooks from Stripe to trigger events directly in Hatchet.
## Setup
### Get your Stripe webhook signing secret
In the [Stripe Dashboard](https://dashboard.stripe.com/webhooks), you'll create a new webhook endpoint. Don't fill in the URL yet — you'll get that from Hatchet in the next step. See [Stripe's webhooks guide](https://docs.stripe.com/webhooks) for more details on setting this up.
For now, note the **signing secret** that Stripe generates for you (it starts with `whsec_`). You'll need this to tell Hatchet how to verify incoming requests.
### Create the webhook in Hatchet
In the Hatchet dashboard, go to **Webhooks** and create a new webhook with the following settings:
Field, Value
**Name**, `stripe` (or whatever you'd like)
**Source**, Stripe
**Event Key Expression**, `'stripe:' + input.type`
**Secret**, Your `whsec_...` signing secret
The event key expression here takes the `type` field from Stripe's payload (something like `payment_intent.created`) and prefixes it with `stripe:` so your event keys are namespaced. When a webhook from Stripe is ingested with an `input.type` of `payment_intent.created`, there will be a corresponding Hatchet event with a key of `stripe:payment_intent.created` created.
Once you've created the webhook, copy the URL that Hatchet generates.
### Add the URL to Stripe
Go back to the Stripe Dashboard webhook you created in step 1 and paste in the Hatchet webhook URL. Select the events you want to listen for (or just select all of them — Hatchet will only trigger workflows that match the event key).
### Write a task that listens for Stripe events
Now you just need a task with a matching `on_events` trigger. For example, to handle successful payments:
#### Python
```python
class StripeObject(BaseModel):
customer: str
amount: int
class StripeData(BaseModel):
object: StripeObject
class StripePaymentInput(BaseModel):
type: str
data: StripeData
class StripePaymentOutput(BaseModel):
customer: str
amount: int
@hatchet.task(
input_validator=StripePaymentInput,
on_events=["stripe:payment_intent.succeeded"],
)
def handle_stripe_payment(
input: StripePaymentInput, ctx: Context
) -> StripePaymentOutput:
customer = input.data.object.customer
amount = input.data.object.amount
print(f"Payment of {amount} from {customer}")
return StripePaymentOutput(customer=customer, amount=amount)
```
#### Typescript
```typescript
type StripePaymentInput = {
type: string;
data: {
object: {
customer: string;
amount: number;
};
};
};
export const handleStripePayment = hatchet.task({
name: 'handle-stripe-payment',
on: {
event: 'stripe:payment_intent.succeeded',
},
fn: async (input: StripePaymentInput) => {
const { customer, amount } = input.data.object;
console.log(`Payment of ${amount} from ${customer}`);
return { customer, amount };
},
});
```
#### Go
```go
type StripePaymentInput struct {
Type string `json:"type"`
Data struct {
Object struct {
Customer string `json:"customer"`
Amount int `json:"amount"`
} `json:"object"`
} `json:"data"`
}
stripePayment := client.NewStandaloneTask(
"handle-stripe-payment",
func(ctx hatchet.Context, input StripePaymentInput) (*struct {
Customer string `json:"customer"`
Amount int `json:"amount"`
}, error) {
fmt.Printf("Payment of %d from %s\n", input.Data.Object.Amount, input.Data.Object.Customer)
return &struct {
Customer string `json:"customer"`
Amount int `json:"amount"`
}{
Customer: input.Data.Object.Customer,
Amount: input.Data.Object.Amount,
}, nil
},
hatchet.WithWorkflowEvents("stripe:payment_intent.succeeded"),
)
```
#### Ruby
```ruby
HANDLE_STRIPE_PAYMENT = HATCHET.task(
name: "handle-stripe-payment",
on_events: ["stripe:payment_intent.succeeded"]
) do |input, ctx|
customer = input["data"]["object"]["customer"]
amount = input["data"]["object"]["amount"]
puts "Payment of #{amount} from #{customer}"
{ "customer" => customer, "amount" => amount }
end
```
### Test it
You can use Stripe's "Send test webhook" feature in the dashboard, or trigger a real event in test mode. You should see the task run appear in the Hatchet dashboard.
---
# GitHub Webhooks
GitHub can send webhooks for repository events — pushes, pull requests, issues, releases, and so on. This guide walks through connecting GitHub webhooks to Hatchet.
## Setup
### Create the webhook in Hatchet
In the Hatchet dashboard, go to **Webhooks** and create a new webhook with the following settings:
Field, Value
**Name**, `github` (or whatever you'd like)
**Source**, GitHub
**Event Key Expression**, `'github:' + headers['x-github-event'] + ':' + input.action`
**Secret**, A secret string of your choosing (you'll use the same one in GitHub)
A quick note on the event key expression: GitHub sends the event type (like `pull_request` or `issues`) in the `x-github-event` header, and the specific action (like `opened` or `closed`) in the payload's `action` field. The expression above combines them to produce keys like `github:pull_request:opened`.
Not all GitHub events have an `action` field, though. Push events, for instance, don't. If you want to handle events that might not have an `action`, you could use a simpler expression like `'github:' + headers['x-github-event']` and handle action-level routing in your task logic instead. Or you could create two separate webhooks — one for action-based events and one for action-less events.
Once you've created the webhook, copy the URL.
### Configure the webhook in GitHub
Go to your repository (or organization) settings, find **Webhooks**, and add a new webhook. See [GitHub's webhook docs](https://docs.github.com/en/webhooks/using-webhooks/creating-webhooks) for the full walkthrough.
1. **Payload URL**: Paste the Hatchet webhook URL.
2. **Content type**: Select `application/json`.
3. **Secret**: Enter the same secret you used when creating the webhook in Hatchet.
4. **Events**: Choose "Let me select individual events" and pick the ones you care about, or select "Send me everything" if you prefer.
> **Warning:** Make sure you set the content type to `application/json`. GitHub defaults to
> `application/x-www-form-urlencoded`, which won't work with Hatchet's JSON
> payload parsing.
### Write a task that listens for GitHub events
Here's an example that triggers when a pull request is opened:
#### Python
```python
class GitHubPullRequest(BaseModel):
number: int
title: str
class GitHubRepository(BaseModel):
full_name: str
class GitHubPRInput(BaseModel):
action: str
pull_request: GitHubPullRequest
repository: GitHubRepository
class GitHubPROutput(BaseModel):
repo: str
pr: int
@hatchet.task(
input_validator=GitHubPRInput,
on_events=["github:pull_request:opened"],
)
def handle_github_pr(input: GitHubPRInput, ctx: Context) -> GitHubPROutput:
repo = input.repository.full_name
pr_number = input.pull_request.number
title = input.pull_request.title
print(f"PR #{pr_number} opened on {repo}: {title}")
return GitHubPROutput(repo=repo, pr=pr_number)
```
#### Typescript
```typescript
type GitHubPRInput = {
action: string;
pull_request: {
number: number;
title: string;
};
repository: {
full_name: string;
};
};
export const handleGitHubPR = hatchet.task({
name: 'handle-github-pr',
on: {
event: 'github:pull_request:opened',
},
fn: async (input: GitHubPRInput) => {
const repo = input.repository.full_name;
const prNumber = input.pull_request.number;
const { title } = input.pull_request;
console.log(`PR #${prNumber} opened on ${repo}: ${title}`);
return { repo, pr: prNumber };
},
});
```
#### Go
```go
type GitHubPRInput struct {
Action string `json:"action"`
PullRequest struct {
Number int `json:"number"`
Title string `json:"title"`
} `json:"pull_request"`
Repository struct {
FullName string `json:"full_name"`
} `json:"repository"`
}
githubPR := client.NewStandaloneTask(
"handle-github-pr",
func(ctx hatchet.Context, input GitHubPRInput) (*struct {
Repo string `json:"repo"`
PR int `json:"pr"`
}, error) {
fmt.Printf("PR #%d opened on %s: %s\n", input.PullRequest.Number, input.Repository.FullName, input.PullRequest.Title)
return &struct {
Repo string `json:"repo"`
PR int `json:"pr"`
}{
Repo: input.Repository.FullName,
PR: input.PullRequest.Number,
}, nil
},
hatchet.WithWorkflowEvents("github:pull_request:opened"),
)
```
#### Ruby
```ruby
HANDLE_GITHUB_PR = HATCHET.task(
name: "handle-github-pr",
on_events: ["github:pull_request:opened"]
) do |input, ctx|
repo = input["repository"]["full_name"]
pr_number = input["pull_request"]["number"]
title = input["pull_request"]["title"]
puts "PR ##{pr_number} opened on #{repo}: #{title}"
{ "repo" => repo, "pr" => pr_number }
end
```
### Test it
After saving the webhook in GitHub, GitHub will send a `ping` event to verify the connection. You can also use the "Redeliver" button in GitHub's webhook settings to replay past events, or just open a PR to trigger a real event.
---
# Slack Webhooks
Slack has several different ways to send data to your app — slash commands, interactive components (buttons, modals, etc.), and event subscriptions. They each have different payload formats and authentication mechanisms, which makes the setup a bit more involved than other webhook integrations. This guide walks through each one.
> **Info:** Slack's different interaction modes use different content types. Event
> subscriptions send JSON, but slash commands and interactive components send
> form-encoded data. Hatchet handles both, but it's good to be aware of the
> difference when writing your CEL expressions and task logic.
## Slack App Setup
Before configuring anything in Hatchet, you'll need a Slack app. If you don't already have one:
1. Go to [api.slack.com/apps](https://api.slack.com/apps) and click **Create New App**. See [Slack's getting started guide](https://api.slack.com/quickstart) if this is your first time.
2. Choose **From scratch**, give it a name, and select your workspace.
3. Once created, go to **Basic Information** and note the **Signing Secret** — you'll need this for Hatchet. See [Slack's signing secret docs](https://api.slack.com/authentication/verifying-requests-from-slack) for more on how request verification works.
## Event Subscriptions
Event subscriptions are what Slack uses to notify your app about things happening in the workspace — messages being posted, channels being created, users joining, and so on.
### Create the webhook in Hatchet
Field, Value
**Name**, `slack-events`
**Source**, Slack
**Event Key Expression**, `'slack:event:' + input.event.type`
**Secret**, Your Slack app's signing secret
Copy the generated URL.
### Enable Event Subscriptions in Slack
In your Slack app settings, go to [**Event Subscriptions**](https://api.slack.com/events), toggle it on, and paste the Hatchet webhook URL into the **Request URL** field.
> **Info:** Slack will send a challenge request to verify the URL. Hatchet handles this
> automatically — you should see a green checkmark confirming the URL is
> verified.
Then, under **Subscribe to bot events**, add the events you want to listen for (e.g., `message.channels`, `app_mention`, `member_joined_channel`).
### Write a task
#### Python
```python
class SlackEvent(BaseModel):
type: str
user: str
text: str
channel: str
class SlackEventInput(BaseModel):
event: SlackEvent
class SlackEventOutput(BaseModel):
handled: bool
@hatchet.task(
input_validator=SlackEventInput,
on_events=["slack:event:app_mention"],
)
def handle_slack_mention(input: SlackEventInput, ctx: Context) -> SlackEventOutput:
print(
f"Mentioned by {input.event.user} in {input.event.channel}: {input.event.text}"
)
return SlackEventOutput(handled=True)
```
#### Typescript
```typescript
type SlackEventInput = {
event: {
type: string;
user: string;
text: string;
channel: string;
};
};
export const handleSlackMention = hatchet.task({
name: 'handle-slack-mention',
on: {
event: 'slack:event:app_mention',
},
fn: async (input: SlackEventInput) => {
const { user, text, channel } = input.event;
console.log(`Mentioned by ${user} in ${channel}: ${text}`);
return { handled: true };
},
});
```
#### Go
```go
type SlackEventInput struct {
Event struct {
Type string `json:"type"`
User string `json:"user"`
Text string `json:"text"`
Channel string `json:"channel"`
} `json:"event"`
}
slackMention := client.NewStandaloneTask(
"handle-slack-mention",
func(ctx hatchet.Context, input SlackEventInput) (*struct {
Handled bool `json:"handled"`
}, error) {
fmt.Printf("Mentioned by %s in %s: %s\n", input.Event.User, input.Event.Channel, input.Event.Text)
return &struct {
Handled bool `json:"handled"`
}{Handled: true}, nil
},
hatchet.WithWorkflowEvents("slack:event:app_mention"),
)
```
#### Ruby
```ruby
HANDLE_SLACK_MENTION = HATCHET.task(
name: "handle-slack-mention",
on_events: ["slack:event:app_mention"]
) do |input, ctx|
event = input["event"]
puts "Mentioned by #{event["user"]} in #{event["channel"]}: #{event["text"]}"
{ "handled" => true }
end
```
## Slash Commands
Slash commands work differently from event subscriptions. When a user types something like `/deploy production`, Slack sends a form-encoded POST to your configured URL. The payload includes the command, the text after it, the user, the channel, and a `response_url` you can use to send a response back.
### Create the webhook in Hatchet
Field, Value
**Name**, `slack-commands`
**Source**, Slack
**Event Key Expression**, `'slack:command:' + input.command`
**Secret**, Your Slack app's signing secret
Copy the generated URL.
> **Info:** Even though slash commands send form-encoded payloads, Hatchet parses them
> into a JSON object so you can use the same `input.field` syntax in your CEL
> expressions.
### Add the slash command in Slack
In your Slack app settings, go to [**Slash Commands**](https://api.slack.com/interactivity/slash-commands) and create a new command. Set the **Request URL** to the Hatchet webhook URL you just copied.
### Write a task
The `input.command` field includes the leading slash (e.g., `/deploy`), so your event key will look like `slack:command:/deploy`.
#### Python
```python
class SlackCommandInput(BaseModel):
command: str
text: str
user_name: str
response_url: str
class SlackCommandOutput(BaseModel):
command: str
args: str
@hatchet.task(
input_validator=SlackCommandInput,
on_events=["slack:command:/deploy"],
)
def handle_slack_command(input: SlackCommandInput, ctx: Context) -> SlackCommandOutput:
print(f"{input.user_name} ran {input.command} {input.text}")
return SlackCommandOutput(command=input.command, args=input.text)
```
#### Typescript
```typescript
type SlackCommandInput = {
command: string;
text: string;
user_name: string;
response_url: string;
};
export const handleSlackCommand = hatchet.task({
name: 'handle-slack-command',
on: {
event: 'slack:command:/deploy',
},
fn: async (input: SlackCommandInput) => {
console.log(`${input.user_name} ran ${input.command} ${input.text}`);
return { command: input.command, args: input.text };
},
});
```
#### Go
```go
type SlackCommandInput struct {
Command string `json:"command"`
Text string `json:"text"`
UserName string `json:"user_name"`
ResponseURL string `json:"response_url"`
}
slackCommand := client.NewStandaloneTask(
"handle-slack-command",
func(ctx hatchet.Context, input SlackCommandInput) (*struct {
Command string `json:"command"`
Args string `json:"args"`
}, error) {
fmt.Printf("%s ran %s %s\n", input.UserName, input.Command, input.Text)
return &struct {
Command string `json:"command"`
Args string `json:"args"`
}{
Command: input.Command,
Args: input.Text,
}, nil
},
hatchet.WithWorkflowEvents("slack:command:/deploy"),
)
```
#### Ruby
```ruby
HANDLE_SLACK_COMMAND = HATCHET.task(
name: "handle-slack-command",
on_events: ["slack:command:/deploy"]
) do |input, ctx|
puts "#{input["user_name"]} ran #{input["command"]} #{input["text"]}"
{ "command" => input["command"], "args" => input["text"] }
end
```
## Interactive Components
Interactive components — buttons, menus, modals — send payloads to an **Interactivity Request URL** when a user interacts with them. These are also form-encoded, with the actual payload nested inside a `payload` field as a JSON string.
### Create the webhook in Hatchet
Field, Value
**Name**, `slack-interactions`
**Source**, Slack
**Event Key Expression**, `'slack:interaction:' + input.type`
**Secret**, Your Slack app's signing secret
### Enable Interactivity in Slack
In your Slack app settings, go to [**Interactivity & Shortcuts**](https://api.slack.com/interactivity/handling), toggle it on, and paste the Hatchet webhook URL into the **Request URL** field.
### Write a task
#### Python
```python
class SlackAction(BaseModel):
action_id: str
class SlackUser(BaseModel):
username: str
class SlackInteractionInput(BaseModel):
type: str
actions: list[SlackAction]
user: SlackUser
class SlackInteractionOutput(BaseModel):
action: str
@hatchet.task(
input_validator=SlackInteractionInput,
on_events=["slack:interaction:block_actions"],
)
def handle_slack_interaction(
input: SlackInteractionInput, ctx: Context
) -> SlackInteractionOutput:
action = input.actions[0]
print(f"{input.user.username} clicked button: {action.action_id}")
return SlackInteractionOutput(action=action.action_id)
```
#### Typescript
```typescript
type SlackInteractionInput = {
type: string;
actions: Array<{ action_id: string }>;
user: { username: string };
};
export const handleSlackInteraction = hatchet.task({
name: 'handle-slack-interaction',
on: {
event: 'slack:interaction:block_actions',
},
fn: async (input: SlackInteractionInput) => {
const [action] = input.actions;
console.log(`${input.user.username} clicked button: ${action.action_id}`);
return { action: action.action_id };
},
});
```
#### Go
```go
type SlackInteractionInput struct {
Type string `json:"type"`
Actions []struct {
ActionID string `json:"action_id"`
} `json:"actions"`
User struct {
Username string `json:"username"`
} `json:"user"`
}
slackInteraction := client.NewStandaloneTask(
"handle-slack-interaction",
func(ctx hatchet.Context, input SlackInteractionInput) (*struct {
Action string `json:"action"`
}, error) {
action := input.Actions[0]
fmt.Printf("%s clicked button: %s\n", input.User.Username, action.ActionID)
return &struct {
Action string `json:"action"`
}{Action: action.ActionID}, nil
},
hatchet.WithWorkflowEvents("slack:interaction:block_actions"),
)
```
#### Ruby
```ruby
HANDLE_SLACK_INTERACTION = HATCHET.task(
name: "handle-slack-interaction",
on_events: ["slack:interaction:block_actions"]
) do |input, ctx|
action = input["actions"][0]
puts "#{input["user"]["username"]} clicked button: #{action["action_id"]}"
{ "action" => action["action_id"] }
end
```
---
# Self-Hosting the Hatchet Control Plane
Self-hosting Hatchet means running your own instance of the **Hatchet Control Plane** - the central orchestration system that manages workflows, schedules tasks, and coordinates worker execution. This is different from running workers, which can connect to any Hatchet instance (self-hosted or Hatchet Cloud).
## What You're Self-Hosting
When you self-host Hatchet, you're deploying:
- **API Server** - REST APIs for workflow management
- **Engine** - gRPC API for core workflow orchestration and task scheduling
- **Database** - PostgreSQL for storing workflow state and metadata
- **Message Queue (optional)** - RabbitMQ for inter-service communication and high-throughput real-time updates
- **Dashboard** - Web UI for monitoring workflows and debugging
Your **workers** (the processes that execute your workflow steps) will connect to your self-hosted control plane and execute tasks.
## Deployment Options
The fastest way to get a Hatchet instance running locally is with the [Hatchet CLI](/cli) (which wraps Hatchet Lite):
```sh
hatchet server start
```
There are currently three supported ways to self-host the Hatchet Control Plane:
Docker:
1. [Hatchet Lite](./self-hosting/hatchet-lite.mdx) - Single docker image with bundled engine and API (development, testing, or low-throughput production)
2. [Docker Compose](./self-hosting/docker-compose.mdx) - Multi-container setup with PostgreSQL and RabbitMQ (production)
Kubernetes:
1. [Quickstart with Helm](./self-hosting/kubernetes-quickstart.mdx) - Production-ready Helm charts (production)
---
# Hatchet Lite Deployment
To get up and running quickly, you can deploy via the `hatchet-lite` image. This image is designed for development and low-volume use-cases.
### Prerequisites
This deployment requires [Docker](https://docs.docker.com/engine/install/) installed locally to work.
### Getting Hatchet Lite Running
#### Hatchet CLI
The easiest way to get Hatchet Lite running is via the Hatchet CLI. Simply run the following command:
```sh
hatchet server start
```
#### With Postgres (Default)
To use Postgres as both your DB and message queue, copy the following `docker-compose.hatchet.yml` file to the root of your repository:
> **Info:** If you have an existing Postgres instance already running, you can simply
> point `DATABASE_URL` to that instance and ignore the `postgres` service
> deployment in the following docker-compose file.
```yaml filename="docker-compose.hatchet.yml" copy
version: "3.8"
name: hatchet-lite
services:
postgres:
image: postgres:15.6
command: postgres -c 'max_connections=200'
restart: always
environment:
- POSTGRES_USER=hatchet
- POSTGRES_PASSWORD=hatchet
- POSTGRES_DB=hatchet
volumes:
- hatchet_lite_postgres_data:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -d hatchet -U hatchet"]
interval: 10s
timeout: 10s
retries: 5
start_period: 10s
hatchet-lite:
image: ghcr.io/hatchet-dev/hatchet/hatchet-lite:latest
ports:
- "8888:8888"
- "7077:7077"
depends_on:
postgres:
condition: service_healthy
environment:
# Refer to https://docs.hatchet.run/self-hosting/configuration-options
# for a list of all supported environment variables
DATABASE_URL: "postgresql://hatchet:hatchet@postgres:5432/hatchet?sslmode=disable"
SERVER_AUTH_COOKIE_DOMAIN: localhost
SERVER_AUTH_COOKIE_INSECURE: "t"
SERVER_GRPC_BIND_ADDRESS: "0.0.0.0"
SERVER_GRPC_INSECURE: "t"
SERVER_GRPC_BROADCAST_ADDRESS: localhost:7077
SERVER_GRPC_PORT: "7077"
SERVER_URL: http://localhost:8888
SERVER_AUTH_SET_EMAIL_VERIFIED: "t"
SERVER_DEFAULT_ENGINE_VERSION: "V1"
SERVER_INTERNAL_CLIENT_INTERNAL_GRPC_BROADCAST_ADDRESS: localhost:7077
volumes:
- "hatchet_lite_config:/config"
volumes:
hatchet_lite_postgres_data:
hatchet_lite_config:
```
Then run `docker-compose -f docker-compose.hatchet.yml up` to get the Hatchet Lite instance running.
#### With Postgres + RabbitMQ
To use Postgres as your DB and RabbitMQ as the message queue, copy the following `docker-compose.hatchet.yml` file to the root of your repository:
> **Info:** If you have an existing Postgres instance already running, you can simply
> point `DATABASE_URL` to that instance and ignore the `postgres` service
> deployment in the following docker-compose file.
```yaml filename="docker-compose.hatchet.yml" copy
version: "3.8"
name: hatchet-lite
services:
postgres:
image: postgres:15.6
command: postgres -c 'max_connections=200'
restart: always
environment:
- POSTGRES_USER=hatchet
- POSTGRES_PASSWORD=hatchet
- POSTGRES_DB=hatchet
volumes:
- hatchet_lite_postgres_data:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -d hatchet -U hatchet"]
interval: 10s
timeout: 10s
retries: 5
start_period: 10s
rabbitmq:
image: "rabbitmq:3-management"
hostname: "rabbitmq"
ports:
- "5672:5672"
- "15672:15672"
environment:
RABBITMQ_DEFAULT_USER: "user"
RABBITMQ_DEFAULT_PASS: "password"
volumes:
- "hatchet_rabbitmq_data:/var/lib/rabbitmq"
- "hatchet_rabbitmq.conf:/etc/rabbitmq/rabbitmq.conf"
healthcheck:
test: ["CMD", "rabbitmqctl", "status"]
interval: 30s
timeout: 10s
retries: 5
hatchet-lite:
image: ghcr.io/hatchet-dev/hatchet/hatchet-lite:latest
ports:
- "8888:8888"
- "7077:7077"
depends_on:
postgres:
condition: service_healthy
environment:
SERVER_MSGQUEUE_KIND: rabbitmq
SERVER_MSGQUEUE_RABBITMQ_URL: "amqp://user:password@rabbitmq:5672/"
# Refer to https://docs.hatchet.run/self-hosting/configuration-options
# for a list of all supported environment variables
DATABASE_URL: "postgresql://hatchet:hatchet@postgres:5432/hatchet?sslmode=disable"
SERVER_AUTH_COOKIE_DOMAIN: localhost
SERVER_AUTH_COOKIE_INSECURE: "t"
SERVER_GRPC_BIND_ADDRESS: "0.0.0.0"
SERVER_GRPC_INSECURE: "t"
SERVER_GRPC_BROADCAST_ADDRESS: localhost:7077
SERVER_GRPC_PORT: "7077"
SERVER_URL: http://localhost:8888
SERVER_AUTH_SET_EMAIL_VERIFIED: "t"
SERVER_DEFAULT_ENGINE_VERSION: "V1"
volumes:
- "hatchet_lite_config:/config"
volumes:
hatchet_lite_postgres_data:
hatchet_lite_config:
hatchet_rabbitmq_data:
hatchet_rabbitmq.conf:
```
Then run `docker-compose -f docker-compose.hatchet.yml up` to get the Hatchet Lite instance running.
### Accessing Hatchet Lite
Once the Hatchet Lite instance is running, you can access the Hatchet Lite UI at [http://localhost:8888](http://localhost:8888).
By default, a user is created with the following credentials:
```
Email: admin@example.com
Password: Admin123!!
```
After logging in, follow the steps in the UI to create your first tenant and run your first workflow!
---
# Docker Compose Deployment
This guide shows how to deploy Hatchet using Docker Compose for a production-ready deployment. If you'd like to get up and running quickly, you can also deploy Hatchet using the `hatchet-lite` image following the tutorial here: [Hatchet Lite Deployment](/self-hosting/hatchet-lite).
This guide uses RabbitMQ as a message broker for Hatchet. This is optional: if you'd like to use Postgres as a message broker, modify the `setup-config` service in the `docker-compose.yml` file with the following env var, and delete all RabbitMQ references:
```sh
SERVER_MSGQUEUE_KIND=postgres
```
## Quickstart
### Prerequisites
This deployment requires [Docker](https://docs.docker.com/engine/install/) installed locally to work.
### Create files
We will be creating a `docker-compose.yml` file in the root of your repository:
```
root/
docker-compose.yml
docker-compose.yml
```
```yaml filename="docker-compose.yml" copy
version: "3.8"
services:
postgres:
image: postgres:15.6
command: postgres -c 'max_connections=1000'
restart: always
hostname: "postgres"
environment:
- POSTGRES_USER=hatchet
- POSTGRES_PASSWORD=hatchet
- POSTGRES_DB=hatchet
ports:
- "5435:5432"
volumes:
- hatchet_postgres_data:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -d hatchet -U hatchet"]
interval: 10s
timeout: 10s
retries: 5
start_period: 10s
rabbitmq:
image: "rabbitmq:3-management"
hostname: "rabbitmq"
ports:
- "5673:5672" # RabbitMQ
- "15673:15672" # Management UI
environment:
RABBITMQ_DEFAULT_USER: "user"
RABBITMQ_DEFAULT_PASS: "password"
volumes:
- "hatchet_rabbitmq_data:/var/lib/rabbitmq"
- "hatchet_rabbitmq.conf:/etc/rabbitmq/rabbitmq.conf" # Configuration file mount
healthcheck:
test: ["CMD", "rabbitmqctl", "status"]
interval: 10s
timeout: 10s
retries: 5
migration:
image: ghcr.io/hatchet-dev/hatchet/hatchet-migrate:latest
command: /hatchet/hatchet-migrate
environment:
DATABASE_URL: "postgres://hatchet:hatchet@postgres:5432/hatchet"
depends_on:
postgres:
condition: service_healthy
setup-config:
image: ghcr.io/hatchet-dev/hatchet/hatchet-admin:latest
command: /hatchet/hatchet-admin quickstart --skip certs --generated-config-dir /hatchet/config --overwrite=false
environment:
DATABASE_URL: "postgres://hatchet:hatchet@postgres:5432/hatchet"
SERVER_MSGQUEUE_RABBITMQ_URL: amqp://user:password@rabbitmq:5672/
SERVER_AUTH_COOKIE_DOMAIN: localhost:8080
SERVER_AUTH_COOKIE_INSECURE: "t"
SERVER_GRPC_BIND_ADDRESS: "0.0.0.0"
SERVER_GRPC_INSECURE: "t"
SERVER_GRPC_BROADCAST_ADDRESS: localhost:7077
SERVER_DEFAULT_ENGINE_VERSION: "V1"
SERVER_INTERNAL_CLIENT_INTERNAL_GRPC_BROADCAST_ADDRESS: hatchet-engine:7070
volumes:
- hatchet_certs:/hatchet/certs
- hatchet_config:/hatchet/config
depends_on:
migration:
condition: service_completed_successfully
rabbitmq:
condition: service_healthy
postgres:
condition: service_healthy
hatchet-engine:
image: ghcr.io/hatchet-dev/hatchet/hatchet-engine:latest
command: /hatchet/hatchet-engine --config /hatchet/config
restart: on-failure
depends_on:
setup-config:
condition: service_completed_successfully
migration:
condition: service_completed_successfully
ports:
- "7077:7070"
environment:
DATABASE_URL: "postgres://hatchet:hatchet@postgres:5432/hatchet"
SERVER_GRPC_BIND_ADDRESS: "0.0.0.0"
SERVER_GRPC_INSECURE: "t"
volumes:
- hatchet_certs:/hatchet/certs
- hatchet_config:/hatchet/config
hatchet-dashboard:
image: ghcr.io/hatchet-dev/hatchet/hatchet-dashboard:latest
command: sh ./entrypoint.sh --config /hatchet/config
ports:
- 8080:80
restart: on-failure
depends_on:
setup-config:
condition: service_completed_successfully
migration:
condition: service_completed_successfully
environment:
DATABASE_URL: "postgres://hatchet:hatchet@postgres:5432/hatchet"
volumes:
- hatchet_certs:/hatchet/certs
- hatchet_config:/hatchet/config
volumes:
hatchet_postgres_data:
hatchet_rabbitmq_data:
hatchet_rabbitmq.conf:
hatchet_config:
hatchet_certs:
```
### Get Hatchet up and running
To start the services, run the following command in the root of your repository:
```bash
docker compose up
```
Wait for the `hatchet-engine` and `hatchet-dashboard` services to start.
### Accessing Hatchet
Once the Hatchet instance is running, you can access the Hatchet UI at [http://localhost:8080](http://localhost:8080).
By default, a user is created with the following credentials:
```
Email: admin@example.com
Password: Admin123!!
```
## Run tasks against the Hatchet instance
To run tasks against this instance, you will first need to create an API token for your worker. There are two ways to do this:
1. **Using a CLI command**:
You can run the following command to create a token:
```sh
docker compose run --no-deps setup-config /hatchet/hatchet-admin token create --config /hatchet/config --tenant-id 707d0855-80ab-4e1f-a156-f1c4546cbf52
```
2. **Using the Hatchet dashboard**:
- Log in to the Hatchet dashboard.
- Navigate to the "Settings" page.
- Click on the "API Tokens" tab.
- Click on "Create API Token".
Now that you have an API token, see the guide [here](https://docs.hatchet.run/home/setup) for how to run your first task.
## Repulling images
The docker compose file above uses the `latest` tag for all images. This means that if you want to pull the latest version of the images, you can run the following command:
```bash
docker compose pull
```
## Connecting to the engine from within Docker
If you're also running your worker application inside of `docker-compose`, you should modify the `SERVER_GRPC_BROADCAST_ADDRESS` environment variable in the `setup-config` service to use `hatchet-engine` as the hostname. For example:
```yaml
SERVER_GRPC_BROADCAST_ADDRESS: "hatchet-engine:7077"
```
Make sure your worker depends on hatchet-engine:
```yaml
worker:
depends_on:
hatchet-engine:
condition: service_started
```
> **Info:** **Note:** modifying the GRPC broadcast address or server URL will require
> re-issuing an API token.
## Additional Docker configuration
### Increase Postgres shared memory
By default, containers have a 64 MB shared memory segment (`/dev/shm`). For larger Hatchet deployments this can be too small and may lead to slow queries or an unresponsive dashboard. Increase the shared memory size for the `postgres` service:
```yaml filename="docker-compose.yml" copy
# ...
services:
postgres:
image: postgres:15.6
shm_size: 1g # Increase shared memory (adjust as needed)
command: postgres -c 'max_connections=1000'
restart: always
hostname: "postgres"
environment:
- POSTGRES_USER=hatchet
- POSTGRES_PASSWORD=hatchet
- POSTGRES_DB=hatchet
ports:
- "5435:5432"
volumes:
- hatchet_postgres_data:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -d hatchet -U hatchet"]
interval: 10s
timeout: 10s
retries: 5
start_period: 10s
# ...
```
---
# Kubernetes Quickstart
## Prerequisites
- A Kubernetes cluster currently set as the current context in `kubectl`
- `kubectl` and `helm` installed
## Quickstart
### Get Hatchet Running
To deploy `hatchet-stack`, run the following commands:
```sh
helm repo add hatchet https://hatchet-dev.github.io/hatchet-charts
helm install hatchet-stack hatchet/hatchet-stack --set caddy.enabled=true
```
This default installation will run the Hatchet server as an internal service in the cluster and spins up a reverse proxy via `Caddy` to get local access. To view the Hatchet server, run the following command:
```sh
export NAMESPACE=default # TODO: replace with your namespace
export POD_NAME=$(kubectl get pods --namespace $NAMESPACE -l "app=caddy" -o jsonpath="{.items[0].metadata.name}")
export CONTAINER_PORT=$(kubectl get pod --namespace $NAMESPACE $POD_NAME -o jsonpath="{.spec.containers[0].ports[0].containerPort}")
kubectl --namespace $NAMESPACE port-forward $POD_NAME 8080:$CONTAINER_PORT
```
And then navigate to `http://localhost:8080` to see the Hatchet frontend running. You can log into Hatchet with the following credentials:
```
Email: admin@example.com
Password: Admin123!!
```
### Port forward to the Hatchet engine
```sh
export NAMESPACE=default # TODO: replace with your namespace
export POD_NAME=$(kubectl get pods --namespace $NAMESPACE -l "app.kubernetes.io/name=engine" -o jsonpath="{.items[0].metadata.name}")
export CONTAINER_PORT=$(kubectl get pod --namespace $NAMESPACE $POD_NAME -o jsonpath="{.spec.containers[0].ports[0].containerPort}")
kubectl --namespace $NAMESPACE port-forward $POD_NAME 7070:$CONTAINER_PORT
```
This will spin up the Hatchet engine service on `localhost:7070` which you can then connect to from the examples.
### Generate an API token
To generate an API token, navigate to the `Settings` tab in the Hatchet frontend and click on the `API Tokens` tab. Click the `Generate API Token` button to create a new token. Store this token somewhere safe.
### Run your first worker
Now that you have an API token, see the guide [here](https://docs.hatchet.run/home/setup) for how to run your first task.
---
# Kubernetes Deployment via Glasskube
## Prerequisites
- A Kubernetes cluster currently set as the current context in `kubectl`
- `docker`, `openssl`, `kubectl` and [`glasskube`](https://glasskube.dev) installed
## What is Glasskube?
[Glasskube](https://glasskube.dev) is an alternative package manager for Kubernetes and part of the CNCF landscape. Glasskube is designed as a Cloud Native application and every installed package is represented by a Custom Resource.
[`glasskube/glasskube`](https://github.com/glasskube/glasskube/) is in active development, with _good first issues_
available for new contributors.
## Quickstart
### Generate encryption keys
There are 4 encryption secrets required for Hatchet to run which can be generated via the following bash script (requires `docker` and `openssl`):
```sh filename=generate.sh copy
#!/bin/bash
# Define an alias for generating random strings. This needs to be a function in a script.
randstring() {
openssl rand -base64 69 | tr -d "\n=+/" | cut -c1-$1
}
# Create keys directory
mkdir -p ./keys
# Function to clean up the keys directory
cleanup() {
rm -rf ./keys
}
# Register the cleanup function to be called on the EXIT signal
trap cleanup EXIT
# Check if Docker is installed
if ! command -v docker &> /dev/null
then
echo "Docker could not be found. Please install Docker."
exit 1
fi
# Generate keysets using Docker
docker run --user $(id -u):$(id -g) -v $(pwd)/keys:/hatchet/keys ghcr.io/hatchet-dev/hatchet/hatchet-admin:latest /hatchet/hatchet-admin keyset create-local-keys --key-dir /hatchet/keys
# Read keysets from files
SERVER_ENCRYPTION_MASTER_KEYSET=$(<./keys/master.key)
SERVER_ENCRYPTION_JWT_PRIVATE_KEYSET=$(<./keys/private_ec256.key)
SERVER_ENCRYPTION_JWT_PUBLIC_KEYSET=$(<./keys/public_ec256.key)
# Generate the random strings for SERVER_AUTH_COOKIE_SECRETS
SERVER_AUTH_COOKIE_SECRET1=$(randstring 16)
SERVER_AUTH_COOKIE_SECRET2=$(randstring 16)
# Create the YAML file
cat > hatchet-secret.yaml <"
HATCHET_CLIENT_TLS_STRATEGY=none
```
You will need this in the following example.
### Port forward to the Hatchet engine
```sh
export NAMESPACE=hatchet # TODO: change if you modified the namespace
export POD_NAME=$(kubectl get pods --namespace $NAMESPACE -l "app.kubernetes.io/name=hatchet-engine,app.kubernetes.io/instance=hatchet" -o jsonpath="{.items[0].metadata.name}")
export CONTAINER_PORT=$(kubectl get pod --namespace $NAMESPACE $POD_NAME -o jsonpath="{.spec.containers[0].ports[0].containerPort}")
kubectl --namespace $NAMESPACE port-forward $POD_NAME 7070:$CONTAINER_PORT
```
This will spin up the Hatchet engine service on `localhost:7070` which you can then connect to from the examples.
### Generate an API token
To generate an API token, navigate to the `Settings` tab in the Hatchet frontend and click on the `API Tokens` tab. Click the `Generate API Token` button to create a new token. Store this token somewhere safe.
### Run your first worker
Now that you have an API token, see the guide [here](https://docs.hatchet.run/home/setup) for how to run your first task.
---
# Kubernetes Networking
## Overview
By default, the Kubernetes Helm chart does not expose any of the Hatchet services over an ingress. There are three services which can possibly be exposed:
1. `hatchet-engine`
2. `hatchet-stack-api`
3. `hatchet-stack-frontend`
To expose these services, you will need to do the following:
1. Configure ingresses for `frontend` and `engine` services (and optionally the `api` service). We recommend configuring the ingress to reverse proxy `/api` endpoints to the `hatchet-stack-api` service, and configuring a separate ingress to proxy to `hatchet-engine`.
2. Update the following configuration variables:
```yaml
api:
env:
SERVER_AUTH_COOKIE_DOMAIN: "hatchet.example.com" # example.com should be replaced with your domain
SERVER_URL: "https://hatchet.example.com" # example.com should be replaced with your domain
SERVER_GRPC_BIND_ADDRESS: "0.0.0.0"
SERVER_GRPC_INSECURE: "false"
SERVER_GRPC_BROADCAST_ADDRESS: "hatchet-engine.example.com:443" # example.com should be replaced with your domain
engine:
env:
SERVER_AUTH_COOKIE_DOMAIN: "hatchet.example.com" # example.com should be replaced with your domain
SERVER_URL: "https://hatchet.example.com" # example.com should be replaced with your domain
SERVER_GRPC_BIND_ADDRESS: "0.0.0.0"
SERVER_GRPC_INSECURE: "false"
SERVER_GRPC_BROADCAST_ADDRESS: "engine.hatchet.example.com:443" # example.com should be replaced with your domain
```
## Example: `nginx-ingress`
Let's walk through an example of exposing Hatchet over `hatchet.example.com` (for the API and frontend) and `engine.hatchet.example.com` (for the engine).
We'll be deploying this with SSL enabled, which requires a valid certificate. We recommend using [cert-manager](https://cert-manager.io/docs/) to manage your certificates. This guide assumes that you have a cert-manager `ClusterIssuer` called `letsencrypt-prod` configured.
Here's an example `values.yaml` file for this setup:
```yaml
api:
env:
# TODO: insert these values from the output of the keyset generation command
SERVER_AUTH_COOKIE_SECRETS: "$SERVER_AUTH_COOKIE_SECRET1 $SERVER_AUTH_COOKIE_SECRET2"
SERVER_ENCRYPTION_MASTER_KEYSET: "$SERVER_ENCRYPTION_MASTER_KEYSET"
SERVER_ENCRYPTION_JWT_PRIVATE_KEYSET: "$SERVER_ENCRYPTION_JWT_PRIVATE_KEYSET"
SERVER_ENCRYPTION_JWT_PUBLIC_KEYSET: "$SERVER_ENCRYPTION_JWT_PUBLIC_KEYSET"
SERVER_AUTH_COOKIE_DOMAIN: "hatchet.example.com" # example.com should be replaced with your domain
SERVER_URL: "https://hatchet.example.com" # example.com should be replaced with your domain
SERVER_GRPC_BIND_ADDRESS: "0.0.0.0"
SERVER_GRPC_INSECURE: "false"
SERVER_GRPC_BROADCAST_ADDRESS: "engine.hatchet.example.com:443" # example.com should be replaced with your domain
engine:
env:
# TODO: insert these values from the output of the keyset generation command
SERVER_AUTH_COOKIE_SECRETS: "$SERVER_AUTH_COOKIE_SECRET1 $SERVER_AUTH_COOKIE_SECRET2"
SERVER_ENCRYPTION_MASTER_KEYSET: "$SERVER_ENCRYPTION_MASTER_KEYSET"
SERVER_ENCRYPTION_JWT_PRIVATE_KEYSET: "$SERVER_ENCRYPTION_JWT_PRIVATE_KEYSET"
SERVER_ENCRYPTION_JWT_PUBLIC_KEYSET: "$SERVER_ENCRYPTION_JWT_PUBLIC_KEYSET"
SERVER_AUTH_COOKIE_DOMAIN: "hatchet.example.com" # example.com should be replaced with your domain
SERVER_URL: "https://hatchet.example.com" # example.com should be replaced with your domain
SERVER_GRPC_BIND_ADDRESS: "0.0.0.0"
SERVER_GRPC_INSECURE: "false"
SERVER_GRPC_BROADCAST_ADDRESS: "engine.hatchet.example.com:443" # example.com should be replaced with your domain
ingress:
enabled: true
ingressClassName: nginx
labels: {}
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
nginx.ingress.kubernetes.io/auth-tls-verify-client: "optional"
nginx.ingress.kubernetes.io/auth-tls-secret: "${kubernetes_namespace.cloud.metadata[0].name}/engine-cert"
nginx.ingress.kubernetes.io/auth-tls-verify-depth: "1"
nginx.ingress.kubernetes.io/auth-tls-pass-certificate-to-upstream: "true"
nginx.ingress.kubernetes.io/backend-protocol: "GRPC"
nginx.ingress.kubernetes.io/ssl-redirect: "true"
nginx.ingress.kubernetes.io/grpc-backend: "true"
nginx.ingress.kubernetes.io/server-snippet: |
grpc_read_timeout 1d;
grpc_send_timeout 1h;
client_header_timeout 1h;
client_body_timeout 1h;
hosts:
- host: engine.hatchet.example.com
paths:
- path: /
backend:
serviceName: hatchet-engine
servicePort: 7070
tls:
- hosts:
- engine.hatchet.example.com
secretName: engine-cert
servicePort: 7070
frontend:
ingress:
enabled: true
ingressClassName: nginx
labels: {}
annotations:
nginx.ingress.kubernetes.io/proxy-body-size: 50m
nginx.ingress.kubernetes.io/proxy-send-timeout: "60"
nginx.ingress.kubernetes.io/proxy-read-timeout: "60"
nginx.ingress.kubernetes.io/proxy-connect-timeout: "60"
cert-manager.io/cluster-issuer: letsencrypt-prod
hosts:
- host: hatchet.example.com
paths:
- path: /api
backend:
serviceName: hatchet-api
servicePort: 8080
- path: /
backend:
serviceName: hatchet-frontend
servicePort: 8080
tls:
- secretName: hatchet-api
hosts:
- hatchet.example.com
```
---
# Configuring the Helm Chart
## Shared Config
For the `hatchet-stack` and `hatchet-ha` Helm charts, the `sharedConfig` object in the `values.yaml` file allows you to configure shared settings for all backend services. The default values are:
```yaml
sharedConfig:
# you can disable shared config by setting this to false
enabled: true
# these are the most commonly configured values
serverUrl: "http://localhost:8080"
serverAuthCookieDomain: "localhost:8080" # the domain for the auth cookie
serverAuthCookieInsecure: "t" # allows cookies to be set over http
serverAuthSetEmailVerified: "t" # automatically sets email_verified to true for all users
serverAuthBasicAuthEnabled: "t" # allows login via basic auth (email/password)
grpcBroadcastAddress: "localhost:7070" # the endpoint for the gRPC server, exposed via the `grpc` service
grpcInsecure: "true" # allows gRPC to be served over http
defaultAdminEmail: "admin@example.com" # in exposed/production environments, change this to a valid email
defaultAdminPassword: "Admin123!!" # in exposed/production environments, change this to a secure password
# you can set additional environment variables here, which will override any defaults
env: {}
```
### Networking
- **`sharedConfig.serverUrl`** (default: `"http://localhost:8080"`): specifies the base URL for the server. This URL should be the public-facing URL of the Hatchet API server (which is typically bundled behind a reverse proxy with the Hatchet frontend).
- **`sharedConfig.grpcBroadcastAddress`** (default: `"localhost:7070"`): defines the address for the gRPC server endpoint, which is exposed via the `grpc` service.
- **`sharedConfig.grpcInsecure`** (default: `"true"`): when set to `true`, allows the gRPC server to be served over HTTP instead of HTTPS. Use this in non-production environments only.
### Authentication
- **`sharedConfig.serverAuthCookieDomain`** (default: `"localhost:8080"`): specifies the domain for the authentication cookie. Should be set to the appropriate domain when deploying to production.
- **`sharedConfig.serverAuthCookieInsecure`** (default: `"t"`): if set to `"t"`, allows authentication cookies to be set over HTTP, useful for local development. In production, use a secure setting.
- **`sharedConfig.serverAuthSetEmailVerified`** (default: `"t"`): automatically sets `email_verified` to `true` for all users. This is useful for testing environments where email verification is not necessary.
- **`sharedConfig.serverAuthBasicAuthEnabled`** (default: `"t"`): enables basic authentication (using email and password) for users. Should be enabled if the system needs to support user logins via email/password.
- **`sharedConfig.defaultAdminEmail`** (default: `"admin@example.com"`): specifies the email for the default administrator account. Change this to a valid email when deploying to production environments.
- **`sharedConfig.defaultAdminPassword`** (default: `"Admin123!!"`): defines the password for the default administrator account. This should be changed to a strong password for production deployments.
### Additional Env Variables
You can set additional environment variables for the backend services using the `env` object. For example:
```yaml
sharedConfig:
env:
MY_ENV_VAR: "my-value"
```
This will set the environment variable `MY_ENV_VAR` to `"my-value"` for all backend services. These values will override any default environment settings for the services.
### Seeding Data
The `sharedConfig` object also allows you to seed the database with a default tenant and user. The following environment variables are used for seeding:
````yaml
The following environment variables are used to seed the database:
```yaml
seed:
defaultAdminEmail: "admin@example.com" # in exposed/production environments, change this to a valid email
defaultAdminPassword: "Admin123!!" # in exposed/production environments, change this to a secure password
env:
ADMIN_NAME: "Admin User"
DEFAULT_TENANT_NAME: "Default"
DEFAULT_TENANT_SLUG: "default"
DEFAULT_TENANT_ID: "707d0855-80ab-4e1f-a156-f1c4546cbf52"
````
---
# Setting up Hatchet with an external database
## Connecting to Postgres
To connect to an external Postgres instance, set `postgres.enabled` to `false` in the `values.yaml` file. This will disable the internal Postgres instance and allow you to connect to an external database. You should then add the following configuration for the `hatchet-stack` or `hatchet-ha` charts:
> Note: Either `DATABASE_URL` or `DATABASE_POSTGRES_*` are required
```yaml
sharedConfig:
env:
DATABASE_URL: "postgres://:@:5432/?sslmode=disable"
DATABASE_POSTGRES_HOST: ""
DATABASE_POSTGRES_PORT: "5432"
DATABASE_POSTGRES_USERNAME: ""
DATABASE_POSTGRES_PASSWORD: ""
DATABASE_POSTGRES_DB_NAME: ""
DATABASE_POSTGRES_SSL_MODE: "disable"
```
## Mounting environment variables
Environment variables can also be mounted from secrets or configmaps via the `deploymentEnvFrom` field, which corresponds to the `envFrom` field in a Kubernetes deployment. For example, to mount the `DATABASE_URL` environment variable from a secret, you can use the following configuration:
```yaml
hatchet-api:
deploymentEnvFrom:
- secretRef:
name: hatchet-api-secrets
key: DATABASE_URL
hatchet-engine:
deploymentEnvFrom:
- secretRef:
name: hatchet-api-secrets
key: DATABASE_URL
```
For more information on mounting environment variables from secrets, refer to the [Kubernetes documentation](https://kubernetes.io/docs/tasks/inject-data-application/distribute-credentials-secure/#configure-all-key-value-pairs-in-a-secret-as-container-environment-variables).
## Migrations
In order for migrations to run, the database user requires permissions to write and modify schemas **on a clean database**. It is therefore recommended to create a separate database instance where Hatchet can run and grant permissions on this database to the Hatchet user. For example, to create a new database and user `hatchet` in Postgres, run the following commands (**warning:** change the username/password for production usage):
```sql
create database hatchet;
create role hatchet
with
login password 'hatchet';
grant hatchet to postgres;
alter database hatchet owner to hatchet;
```
---
# High Availability
If you are running Hatchet in a high-throughput production environment, you may want to set up an HA (High Availability) configuration to ensure that your system remains available in the event of infrastructure failures or other issues.
There are multiple levels that you can configure Hatchet to be high availability:
- At the **database level** by using a managed Postgres provider like AWS RDS or Google Cloud SQL which supports HA options.
- At the **RabbitMQ level** by configuring the RabbitMQ cluster to have at least 3 replicas across multiple zones within a region.
- At the **Hatchet Engine/API level** by running multiple instances of the Hatchet engine behind a load balancer and splitting the different Hatchet services into separate deployments.
This guide will focus on the last level of high availability.
To view an end-to-end example of configuring Hatchet for high availability on GCP using Terraform, check out the GCP deployment guide [here](https://github.com/hatchet-dev/hatchet-infra-examples/blob/main/self-hosting/gcp)
## HA Helm Chart
Hatchet offers an HA Helm chart that can be used to deploy Hatchet in a high availability configuration. To use this Helm chart:
```sh
helm repo add hatchet https://hatchet-dev.github.io/hatchet-charts
helm install hatchet-ha hatchet/hatchet-ha
```
This chart accepts the same parameters as `hatchet-stack` for the top-level `api`, `frontend`, `postgres` and `rabbitmq` objects, but you can additionally configure the following services:
```yaml
grpc:
replicaCount: 4
controllers:
replicaCount: 2
scheduler:
replicaCount: 2
```
See the [Helm configuration](./kubernetes-helm-configuration) guide for more information on configuring the Hatchet Helm charts.
---
# Configuration Options
The Hatchet server and engine can be configured via environment variables using several prefixes. This document contains a comprehensive list of all 197+ available options organized by component.
## Environment Variable Prefixes
Hatchet uses the following environment variable prefixes:
- **`SERVER_`** (172 variables) - Main server configuration including runtime, authentication, encryption, monitoring, and integrations
- **`DATABASE_`** (19 variables) - PostgreSQL database connection and configuration
- **`READ_REPLICA_`** (4 variables) - Read replica database configuration
- **`ADMIN_`** (3 variables) - Administrator user setup for initial seeding
- **`DEFAULT_`** (3 variables) - Default tenant configuration
- **`SCHEDULER_`** (1 variable) - Scheduler-specific rate limiting
- **`SEED_`** (1 variable) - Development environment seeding
- **`CACHE_`** (1 variable) - Cache duration settings
_Note: This documentation excludes `HATCHET*CLIENT*_` variables which are specific to Go SDK client configuration.\*
## Required Environment Variables
The following variables are **absolutely required** for Hatchet to start successfully:
### Encryption Keys (Required - Choose One Strategy)
**Option A: Local Encryption Keys**
```bash
SERVER_ENCRYPTION_MASTER_KEYSET=""
SERVER_ENCRYPTION_JWT_PUBLIC_KEYSET=""
SERVER_ENCRYPTION_JWT_PRIVATE_KEYSET=""
```
**Option B: File-based Keys**
```bash
SERVER_ENCRYPTION_MASTER_KEYSET_FILE="/path/to/master.keyset"
SERVER_ENCRYPTION_JWT_PUBLIC_KEYSET_FILE="/path/to/jwt-public.keyset"
SERVER_ENCRYPTION_JWT_PRIVATE_KEYSET_FILE="/path/to/jwt-private.keyset"
```
**Option C: Google Cloud KMS**
```bash
SERVER_ENCRYPTION_CLOUDKMS_ENABLED=true
SERVER_ENCRYPTION_CLOUDKMS_KEY_URI="gcp-kms://your-key-uri"
SERVER_ENCRYPTION_CLOUDKMS_CREDENTIALS_JSON=""
```
### Authentication Secrets (Required)
```bash
SERVER_AUTH_COOKIE_SECRETS=""
```
### Database Connection (Required)
**Option A: Connection String**
```bash
DATABASE_URL="postgresql://user:password@host:port/dbname"
```
**Option B: Individual Parameters** (uses defaults if not specified)
```bash
DATABASE_POSTGRES_HOST=your-postgres-host
DATABASE_POSTGRES_PASSWORD=your-secure-password
```
## Minimal Configuration Example
> **Warning:** This example is for local development when Hatchet connects to PostgreSQL
> running on the same host. For Docker Compose deployments, use your database
> service name, such as `postgres`, instead of `127.0.0.1`. See the [Docker
> Compose deployment guide](/self-hosting/docker-compose).
```bash
# Database
DATABASE_URL='postgresql://hatchet:hatchet@127.0.0.1:5431/hatchet'
# Encryption (using key files - recommended for development)
SERVER_ENCRYPTION_MASTER_KEYSET_FILE=./keys/master.key
SERVER_ENCRYPTION_JWT_PRIVATE_KEYSET_FILE=./keys/private_ec256.key
SERVER_ENCRYPTION_JWT_PUBLIC_KEYSET_FILE=./keys/public_ec256.key
# Authentication
SERVER_AUTH_COOKIE_SECRETS="your-secret-key-1 your-secret-key-2"
SERVER_AUTH_SET_EMAIL_VERIFIED=true
# Basic server config
SERVER_PORT=8080
SERVER_URL=http://localhost:8080
# Development settings (optional but recommended)
SERVER_GRPC_INSECURE=true
SERVER_INTERNAL_CLIENT_BASE_STRATEGY=none
SERVER_LOGGER_LEVEL=error
SERVER_LOGGER_FORMAT=console
DATABASE_LOGGER_LEVEL=error
DATABASE_LOGGER_FORMAT=console
```
Generate encryption keys with:
```bash
go run ./cmd/hatchet-admin keyset create-local-keys --key-dir ./keys
```
## Runtime Configuration
Variables marked with ⚠️ are conditionally required when specific features are enabled.
Variable, Description, Default Value
`SERVER_PORT`, Port for the core server, `8080`
`SERVER_URL`, Full server URL, including protocol, `http://localhost:8080`
`SERVER_GRPC_PORT`, Port for the GRPC service, `7070`
`SERVER_GRPC_BIND_ADDRESS`, GRPC server bind address, `127.0.0.1`
`SERVER_GRPC_BROADCAST_ADDRESS`, GRPC server broadcast address, `127.0.0.1:7070`
`SERVER_GRPC_INSECURE`, Controls if the GRPC server is insecure, `false`
`SERVER_ENFORCE_LIMITS`, Enforce tenant limits, `false`
`SERVER_ALLOW_SIGNUP`, Allow new tenant signups, `true`
`SERVER_ALLOW_INVITES`, Allow new invites, `true`
`SERVER_ALLOW_CREATE_TENANT`, Allow tenant creation, `true`
`SERVER_ALLOW_CHANGE_PASSWORD`, Allow password changes, `true`
`SERVER_HEALTHCHECK`, Enable healthcheck endpoint, `true`
`SERVER_HEALTHCHECK_PORT`, Healthcheck port, `8733`
`SERVER_GRPC_MAX_MSG_SIZE`, gRPC max message size, `4194304`
`SERVER_GRPC_RATE_LIMIT`, gRPC rate limit, `1000`
`SCHEDULER_CONCURRENCY_RATE_LIMIT`, Scheduler concurrency rate limit, `20`
`SCHEDULER_CONCURRENCY_POLLING_MIN_INTERVAL`, Minimum concurrency polling interval, `500ms`
`SCHEDULER_CONCURRENCY_POLLING_MAX_INTERVAL`, Maximum concurrency polling interval, `5s`
`SCHEDULER_ADVISORY_LOCK_TIMEOUT`, Timeout for in-memory advisory lock, `5s`
`SERVER_SERVICES`, Services to run, `["all"]`
`SERVER_PAUSED_CONTROLLERS`, Paused controllers
`SERVER_ENABLE_DATA_RETENTION`, Enable data retention, `true`
`SERVER_ENABLE_WORKER_RETENTION`, Enable worker retention, `false`
`SERVER_MAX_PENDING_INVITES`, Max pending invites, `100`
`SERVER_DISABLE_TENANT_PUBS`, Disable tenant pubsub
`SERVER_MAX_INTERNAL_RETRY_COUNT`, Max internal retry count, `10`
`SERVER_PREVENT_TENANT_VERSION_UPGRADE`, Prevent tenant version upgrades, `false`
`SERVER_DEFAULT_ENGINE_VERSION`, Default engine version, `V1`
`SERVER_REPLAY_ENABLED`, Enable task replay, `true`
## Database Configuration
> **Info:** In Docker Compose deployments, use the database service name in `DATABASE_URL`
> rather than `127.0.0.1`. Inside a container, `127.0.0.1` refers to the
> container itself. The localhost defaults shown in this section are intended
> for local development on the same host.
Variable, Description, Default Value
`DATABASE_URL`, PostgreSQL connection string constructed from database settings if unset
`DATABASE_POSTGRES_HOST`, PostgreSQL host, `127.0.0.1`
`DATABASE_POSTGRES_PORT`, PostgreSQL port, `5431`
`DATABASE_POSTGRES_USERNAME`, PostgreSQL username, `hatchet`
`DATABASE_POSTGRES_PASSWORD`, PostgreSQL password, `hatchet`
`DATABASE_POSTGRES_DB_NAME`, PostgreSQL database name, `hatchet`
`DATABASE_POSTGRES_SSL_MODE`, PostgreSQL SSL mode, `disable`
`DATABASE_MAX_CONNS`, Max database connections, `50`
`DATABASE_MIN_CONNS`, Min database connections, `10`
`DATABASE_MAX_QUEUE_CONNS`, Max queue connections, `50`
`DATABASE_MIN_QUEUE_CONNS`, Min queue connections, `10`
`DATABASE_MAX_CONN_LIFETIME`, Max lifetime of a connection, `15m`
`DATABASE_MAX_CONN_IDLE_TIME`, Max time a connection can be idle before being closed, `1m`
`DATABASE_LOG_QUERIES`, Log database queries, `false`
`DATABASE_PGBOUNCER_ENABLED`, Enable pgbouncer support; requires `DATABASE_DIRECT_URL` to be set, `false`
`DATABASE_DIRECT_URL`, Direct PostgreSQL connection string bypassing pgbouncer for DDL operations
`DATABASE_DIRECT_MAX_CONNS`, Max connections for the direct (non-pgbouncer) pool, `2`
`DATABASE_DIRECT_MIN_CONNS`, Min connections for the direct (non-pgbouncer) pool, `1`
`CACHE_DURATION`, Cache duration, `5s`
`ADMIN_EMAIL`, Admin email for seeding, `admin@example.com`
`ADMIN_PASSWORD`, Admin password for seeding, `Admin123!!`
`ADMIN_NAME`, Admin name for seeding, `Admin`
`DEFAULT_TENANT_NAME`, Default tenant name, `Default`
`DEFAULT_TENANT_SLUG`, Default tenant slug, `default`
`DEFAULT_TENANT_ID`, Default tenant ID
`SEED_DEVELOPMENT`, Development seeding flag
`READ_REPLICA_ENABLED`, Enable read replica, `false`
`READ_REPLICA_DATABASE_URL`, Read replica database URL
`READ_REPLICA_MAX_CONNS`, Read replica max connections, `50`
`READ_REPLICA_MIN_CONNS`, Read replica min connections, `10`
`DATABASE_LOGGER_LEVEL`, Database logger level
`DATABASE_LOGGER_FORMAT`, Database logger format
## Security Check Configuration
Variable, Description, Default Value
`SERVER_SECURITY_CHECK_ENABLED`, Enable security check, `true`
`SERVER_SECURITY_CHECK_ENDPOINT`, Security check endpoint, `https://security.hatchet.run`
## Limit Configuration
Variable, Description, Default Value
`SERVER_LIMITS_DEFAULT_TENANT_RETENTION_PERIOD`, Default tenant retention period, `720h`
`SERVER_LIMITS_DEFAULT_WORKER_LIMIT`, Default worker limit, `4`
`SERVER_LIMITS_DEFAULT_WORKER_ALARM_LIMIT`, Default worker alarm limit, `2`
`SERVER_LIMITS_DEFAULT_EVENT_LIMIT`, Default event limit, `1000`
`SERVER_LIMITS_DEFAULT_EVENT_ALARM_LIMIT`, Default event alarm limit, `750`
`SERVER_LIMITS_DEFAULT_EVENT_WINDOW`, Default event window, `24h`
`SERVER_LIMITS_DEFAULT_CRON_LIMIT`, Default cron limit, `5`
`SERVER_LIMITS_DEFAULT_CRON_ALARM_LIMIT`, Default cron alarm limit, `2`
`SERVER_LIMITS_DEFAULT_SCHEDULE_LIMIT`, Default schedule limit, `1000`
`SERVER_LIMITS_DEFAULT_SCHEDULE_ALARM_LIMIT`, Default schedule alarm limit, `750`
`SERVER_LIMITS_DEFAULT_TASK_RUN_LIMIT`, Default task run limit, `2000`
`SERVER_LIMITS_DEFAULT_TASK_RUN_ALARM_LIMIT`, Default task run alarm limit, `1500`
`SERVER_LIMITS_DEFAULT_TASK_RUN_WINDOW`, Default task run window, `24h`
`SERVER_LIMITS_DEFAULT_WORKER_SLOT_LIMIT`, Default worker slot limit, `4000`
`SERVER_LIMITS_DEFAULT_WORKER_SLOT_ALARM_LIMIT`, Default worker slot alarm limit, `3000`
## Alerting Configuration
Variable, Description, Default Value
`SERVER_ALERTING_SENTRY_ENABLED`, Enable Sentry for alerting
`SERVER_ALERTING_SENTRY_DSN`, Sentry DSN
`SERVER_ALERTING_SENTRY_ENVIRONMENT`, Sentry environment, `development`
`SERVER_ALERTING_SENTRY_SAMPLE_RATE`, Sentry sample rate, `1.0`
`SERVER_ANALYTICS_POSTHOG_ENABLED`, Enable PostHog analytics
`SERVER_ANALYTICS_POSTHOG_API_KEY`, PostHog API key
`SERVER_ANALYTICS_POSTHOG_ENDPOINT`, PostHog endpoint
`SERVER_ANALYTICS_POSTHOG_FE_API_HOST`, PostHog frontend API host
`SERVER_ANALYTICS_POSTHOG_FE_API_KEY`, PostHog frontend API key
`SERVER_PYLON_ENABLED`, Enable Pylon
`SERVER_PYLON_APP_ID` ⚠️, Pylon app ID (required if Pylon enabled)
`SERVER_PYLON_SECRET`, Pylon secret
## Encryption Configuration
Variable, Description, Default Value
`SERVER_ENCRYPTION_MASTER_KEYSET`, Raw master keyset, base64-encoded JSON string
`SERVER_ENCRYPTION_MASTER_KEYSET_FILE`, Path to the master keyset file
`SERVER_ENCRYPTION_JWT_PUBLIC_KEYSET`, Public JWT keyset, base64-encoded JSON string
`SERVER_ENCRYPTION_JWT_PUBLIC_KEYSET_FILE`, Path to the public JWT keyset file
`SERVER_ENCRYPTION_JWT_PRIVATE_KEYSET`, Private JWT keyset, base64-encoded JSON string
`SERVER_ENCRYPTION_JWT_PRIVATE_KEYSET_FILE`, Path to the private JWT keyset file
`SERVER_ENCRYPTION_CLOUDKMS_ENABLED`, Whether Google Cloud KMS is enabled, `false`
`SERVER_ENCRYPTION_CLOUDKMS_KEY_URI`, URI of the key in Google Cloud KMS
`SERVER_ENCRYPTION_CLOUDKMS_CREDENTIALS_JSON`, JSON credentials for Google Cloud KMS
## Authentication Configuration
Variable, Description, Default Value
`SERVER_AUTH_RESTRICTED_EMAIL_DOMAINS`, Restricted email domains
`SERVER_AUTH_BASIC_AUTH_ENABLED`, Whether basic auth is enabled, `true`
`SERVER_AUTH_SET_EMAIL_VERIFIED`, Whether the user's email is set to verified automatically, `false`
`SERVER_AUTH_COOKIE_NAME`, Name of the cookie, `hatchet`
`SERVER_AUTH_COOKIE_DOMAIN`, Domain for the cookie
`SERVER_AUTH_COOKIE_SECRETS`, Cookie secrets
`SERVER_AUTH_COOKIE_INSECURE`, Whether the cookie is insecure, `false`
`SERVER_AUTH_GOOGLE_ENABLED`, Whether Google auth is enabled, `false`
`SERVER_AUTH_GOOGLE_CLIENT_ID` ⚠️, Google auth client ID (required if Google auth enabled)
`SERVER_AUTH_GOOGLE_CLIENT_SECRET` ⚠️, Google auth client secret (required if Google auth enabled)
`SERVER_AUTH_GOOGLE_SCOPES`, Google auth scopes, `["openid", "profile", "email"]`
`SERVER_AUTH_GITHUB_ENABLED`, Whether GitHub auth is enabled, `false`
`SERVER_AUTH_GITHUB_CLIENT_ID` ⚠️, GitHub auth client ID (required if GitHub auth enabled)
`SERVER_AUTH_GITHUB_CLIENT_SECRET` ⚠️, GitHub auth client secret (required if GitHub auth enabled)
`SERVER_AUTH_GITHUB_SCOPES`, GitHub auth scopes, `["read:user", "user:email"]`
## Task Queue Configuration
Variable, Description, Default Value
`SERVER_MSGQUEUE_KIND`, Message queue kind, `rabbitmq`
`SERVER_MSGQUEUE_RABBITMQ_URL`, RabbitMQ URL
`SERVER_MSGQUEUE_RABBITMQ_QOS`, RabbitMQ QoS, `100`
`SERVER_REQUEUE_LIMIT`, Requeue limit, `100`
`SERVER_SINGLE_QUEUE_LIMIT`, Single queue limit, `100`
`SERVER_UPDATE_HASH_FACTOR`, Update hash factor, `100`
`SERVER_UPDATE_CONCURRENT_FACTOR`, Update concurrent factor, `10`
## TLS Configuration
Variable, Description, Default Value
`SERVER_TLS_STRATEGY`, TLS strategy
`SERVER_TLS_CERT`, TLS certificate
`SERVER_TLS_CERT_FILE`, Path to the TLS certificate file
`SERVER_TLS_KEY`, TLS key
`SERVER_TLS_KEY_FILE`, Path to the TLS key file
`SERVER_TLS_ROOT_CA`, TLS root CA
`SERVER_TLS_ROOT_CA_FILE`, Path to the TLS root CA file
`SERVER_TLS_SERVER_NAME`, TLS server name
`SERVER_INTERNAL_CLIENT_BASE_STRATEGY`, Internal client TLS strategy
`SERVER_INTERNAL_CLIENT_BASE_INHERIT_BASE`, Inherit base TLS config, `true`
`SERVER_INTERNAL_CLIENT_TLS_BASE_CERT`, Internal client TLS cert
`SERVER_INTERNAL_CLIENT_TLS_BASE_CERT_FILE`, Internal client TLS cert file
`SERVER_INTERNAL_CLIENT_TLS_BASE_KEY`, Internal client TLS key
`SERVER_INTERNAL_CLIENT_TLS_BASE_KEY_FILE`, Internal client TLS key file
`SERVER_INTERNAL_CLIENT_TLS_BASE_ROOT_CA`, Internal client TLS root CA
`SERVER_INTERNAL_CLIENT_TLS_BASE_ROOT_CA_FILE`, Internal client TLS root CA file
`SERVER_INTERNAL_CLIENT_TLS_SERVER_NAME`, Internal client TLS server name
`SERVER_INTERNAL_CLIENT_INTERNAL_GRPC_BROADCAST_ADDRESS`, Internal gRPC broadcast address
## Logging Configuration
Variable, Description, Default Value
`SERVER_LOGGER_LEVEL`, Logger level
`SERVER_LOGGER_FORMAT`, Logger format
`SERVER_LOG_INGESTION_ENABLED`, Enable log ingestion, `true`
`SERVER_ADDITIONAL_LOGGERS_QUEUE_LEVEL`, Queue logger level
`SERVER_ADDITIONAL_LOGGERS_QUEUE_FORMAT`, Queue logger format
`SERVER_ADDITIONAL_LOGGERS_PGXSTATS_LEVEL`, PGX stats logger level
`SERVER_ADDITIONAL_LOGGERS_PGXSTATS_FORMAT`, PGX stats logger format
## OpenTelemetry Configuration
Variable, Description, Default Value
`SERVER_OTEL_SERVICE_NAME`, Service name for OpenTelemetry
`SERVER_OTEL_COLLECTOR_URL`, Collector URL for OpenTelemetry
`SERVER_OTEL_INSECURE`, Whether to use an insecure connection to the collector URL
`SERVER_OTEL_TRACE_ID_RATIO`, OpenTelemetry trace ID ratio
`SERVER_OTEL_COLLECTOR_AUTH`, OpenTelemetry Collector Authorization header value
`SERVER_OTEL_METRICS_ENABLED`, Enable OpenTelemetry metrics collection, `false`
`SERVER_PROMETHEUS_ENABLED`, Enable Prometheus, `false`
`SERVER_PROMETHEUS_ADDRESS`, Prometheus address, `:9090`
`SERVER_PROMETHEUS_PATH`, Prometheus metrics path, `/metrics`
`SERVER_PROMETHEUS_SERVER_URL`, Prometheus server URL
`SERVER_PROMETHEUS_SERVER_USERNAME`, Prometheus server username
`SERVER_PROMETHEUS_SERVER_PASSWORD`, Prometheus server password
## Tenant Alerting Configuration
Variable, Description, Default Value
`SERVER_TENANT_ALERTING_SLACK_ENABLED`, Enable Slack for tenant alerting
`SERVER_TENANT_ALERTING_SLACK_CLIENT_ID`, Slack client ID
`SERVER_TENANT_ALERTING_SLACK_CLIENT_SECRET`, Slack client secret
`SERVER_TENANT_ALERTING_SLACK_SCOPES`, Slack scopes, `["incoming-webhook"]`
`SERVER_EMAIL_KIND`, Email integration kind, `postmark`
`SERVER_EMAIL_POSTMARK_ENABLED`, Enable Postmark
`SERVER_EMAIL_POSTMARK_SERVER_KEY`, Postmark server key
`SERVER_EMAIL_POSTMARK_FROM_EMAIL`, Postmark from email
`SERVER_EMAIL_POSTMARK_FROM_NAME`, Postmark from name, `Hatchet Support`
`SERVER_EMAIL_POSTMARK_SUPPORT_EMAIL`, Postmark support email
`SERVER_EMAIL_SMTP_ENABLED`, Enable SMTP
`SERVER_EMAIL_SMTP_SERVER_ADDR`, SMTP server address
`SERVER_EMAIL_SMTP_FROM_EMAIL`, SMTP from email
`SERVER_EMAIL_SMTP_FROM_NAME`, SMTP from name, `Hatchet Support`
`SERVER_EMAIL_SMTP_SUPPORT_EMAIL`, SMTP support email
`SERVER_EMAIL_SMTP_AUTH_USERNAME`, SMTP authentication username
`SERVER_EMAIL_SMTP_AUTH_PASSWORD`, SMTP authentication password
`SERVER_MONITORING_ENABLED`, Enable monitoring, `true`
`SERVER_MONITORING_PERMITTED_TENANTS`, Permitted tenants for monitoring
`SERVER_MONITORING_PROBE_TIMEOUT`, Monitoring probe timeout, `30s`
`SERVER_MONITORING_TLS_ROOT_CA_FILE`, Monitoring TLS root CA file
`SERVER_SAMPLING_ENABLED`, Enable sampling, `false`
`SERVER_SAMPLING_RATE`, Sampling rate, `1.0`
`SERVER_OPERATIONS_JITTER`, Operations jitter in milliseconds, `0`
`SERVER_OPERATIONS_POLL_INTERVAL`, Operations poll interval in seconds, `2`
## Cron Operations Configuration
Variable, Description, Default Value
`SERVER_CRON_OPERATIONS_TASK_ANALYZE_CRON_INTERVAL`, Interval for running ANALYZE on task-related tables, `3h`
`SERVER_CRON_OPERATIONS_OLAP_ANALYZE_CRON_INTERVAL`, Interval for running ANALYZE on OLAP/analytics tables, `3h`
`SERVER_CRON_OPERATIONS_DB_HEALTH_METRICS_INTERVAL`, Interval for collecting database health metrics (OTel), `60s`
`SERVER_CRON_OPERATIONS_OLAP_METRICS_INTERVAL`, Interval for collecting OLAP metrics (OTel), `5m`
`SERVER_CRON_OPERATIONS_WORKER_METRICS_INTERVAL`, Interval for collecting worker metrics (OTel), `60s`
`SERVER_CRON_OPERATIONS_YESTERDAY_RUN_COUNT_HOUR`, Hour (0-23) at which to collect yesterday's workflow run count (OTel), `0`
`SERVER_CRON_OPERATIONS_YESTERDAY_RUN_COUNT_MINUTE`, Minute (0-59) at which to collect yesterday's workflow run count, `5`
`SERVER_WAIT_FOR_FLUSH`, Default wait for flush, `1ms`
`SERVER_MAX_CONCURRENT`, Default max concurrent, `50`
`SERVER_FLUSH_PERIOD_MILLISECONDS`, Default flush period, `10ms`
`SERVER_FLUSH_ITEMS_THRESHOLD`, Default flush threshold, `100`
`SERVER_FLUSH_STRATEGY`, Default flush strategy, `DYNAMIC`
## OLAP Database Configuration
Variable, Description, Default Value
`SERVER_OLAP_STATUS_UPDATE_DAG_BATCH_SIZE_LIMIT`, Batch size limit for running DAG status updates, `1000`
`SERVER_OLAP_STATUS_UPDATE_TASK_BATCH_SIZE_LIMIT`, Batch size limit for running task status updates, `1000`
---
## Prometheus Metrics for Hatchet
> **Warning:** Only works with v1 tenants
This document provides an overview of the Prometheus metrics exposed by Hatchet, setup instructions for the metrics endpoint, and example PromQL queries to analyze them.
### Setup
To enable Prometheus metrics for your Hatchet instance, you can set the following environment variables. The corresponding configuration YAML values are mentioned in parantheses. If you are deploying [Hatchet in HA mode](/self-hosting/high-availability), these should be set on both the `controllers` as well as `scheduler` deployments.
- Required
- **`SERVER_PROMETHEUS_ENABLED`** (`prometheus.enabled`)
- Default: `false`
- Description: Enables or disables the Prometheus metrics HTTP server.
- Optional
- **`SERVER_PROMETHEUS_ADDRESS`** (`prometheus.address`)
- Default: `":9090"`
- Description: The network address and port to bind the Prometheus metrics server to.
- **`SERVER_PROMETHEUS_PATH`** (`prometheus.path`)
- Default: `"/metrics"`
- Description: The HTTP path at which metrics will be exposed.
Once enabled, you can setup any scraper that supports ingesting Prometheus metrics.
#### Tenant metrics endpoint
> **Info:** This step requires communication with a service that scrapes Hatchet
> Prometheus metrics.
To enable the [tenant API endpoint](/v1/prometheus-metrics) you can set the following environment variables:
- Required
- **`SERVER_PROMETHEUS_SERVER_URL`** (`prometheus.prometheusServerURL`)
- Description: The Prometheus server URL.
- Optional
- **`SERVER_PROMETHEUS_SERVER_USERNAME`** (`prometheus.prometheusServerUsername`)
- Description: The username to access the Prometheus instance via HTTP basic auth.
- **`SERVER_PROMETHEUS_SERVER_PASSWORD`** (`prometheus.prometheusServerPassword`)
- Description: The password to access the Prometheus instance via HTTP basic auth.
**Example environment setup:**
```bash
export SERVER_PROMETHEUS_ENABLED=true
export SERVER_PROMETHEUS_ADDRESS=":9999"
export SERVER_PROMETHEUS_PATH="/custom-metrics"
```
Restart your Hatchet server after setting these variables to apply the changes.
---
### Global Metrics
Metric Name, Type, Description
`hatchet_queue_invocations_total`, Counter, The total number of invocations of the queuer function
`hatchet_created_tasks_total`, Counter, The total number of tasks created
`hatchet_retried_tasks_total`, Counter, The total number of tasks retried
`hatchet_succeeded_tasks_total`, Counter, The total number of tasks that succeeded
`hatchet_failed_tasks_total`, Counter, The total number of tasks that failed (in a final state, not including retries)
`hatchet_skipped_tasks_total`, Counter, The total number of tasks that were skipped
`hatchet_cancelled_tasks_total`, Counter, The total number of tasks cancelled
`hatchet_assigned_tasks_total`, Counter, The total number of tasks assigned to a worker
`hatchet_scheduling_timed_out_total`, Counter, The total number of tasks that timed out while waiting to be scheduled
`hatchet_rate_limited_total`, Counter, The total number of tasks that were rate limited
`hatchet_queued_to_assigned_total`, Counter, The total number of unique tasks that were queued and later assigned to a worker
`hatchet_queued_to_assigned_seconds`, Histogram, Buckets of time (in seconds) spent in the queue before being assigned to a worker
`hatchet_reassigned_tasks_total`, Counter, The total number of tasks that were reassigned to a worker
#### Example PromQL Queries
##### 1. Rate of calls to the queuer method
```promql
rate(hatchet_queue_invocations_total[5m])
```
##### 2. Average queue time in milliseconds
```promql
# Calculates average queue time over the past 5 minutes, converted to ms
rate(hatchet_queued_to_assigned_seconds_sum[5m])
/ rate(hatchet_queued_to_assigned_seconds_count[5m])
* 1e3
```
##### 3. Success and failure rates
```promql
rate(hatchet_succeeded_tasks_total[5m])
rate(hatchet_failed_tasks_total[5m])
```
##### 4. Queue time distribution (histogram)
```promql
sum by (le) (
rate(hatchet_queued_to_assigned_seconds_bucket[5m])
)
```
##### 5. Rate of tasks created vs. retried
```promql
rate(hatchet_created_tasks_total[5m])
rate(hatchet_retried_tasks_total[5m])
```
##### 6. Task Assignment Rate
```promql
rate(hatchet_assigned_tasks_total[5m])
```
##### 7. Scheduling Timeout Rate
```promql
rate(hatchet_scheduling_timed_out_total[5m])
```
##### 8. Rate Limiting Impact
```promql
rate(hatchet_rate_limited_total[5m])
```
##### 9. Task Completion Ratio (Success vs Total)
```promql
rate(hatchet_succeeded_tasks_total[5m])
/
(rate(hatchet_succeeded_tasks_total[5m]) + rate(hatchet_failed_tasks_total[5m]))
```
##### 10. Task Cancellation Rate
```promql
rate(hatchet_cancelled_tasks_total[5m])
```
##### 11. Task Skip Rate
```promql
rate(hatchet_skipped_tasks_total[5m])
```
##### 12. Queue Processing Efficiency (Assigned vs Created)
```promql
rate(hatchet_assigned_tasks_total[5m]) / rate(hatchet_created_tasks_total[5m])
```
##### 13. Task Reassignment Rate
```promql
rate(hatchet_reassigned_tasks_total[5m])
```
### Tenant Metrics
Metric Name, Type, Description
`hatchet_tenant_workflow_duration_milliseconds`, Histogram, Duration of workflow execution in milliseconds (DAGs and single tasks)
`hatchet_tenant_queue_invocations_total`, Counter, The total number of invocations of the queuer function
`hatchet_tenant_created_tasks_total`, Counter, The total number of tasks created
`hatchet_tenant_retried_tasks_total`, Counter, The total number of tasks retried
`hatchet_tenant_succeeded_tasks_total`, Counter, The total number of tasks that succeeded
`hatchet_tenant_failed_tasks_total`, Counter, The total number of tasks that failed (in a final state, not including retries)
`hatchet_tenant_skipped_tasks_total`, Counter, The total number of tasks that were skipped
`hatchet_tenant_cancelled_tasks_total`, Counter, The total number of tasks cancelled
`hatchet_tenant_assigned_tasks`, Counter, The total number of tasks assigned to a worker
`hatchet_tenant_scheduling_timed_out`, Counter, The total number of tasks that timed out while waiting to be scheduled
`hatchet_tenant_rate_limited`, Counter, The total number of tasks that were rate limited
`hatchet_tenant_queued_to_assigned`, Counter, The total number of unique tasks that were queued and later got assigned to a worker
`hatchet_tenant_queued_to_assigned_time_seconds`, Histogram, Buckets of time in seconds spent in the queue before being assigned to a worker
`hatchet_tenant_reassigned_tasks`, Counter, The total number of tasks that were reassigned to a worker
`hatchet_tenant_used_worker_slots`, Gauge, The current number of worker slots being used
`hatchet_tenant_available_worker_slots`, Gauge, The current number of worker slots available (free)
`hatchet_tenant_worker_slots`, Gauge, The total number of worker slots (free + used)
#### Example PromQL Queries
##### 1. Workflow Duration by Tenant and Status
```promql
rate(hatchet_tenant_workflow_duration_milliseconds_sum[5m])
by (tenant_id, workflow_name, status)
/
rate(hatchet_tenant_workflow_duration_milliseconds_count[5m])
by (tenant_id, workflow_name, status)
```
##### 2. Tenant Queue Performance (95th percentile)
```promql
histogram_quantile(0.95,
rate(hatchet_tenant_queued_to_assigned_time_seconds_bucket[5m])
) by (tenant_id)
```
##### 3. Tenant Error Rate by Workflow
```promql
rate(hatchet_tenant_failed_tasks_total[5m]) by (tenant_id)
/
rate(hatchet_tenant_created_tasks_total[5m]) by (tenant_id)
```
##### 4. Tenant Task Throughput
```promql
rate(hatchet_tenant_succeeded_tasks_total[5m]) by (tenant_id)
```
##### 5. Tenant Retry Rate
```promql
rate(hatchet_tenant_retried_tasks_total[5m]) by (tenant_id)
/
rate(hatchet_tenant_created_tasks_total[5m]) by (tenant_id)
```
##### 6. Workflow Duration Distribution by Tenant
```promql
sum by (tenant_id, le) (
rate(hatchet_tenant_workflow_duration_milliseconds_bucket[5m])
)
```
##### 7. Tenant Rate Limiting Impact
```promql
rate(hatchet_tenant_rate_limited[5m]) by (tenant_id)
```
##### 8. Per-Tenant Queue Utilization
```promql
rate(hatchet_tenant_queue_invocations_total[5m]) by (tenant_id)
```
##### 9. Tenant Scheduling Timeouts
```promql
rate(hatchet_tenant_scheduling_timed_out[5m]) by (tenant_id)
```
##### 10. Tenant Task Assignment Success Rate
```promql
rate(hatchet_tenant_assigned_tasks[5m]) by (tenant_id)
/
rate(hatchet_tenant_created_tasks_total[5m]) by (tenant_id)
```
##### 11. Tenant Task Reassignment Rate
```promql
rate(hatchet_tenant_reassigned_tasks[5m]) by (tenant_id)
```
### Cross-Tenant Analysis
#### Example PromQL Queries
##### 1. Top 5 Tenants by Task Volume
```promql
topk(5,
sum by (tenant_id) (
rate(hatchet_tenant_created_tasks_total[1h])
)
)
```
##### 2. Slowest Workflows Across All Tenants
```promql
topk(10,
rate(hatchet_tenant_workflow_duration_milliseconds_sum[5m])
/
rate(hatchet_tenant_workflow_duration_milliseconds_count[5m])
) by (tenant_id, workflow_name)
```
##### 3. Tenant Resource Consumption Comparison
```promql
sum by (tenant_id) (
rate(hatchet_tenant_workflow_duration_milliseconds_sum[1h])
)
/ 1000 / 60 # Convert to minutes
```
### Integration with Prometheus
This endpoint can be used to configure Prometheus to scrape tenant-specific metrics:
```yaml
scrape_configs:
- job_name: "hatchet-tenant-metrics"
static_configs:
- targets: ["cloud.onhatchet.run"]
metrics_path: "/api/v1/tenants/707d0855-80ab-4e1f-a156-f1c4546cbf52/prometheus-metrics"
scheme: "https"
authorization:
credentials: "your-api-token-here"
```
**Note:** Replace `cloud.onhatchet.run` with the URL where your Hatchet instance is hosted.
This provides tenant-isolated metrics that can be scraped directly by Prometheus or consumed by other monitoring tools that support the Prometheus text format.
---
# Worker Configuration Options
The Hatchet worker can be configured via environment variables and programmatic options. This document contains a list of all available options.
## Basic Configuration
Variable, Description, Default Value
`HATCHET_CLIENT_TOKEN`, Authentication token for the worker
`HATCHET_CLIENT_HOST_PORT`, GRPC server host and port, \* Inherited from token
`HATCHET_CLIENT_API_URL` (TypeScript SDK), API server host and port, \* Inherited from token
`HATCHET_CLIENT_SERVER_URL` (Go SDK), API server host and port, \* Inherited from token
`HATCHET_CLIENT_NAMESPACE`, Namespace prefix for the worker, \* Inherited from token
## Worker Runtime Configuration
Variable, Description, Default Value
`name`, Friendly name of the worker
`slots`, Maximum number of concurrent runs, `100`
`durable_slots`, Maximum number of concurrent durable tasks, `1000`
## Worker healthcheck server (Python SDK)
These variables enable a local HTTP server that exposes `/health` and `/metrics` for a running worker.
Variable, Description, Default Value
`HATCHET_CLIENT_WORKER_HEALTHCHECK_ENABLED`, Enable the local worker healthcheck server, `false`
`HATCHET_CLIENT_WORKER_HEALTHCHECK_PORT`, Port for the local worker healthcheck server, `8001`
`HATCHET_CLIENT_WORKER_HEALTHCHECK_EVENT_LOOP_BLOCK_THRESHOLD_SECONDS`, If the worker listener process event loop is blocked longer than this threshold, `/health` returns 503, `5.0`
## TLS Configuration
Variable, Description, Default Value
`HATCHET_CLIENT_TLS_STRATEGY`, TLS strategy (tls, mtls, none), `tls`
`HATCHET_CLIENT_TLS_CERT_FILE`, Path to TLS certificate file
`HATCHET_CLIENT_TLS_KEY_FILE`, Path to TLS key file
`HATCHET_CLIENT_TLS_ROOT_CA_FILE`, Path to TLS root CA file
`HATCHET_CLIENT_TLS_SERVER_NAME`, TLS server name
## Logging Configuration
Variable, Description, Default Value
`HATCHET_CLIENT_LOG_LEVEL`, Log level for the worker client, `WARN`
`HATCHET_CLIENT_LOG_FORMAT`, Log format for the worker client, `console`
`HATCHET_CLIENT_GRPC_MAX_RECV_MESSAGE_LENGTH`, Maximum gRPC message receive size (Python SDK only), `4MB`
`HATCHET_CLIENT_GRPC_MAX_SEND_MESSAGE_LENGTH`, Maximum gRPC message send size (Python SDK only), `4MB`
---
# Upgrading and Downgrading Hatchet
This guide covers how to safely upgrade and downgrade self-hosted Hatchet instances, with strategies for production-critical workloads.
## Overview
For production-critical deployments, we recommend the following workflow:
1. **Snapshot** your database before upgrading
2. **Upgrade** the Hatchet engine to the new version
3. **Verify** the upgrade is working as expected
4. If something goes wrong, **downgrade** by restoring the snapshot or running down migrations
## Step 1: Take a Database Snapshot
Before any version change, create a point-in-time snapshot of your database. This gives you a fast, reliable rollback path if the upgrade causes issues.
Refer to the backup and restore documentation for your database provider:
- **PostgreSQL (self-managed):** [Backup and Restore](https://www.postgresql.org/docs/current/backup.html)
- **AWS RDS:** [Backing Up and Restoring](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_CommonTasks.BackupRestore.html)
- **Google Cloud SQL:** [Backup and Recovery](https://docs.cloud.google.com/sql/docs/postgres/backup-recovery/backups)
- **Azure Database for PostgreSQL:** [Backup](https://learn.microsoft.com/en-us/azure/backup/backup-azure-database-postgresql)
## Step 2: Upgrade Engine Versions
Once your snapshot is in place, upgrade the Hatchet engine.
> **Info:** Hatchet runs database migrations automatically on engine startup. No separate
> migration step is required when upgrading.
### Docker Compose
Update the image tags in your `docker-compose.yml`:
```yaml
services:
hatchet-engine:
image: ghcr.io/hatchet-dev/hatchet/hatchet-engine:v0.78.26
# ... rest of configuration
hatchet-dashboard:
image: ghcr.io/hatchet-dev/hatchet/hatchet-dashboard:v0.78.26
# ... rest of configuration
```
Then redeploy:
> **Warning:** This can cause some downtime till the containers are back up.
```bash
docker-compose pull
docker-compose down
docker-compose up -d
```
### Kubernetes (Helm)
The Hatchet Helm charts use a `sharedConfig.image.tag` value that sets the image tag for all components (engine, API, frontend, migrations). Set this to the target Hatchet version:
```yaml
# values.yaml
sharedConfig:
image:
tag: "v0.78.26"
```
Then upgrade the release:
```bash
# hatchet-stack (standard deployment)
helm upgrade hatchet ./charts/hatchet-stack \
--namespace hatchet \
--values values.yaml
# hatchet-ha (high-availability deployment)
helm upgrade hatchet ./charts/hatchet-ha \
--namespace hatchet \
--values values.yaml
```
> **Info:** You can also override individual component tags (e.g., `engine.image.tag`,
> `frontend.image.tag`), but `sharedConfig.image.tag` takes precedence when set.
### Verification
After upgrading, verify the deployment is healthy:
1. Check that the engine is running and accepting connections
2. Confirm the dashboard loads and shows the correct version
3. Run a test workflow to verify end-to-end functionality
4. Monitor logs for migration errors or unexpected warnings
```bash
# Docker Compose
docker-compose logs hatchet-engine | head -50
# Kubernetes
kubectl logs -n hatchet -l app=hatchet-engine --tail=50
```
## Step 3: Downgrade if Needed
If the upgrade causes issues, you have two options depending on your situation:
> **Warning:** Both the following options will result in data loss and some downtime.
### Option A: Restore from Database Snapshot (Recommended for Production)
This is the fastest and safest rollback path. It returns your database to the exact state before the upgrade, avoiding any risk of incomplete down migrations.
### Stop all Hatchet services
Shut down all Hatchet engine instances to prevent writes during the restore.
```bash
# Docker Compose
docker-compose down
# Kubernetes
kubectl scale deployment hatchet-engine -n hatchet --replicas=0
```
### Restore the database snapshot
Follow the restore procedure for your database provider (see [Step 1](#step-1-take-a-database-snapshot) for links to the relevant documentation).
### Deploy the previous Hatchet version
Update your deployment to use the previous version's image tags (see [Upgrade Engine Versions](#step-2-upgrade-engine-versions) for the relevant deployment method) and redeploy.
### Verify the rollback
Confirm the engine starts, the dashboard loads, and workflows execute correctly.
### Option B: Run Down Migrations (Manual)
If you don't have a database snapshot or prefer a more targeted rollback, you can run down migrations to revert schema changes. See the [Downgrading DB Schema Manually](/self-hosting/downgrading-db-schema-manually) guide for detailed instructions on:
- Finding the target migration version for your desired Hatchet version
- Running `hatchet-migrate --down `
- Deploying the older engine version
> **Warning:** Down migrations may not fully reverse all data changes (e.g., dropped columns
> lose their data). For production-critical workloads, prefer restoring from a
> snapshot when possible.
---
# Downgrading DB Schema Manually
This guide explains how to safely downgrade your Hatchet DB schema to a previous version.
> **Info:** For production-critical workloads, see the [Upgrading and
> Downgrading](/self-hosting/upgrading-downgrading) guide which covers database
> snapshots, upgrading, and safe rollback strategies.
> **Warning:** Downgrading may result in data loss. Always test downgrades in a
> non-production environment first.
## Overview
Downgrading Hatchet involves two steps:
1. Running down migrations to revert database schema changes
2. Deploying the older Hatchet version
## Prerequisites
- **Critical:** Backup your database before downgrading
- Ensure the target version supports the current data in your database
- Have access to run `hatchet-migrate` command
- Verify that all migrations between your current version and target version have down migrations
## Finding the Target Migration Version
To downgrade to a specific Hatchet version, you need to identify the last migration that was included in that version.
Visit the Hatchet GitHub repository for your target version:
```
https://github.com/hatchet-dev/hatchet/tree/{GIT_TAG}/cmd/hatchet-migrate/migrate/migrations
```
Replace `{GIT_TAG}` with your target version (e.g., `v0.71.0`).
Find the last migration file in that directory - the timestamp at the beginning of the filename is your target migration version.
**Example:**
- Target Hatchet version: `v0.71.0`
- Last migration file: `20250813183355_v1_0_36.sql`
- Migration version: `20250813183355`
## Running Down Migrations
> **Info:** Use a stable release of the `hatchet-migrate` binary (avoid alpha tags) from
> the [Hatchet releases page](https://github.com/hatchet-dev/hatchet/tags) to
> ensure down migrations work correctly.
Once you have identified the target migration version, use the `hatchet-migrate` command with the `--down` flag:
```bash
hatchet-migrate --down 20241023223039
```
This will:
1. Connect to your database using the `DATABASE_URL` environment variable
2. Check the current migration version
3. Run all down migrations from the current version to the target version
4. Display progress and confirm when complete
## Deploying the Older Version
After successfully running the down migrations, deploy the older Hatchet version:
### Docker Compose
Update your `docker-compose.yml`:
```yaml
services:
hatchet-engine:
image: ghcr.io/hatchet-dev/hatchet/hatchet-engine:v0.71.0
# ... rest of configuration
hatchet-dashboard:
image: ghcr.io/hatchet-dev/hatchet/hatchet-dashboard:v0.71.0
# ... rest of configuration
```
Then restart:
```bash
docker-compose down
docker-compose up -d
```
---
# Benchmarking Hatchet
This page provides example benchmarks for Hatchet throughput and latency on an 8 CPU database (Amazon RDS, `m7g.2xlarge` instance type). These benchmarks were all run against a v1 Hatchet engine running version `v0.55.26`. For more information on the setup, see the [Setup](#setup) section. Note that on better hardware, there will be significantly better performance: we have tested up to 10k/s on an `m7g.8xlarge` instance.
The best way to benchmark Hatchet is to run your own benchmarks in your own environment. The benchmarks below are provided as a reference point for what you might expect to see in a typical setup. To run your own benchmarks, see the [Running your own benchmarks](#running-your-own-benchmarks) section.
## Throughput
Below are summarized throughput benchmarks run at different incoming event rates. For each run, we note the database CPU utilization and estimated IOPS, which are the most relevant metrics for tracking performance on the database.
Throughput (runs/s), Database CPU, Database IOPs
100, 15%, 400
500, 60%, 600
2000, 83%, 800
## Latency
Benchmarks run using event-based triggering: this approximately doubles the queueing time of a workflow. The average latency of events in Hatchet can be approximated by two measurements that Hatchet reports:
- **Average execution time per executed event**: The time from when the event starts execution to when it completes.
- **Average write time per event**: The acknowledgement time for Hatchet to write the event.
Below is a table summarizing these latencies:
Throughput (runs/s), Average Execution Time (ms), Average Write Time (ms)
100, ~40, ~2.5
500, ~48, ~2.6
2000, ~220, ~5.7
For workloads up to around 100-500 events per second, the latency remains relatively low. As throughput scales toward 2000 events per second, the overall average execution time increases (though the Hatchet engine remained stable throughout the tests).
## Running your own benchmarks
Hatchet publishes a public load testing container which can be used for benchmarking. This container is available at `ghcr.io/hatchet-dev/hatchet/hatchet-loadtest`. It acts as a Hatchet worker and event emitter, so it simply expects a `HATCHET_CLIENT_TOKEN` to be set in the environment.
For example, to run 100 events/second for 60 seconds, you can use the following command:
```bash
docker run -e HATCHET_CLIENT_TOKEN=your-token ghcr.io/hatchet-dev/hatchet/hatchet-loadtest -e "100" -d "60s" --level "warn" --slots "100"
```
The event emitter which is bundled into the container has difficulty emitting more than 2k events/s. As a result, to test higher throughputs, it is recommended to run multiple containers in parallel. Since each container manages its own workflows and worker, it is recommended to use the `HATCHET_CLIENT_NAMESPACE` environment variable to ensure that workflows are not duplicated across containers. For example:
```bash
# first container
docker run -e HATCHET_CLIENT_TOKEN=your-token -e HATCHET_CLIENT_NAMESPACE=loadtest1 ghcr.io/hatchet-dev/hatchet/hatchet-loadtest -e "2000" -d "60s" --level "warn" --slots "100"
# second container
docker run -e HATCHET_CLIENT_TOKEN=your-token -e HATCHET_CLIENT_NAMESPACE=loadtest2 ghcr.io/hatchet-dev/hatchet/hatchet-loadtest -e "2000" -d "60s" --level "warn" --slots "100"
```
### Reference
This container takes the following arguments:
```sh
Usage:
loadtest [flags]
Flags:
-c, --concurrency int concurrency specifies the maximum events to run at the same time
-D, --delay duration delay specifies the time to wait in each event to simulate slow tasks
-d, --duration duration duration specifies the total time to run the load test (default 10s)
-F, --eventFanout int eventFanout specifies the number of events to fanout (default 1)
-e, --events int events per second (default 10)
-f, --failureRate float32 failureRate specifies the rate of failure for the worker
-h, --help help for loadtest
-l, --level string logLevel specifies the log level (debug, info, warn, error) (default "info")
-P, --payloadSize string payload specifies the size of the payload to send (default "0kb")
-s, --slots int slots specifies the number of slots to use in the worker
-w, --wait duration wait specifies the total time to wait until events complete (default 10s)
-p, --workerDelay duration workerDelay specifies the time to wait before starting the worker
```
### Running a benchmark on Kubernetes
You can use the following Pod manifest to run the load test on Kubernetes (make sure to fill in `HATCHET_CLIENT_TOKEN`):
```yaml
apiVersion: v1
kind: Pod
metadata:
name: loadtest1a
namespace: staging
spec:
restartPolicy: Never
containers:
- image: ghcr.io/hatchet-dev/hatchet/hatchet-loadtest:v0.56.0
imagePullPolicy: Always
name: loadtest
command: ["/hatchet/hatchet-load-test"]
args:
- loadtest
- --duration
- "60s"
- --events
- "100"
- --slots
- "100"
- --wait
- "10s"
- --level
- warn
env:
- name: HATCHET_CLIENT_TOKEN
value: "your-token"
- name: HATCHET_CLIENT_NAMESPACE
value: "loadtest1a"
resources:
limits:
memory: 1Gi
requests:
cpu: 500m
memory: 1Gi
```
## Setup
All tests were run on a Kubernetes cluster on AWS configured with:
- **Hatchet engine replicas:** 2 (using `c7i.4xlarge` instances to ensure CPU was not a bottleneck)
- **Database:** `m7g.2xlarge` instance type (Amazon RDS)
- **Hatchet version:** `v0.55.26`
- **AWS region:** `us-west-1`
The database configuration was chosen to avoid disk and CPU contention until higher throughputs were reached. We observed that up to around 2000 events/second, the chosen database instance size kept up without major performance degradation. The Hatchet engine was deployed with 2 replicas, and each engine instance had ample CPU headroom on `c7i.4xlarge` nodes.
---
# Data Retention
In Hatchet engine version `0.36.0` and above, you can configure the default data retention per tenant for workflow runs and events. The default value is set to 30 days, which means that all workflow runs which were created over 30 days ago and are in a final state (i.e. completed or failed), and all events which were created over 30 days ago, will be deleted.
This can be configured by setting the following environment variable to a Go duration string:
```sh
SERVER_LIMITS_DEFAULT_TENANT_RETENTION_PERIOD=720h # 30 days
```
---
# Tuning Hatchet for Performance
Generally, with a reasonable database instance (4 CPU, 8GB RAM) and small payload sizes, Hatchet can handle hundreds of events and workflow runs per second. However, as throughput increases, you will start to see performance degradation. The most common causes of performance degradation are listed below.
## Database Connection Pooling
The default max connection pool size is 50 per engine instance. If you have a high throughput, you may need to increase this value. This value can be set via the `DATABASE_MAX_CONNS` environment variable on the engine. Note that if you increase this value, you will need to increase the [`max_connections`](https://www.postgresql.org/docs/current/runtime-config-connection.html) value on your Postgres instance as well.
## High Database CPU
Due to the nature of Hatchet workloads, the first bottleneck you will typically see on the database is CPU. If you have access to database query performance metrics, it is worth checking the cause of high CPU. If there is high lock contention on a query, please let the Hatchet team know, as we are looking to reduce lock contention in future releases. Otherwise, if you are seeing high CPU usage without any lock contention, you should increase the number of cores on your database instance.
If you are performing a high number of inserts, particularly in a short period of time, and this correlates with high CPU usage, you can improve performance in several ways by using bulk endpoints or tuning the buffer settings.
### Using bulk endpoints
There are two main ways to initiate workflows, by sending events to Hatchet and by starting workflows directly. In most example workflows, we push a single event or workflow at a time, but it is possible to send multiple events or workflows in one request.
#### Events
#### Python
```python
hatchet.event.bulk_push(
events=[
BulkPushEventWithMetadata(
key="user:create",
payload={"userId": str(i), "should_skip": False},
)
for i in range(10)
]
)
```
#### Typescript
```typescript
const events = [
{
payload: { test: 'test1' },
additionalMetadata: { user_id: 'user1', source: 'test' },
},
{
payload: { test: 'test2' },
additionalMetadata: { user_id: 'user2', source: 'test' },
},
{
payload: { test: 'test3' },
additionalMetadata: { user_id: 'user3', source: 'test' },
},
];
await hatchet.events.bulkPush('user:create', events);
```
#### Go
```go
c, err := client.New(
client.WithHostPort("127.0.0.1", 7077),
)
if err != nil {
panic(err)
}
events := []client.EventWithMetadata{
{
Event: &events.TestEvent{
Name: "testing",
},
AdditionalMetadata: map[string]string{"hello": "world1"},
Key: "user:create",
},
{
Event: &events.TestEvent{
Name: "testing2",
},
AdditionalMetadata: map[string]string{"hello": "world2"},
Key: "user:create",
},
}
c.Event().BulkPush(
context.Background(),
events,
)
```
> **Warning:** There is a maximum limit of 1000 events per request.
#### Workflows
#### Python
```python
@bulk_parent_wf.task(execution_timeout=timedelta(minutes=5))
async def spawn(input: ParentInput, ctx: Context) -> dict[str, list[dict[str, Any]]]:
# 👀 Create each workflow run to spawn
child_workflow_runs = [
bulk_child_wf.create_bulk_run_item(
input=ChildInput(a=str(i)),
key=f"child{i}",
additional_metadata={"hello": "earth"},
)
for i in range(input.n)
]
# 👀 Run workflows in bulk to improve performance
spawn_results = await bulk_child_wf.aio_run_many(child_workflow_runs)
return {"results": spawn_results}
```
#### Typescript
```typescript
const parent = hatchet.task({
name: 'simple',
fn: async (input: SimpleInput, ctx) => {
// Bulk run two tasks in parallel
const child = await ctx.bulkRunChildren([
{
workflow: simple,
input: {
Message: 'Hello, World!',
},
},
{
workflow: simple,
input: {
Message: 'Hello, Moon!',
},
},
]);
return {
TransformedMessage: `${child[0].TransformedMessage} ${child[1].TransformedMessage}`,
};
},
});
```
#### Go
```go
w.RegisterWorkflow(
&worker.WorkflowJob{
Name: "parent-workflow",
On: worker.Event("fanout:create"),
Description: "Example workflow for spawning child workflows.",
Steps: []*worker.WorkflowStep{
worker.Fn(func(ctx worker.HatchetContext) error {
// Prepare the batch of workflows to spawn
childWorkflows := make([]*worker.SpawnWorkflowsOpts, 10)
for i := 0; i < 10; i++ {
childInput := "child-input-" + strconv.Itoa(i)
childWorkflows[i] = &worker.SpawnWorkflowsOpts{
WorkflowName: "child-workflow",
Input: childInput,
Key: "child-key-" + strconv.Itoa(i),
}
}
// Spawn all workflows in bulk using SpawnWorkflows
createdWorkflows, err := ctx.SpawnWorkflows(childWorkflows)
if err != nil {
return err
}
return nil
}),
},
},
)
```
> **Warning:** There is a maximum limit of 1000 workflows per bulk request.
### Tuning Buffer Settings
Hatchet has configurable write buffers which enable it to reduce the total number of database queries by batching DB writes. This speeds up throughput dramatically at the expense of a slight increase in latency. In general, increasing the buffer size and reducing the buffer flush frequency reduces the CPU load on the DB.
The two most important configurable settings for the buffers are
1. **Flush Period:** The amount of milliseconds to wait between subsequent writes to the database
2. **Max Buffer Size:** The maximum size of the internal buffer writing to the database.
The following environment variables are all configurable:
```sh
# Default values if the values below are not set
SERVER_FLUSH_PERIOD_MILLISECONDS
SERVER_FLUSH_ITEMS_THRESHOLD
# Settings for writing workflow runs to the database
SERVER_WORKFLOWRUNBUFFER_FLUSH_PERIOD_MILLISECONDS
SERVER_WORKFLOWRUNBUFFER_FLUSH_ITEMS_THRESHOLD
# Settings for writing events to the database
SERVER_EVENTBUFFER_FLUSH_PERIOD_MILLISECONDS
SERVER_EVENTBUFFER_FLUSH_ITEMS_THRESHOLD
# Settings for releasing slots for workers
SERVER_RELEASESEMAPHOREBUFFER_FLUSH_PERIOD_MILLISECONDS
SERVER_RELEASESEMAPHOREBUFFER_FLUSH_ITEMS_THRESHOLD
# Settings for writing queue items to the database
SERVER_QUEUESTEPRUNBUFFER_FLUSH_PERIOD_MILLISECONDS
SERVER_QUEUESTEPRUNBUFFER_FLUSH_ITEMS_THRESHOLD
```
A buffer configuration for higher throughput might look like the following:
```sh
# Default values if the values below are not set
SERVER_FLUSH_PERIOD_MILLISECONDS=250
SERVER_FLUSH_ITEMS_THRESHOLD=1000
# Settings for writing workflow runs to the database
SERVER_WORKFLOWRUNBUFFER_FLUSH_PERIOD_MILLISECONDS=100
SERVER_WORKFLOWRUNBUFFER_FLUSH_ITEMS_THRESHOLD=500
# Settings for writing events to the database
SERVER_EVENTBUFFER_FLUSH_PERIOD_MILLISECONDS=1000
SERVER_EVENTBUFFER_FLUSH_ITEMS_THRESHOLD=1000
# Settings for releasing slots for workers
SERVER_RELEASESEMAPHOREBUFFER_FLUSH_PERIOD_MILLISECONDS=100
SERVER_RELEASESEMAPHOREBUFFER_FLUSH_ITEMS_THRESHOLD=200
# Settings for writing queue items to the database
SERVER_QUEUESTEPRUNBUFFER_FLUSH_PERIOD_MILLISECONDS=100
SERVER_QUEUESTEPRUNBUFFER_FLUSH_ITEMS_THRESHOLD=500
```
Benchmarking and tuning on your own infrastructure is recommended to find the optimal values for your workload and use case.
## Slow Time to Start
With higher throughput, you may see a slower time to start for each step run in a workflow. The reason for this is typically that each step run needs to be processed in an internal message queue before getting sent to the worker. You can increase the throughput of this internal queue by setting the following environment variable (default value of `100`):
```
SERVER_MSGQUEUE_RABBITMQ_QOS=200
```
Note that this refers to the number of messages that can be processed in parallel, and each message typically corresponds to at least one database write, so it will not improve performance if this value is significantly higher than the `DATABASE_MAX_CONNS` value. If you are seeing warnings in the engine logs that you are saturating connections, consider decreasing this value.
## Database Settings and Autovacuum
There are several scenarios where Postgres flags may need to be modified to improve performance. By default, every workflow run and step run are stored for 30 days in the Postgres instance. Without tuning autovacuum settings, you may see high table bloat across many tables. If you are storing > 500 GB of workflow run or step run data, we recommend the following autovacuum settings to autovacuum more aggressively:
```
autovacuum_max_workers=10
autovacuum_vacuum_scale_factor=0.1
autovacuum_analyze_scale_factor=0.05
autovacuum_vacuum_threshold=25
autovacuum_analyze_threshold=25
autovacuum_vacuum_cost_delay=10
autovacuum_vacuum_cost_limit=1000
```
If your database has enough memory capacity, you may need to increase the `work_mem` or `maintenance_work_mem` value. For example, on database instances with a large amount of memory available, we typically set the following settings:
```
maintenance_work_mem=2147483647
work_mem=125828
```
Additionally, if there is enough disk capacity, you may see improved performance setting the following flag:
```
max_wal_size=15040
```
## Scaling the Hatchet Engine
By default, the Hatchet engine runs all internal services on a single instance. The internal services on the Hatchet engine are as follows:
- **grpc-api**: the gRPC endpoint for the Hatchet engine. This is the primary endpoint for Hatchet workers. Not to be confused with the Hatchet REST API, which is a separate service that we typically refer to as `api`.
- **controllers**: the internal service that manages the lifecycle of workflow runs, step runs, and events. This service is write-heavy on the database and read-heavy from the message queue.
- **scheduler**: the internal service that schedules step runs to workers. This service is both read-heavy and write-heavy on the database.
It is possible to horizontally scale the Hatchet engine by running multiple instances of the engine. However, if you are seeing a large number of warnings from the scheduler when running the other services in the same engine instance, we recommend running the scheduler on a separate instance. See the [high availability](./high-availability) documentation for more information on how to run the scheduler on a separate instance.
---
# Read Replica Support
For high-throughput production deployments, Hatchet supports database read replicas to distribute database load and improve read performance. This feature allows you to direct read queries to a separate database instance while continuing to send write operations to the primary database. **This can significantly improve performance in read-heavy workloads without requiring application changes.**
You can enable read replica support by setting the following environment variables:
```bash
READ_REPLICA_ENABLED=true
READ_REPLICA_DATABASE_URL='postgresql://hatchet:hatchet@127.0.0.1:5432/hatchet'
READ_REPLICA_MAX_CONNS=200
READ_REPLICA_MIN_CONNS=50
```
## Configuration Options
- `READ_REPLICA_ENABLED`: Set to `true` to enable read replica support
- `READ_REPLICA_DATABASE_URL`: Connection string for the read replica database
- `READ_REPLICA_MAX_CONNS`: Maximum number of connections in the read replica connection pool
- `READ_REPLICA_MIN_CONNS`: Minimum number of connections to maintain in the read replica connection pool
## Limitations
- Replication lag may result in slightly stale or missing data being returned from read operations
- The read replica is only utilized by analytical queries (to load workflow runs, task runs, and metrics in the UI)
---
# Trace Sampling
For a very high-volume setup, it may be desirable to sample results for the dashboard for the purpose of limiting the amount of data stored in the Hatchet database. **This does not impact the behavior of the Hatchet engine and all tasks will still be processed.** This can be done by setting the following environment variables:
```bash
SERVER_SAMPLING_ENABLED=t
SERVER_SAMPLING_RATE=0.1 # only 10% of results will be sampled
```
Sampling is done at the workflow run level, so all tasks within the same workflow will be sampled, along with all of their events. Sampling has the following limitations:
- Parent tasks which spawn child tasks are not guaranteed to be sampled, even if their children are. This means that the child task may be shown in the dashboard without a corresponding parent task, and vice versa.
- There is no way to configure sampling to ensure that failure events are sampled.
- Only tasks which are sampled can be cancelled or replayed via the REST APIs: do not use this feature if dependent on programmatic cancellations and replays.
---
# SMTP Server
Configure email delivery for tenant invites and Hatchet alerts using any standard SMTP provider (Gmail, SendGrid, AWS SES, etc).
## Prerequisites
- An SMTP provider that supports [PLAIN](https://datatracker.ietf.org/doc/html/rfc4616/) authentication with a username and password.
## Configuration
Set the following environment variables:
```bash
# Enable SMTP
export SERVER_EMAIL_KIND=smtp
export SERVER_EMAIL_SMTP_ENABLED=true
# Connection Settings
export SERVER_EMAIL_SMTP_SERVER_ADDR=smtp.gmail.com:587 # Host and port
export SERVER_EMAIL_SMTP_AUTH_USERNAME=your-email@yourdomain.com # Username or API Key ID
export SERVER_EMAIL_SMTP_AUTH_PASSWORD=your-password # Password or API Secret Key
# Sender Identity
export SERVER_EMAIL_SMTP_FROM_EMAIL=noreply@yourdomain.com # Sender email address
export SERVER_EMAIL_SMTP_SUPPORT_EMAIL=support@yourdomain.com # Support contact email
export SERVER_EMAIL_SMTP_FROM_NAME="Hatchet" # (Optional) Display name
```
## Provider Reference
Common configuration values for major providers:
Provider, Server Address, Username, Password
**[Gmail](https://support.google.com/mail/answer/185833?hl=en)**, `smtp.gmail.com:587`, Your Email, [App Password](https://myaccount.google.com/apppasswords)
**[SendGrid](https://docs.sendgrid.com/for-developers/sending-email/integrating-with-the-smtp-api)**, `smtp.sendgrid.net:587`, `apikey`, Your API Key
**[AWS SES](https://docs.aws.amazon.com/ses/latest/dg/send-email-smtp.html)**, `email-smtp.us-east-1.amazonaws.com:587`, IAM Username, IAM Secret
**[Outlook](https://support.microsoft.com/en-us/office/pop-imap-and-smtp-settings-8361e398-8af4-4e97-b147-6c6c4ac95353)**, `smtp.office365.com:587`, Your Email, Your Password
> **Info:** To request another provider or SMTP authentication protocol, open a feature
> request on
> [GitHub](https://github.com/hatchet-dev/hatchet/issues/new?template=feature_request.md).
---
# Hatchet Python SDK Reference
This is the Python SDK reference, documenting methods available for interacting with Hatchet resources. Check out the [user guide](../../home) for an introduction for getting your first tasks running.
## The Hatchet Python Client
Main client for interacting with the Hatchet SDK.
This class provides access to various client interfaces and utility methods for working with Hatchet workers, workflows, tasks, and our various feature clients.
Methods:
Name, Description
`worker`, Create a Hatchet worker on which to run workflows.
`workflow`, Define a Hatchet workflow, which can then declare `task`s and be `run`, `schedule`d, and so on.
`task`, A decorator to transform a function into a standalone Hatchet task that runs as part of a workflow.
`durable_task`, A decorator to transform a function into a standalone Hatchet _durable_ task that runs as part of a workflow.
### Attributes
#### `cron`
The cron client is a client for managing cron workflows within Hatchet.
#### `event`
The event client, which you can use to push events to Hatchet.
#### `logs`
The logs client is a client for interacting with Hatchet's logs API.
#### `metrics`
The metrics client is a client for reading metrics out of Hatchet's metrics API.
#### `rate_limits`
The rate limits client is a wrapper for Hatchet's gRPC API that makes it easier to work with rate limits in Hatchet.
#### `runs`
The runs client is a client for interacting with task and workflow runs within Hatchet.
#### `scheduled`
The scheduled client is a client for managing scheduled workflows within Hatchet.
#### `workers`
The workers client is a client for managing workers programmatically within Hatchet.
#### `workflows`
The workflows client is a client for managing workflows programmatically within Hatchet.
Note that workflows are the declaration, _not_ the individual runs. If you're looking for runs, use the `RunsClient` instead.
#### `tenant_id`
The tenant id you're operating in.
#### `namespace`
The current namespace you're interacting with.
### Functions
#### `worker`
Create a Hatchet worker on which to run workflows.
Parameters:
Name, Type, Description, Default
`name`, `str`, The name of the worker., _required_
`slots`, `int \, None`, slot count for standard tasks., `None`
`durable_slots`, `int \, None`, slot count for durable tasks., `None`
`labels`, `dict[str, str \, int] \, None`, A dictionary of labels to assign to the worker. For more details, view examples on affinity and worker labels., `None`
`workflows`, `list[BaseWorkflow[Any]] \, None`, A list of workflows to register on the worker, as a shorthand for calling `register_workflow` on each or `register_workflows` on all of them., `None`
`lifespan`, `LifespanFn \, None`, A lifespan function to run on the worker. This function will be called when the worker is started, and can be used to perform any setup or teardown tasks., `None`
Returns:
Type, Description
`Worker`, The created `Worker` object, which exposes an instance method `start` which can be called to start the worker.
#### `workflow`
Define a Hatchet workflow, which can then declare `task`s and be `run`, `schedule`d, and so on.
Parameters:
Name, Type, Description, Default
`name`, `str`, The name of the workflow., _required_
`description`, `str \, None`, A description for the workflow, `None`
`input_validator`, `type[TWorkflowInput] \, None`, A Pydantic model to use as a validator for the `input` to the tasks in the workflow. If no validator is provided, defaults to an `EmptyModel` under the hood. The `EmptyModel` is a Pydantic model with no fields specified, and with the `extra` config option set to `"allow"`., `None`
`on_events`, `list[str] \, None`, A list of event triggers for the workflow - events which cause the workflow to be run., `None`
`on_crons`, `list[str] \, None`, A list of cron triggers for the workflow., `None`
`version`, `str \, None`, A version for the workflow, `None`
`sticky`, `StickyStrategy \, None`, A sticky strategy for the workflow, `None`
`default_priority`, `int`, The priority of the workflow. Higher values will cause this workflow to have priority in scheduling over other, lower priority ones., `1`
`concurrency`, `int \, ConcurrencyExpression \, list[ConcurrencyExpression] \, None`, A concurrency object controlling the concurrency settings for this workflow. If an integer is provided, it is treated as a constant concurrency limit with a `GROUP_ROUND_ROBIN` strategy, which means that only `N` runs of the task may execute at any given time., `None`
`task_defaults`, `TaskDefaults`, A `TaskDefaults` object controlling the default task settings for this workflow., `TaskDefaults()`
`default_filters`, `list[DefaultFilter] \, None`, A list of filters to create with the workflow is created. Note that this is a helper to allow you to create filters "declaratively" without needing to make a separate API call once the workflow is created to create them., `None`
`default_additional_metadata`, `JSONSerializableMapping \, None`, A dictionary of additional metadata to attach to each run of this workflow by default., `None`
Returns:
Type, Description
`Workflow[EmptyModel] \, Workflow[TWorkflowInput]`, The created `Workflow` object, which can be used to declare tasks, run the workflow, and so on.
#### `task`
A decorator to transform a function into a standalone Hatchet task that runs as part of a workflow.
Parameters:
Name, Type, Description, Default
`name`, `str \, None`, The name of the task. If not specified, defaults to the name of the function being wrapped by the `task` decorator., `None`
`description`, `str \, None`, An optional description for the task., `None`
`input_validator`, `type[TWorkflowInput] \, None`, A Pydantic model to use as a validator for the input to the task. If no validator is provided, defaults to an `EmptyModel`., `None`
`on_events`, `list[str] \, None`, A list of event triggers for the task - events which cause the task to be run., `None`
`on_crons`, `list[str] \, None`, A list of cron triggers for the task., `None`
`version`, `str \, None`, A version for the task., `None`
`sticky`, `StickyStrategy \, None`, A sticky strategy for the task., `None`
`default_priority`, `int`, The priority of the task. Higher values will cause this task to have priority in scheduling., `1`
`concurrency`, `int \, ConcurrencyExpression \, list[ConcurrencyExpression] \, None`, A concurrency object controlling the concurrency settings for this task. If an integer is provided, it is treated as a constant concurrency limit with a `GROUP_ROUND_ROBIN` strategy, which means that only `N` runs of the task may execute at any given time., `None`
`schedule_timeout`, `Duration`, The maximum time allowed for scheduling the task., `timedelta(minutes=5)`
`execution_timeout`, `Duration`, The maximum time allowed for executing the task., `timedelta(seconds=60)`
`retries`, `int`, The number of times to retry the task before failing., `0`
`rate_limits`, `list[RateLimit] \, None`, A list of rate limit configurations for the task., `None`
`desired_worker_labels`, `dict[str, DesiredWorkerLabel] \, None`, A dictionary of desired worker labels that determine to which worker the task should be assigned., `None`
`backoff_factor`, `float \, None`, The backoff factor for controlling exponential backoff in retries., `None`
`backoff_max_seconds`, `int \, None`, The maximum number of seconds to allow retries with exponential backoff to continue., `None`
`default_filters`, `list[DefaultFilter] \, None`, A list of filters to create with the task is created. Note that this is a helper to allow you to create filters "declaratively" without needing to make a separate API call once the task is created to create them., `None`
`default_additional_metadata`, `JSONSerializableMapping \, None`, A dictionary of additional metadata to attach to each run of this task by default., `None`
Returns:
Type, Description
`Callable[[Callable[Concatenate[EmptyModel, Context, P], R \, CoroutineLike[R]]], Standalone[EmptyModel, R]] \, Callable[[Callable[Concatenate[TWorkflowInput, Context, P], R \, CoroutineLike[R]]], Standalone[TWorkflowInput, R]]`, A decorator which creates a `Standalone` task object.
#### `durable_task`
A decorator to transform a function into a standalone Hatchet _durable_ task that runs as part of a workflow.
Parameters:
Name, Type, Description, Default
`name`, `str \, None`, The name of the task. If not specified, defaults to the name of the function being wrapped by the `task` decorator., `None`
`description`, `str \, None`, An optional description for the task., `None`
`input_validator`, `type[TWorkflowInput] \, None`, A Pydantic model to use as a validator for the input to the task. If no validator is provided, defaults to an `EmptyModel`., `None`
`on_events`, `list[str] \, None`, A list of event triggers for the task - events which cause the task to be run., `None`
`on_crons`, `list[str] \, None`, A list of cron triggers for the task., `None`
`version`, `str \, None`, A version for the task., `None`
`sticky`, `StickyStrategy \, None`, A sticky strategy for the task., `None`
`default_priority`, `int`, The priority of the task. Higher values will cause this task to have priority in scheduling., `1`
`concurrency`, `int \, ConcurrencyExpression \, list[ConcurrencyExpression] \, None`, A concurrency object controlling the concurrency settings for this task. If an integer is provided, it is treated as a constant concurrency limit with a `GROUP_ROUND_ROBIN` strategy, which means that only `N` runs of the task may execute at any given time., `None`
`schedule_timeout`, `Duration`, The maximum time allowed for scheduling the task., `timedelta(minutes=5)`
`execution_timeout`, `Duration`, The maximum time allowed for executing the task., `timedelta(seconds=60)`
`retries`, `int`, The number of times to retry the task before failing., `0`
`rate_limits`, `list[RateLimit] \, None`, A list of rate limit configurations for the task., `None`
`desired_worker_labels`, `dict[str, DesiredWorkerLabel] \, None`, A dictionary of desired worker labels that determine to which worker the task should be assigned., `None`
`backoff_factor`, `float \, None`, The backoff factor for controlling exponential backoff in retries., `None`
`backoff_max_seconds`, `int \, None`, The maximum number of seconds to allow retries with exponential backoff to continue., `None`
`default_filters`, `list[DefaultFilter] \, None`, A list of filters to create with the task is created. Note that this is a helper to allow you to create filters "declaratively" without needing to make a separate API call once the task is created to create them., `None`
`default_additional_metadata`, `JSONSerializableMapping \, None`, A dictionary of additional metadata to attach to each run of this task by default., `None`
Returns:
Type, Description
`Callable[[Callable[Concatenate[EmptyModel, DurableContext, P], R \, CoroutineLike[R]]], Standalone[EmptyModel, R]] \, Callable[[Callable[Concatenate[TWorkflowInput, DurableContext, P], R \, CoroutineLike[R]]], Standalone[TWorkflowInput, R]]`, A decorator which creates a `Standalone` task object.
---
# Context
The Hatchet Context class provides helper methods and useful data to tasks at runtime. It is passed as the second argument to all tasks and durable tasks.
There are two types of context classes you'll encounter:
- `Context` - The standard context for regular tasks with methods for logging, task output retrieval, cancellation, and more
- `DurableContext` - An extended context for durable tasks that includes additional methods for durable execution like `aio_wait_for` and `aio_sleep_for`
## Context
Methods:
Name, Description
`was_skipped`, Check if a given task was skipped. You can read about skipping in [the docs](/v1/conditions#checking-if-a-task-was-skipped).
`task_output`, Get the output of a parent task in a DAG.
`cancel`, Cancel the current task run. This will call the Hatchet API to cancel the step run and set the exit flag to True.
`aio_cancel`, Cancel the current task run. This will call the Hatchet API to cancel the step run and set the exit flag to True.
`done`, Check if the current task run has been cancelled.
`log`, Log a line to the Hatchet API. This will send the log line to the Hatchet API and return immediately.
`release_slot`, Manually release the slot for the current step run to free up a slot on the worker. Note that this is an advanced feature and should be used with caution.
`put_stream`, Put a stream event to the Hatchet API. This will send the data to the Hatchet API and return immediately. You can then subscribe to the stream from a separate consumer.
`refresh_timeout`, Refresh the timeout for the current task run. You can read about refreshing timeouts in [the docs](../../home/timeouts#refreshing-timeouts).
`fetch_task_run_error`, **DEPRECATED**: Use `get_task_run_error` instead.
### Attributes
#### `was_triggered_by_event`
A property that indicates whether the workflow was triggered by an event.
Returns:
Type, Description
`bool`, True if the workflow was triggered by an event, False otherwise.
#### `workflow_input`
The input to the workflow, as a dictionary. It's recommended to use the `input` parameter to the task (the first argument passed into the task at runtime) instead of this property.
Returns:
Type, Description
`JSONSerializableMapping`, The input to the workflow.
#### `lifespan`
The worker lifespan, if it exists. You can read about lifespans in [the docs](../../home/lifespans).
**Note: You'll need to cast the return type of this property to the type returned by your lifespan generator.**
#### `workflow_run_id`
The id of the current workflow run.
Returns:
Type, Description
`str`, The id of the current workflow run.
#### `retry_count`
The retry count of the current task run, which corresponds to the number of times the task has been retried.
Returns:
Type, Description
`int`, The retry count of the current task run.
#### `attempt_number`
The attempt number of the current task run, which corresponds to the number of times the task has been attempted, including the initial attempt. This is one more than the retry count.
Returns:
Type, Description
`int`, The attempt number of the current task run.
#### `additional_metadata`
The additional metadata sent with the current task run.
Returns:
Type, Description
`JSONSerializableMapping`, The additional metadata sent with the current task run, or None if no additional metadata was sent.
#### `parent_workflow_run_id`
The parent workflow run id of the current task run, if it exists. This is useful for knowing which workflow run spawned this run as a child.
Returns:
Type, Description
`str \, None`, The parent workflow run id of the current task run, or None if it does not exist.
#### `priority`
The priority that the current task was run with.
Returns:
Type, Description
`int \, None`, The priority of the current task run, or None if no priority was set.
#### `workflow_id`
The id of the workflow that this task belongs to.
Returns:
Type, Description
`str \, None`, The id of the workflow that this task belongs to.
#### `workflow_version_id`
The id of the workflow version that this task belongs to.
Returns:
Type, Description
`str \, None`, The id of the workflow version that this task belongs to.
#### `task_run_errors`
A helper intended to be used in an on-failure step to retrieve the errors that occurred in upstream task runs.
Returns:
Type, Description
`dict[str, str]`, A dictionary mapping task names to their error messages.
### Functions
#### `was_skipped`
Check if a given task was skipped. You can read about skipping in [the docs](/v1/conditions#checking-if-a-task-was-skipped).
Parameters:
Name, Type, Description, Default
`task`, `Task[TWorkflowInput, R]`, The task to check the status of (skipped or not)., _required_
Returns:
Type, Description
`bool`, True if the task was skipped, False otherwise.
#### `task_output`
Get the output of a parent task in a DAG.
Parameters:
Name, Type, Description, Default
`task`, `Task[TWorkflowInput, R]`, The task whose output you want to retrieve., _required_
Returns:
Type, Description
`R`, The output of the parent task, validated against the task's validators.
Raises:
Type, Description
`ValueError`, If the task was skipped or if the step output for the task is not found.
#### `cancel`
Cancel the current task run. This will call the Hatchet API to cancel the step run and set the exit flag to True.
Returns:
Type, Description
`None`, None
#### `aio_cancel`
Cancel the current task run. This will call the Hatchet API to cancel the step run and set the exit flag to True.
Returns:
Type, Description
`None`, None
#### `done`
Check if the current task run has been cancelled.
Returns:
Type, Description
`bool`, True if the task run has been cancelled, False otherwise.
#### `log`
Log a line to the Hatchet API. This will send the log line to the Hatchet API and return immediately.
Parameters:
Name, Type, Description, Default
`line`, `str \, JSONSerializableMapping`, The line to log. Can be a string or a JSON serializable mapping., _required_
`raise_on_error`, `bool`, If True, will raise an exception if the log fails. Defaults to False., `False`
Returns:
Type, Description
`None`, None
#### `release_slot`
Manually release the slot for the current step run to free up a slot on the worker. Note that this is an advanced feature and should be used with caution.
Returns:
Type, Description
`None`, None
#### `put_stream`
Put a stream event to the Hatchet API. This will send the data to the Hatchet API and return immediately. You can then subscribe to the stream from a separate consumer.
Parameters:
Name, Type, Description, Default
`data`, `str \, bytes`, The data to send to the Hatchet API. Can be a string or bytes., _required_
Returns:
Type, Description
`None`, None
#### `refresh_timeout`
Refresh the timeout for the current task run. You can read about refreshing timeouts in [the docs](../../home/timeouts#refreshing-timeouts).
Parameters:
Name, Type, Description, Default
`increment_by`, `str \, timedelta`, The amount of time to increment the timeout by. Can be a string (e.g. "5m") or a timedelta object., _required_
Returns:
Type, Description
`None`, None
#### `fetch_task_run_error`
**DEPRECATED**: Use `get_task_run_error` instead.
A helper intended to be used in an on-failure step to retrieve the error that occurred in a specific upstream task run.
Parameters:
Name, Type, Description, Default
`task`, `Task[TWorkflowInput, R]`, The task whose error you want to retrieve., _required_
Returns:
Type, Description
`str \, None`, The error message of the task run, or None if no error occurred.
## DurableContext
Bases: `Context`
Methods:
Name, Description
`aio_wait_for`, Durably wait for either a sleep or an event.
`aio_sleep_for`, Lightweight wrapper for durable sleep. Allows for shorthand usage of `ctx.aio_wait_for` when specifying a sleep condition.
### Functions
#### `aio_wait_for`
Durably wait for either a sleep or an event.
Parameters:
Name, Type, Description, Default
`signal_key`, `str`, The key to use for the durable event. This is used to identify the event in the Hatchet API., _required_
`*conditions`, `SleepCondition \, UserEventCondition \, OrGroup`, The conditions to wait for. Can be a SleepCondition or UserEventCondition., `()`
Returns:
Type, Description
`dict[str, Any]`, A dictionary containing the results of the wait.
Raises:
Type, Description
`ValueError`, If the durable event listener is not available.
#### `aio_sleep_for`
Lightweight wrapper for durable sleep. Allows for shorthand usage of `ctx.aio_wait_for` when specifying a sleep condition.
For more complicated conditions, use `ctx.aio_wait_for` directly.
---
# Cron Client
Bases: `BaseRestClient`
The cron client is a client for managing cron workflows within Hatchet.
Methods:
Name, Description
`aio_create`, Create a new workflow cron trigger.
`aio_delete`, Delete a workflow cron trigger.
`aio_get`, Retrieve a specific workflow cron trigger by ID.
`aio_list`, Retrieve a list of all workflow cron triggers matching the criteria.
`create`, Create a new workflow cron trigger.
`delete`, Delete a workflow cron trigger.
`get`, Retrieve a specific workflow cron trigger by ID.
`list`, Retrieve a list of all workflow cron triggers matching the criteria.
### Functions
#### `aio_create`
Create a new workflow cron trigger.
Parameters:
Name, Type, Description, Default
`workflow_name`, `str`, The name of the workflow to trigger., _required_
`cron_name`, `str`, The name of the cron trigger., _required_
`expression`, `str`, The cron expression defining the schedule., _required_
`input`, `JSONSerializableMapping`, The input data for the cron workflow., _required_
`additional_metadata`, `JSONSerializableMapping`, Additional metadata associated with the cron trigger., _required_
`priority`, `int \, None`, The priority of the cron workflow trigger., `None`
Returns:
Type, Description
`CronWorkflows`, The created cron workflow instance.
#### `aio_delete`
Delete a workflow cron trigger.
Parameters:
Name, Type, Description, Default
`cron_id`, `str`, The ID of the cron trigger to delete., _required_
Returns:
Type, Description
`None`, None
#### `aio_get`
Retrieve a specific workflow cron trigger by ID.
Parameters:
Name, Type, Description, Default
`cron_id`, `str`, The cron trigger ID or CronWorkflows instance to retrieve., _required_
Returns:
Type, Description
`CronWorkflows`, The requested cron workflow instance.
#### `aio_list`
Retrieve a list of all workflow cron triggers matching the criteria.
Parameters:
Name, Type, Description, Default
`offset`, `int \, None`, The offset to start the list from., `None`
`limit`, `int \, None`, The maximum number of items to return., `None`
`workflow_id`, `str \, None`, The ID of the workflow to filter by., `None`
`additional_metadata`, `JSONSerializableMapping \, None`, Filter by additional metadata keys., `None`
`order_by_field`, `CronWorkflowsOrderByField \, None`, The field to order the list by., `None`
`order_by_direction`, `WorkflowRunOrderByDirection \, None`, The direction to order the list by., `None`
`workflow_name`, `str \, None`, The name of the workflow to filter by., `None`
`cron_name`, `str \, None`, The name of the cron trigger to filter by., `None`
Returns:
Type, Description
`CronWorkflowsList`, A list of cron workflows.
#### `create`
Create a new workflow cron trigger.
Parameters:
Name, Type, Description, Default
`workflow_name`, `str`, The name of the workflow to trigger., _required_
`cron_name`, `str`, The name of the cron trigger., _required_
`expression`, `str`, The cron expression defining the schedule., _required_
`input`, `JSONSerializableMapping`, The input data for the cron workflow., _required_
`additional_metadata`, `JSONSerializableMapping`, Additional metadata associated with the cron trigger., _required_
`priority`, `int \, None`, The priority of the cron workflow trigger., `None`
Returns:
Type, Description
`CronWorkflows`, The created cron workflow instance.
#### `delete`
Delete a workflow cron trigger.
Parameters:
Name, Type, Description, Default
`cron_id`, `str`, The ID of the cron trigger to delete., _required_
Returns:
Type, Description
`None`, None
#### `get`
Retrieve a specific workflow cron trigger by ID.
Parameters:
Name, Type, Description, Default
`cron_id`, `str`, The cron trigger ID or CronWorkflows instance to retrieve., _required_
Returns:
Type, Description
`CronWorkflows`, The requested cron workflow instance.
#### `list`
Retrieve a list of all workflow cron triggers matching the criteria.
Parameters:
Name, Type, Description, Default
`offset`, `int \, None`, The offset to start the list from., `None`
`limit`, `int \, None`, The maximum number of items to return., `None`
`workflow_id`, `str \, None`, The ID of the workflow to filter by., `None`
`additional_metadata`, `JSONSerializableMapping \, None`, Filter by additional metadata keys., `None`
`order_by_field`, `CronWorkflowsOrderByField \, None`, The field to order the list by., `None`
`order_by_direction`, `WorkflowRunOrderByDirection \, None`, The direction to order the list by., `None`
`workflow_name`, `str \, None`, The name of the workflow to filter by., `None`
`cron_name`, `str \, None`, The name of the cron trigger to filter by., `None`
Returns:
Type, Description
`CronWorkflowsList`, A list of cron workflows.
---
# Filters Client
Bases: `BaseRestClient`
The filters client is a client for interacting with Hatchet's filters API.
Methods:
Name, Description
`aio_create`, Create a new filter.
`aio_delete`, Delete a filter by its ID.
`aio_get`, Get a filter by its ID.
`aio_list`, List filters for a given tenant.
`aio_update`, Update a filter by its ID.
`create`, Create a new filter.
`delete`, Delete a filter by its ID.
`get`, Get a filter by its ID.
`list`, List filters for a given tenant.
`update`, Update a filter by its ID.
### Functions
#### `aio_create`
Create a new filter.
Parameters:
Name, Type, Description, Default
`workflow_id`, `str`, The ID of the workflow to associate with the filter., _required_
`expression`, `str`, The expression to evaluate for the filter., _required_
`scope`, `str`, The scope for the filter., _required_
`payload`, `JSONSerializableMapping \, None`, The payload to send with the filter., `None`
Returns:
Type, Description
`V1Filter`, The created filter.
#### `aio_delete`
Delete a filter by its ID.
Parameters:
Name, Type, Description, Default
`filter_id`, `str`, The ID of the filter to delete., _required_
Returns:
Type, Description
`V1Filter`, The deleted filter.
#### `aio_get`
Get a filter by its ID.
Parameters:
Name, Type, Description, Default
`filter_id`, `str`, The ID of the filter to retrieve., _required_
Returns:
Type, Description
`V1Filter`, The filter with the specified ID.
#### `aio_list`
List filters for a given tenant.
Parameters:
Name, Type, Description, Default
`limit`, `int \, None`, The maximum number of filters to return., `None`
`offset`, `int \, None`, The number of filters to skip before starting to collect the result set., `None`
`workflow_ids`, `list[str] \, None`, A list of workflow IDs to filter by., `None`
`scopes`, `list[str] \, None`, A list of scopes to filter by., `None`
Returns:
Type, Description
`V1FilterList`, A list of filters matching the specified criteria.
#### `aio_update`
Update a filter by its ID.
Parameters:
Name, Type, Description, Default
`filter_id`, `str`, The ID of the filter to update., _required_
`updates`, `V1UpdateFilterRequest`, The updates to apply to the filter., _required_
Returns:
Type, Description
`V1Filter`, The updated filter.
#### `create`
Create a new filter.
Parameters:
Name, Type, Description, Default
`workflow_id`, `str`, The ID of the workflow to associate with the filter., _required_
`expression`, `str`, The expression to evaluate for the filter., _required_
`scope`, `str`, The scope for the filter., _required_
`payload`, `JSONSerializableMapping \, None`, The payload to send with the filter., `None`
Returns:
Type, Description
`V1Filter`, The created filter.
#### `delete`
Delete a filter by its ID.
Parameters:
Name, Type, Description, Default
`filter_id`, `str`, The ID of the filter to delete., _required_
Returns:
Type, Description
`V1Filter`, The deleted filter.
#### `get`
Get a filter by its ID.
Parameters:
Name, Type, Description, Default
`filter_id`, `str`, The ID of the filter to retrieve., _required_
Returns:
Type, Description
`V1Filter`, The filter with the specified ID.
#### `list`
List filters for a given tenant.
Parameters:
Name, Type, Description, Default
`limit`, `int \, None`, The maximum number of filters to return., `None`
`offset`, `int \, None`, The number of filters to skip before starting to collect the result set., `None`
`workflow_ids`, `list[str] \, None`, A list of workflow IDs to filter by., `None`
`scopes`, `list[str] \, None`, A list of scopes to filter by., `None`
Returns:
Type, Description
`V1FilterList`, A list of filters matching the specified criteria.
#### `update`
Update a filter by its ID.
Parameters:
Name, Type, Description, Default
`filter_id`, `str`, The ID of the filter to update., _required_
`updates`, `V1UpdateFilterRequest`, The updates to apply to the filter., _required_
Returns:
Type, Description
`V1Filter`, The updated filter.
---
# Logs Client
Bases: `BaseRestClient`
The logs client is a client for interacting with Hatchet's logs API.
Methods:
Name, Description
`aio_list`, List log lines for a given task run.
`list`, List log lines for a given task run.
### Functions
#### `aio_list`
List log lines for a given task run.
Parameters:
Name, Type, Description, Default
`task_run_id`, `str`, The ID of the task run to list logs for., _required_
`limit`, `int`, Maximum number of log lines to return (default: 1000)., `1000`
`since`, `datetime \, None`, The start time to get logs for., `None`
`until`, `datetime \, None`, The end time to get logs for., `None`
Returns:
Type, Description
`V1LogLineList`, A list of log lines for the specified task run.
#### `list`
List log lines for a given task run.
Parameters:
Name, Type, Description, Default
`task_run_id`, `str`, The ID of the task run to list logs for., _required_
`limit`, `int`, Maximum number of log lines to return (default: 1000)., `1000`
`since`, `datetime \, None`, The start time to get logs for., `None`
`until`, `datetime \, None`, The end time to get logs for., `None`
Returns:
Type, Description
`V1LogLineList`, A list of log lines for the specified task run.
---
# Metrics Client
Bases: `BaseRestClient`
The metrics client is a client for reading metrics out of Hatchet's metrics API.
Methods:
Name, Description
`aio_get_queue_metrics`, Retrieve the current queue metrics for the tenant.
`aio_get_task_metrics`, Retrieve task metrics, grouped by status (queued, running, completed, failed, cancelled).
`aio_get_task_stats`, Get task statistics for the tenant.
`aio_scrape_tenant_prometheus_metrics`, Scrape Prometheus metrics for the tenant. Returns the metrics in Prometheus text format.
`get_queue_metrics`, Retrieve the current queue metrics for the tenant.
`get_task_metrics`, Retrieve task metrics, grouped by status (queued, running, completed, failed, cancelled).
`get_task_stats`, Get task statistics for the tenant.
`scrape_tenant_prometheus_metrics`, Scrape Prometheus metrics for the tenant. Returns the metrics in Prometheus text format.
### Functions
#### `aio_get_queue_metrics`
Retrieve the current queue metrics for the tenant.
Returns:
Type, Description
`dict[str, Any]`, The current queue metrics
#### `aio_get_task_metrics`
Retrieve task metrics, grouped by status (queued, running, completed, failed, cancelled).
Parameters:
Name, Type, Description, Default
`since`, `datetime \, None`, Start time for the metrics query (defaults to the past day if unset), `None`
`until`, `datetime \, None`, End time for the metrics query, `None`
`workflow_ids`, `list[str] \, None`, List of workflow IDs to filter the metrics by, `None`
`parent_task_external_id`, `str \, None`, ID of the parent task to filter by (note that parent task here refers to the task that spawned this task as a child), `None`
`triggering_event_external_id`, `str \, None`, ID of the triggering event to filter by, `None`
Returns:
Type, Description
`TaskMetrics`, Task metrics
#### `aio_get_task_stats`
Get task statistics for the tenant.
Returns:
Type, Description
`dict[str, TaskStat]`, A dictionary mapping task names to their statistics.
#### `aio_scrape_tenant_prometheus_metrics`
Scrape Prometheus metrics for the tenant. Returns the metrics in Prometheus text format.
Returns:
Type, Description
`str`, The metrics, returned in Prometheus text format
#### `get_queue_metrics`
Retrieve the current queue metrics for the tenant.
Returns:
Type, Description
`dict[str, Any]`, The current queue metrics
#### `get_task_metrics`
Retrieve task metrics, grouped by status (queued, running, completed, failed, cancelled).
Parameters:
Name, Type, Description, Default
`since`, `datetime \, None`, Start time for the metrics query (defaults to the past day if unset), `None`
`until`, `datetime \, None`, End time for the metrics query, `None`
`workflow_ids`, `list[str] \, None`, List of workflow IDs to filter the metrics by, `None`
`parent_task_external_id`, `str \, None`, ID of the parent task to filter by (note that parent task here refers to the task that spawned this task as a child), `None`
`triggering_event_external_id`, `str \, None`, ID of the triggering event to filter by, `None`
Returns:
Type, Description
`TaskMetrics`, Task metrics
#### `get_task_stats`
Get task statistics for the tenant.
Returns:
Type, Description
`dict[str, TaskStat]`, A dictionary mapping task names to their statistics.
#### `scrape_tenant_prometheus_metrics`
Scrape Prometheus metrics for the tenant. Returns the metrics in Prometheus text format.
Returns:
Type, Description
`str`, The metrics, returned in Prometheus text format
---
# Rate Limits Client
Bases: `BaseRestClient`
The rate limits client is a wrapper for Hatchet's gRPC API that makes it easier to work with rate limits in Hatchet.
Methods:
Name, Description
`aio_put`, Put a rate limit for a given key.
`put`, Put a rate limit for a given key.
### Functions
#### `aio_put`
Put a rate limit for a given key.
Parameters:
Name, Type, Description, Default
`key`, `str`, The key to set the rate limit for., _required_
`limit`, `int`, The rate limit to set., _required_
`duration`, `RateLimitDuration`, The duration of the rate limit., `SECOND`
Returns:
Type, Description
`None`, None
#### `put`
Put a rate limit for a given key.
Parameters:
Name, Type, Description, Default
`key`, `str`, The key to set the rate limit for., _required_
`limit`, `int`, The rate limit to set., _required_
`duration`, `RateLimitDuration`, The duration of the rate limit., `SECOND`
Returns:
Type, Description
`None`, None
---
# Runs Client
Bases: `BaseRestClient`
The runs client is a client for interacting with task and workflow runs within Hatchet.
Methods:
Name, Description
`get`, Get workflow run details for a given workflow run ID.
`aio_get`, Get workflow run details for a given workflow run ID.
`get_status`, Get workflow run status for a given workflow run ID.
`aio_get_status`, Get workflow run status for a given workflow run ID.
`list`, List task runs according to a set of filters.
`aio_list`, List task runs according to a set of filters.
`create`, Trigger a new workflow run.
`aio_create`, Trigger a new workflow run.
`replay`, Replay a task or workflow run.
`aio_replay`, Replay a task or workflow run.
`bulk_replay`, Replay task or workflow runs in bulk, according to a set of filters.
`aio_bulk_replay`, Replay task or workflow runs in bulk, according to a set of filters.
`cancel`, Cancel a task or workflow run.
`aio_cancel`, Cancel a task or workflow run.
`bulk_cancel`, Cancel task or workflow runs in bulk, according to a set of filters.
`aio_bulk_cancel`, Cancel task or workflow runs in bulk, according to a set of filters.
`get_result`, Get the result of a workflow run by its external ID.
`aio_get_result`, Get the result of a workflow run by its external ID.
`get_run_ref`, Get a reference to a workflow run.
`get_task_run`, Get task run details for a given task run ID.
`aio_get_task_run`, Get task run details for a given task run ID.
`bulk_cancel_by_filters_with_pagination`, Cancel runs matching the specified filters in chunks.
`bulk_replay_by_filters_with_pagination`, Replay runs matching the specified filters in chunks.
`aio_bulk_cancel_by_filters_with_pagination`, Cancel runs matching the specified filters in chunks.
`aio_bulk_replay_by_filters_with_pagination`, Replay runs matching the specified filters in chunks.
`subscribe_to_stream`
### Functions
#### `get`
Get workflow run details for a given workflow run ID.
Parameters:
Name, Type, Description, Default
`workflow_run_id`, `str`, The ID of the workflow run to retrieve details for., _required_
Returns:
Type, Description
`V1WorkflowRunDetails`, Workflow run details for the specified workflow run ID.
#### `aio_get`
Get workflow run details for a given workflow run ID.
Parameters:
Name, Type, Description, Default
`workflow_run_id`, `str`, The ID of the workflow run to retrieve details for., _required_
Returns:
Type, Description
`V1WorkflowRunDetails`, Workflow run details for the specified workflow run ID.
#### `get_status`
Get workflow run status for a given workflow run ID.
Parameters:
Name, Type, Description, Default
`workflow_run_id`, `str`, The ID of the workflow run to retrieve details for., _required_
Returns:
Type, Description
`V1TaskStatus`, The task status
#### `aio_get_status`
Get workflow run status for a given workflow run ID.
Parameters:
Name, Type, Description, Default
`workflow_run_id`, `str`, The ID of the workflow run to retrieve details for., _required_
Returns:
Type, Description
`V1TaskStatus`, The task status
#### `list`
List task runs according to a set of filters.
Parameters:
Name, Type, Description, Default
`since`, `datetime \, None`, The start time for filtering task runs., `None`
`only_tasks`, `bool`, Whether to only list task runs., `False`
`offset`, `int \, None`, The offset for pagination., `None`
`limit`, `int \, None`, The maximum number of task runs to return., `None`
`statuses`, `list[V1TaskStatus] \, None`, The statuses to filter task runs by., `None`
`until`, `datetime \, None`, The end time for filtering task runs., `None`
`additional_metadata`, `dict[str, str] \, None`, Additional metadata to filter task runs by., `None`
`workflow_ids`, `list[str] \, None`, The workflow IDs to filter task runs by., `None`
`worker_id`, `str \, None`, The worker ID to filter task runs by., `None`
`parent_task_external_id`, `str \, None`, The parent task external ID to filter task runs by., `None`
`triggering_event_external_id`, `str \, None`, The event id that triggered the task run., `None`
`include_payloads`, `bool`, Whether to include payloads in the response., `True`
Returns:
Type, Description
`V1TaskSummaryList`, A list of task runs matching the specified filters.
#### `aio_list`
List task runs according to a set of filters.
Parameters:
Name, Type, Description, Default
`since`, `datetime \, None`, The start time for filtering task runs., `None`
`only_tasks`, `bool`, Whether to only list task runs., `False`
`offset`, `int \, None`, The offset for pagination., `None`
`limit`, `int \, None`, The maximum number of task runs to return., `None`
`statuses`, `list[V1TaskStatus] \, None`, The statuses to filter task runs by., `None`
`until`, `datetime \, None`, The end time for filtering task runs., `None`
`additional_metadata`, `dict[str, str] \, None`, Additional metadata to filter task runs by., `None`
`workflow_ids`, `list[str] \, None`, The workflow IDs to filter task runs by., `None`
`worker_id`, `str \, None`, The worker ID to filter task runs by., `None`
`parent_task_external_id`, `str \, None`, The parent task external ID to filter task runs by., `None`
`triggering_event_external_id`, `str \, None`, The event id that triggered the task run., `None`
`include_payloads`, `bool`, Whether to include payloads in the response., `True`
Returns:
Type, Description
`V1TaskSummaryList`, A list of task runs matching the specified filters.
#### `create`
Trigger a new workflow run.
IMPORTANT: It's preferable to use `Workflow.run` (and similar) to trigger workflows if possible. This method is intended to be an escape hatch. For more details, see [the documentation](../../../sdk/python/runnables#workflow).
Parameters:
Name, Type, Description, Default
`workflow_name`, `str`, The name of the workflow to trigger., _required_
`input`, `JSONSerializableMapping`, The input data for the workflow run., _required_
`additional_metadata`, `JSONSerializableMapping \, None`, Additional metadata associated with the workflow run., `None`
`priority`, `int \, None`, The priority of the workflow run., `None`
Returns:
Type, Description
`V1WorkflowRunDetails`, The details of the triggered workflow run.
#### `aio_create`
Trigger a new workflow run.
IMPORTANT: It's preferable to use `Workflow.run` (and similar) to trigger workflows if possible. This method is intended to be an escape hatch. For more details, see [the documentation](../../../sdk/python/runnables#workflow).
Parameters:
Name, Type, Description, Default
`workflow_name`, `str`, The name of the workflow to trigger., _required_
`input`, `JSONSerializableMapping`, The input data for the workflow run., _required_
`additional_metadata`, `JSONSerializableMapping \, None`, Additional metadata associated with the workflow run., `None`
`priority`, `int \, None`, The priority of the workflow run., `None`
Returns:
Type, Description
`V1WorkflowRunDetails`, The details of the triggered workflow run.
#### `replay`
Replay a task or workflow run.
Parameters:
Name, Type, Description, Default
`run_id`, `str`, The external ID of the task or workflow run to replay., _required_
Returns:
Type, Description
`None`, None
#### `aio_replay`
Replay a task or workflow run.
Parameters:
Name, Type, Description, Default
`run_id`, `str`, The external ID of the task or workflow run to replay., _required_
Returns:
Type, Description
`None`, None
#### `bulk_replay`
Replay task or workflow runs in bulk, according to a set of filters.
Parameters:
Name, Type, Description, Default
`opts`, `BulkCancelReplayOpts`, Options for bulk replay, including filters and IDs., _required_
Returns:
Type, Description
`None`, None
#### `aio_bulk_replay`
Replay task or workflow runs in bulk, according to a set of filters.
Parameters:
Name, Type, Description, Default
`opts`, `BulkCancelReplayOpts`, Options for bulk replay, including filters and IDs., _required_
Returns:
Type, Description
`None`, None
#### `cancel`
Cancel a task or workflow run.
Parameters:
Name, Type, Description, Default
`run_id`, `str`, The external ID of the task or workflow run to cancel., _required_
Returns:
Type, Description
`None`, None
#### `aio_cancel`
Cancel a task or workflow run.
Parameters:
Name, Type, Description, Default
`run_id`, `str`, The external ID of the task or workflow run to cancel., _required_
Returns:
Type, Description
`None`, None
#### `bulk_cancel`
Cancel task or workflow runs in bulk, according to a set of filters.
Parameters:
Name, Type, Description, Default
`opts`, `BulkCancelReplayOpts`, Options for bulk cancel, including filters and IDs., _required_
Returns:
Type, Description
`None`, None
#### `aio_bulk_cancel`
Cancel task or workflow runs in bulk, according to a set of filters.
Parameters:
Name, Type, Description, Default
`opts`, `BulkCancelReplayOpts`, Options for bulk cancel, including filters and IDs., _required_
Returns:
Type, Description
`None`, None
#### `get_result`
Get the result of a workflow run by its external ID.
Parameters:
Name, Type, Description, Default
`run_id`, `str`, The external ID of the workflow run to retrieve the result for., _required_
Returns:
Type, Description
`JSONSerializableMapping`, The result of the workflow run.
#### `aio_get_result`
Get the result of a workflow run by its external ID.
Parameters:
Name, Type, Description, Default
`run_id`, `str`, The external ID of the workflow run to retrieve the result for., _required_
Returns:
Type, Description
`JSONSerializableMapping`, The result of the workflow run.
#### `get_run_ref`
Get a reference to a workflow run.
Parameters:
Name, Type, Description, Default
`workflow_run_id`, `str`, The ID of the workflow run to get a reference to., _required_
Returns:
Type, Description
`WorkflowRunRef`, A reference to the specified workflow run.
#### `get_task_run`
Get task run details for a given task run ID.
Parameters:
Name, Type, Description, Default
`task_run_id`, `str`, The ID of the task run to retrieve details for., _required_
Returns:
Type, Description
`V1TaskSummary`, Task run details for the specified task run ID.
#### `aio_get_task_run`
Get task run details for a given task run ID.
Parameters:
Name, Type, Description, Default
`task_run_id`, `str`, The ID of the task run to retrieve details for., _required_
Returns:
Type, Description
`V1TaskSummary`, Task run details for the specified task run ID.
#### `bulk_cancel_by_filters_with_pagination`
Cancel runs matching the specified filters in chunks.
The motivation for this method is to provide an easy way to perform bulk operations by filters over a larger number of runs than the API would normally be able to handle, with automatic pagination and chunking to help limit the pressure on the API.
This method first pulls the IDs of the runs from the API, and then feeds them back to the API in chunks.
Parameters:
Name, Type, Description, Default
`sleep_time`, `int`, The time to sleep between processing chunks, in seconds., `3`
`chunk_size`, `int`, The maximum number of run IDs to process in each chunk., `500`
`since`, `datetime \, None`, The start time for filtering runs., `None`
`until`, `datetime \, None`, The end time for filtering runs., `None`
`statuses`, `list[V1TaskStatus] \, None`, The statuses to filter runs by., `None`
`additional_metadata`, `dict[str, str] \, None`, Additional metadata to filter runs by., `None`
`workflow_ids`, `list[str] \, None`, The workflow IDs to filter runs by., `None`
#### `bulk_replay_by_filters_with_pagination`
Replay runs matching the specified filters in chunks.
The motivation for this method is to provide an easy way to perform bulk operations by filters over a larger number of runs than the API would normally be able to handle, with automatic pagination and chunking to help limit the pressure on the API.
This method first pulls the IDs of the runs from the API, and then feeds them back to the API in chunks.
Parameters:
Name, Type, Description, Default
`sleep_time`, `int`, The time to sleep between processing chunks, in seconds., `3`
`chunk_size`, `int`, The maximum number of run IDs to process in each chunk., `500`
`since`, `datetime \, None`, The start time for filtering runs., `None`
`until`, `datetime \, None`, The end time for filtering runs., `None`
`statuses`, `list[V1TaskStatus] \, None`, The statuses to filter runs by., `None`
`additional_metadata`, `dict[str, str] \, None`, Additional metadata to filter runs by., `None`
`workflow_ids`, `list[str] \, None`, The workflow IDs to filter runs by., `None`
#### `aio_bulk_cancel_by_filters_with_pagination`
Cancel runs matching the specified filters in chunks.
The motivation for this method is to provide an easy way to perform bulk operations by filters over a larger number of runs than the API would normally be able to handle, with automatic pagination and chunking to help limit the pressure on the API.
This method first pulls the IDs of the runs from the API, and then feeds them back to the API in chunks.
Parameters:
Name, Type, Description, Default
`sleep_time`, `int`, The time to sleep between processing chunks, in seconds., `3`
`chunk_size`, `int`, The maximum number of run IDs to process in each chunk., `500`
`since`, `datetime \, None`, The start time for filtering runs., `None`
`until`, `datetime \, None`, The end time for filtering runs., `None`
`statuses`, `list[V1TaskStatus] \, None`, The statuses to filter runs by., `None`
`additional_metadata`, `dict[str, str] \, None`, Additional metadata to filter runs by., `None`
`workflow_ids`, `list[str] \, None`, The workflow IDs to filter runs by., `None`
#### `aio_bulk_replay_by_filters_with_pagination`
Replay runs matching the specified filters in chunks.
The motivation for this method is to provide an easy way to perform bulk operations by filters over a larger number of runs than the API would normally be able to handle, with automatic pagination and chunking to help limit the pressure on the API.
This method first pulls the IDs of the runs from the API, and then feeds them back to the API in chunks.
Parameters:
Name, Type, Description, Default
`sleep_time`, `int`, The time to sleep between processing chunks, in seconds., `3`
`chunk_size`, `int`, The maximum number of run IDs to process in each chunk., `500`
`since`, `datetime \, None`, The start time for filtering runs., `None`
`until`, `datetime \, None`, The end time for filtering runs., `None`
`statuses`, `list[V1TaskStatus] \, None`, The statuses to filter runs by., `None`
`additional_metadata`, `dict[str, str] \, None`, Additional metadata to filter runs by., `None`
`workflow_ids`, `list[str] \, None`, The workflow IDs to filter runs by., `None`
#### `subscribe_to_stream`
---
# Scheduled Client
Bases: `BaseRestClient`
The scheduled client is a client for managing scheduled workflows within Hatchet.
Methods:
Name, Description
`aio_bulk_delete`, Bulk delete scheduled workflow runs.
`aio_bulk_update`, Bulk reschedule scheduled workflow runs.
`aio_create`, Creates a new scheduled workflow run.
`aio_delete`, Deletes a scheduled workflow run by its ID.
`aio_get`, Retrieves a specific scheduled workflow by scheduled run trigger ID.
`aio_list`, Retrieves a list of scheduled workflows based on provided filters.
`aio_update`, Reschedule a scheduled workflow run by its ID.
`bulk_delete`, Bulk delete scheduled workflow runs.
`bulk_update`, Bulk reschedule scheduled workflow runs.
`create`, Creates a new scheduled workflow run.
`delete`, Deletes a scheduled workflow run by its ID.
`get`, Retrieves a specific scheduled workflow by scheduled run trigger ID.
`list`, Retrieves a list of scheduled workflows based on provided filters.
`update`, Reschedule a scheduled workflow run by its ID.
### Functions
#### `aio_bulk_delete`
Bulk delete scheduled workflow runs.
Parameters:
Name, Type, Description, Default
`scheduled_ids`, `list[str] \, None`, Explicit list of scheduled workflow run IDs to delete., `None`
`workflow_id`, `str \, None`, Filter by workflow ID., `None`
`parent_workflow_run_id`, `str \, None`, Filter by parent workflow run ID., `None`
`parent_step_run_id`, `str \, None`, Filter by parent step run ID., `None`
`statuses`, `list[ScheduledRunStatus] \, None`, Filter by scheduled run statuses., `None`
`additional_metadata`, `JSONSerializableMapping \, None`, Filter by additional metadata key/value pairs., `None`
Returns:
Type, Description
`ScheduledWorkflowsBulkDeleteResponse`, The bulk delete response containing deleted IDs and per-item errors.
Raises:
Type, Description
`ValueError`, If neither `scheduled_ids` nor any filter field is provided.
#### `aio_bulk_update`
Bulk reschedule scheduled workflow runs.
See `bulk_update` for parameter details.
#### `aio_create`
Creates a new scheduled workflow run.
IMPORTANT: It's preferable to use `Workflow.run` (and similar) to trigger workflows if possible. This method is intended to be an escape hatch. For more details, see [the documentation](../../../sdk/python/runnables#workflow).
Parameters:
Name, Type, Description, Default
`workflow_name`, `str`, The name of the workflow to schedule., _required_
`trigger_at`, `datetime`, The datetime when the run should be triggered., _required_
`input`, `JSONSerializableMapping`, The input data for the scheduled workflow., _required_
`additional_metadata`, `JSONSerializableMapping`, Additional metadata associated with the future run as a key-value pair., _required_
Returns:
Type, Description
`ScheduledWorkflows`, The created scheduled workflow instance.
#### `aio_delete`
Deletes a scheduled workflow run by its ID.
Parameters:
Name, Type, Description, Default
`scheduled_id`, `str`, The ID of the scheduled workflow run to delete., _required_
Returns:
Type, Description
`None`, None
#### `aio_get`
Retrieves a specific scheduled workflow by scheduled run trigger ID.
Parameters:
Name, Type, Description, Default
`scheduled_id`, `str`, The scheduled workflow trigger ID to retrieve., _required_
Returns:
Type, Description
`ScheduledWorkflows`, The requested scheduled workflow instance.
#### `aio_list`
Retrieves a list of scheduled workflows based on provided filters.
Parameters:
Name, Type, Description, Default
`offset`, `int \, None`, The offset to use in pagination., `None`
`limit`, `int \, None`, The maximum number of scheduled workflows to return., `None`
`workflow_id`, `str \, None`, The ID of the workflow to filter by., `None`
`parent_workflow_run_id`, `str \, None`, The ID of the parent workflow run to filter by., `None`
`statuses`, `list[ScheduledRunStatus] \, None`, A list of statuses to filter by., `None`
`additional_metadata`, `JSONSerializableMapping \, None`, Additional metadata to filter by., `None`
`order_by_field`, `ScheduledWorkflowsOrderByField \, None`, The field to order the results by., `None`
`order_by_direction`, `WorkflowRunOrderByDirection \, None`, The direction to order the results by., `None`
Returns:
Type, Description
`ScheduledWorkflowsList`, A list of scheduled workflows matching the provided filters.
#### `aio_update`
Reschedule a scheduled workflow run by its ID.
Parameters:
Name, Type, Description, Default
`scheduled_id`, `str`, The ID of the scheduled workflow run to reschedule., _required_
`trigger_at`, `datetime`, The datetime when the run should be triggered., _required_
Returns:
Type, Description
`ScheduledWorkflows`, The updated scheduled workflow instance.
#### `bulk_delete`
Bulk delete scheduled workflow runs.
Provide either:
- `scheduled_ids` (explicit list of scheduled run IDs), or
- one or more filter fields (`workflow_id`, `parent_workflow_run_id`, `parent_step_run_id`, `statuses`, `additional_metadata`)
Parameters:
Name, Type, Description, Default
`scheduled_ids`, `list[str] \, None`, Explicit list of scheduled workflow run IDs to delete., `None`
`workflow_id`, `str \, None`, Filter by workflow ID., `None`
`parent_workflow_run_id`, `str \, None`, Filter by parent workflow run ID., `None`
`parent_step_run_id`, `str \, None`, Filter by parent step run ID., `None`
`statuses`, `list[ScheduledRunStatus] \, None`, Filter by scheduled run statuses., `None`
`additional_metadata`, `JSONSerializableMapping \, None`, Filter by additional metadata key/value pairs., `None`
Returns:
Type, Description
`ScheduledWorkflowsBulkDeleteResponse`, The bulk delete response containing deleted IDs and per-item errors.
Raises:
Type, Description
`ValueError`, If neither `scheduled_ids` nor any filter field is provided.
#### `bulk_update`
Bulk reschedule scheduled workflow runs.
Parameters:
Name, Type, Description, Default
`updates`, `list[ScheduledWorkflowsBulkUpdateItem] \, list[tuple[str, datetime]]`, Either: - a list of `(scheduled_id, trigger_at)` tuples, or - a list of `ScheduledWorkflowsBulkUpdateItem` objects, _required_
Returns:
Type, Description
`ScheduledWorkflowsBulkUpdateResponse`, The bulk update response containing updated IDs and per-item errors.
#### `create`
Creates a new scheduled workflow run.
IMPORTANT: It's preferable to use `Workflow.run` (and similar) to trigger workflows if possible. This method is intended to be an escape hatch. For more details, see [the documentation](../../../sdk/python/runnables#workflow).
Parameters:
Name, Type, Description, Default
`workflow_name`, `str`, The name of the workflow to schedule., _required_
`trigger_at`, `datetime`, The datetime when the run should be triggered., _required_
`input`, `JSONSerializableMapping`, The input data for the scheduled workflow., _required_
`additional_metadata`, `JSONSerializableMapping`, Additional metadata associated with the future run as a key-value pair., _required_
Returns:
Type, Description
`ScheduledWorkflows`, The created scheduled workflow instance.
#### `delete`
Deletes a scheduled workflow run by its ID.
Parameters:
Name, Type, Description, Default
`scheduled_id`, `str`, The ID of the scheduled workflow run to delete., _required_
Returns:
Type, Description
`None`, None
#### `get`
Retrieves a specific scheduled workflow by scheduled run trigger ID.
Parameters:
Name, Type, Description, Default
`scheduled_id`, `str`, The scheduled workflow trigger ID to retrieve., _required_
Returns:
Type, Description
`ScheduledWorkflows`, The requested scheduled workflow instance.
#### `list`
Retrieves a list of scheduled workflows based on provided filters.
Parameters:
Name, Type, Description, Default
`offset`, `int \, None`, The offset to use in pagination., `None`
`limit`, `int \, None`, The maximum number of scheduled workflows to return., `None`
`workflow_id`, `str \, None`, The ID of the workflow to filter by., `None`
`parent_workflow_run_id`, `str \, None`, The ID of the parent workflow run to filter by., `None`
`statuses`, `list[ScheduledRunStatus] \, None`, A list of statuses to filter by., `None`
`additional_metadata`, `JSONSerializableMapping \, None`, Additional metadata to filter by., `None`
`order_by_field`, `ScheduledWorkflowsOrderByField \, None`, The field to order the results by., `None`
`order_by_direction`, `WorkflowRunOrderByDirection \, None`, The direction to order the results by., `None`
Returns:
Type, Description
`ScheduledWorkflowsList`, A list of scheduled workflows matching the provided filters.
#### `update`
Reschedule a scheduled workflow run by its ID.
Note: the server may reject rescheduling if the scheduled run has already
triggered, or if it was created via code definition (not via API).
Parameters:
Name, Type, Description, Default
`scheduled_id`, `str`, The ID of the scheduled workflow run to reschedule., _required_
`trigger_at`, `datetime`, The datetime when the run should be triggered., _required_
Returns:
Type, Description
`ScheduledWorkflows`, The updated scheduled workflow instance.
---
# Webhooks Client
Bases: `BaseRestClient`
The webhooks client provides methods for managing incoming webhooks in Hatchet.
Webhooks allow external systems to trigger Hatchet workflows by sending HTTP requests to dedicated endpoints. This enables real-time integration with third-party services like GitHub, Stripe, Slack, or any system that can send webhook events.
Methods:
Name, Description
`aio_create`, Create a new webhook.
`aio_delete`, Delete a webhook by its name.
`aio_get`, Get a webhook by its name.
`aio_list`, List webhooks for a given tenant.
`aio_update`, Update a webhook by its name.
`create`, Create a new webhook.
`delete`, Delete a webhook by its name.
`get`, Get a webhook by its name.
`list`, List webhooks for a given tenant.
`update`, Update a webhook by its name.
### Functions
#### `aio_create`
Create a new webhook.
Parameters:
Name, Type, Description, Default
`source_name`, `V1WebhookSourceName`, The source name identifying the external system sending webhook events., _required_
`name`, `str`, The name of the webhook., _required_
`event_key_expression`, `str`, A CEL expression used to extract the event key from the incoming payload., _required_
`auth`, `V1WebhookBasicAuth \, V1WebhookAPIKeyAuth \, V1WebhookHMACAuth`, The authentication configuration for the webhook (basic auth, API key, or HMAC)., _required_
`scope_expression`, `str \, None`, An optional CEL expression used to extract the scope from the incoming payload., `None`
`static_payload`, `dict[str, Any] \, None`, An optional static payload to merge into every triggered event., `None`
Returns:
Type, Description
`V1Webhook`, The created webhook.
#### `aio_delete`
Delete a webhook by its name.
Parameters:
Name, Type, Description, Default
`webhook_name`, `str`, The name of the webhook to delete., _required_
Returns:
Type, Description
`V1Webhook`, The deleted webhook.
#### `aio_get`
Get a webhook by its name.
Parameters:
Name, Type, Description, Default
`webhook_name`, `str`, The name of the webhook to retrieve., _required_
Returns:
Type, Description
`V1Webhook`, The webhook with the specified name.
#### `aio_list`
List webhooks for a given tenant.
Parameters:
Name, Type, Description, Default
`limit`, `int \, None`, The maximum number of webhooks to return., `None`
`offset`, `int \, None`, The number of webhooks to skip before starting to collect the result set., `None`
`webhook_names`, `list[str] \, None`, A list of webhook names to filter by., `None`
`source_names`, `list[V1WebhookSourceName] \, None`, A list of source names to filter by., `None`
Returns:
Type, Description
`V1WebhookList`, A list of webhooks matching the specified criteria.
#### `aio_update`
Update a webhook by its name.
Parameters:
Name, Type, Description, Default
`webhook_name`, `str`, The name of the webhook to update., _required_
`event_key_expression`, `str \, None`, An updated CEL expression used to extract the event key from the incoming payload., `None`
`scope_expression`, `str \, None`, An updated CEL expression used to extract the scope from the incoming payload., `None`
`static_payload`, `dict[str, Any] \, None`, An updated static payload to merge into every triggered event., `None`
Returns:
Type, Description
`V1Webhook`, The updated webhook.
#### `create`
Create a new webhook.
Parameters:
Name, Type, Description, Default
`source_name`, `V1WebhookSourceName`, The source name identifying the external system sending webhook events., _required_
`name`, `str`, The name of the webhook., _required_
`event_key_expression`, `str`, A CEL expression used to extract the event key from the incoming payload., _required_
`auth`, `V1WebhookBasicAuth \, V1WebhookAPIKeyAuth \, V1WebhookHMACAuth`, The authentication configuration for the webhook (basic auth, API key, or HMAC)., _required_
`scope_expression`, `str \, None`, An optional CEL expression used to extract the scope from the incoming payload., `None`
`static_payload`, `dict[str, Any] \, None`, An optional static payload to merge into every triggered event., `None`
Returns:
Type, Description
`V1Webhook`, The created webhook.
#### `delete`
Delete a webhook by its name.
Parameters:
Name, Type, Description, Default
`webhook_name`, `str`, The name of the webhook to delete., _required_
Returns:
Type, Description
`V1Webhook`, The deleted webhook.
#### `get`
Get a webhook by its name.
Parameters:
Name, Type, Description, Default
`webhook_name`, `str`, The name of the webhook to retrieve., _required_
Returns:
Type, Description
`V1Webhook`, The webhook with the specified name.
#### `list`
List webhooks for a given tenant.
Parameters:
Name, Type, Description, Default
`limit`, `int \, None`, The maximum number of webhooks to return., `None`
`offset`, `int \, None`, The number of webhooks to skip before starting to collect the result set., `None`
`webhook_names`, `list[str] \, None`, A list of webhook names to filter by., `None`
`source_names`, `list[V1WebhookSourceName] \, None`, A list of source names to filter by., `None`
Returns:
Type, Description
`V1WebhookList`, A list of webhooks matching the specified criteria.
#### `update`
Update a webhook by its name.
Parameters:
Name, Type, Description, Default
`webhook_name`, `str`, The name of the webhook to update., _required_
`event_key_expression`, `str \, None`, An updated CEL expression used to extract the event key from the incoming payload., `None`
`scope_expression`, `str \, None`, An updated CEL expression used to extract the scope from the incoming payload., `None`
`static_payload`, `dict[str, Any] \, None`, An updated static payload to merge into every triggered event., `None`
Returns:
Type, Description
`V1Webhook`, The updated webhook.
---
# Workers Client
Bases: `BaseRestClient`
The workers client is a client for managing workers programmatically within Hatchet.
Methods:
Name, Description
`aio_get`, Get a worker by its ID.
`aio_list`, List all workers in the tenant determined by the client config.
`aio_update`, Update a worker by its ID.
`get`, Get a worker by its ID.
`list`, List all workers in the tenant determined by the client config.
`update`, Update a worker by its ID.
### Functions
#### `aio_get`
Get a worker by its ID.
Parameters:
Name, Type, Description, Default
`worker_id`, `str`, The ID of the worker to retrieve., _required_
Returns:
Type, Description
`Worker`, The worker.
#### `aio_list`
List all workers in the tenant determined by the client config.
Returns:
Type, Description
`WorkerList`, A list of workers.
#### `aio_update`
Update a worker by its ID.
Parameters:
Name, Type, Description, Default
`worker_id`, `str`, The ID of the worker to update., _required_
`opts`, `UpdateWorkerRequest`, The update options., _required_
Returns:
Type, Description
`Worker`, The updated worker.
#### `get`
Get a worker by its ID.
Parameters:
Name, Type, Description, Default
`worker_id`, `str`, The ID of the worker to retrieve., _required_
Returns:
Type, Description
`Worker`, The worker.
#### `list`
List all workers in the tenant determined by the client config.
Returns:
Type, Description
`WorkerList`, A list of workers.
#### `update`
Update a worker by its ID.
Parameters:
Name, Type, Description, Default
`worker_id`, `str`, The ID of the worker to update., _required_
`opts`, `UpdateWorkerRequest`, The update options., _required_
Returns:
Type, Description
`Worker`, The updated worker.
---
# Workflows Client
Bases: `BaseRestClient`
The workflows client is a client for managing workflows programmatically within Hatchet.
Note that workflows are the declaration, _not_ the individual runs. If you're looking for runs, use the `RunsClient` instead.
Methods:
Name, Description
`aio_delete`, Permanently delete a workflow.
`aio_get`, Get a workflow by its ID.
`aio_get_version`, Get a workflow version by the workflow ID and an optional version.
`aio_list`, List all workflows in the tenant determined by the client config that match optional filters.
`delete`, Permanently delete a workflow.
`get`, Get a workflow by its ID.
`get_version`, Get a workflow version by the workflow ID and an optional version.
`list`, List all workflows in the tenant determined by the client config that match optional filters.
### Functions
#### `aio_delete`
Permanently delete a workflow.
**DANGEROUS: This will delete a workflow and all of its data**
Parameters:
Name, Type, Description, Default
`workflow_id`, `str`, The ID of the workflow to delete., _required_
Returns:
Type, Description
`None`, None
#### `aio_get`
Get a workflow by its ID.
Parameters:
Name, Type, Description, Default
`workflow_id`, `str`, The ID of the workflow to retrieve., _required_
Returns:
Type, Description
`Workflow`, The workflow.
#### `aio_get_version`
Get a workflow version by the workflow ID and an optional version.
Parameters:
Name, Type, Description, Default
`workflow_id`, `str`, The ID of the workflow to retrieve the version for., _required_
`version`, `str \, None`, The version of the workflow to retrieve. If None, the latest version is returned., `None`
Returns:
Type, Description
`WorkflowVersion`, The workflow version.
#### `aio_list`
List all workflows in the tenant determined by the client config that match optional filters.
Parameters:
Name, Type, Description, Default
`workflow_name`, `str \, None`, The name of the workflow to filter by., `None`
`limit`, `int \, None`, The maximum number of items to return., `None`
`offset`, `int \, None`, The offset to start the list from., `None`
Returns:
Type, Description
`WorkflowList`, A list of workflows.
#### `delete`
Permanently delete a workflow.
**DANGEROUS: This will delete a workflow and all of its data**
Parameters:
Name, Type, Description, Default
`workflow_id`, `str`, The ID of the workflow to delete., _required_
Returns:
Type, Description
`None`, None
#### `get`
Get a workflow by its ID.
Parameters:
Name, Type, Description, Default
`workflow_id`, `str`, The ID of the workflow to retrieve., _required_
Returns:
Type, Description
`Workflow`, The workflow.
#### `get_version`
Get a workflow version by the workflow ID and an optional version.
Parameters:
Name, Type, Description, Default
`workflow_id`, `str`, The ID of the workflow to retrieve the version for., _required_
`version`, `str \, None`, The version of the workflow to retrieve. If None, the latest version is returned., `None`
Returns:
Type, Description
`WorkflowVersion`, The workflow version.
#### `list`
List all workflows in the tenant determined by the client config that match optional filters.
Parameters:
Name, Type, Description, Default
`workflow_name`, `str \, None`, The name of the workflow to filter by., `None`
`limit`, `int \, None`, The maximum number of items to return., `None`
`offset`, `int \, None`, The offset to start the list from., `None`
Returns:
Type, Description
`WorkflowList`, A list of workflows.
---
# Runnables
`Runnables` in the Hatchet SDK are things that can be run, namely tasks and workflows. The two main types of runnables you'll encounter are:
- `Workflow`, which lets you define tasks and call all of the run, schedule, etc. methods
- `Standalone`, which is a single task that's returned by `hatchet.task` and can be run, scheduled, etc.
## Workflow
Bases: `BaseWorkflow[TWorkflowInput]`
A Hatchet workflow, which allows you to define tasks to be run and perform actions on the workflow.
Workflows in Hatchet represent coordinated units of work that can be triggered, scheduled, or run on a cron schedule. Each workflow can contain multiple tasks that can be arranged in dependencies (DAGs), have customized retry behavior, timeouts, concurrency controls, and more.
Example:
```python
from pydantic import BaseModel
from hatchet_sdk import Hatchet
class MyInput(BaseModel):
name: str
hatchet = Hatchet()
workflow = hatchet.workflow("my-workflow", input_type=MyInput)
@workflow.task()
def greet(input, ctx):
return f"Hello, {input.name}!"
# Run the workflow
result = workflow.run(MyInput(name="World"))
```
Workflows support various execution patterns including:
- One-time execution with `run()` or `aio_run()`
- Scheduled execution with `schedule()`
- Cron-based recurring execution with `create_cron()`
- Bulk operations with `run_many()`
Tasks within workflows can be defined with `@workflow.task()` or `@workflow.durable_task()` decorators and can be arranged into complex dependency patterns.
Methods:
Name, Description
`task`, A decorator to transform a function into a Hatchet task that runs as part of a workflow.
`durable_task`, A decorator to transform a function into a durable Hatchet task that runs as part of a workflow.
`on_failure_task`, A decorator to transform a function into a Hatchet on-failure task that runs as the last step in a workflow that had at least one task fail.
`on_success_task`, A decorator to transform a function into a Hatchet on-success task that runs as the last step in a workflow that had all upstream tasks succeed.
`run`, Run the workflow synchronously and wait for it to complete.
`aio_run`, Run the workflow asynchronously and wait for it to complete.
`run_no_wait`, Synchronously trigger a workflow run without waiting for it to complete.
`aio_run_no_wait`, Asynchronously trigger a workflow run without waiting for it to complete.
`run_many`, Run a workflow in bulk and wait for all runs to complete.
`aio_run_many`, Run a workflow in bulk and wait for all runs to complete.
`run_many_no_wait`, Run a workflow in bulk without waiting for all runs to complete.
`aio_run_many_no_wait`, Run a workflow in bulk without waiting for all runs to complete.
`schedule`, Schedule a workflow to run at a specific time.
`aio_schedule`, Schedule a workflow to run at a specific time.
`create_cron`, Create a cron job for the workflow.
`aio_create_cron`, Create a cron job for the workflow.
`create_bulk_run_item`, Create a bulk run item for the workflow. This is intended to be used in conjunction with the various `run_many` methods.
`list_runs`, List runs of the workflow.
`aio_list_runs`, List runs of the workflow.
`create_filter`, Create a new filter.
`aio_create_filter`, Create a new filter.
### Attributes
#### `name`
The (namespaced) name of the workflow.
#### `tasks`
#### `id`
`cached`
Get the ID of the workflow.
Returns:
Type, Description
`str`, The ID of the workflow.
Raises:
Type, Description
`ValueError`, If no workflow ID is found for the workflow name.
### Functions
#### `task`
A decorator to transform a function into a Hatchet task that runs as part of a workflow.
Parameters:
Name, Type, Description, Default
`name`, `str \, None`, The name of the task. If not specified, defaults to the name of the function being wrapped by the `task` decorator., `None`
`schedule_timeout`, `Duration`, The maximum time to wait for the task to be scheduled. The run will be canceled if the task does not begin within this time., `timedelta(minutes=5)`
`execution_timeout`, `Duration`, The maximum time to wait for the task to complete. The run will be canceled if the task does not complete within this time., `timedelta(seconds=60)`
`parents`, `list[Task[TWorkflowInput, Any]] \, None`, A list of tasks that are parents of the task. Note: Parents must be defined before their children., `None`
`retries`, `int`, The number of times to retry the task before failing., `0`
`rate_limits`, `list[RateLimit] \, None`, A list of rate limit configurations for the task., `None`
`desired_worker_labels`, `dict[str, DesiredWorkerLabel] \, None`, A dictionary of desired worker labels that determine to which worker the task should be assigned. See documentation and examples on affinity and worker labels for more details., `None`
`backoff_factor`, `float \, None`, The backoff factor for controlling exponential backoff in retries., `None`
`backoff_max_seconds`, `int \, None`, The maximum number of seconds to allow retries with exponential backoff to continue., `None`
`concurrency`, `int \, list[ConcurrencyExpression] \, None`, A list of concurrency expressions for the task. If an integer is provided, it is treated as a constant concurrency limit with a `GROUP_ROUND_ROBIN` strategy, which means that only `N` runs of the task may execute at any given time., `None`
`wait_for`, `list[Condition \, OrGroup] \, None`, A list of conditions that must be met before the task can run., `None`
`skip_if`, `list[Condition \, OrGroup] \, None`, A list of conditions that, if met, will cause the task to be skipped., `None`
`cancel_if`, `list[Condition \, OrGroup] \, None`, A list of conditions that, if met, will cause the task to be canceled., `None`
Returns:
Type, Description
`Callable[[Callable[Concatenate[TWorkflowInput, Context, P], R \, CoroutineLike[R]]], Task[TWorkflowInput, R]]`, A decorator which creates a `Task` object.
#### `durable_task`
A decorator to transform a function into a durable Hatchet task that runs as part of a workflow.
**IMPORTANT:** This decorator creates a _durable_ task, which works using Hatchet's durable execution capabilities. This is an advanced feature of Hatchet.
See the Hatchet docs for more information on durable execution to decide if this is right for you.
Parameters:
Name, Type, Description, Default
`name`, `str \, None`, The name of the task. If not specified, defaults to the name of the function being wrapped by the `task` decorator., `None`
`schedule_timeout`, `Duration`, The maximum time to wait for the task to be scheduled. The run will be canceled if the task does not begin within this time., `timedelta(minutes=5)`
`execution_timeout`, `Duration`, The maximum time to wait for the task to complete. The run will be canceled if the task does not complete within this time., `timedelta(seconds=60)`
`parents`, `list[Task[TWorkflowInput, Any]] \, None`, A list of tasks that are parents of the task. Note: Parents must be defined before their children., `None`
`retries`, `int`, The number of times to retry the task before failing., `0`
`rate_limits`, `list[RateLimit] \, None`, A list of rate limit configurations for the task., `None`
`desired_worker_labels`, `dict[str, DesiredWorkerLabel] \, None`, A dictionary of desired worker labels that determine to which worker the task should be assigned. See documentation and examples on affinity and worker labels for more details., `None`
`backoff_factor`, `float \, None`, The backoff factor for controlling exponential backoff in retries., `None`
`backoff_max_seconds`, `int \, None`, The maximum number of seconds to allow retries with exponential backoff to continue., `None`
`concurrency`, `int \, list[ConcurrencyExpression] \, None`, A list of concurrency expressions for the task. If an integer is provided, it is treated as a constant concurrency limit with a `GROUP_ROUND_ROBIN` strategy, which means that only `N` runs of the task may execute at any given time., `None`
`wait_for`, `list[Condition \, OrGroup] \, None`, A list of conditions that must be met before the task can run., `None`
`skip_if`, `list[Condition \, OrGroup] \, None`, A list of conditions that, if met, will cause the task to be skipped., `None`
`cancel_if`, `list[Condition \, OrGroup] \, None`, A list of conditions that, if met, will cause the task to be canceled., `None`
Returns:
Type, Description
`Callable[[Callable[Concatenate[TWorkflowInput, DurableContext, P], R \, CoroutineLike[R]]], Task[TWorkflowInput, R]]`, A decorator which creates a `Task` object.
#### `on_failure_task`
A decorator to transform a function into a Hatchet on-failure task that runs as the last step in a workflow that had at least one task fail.
Parameters:
Name, Type, Description, Default
`name`, `str \, None`, The name of the on-failure task. If not specified, defaults to the name of the function being wrapped by the `on_failure_task` decorator., `None`
`schedule_timeout`, `Duration`, The maximum time to wait for the task to be scheduled. The run will be canceled if the task does not begin within this time., `timedelta(minutes=5)`
`execution_timeout`, `Duration`, The maximum time to wait for the task to complete. The run will be canceled if the task does not complete within this time., `timedelta(seconds=60)`
`retries`, `int`, The number of times to retry the on-failure task before failing., `0`
`rate_limits`, `list[RateLimit] \, None`, A list of rate limit configurations for the on-failure task., `None`
`backoff_factor`, `float \, None`, The backoff factor for controlling exponential backoff in retries., `None`
`backoff_max_seconds`, `int \, None`, The maximum number of seconds to allow retries with exponential backoff to continue., `None`
`concurrency`, `int \, list[ConcurrencyExpression] \, None`, A list of concurrency expressions for the on-failure task. If an integer is provided, it is treated as a constant concurrency limit with a `GROUP_ROUND_ROBIN` strategy, which means that only `N` runs of the task may execute at any given time., `None`
Returns:
Type, Description
`Callable[[Callable[Concatenate[TWorkflowInput, Context, P], R \, CoroutineLike[R]]], Task[TWorkflowInput, R]]`, A decorator which creates a `Task` object.
#### `on_success_task`
A decorator to transform a function into a Hatchet on-success task that runs as the last step in a workflow that had all upstream tasks succeed.
Parameters:
Name, Type, Description, Default
`name`, `str \, None`, The name of the on-success task. If not specified, defaults to the name of the function being wrapped by the `on_success_task` decorator., `None`
`schedule_timeout`, `Duration`, The maximum time to wait for the task to be scheduled. The run will be canceled if the task does not begin within this time., `timedelta(minutes=5)`
`execution_timeout`, `Duration`, The maximum time to wait for the task to complete. The run will be canceled if the task does not complete within this time., `timedelta(seconds=60)`
`retries`, `int`, The number of times to retry the on-success task before failing, `0`
`rate_limits`, `list[RateLimit] \, None`, A list of rate limit configurations for the on-success task., `None`
`backoff_factor`, `float \, None`, The backoff factor for controlling exponential backoff in retries., `None`
`backoff_max_seconds`, `int \, None`, The maximum number of seconds to allow retries with exponential backoff to continue., `None`
`concurrency`, `int \, list[ConcurrencyExpression] \, None`, A list of concurrency expressions for the on-success task. If an integer is provided, it is treated as a constant concurrency limit with a `GROUP_ROUND_ROBIN` strategy, which means that only `N` runs of the task may execute at any given time., `None`
Returns:
Type, Description
`Callable[[Callable[Concatenate[TWorkflowInput, Context, P], R \, CoroutineLike[R]]], Task[TWorkflowInput, R]]`, A decorator which creates a Task object.
#### `run`
Run the workflow synchronously and wait for it to complete.
This method triggers a workflow run, blocks until completion, and returns the final result.
Parameters:
Name, Type, Description, Default
`input`, `TWorkflowInput`, The input data for the workflow, must match the workflow's input type., `cast(TWorkflowInput, EmptyModel())`
`options`, `TriggerWorkflowOptions`, Additional options for workflow execution like metadata and parent workflow ID., `TriggerWorkflowOptions()`
Returns:
Type, Description
`dict[str, Any]`, The result of the workflow execution as a dictionary.
#### `aio_run`
Run the workflow asynchronously and wait for it to complete.
This method triggers a workflow run, awaits until completion, and returns the final result.
Parameters:
Name, Type, Description, Default
`input`, `TWorkflowInput`, The input data for the workflow, must match the workflow's input type., `cast(TWorkflowInput, EmptyModel())`
`options`, `TriggerWorkflowOptions`, Additional options for workflow execution like metadata and parent workflow ID., `TriggerWorkflowOptions()`
Returns:
Type, Description
`dict[str, Any]`, The result of the workflow execution as a dictionary.
#### `run_no_wait`
Synchronously trigger a workflow run without waiting for it to complete. This method is useful for starting a workflow run and immediately returning a reference to the run without blocking while the workflow runs.
Parameters:
Name, Type, Description, Default
`input`, `TWorkflowInput`, The input data for the workflow., `cast(TWorkflowInput, EmptyModel())`
`options`, `TriggerWorkflowOptions`, Additional options for workflow execution., `TriggerWorkflowOptions()`
Returns:
Type, Description
`WorkflowRunRef`, A `WorkflowRunRef` object representing the reference to the workflow run.
#### `aio_run_no_wait`
Asynchronously trigger a workflow run without waiting for it to complete. This method is useful for starting a workflow run and immediately returning a reference to the run without blocking while the workflow runs.
Parameters:
Name, Type, Description, Default
`input`, `TWorkflowInput`, The input data for the workflow., `cast(TWorkflowInput, EmptyModel())`
`options`, `TriggerWorkflowOptions`, Additional options for workflow execution., `TriggerWorkflowOptions()`
Returns:
Type, Description
`WorkflowRunRef`, A `WorkflowRunRef` object representing the reference to the workflow run.
#### `run_many`
Run a workflow in bulk and wait for all runs to complete. This method triggers multiple workflow runs, blocks until all of them complete, and returns the final results.
Parameters:
Name, Type, Description, Default
`workflows`, `list[WorkflowRunTriggerConfig]`, A list of `WorkflowRunTriggerConfig` objects, each representing a workflow run to be triggered., _required_
`return_exceptions`, `bool`, If `True`, exceptions will be returned as part of the results instead of raising them., `False`
Returns:
Type, Description
`list[dict[str, Any]] \, list[dict[str, Any] \, BaseException]`, A list of results for each workflow run.
#### `aio_run_many`
Run a workflow in bulk and wait for all runs to complete. This method triggers multiple workflow runs, blocks until all of them complete, and returns the final results.
Parameters:
Name, Type, Description, Default
`workflows`, `list[WorkflowRunTriggerConfig]`, A list of `WorkflowRunTriggerConfig` objects, each representing a workflow run to be triggered., _required_
`return_exceptions`, `bool`, If `True`, exceptions will be returned as part of the results instead of raising them., `False`
Returns:
Type, Description
`list[dict[str, Any]] \, list[dict[str, Any] \, BaseException]`, A list of results for each workflow run.
#### `run_many_no_wait`
Run a workflow in bulk without waiting for all runs to complete.
This method triggers multiple workflow runs and immediately returns a list of references to the runs without blocking while the workflows run.
Parameters:
Name, Type, Description, Default
`workflows`, `list[WorkflowRunTriggerConfig]`, A list of `WorkflowRunTriggerConfig` objects, each representing a workflow run to be triggered., _required_
Returns:
Type, Description
`list[WorkflowRunRef]`, A list of `WorkflowRunRef` objects, each representing a reference to a workflow run.
#### `aio_run_many_no_wait`
Run a workflow in bulk without waiting for all runs to complete.
This method triggers multiple workflow runs and immediately returns a list of references to the runs without blocking while the workflows run.
Parameters:
Name, Type, Description, Default
`workflows`, `list[WorkflowRunTriggerConfig]`, A list of `WorkflowRunTriggerConfig` objects, each representing a workflow run to be triggered., _required_
Returns:
Type, Description
`list[WorkflowRunRef]`, A list of `WorkflowRunRef` objects, each representing a reference to a workflow run.
#### `schedule`
Schedule a workflow to run at a specific time.
Parameters:
Name, Type, Description, Default
`run_at`, `datetime`, The time at which to schedule the workflow., _required_
`input`, `TWorkflowInput`, The input data for the workflow., `cast(TWorkflowInput, EmptyModel())`
`options`, `ScheduleTriggerWorkflowOptions`, Additional options for workflow execution., `ScheduleTriggerWorkflowOptions()`
Returns:
Type, Description
`WorkflowVersion`, A `WorkflowVersion` object representing the scheduled workflow.
#### `aio_schedule`
Schedule a workflow to run at a specific time.
Parameters:
Name, Type, Description, Default
`run_at`, `datetime`, The time at which to schedule the workflow., _required_
`input`, `TWorkflowInput`, The input data for the workflow., `cast(TWorkflowInput, EmptyModel())`
`options`, `ScheduleTriggerWorkflowOptions`, Additional options for workflow execution., `ScheduleTriggerWorkflowOptions()`
Returns:
Type, Description
`WorkflowVersion`, A `WorkflowVersion` object representing the scheduled workflow.
#### `create_cron`
Create a cron job for the workflow.
Parameters:
Name, Type, Description, Default
`cron_name`, `str`, The name of the cron job., _required_
`expression`, `str`, The cron expression that defines the schedule for the cron job., _required_
`input`, `TWorkflowInput`, The input data for the workflow., `cast(TWorkflowInput, EmptyModel())`
`additional_metadata`, `JSONSerializableMapping \, None`, Additional metadata for the cron job., `None`
`priority`, `int \, None`, The priority of the cron job. Must be between 1 and 3, inclusive., `None`
Returns:
Type, Description
`CronWorkflows`, A `CronWorkflows` object representing the created cron job.
#### `aio_create_cron`
Create a cron job for the workflow.
Parameters:
Name, Type, Description, Default
`cron_name`, `str`, The name of the cron job., _required_
`expression`, `str`, The cron expression that defines the schedule for the cron job., _required_
`input`, `TWorkflowInput`, The input data for the workflow., `cast(TWorkflowInput, EmptyModel())`
`additional_metadata`, `JSONSerializableMapping \, None`, Additional metadata for the cron job., `None`
`priority`, `int \, None`, The priority of the cron job. Must be between 1 and 3, inclusive., `None`
Returns:
Type, Description
`CronWorkflows`, A `CronWorkflows` object representing the created cron job.
#### `create_bulk_run_item`
Create a bulk run item for the workflow. This is intended to be used in conjunction with the various `run_many` methods.
Parameters:
Name, Type, Description, Default
`input`, `TWorkflowInput`, The input data for the workflow., `cast(TWorkflowInput, EmptyModel())`
`key`, `str \, None`, The key for the workflow run. This is used to identify the run in the bulk operation and for deduplication., `None`
`options`, `TriggerWorkflowOptions`, Additional options for the workflow run., `TriggerWorkflowOptions()`
Returns:
Type, Description
`WorkflowRunTriggerConfig`, A `WorkflowRunTriggerConfig` object that can be used to trigger the workflow run, which you then pass into the `run_many` methods.
#### `list_runs`
List runs of the workflow.
Parameters:
Name, Type, Description, Default
`since`, `datetime \, None`, The start time for the runs to be listed., `None`
`until`, `datetime \, None`, The end time for the runs to be listed., `None`
`limit`, `int`, The maximum number of runs to be listed., `100`
`offset`, `int \, None`, The offset for pagination., `None`
`statuses`, `list[V1TaskStatus] \, None`, The statuses of the runs to be listed., `None`
`additional_metadata`, `dict[str, str] \, None`, Additional metadata for filtering the runs., `None`
`worker_id`, `str \, None`, The ID of the worker that ran the tasks., `None`
`parent_task_external_id`, `str \, None`, The external ID of the parent task., `None`
`only_tasks`, `bool`, Whether to list only task runs., `False`
`triggering_event_external_id`, `str \, None`, The event id that triggered the task run., `None`
Returns:
Type, Description
`list[V1TaskSummary]`, A list of `V1TaskSummary` objects representing the runs of the workflow.
#### `aio_list_runs`
List runs of the workflow.
Parameters:
Name, Type, Description, Default
`since`, `datetime \, None`, The start time for the runs to be listed., `None`
`until`, `datetime \, None`, The end time for the runs to be listed., `None`
`limit`, `int`, The maximum number of runs to be listed., `100`
`offset`, `int \, None`, The offset for pagination., `None`
`statuses`, `list[V1TaskStatus] \, None`, The statuses of the runs to be listed., `None`
`additional_metadata`, `dict[str, str] \, None`, Additional metadata for filtering the runs., `None`
`worker_id`, `str \, None`, The ID of the worker that ran the tasks., `None`
`parent_task_external_id`, `str \, None`, The external ID of the parent task., `None`
`only_tasks`, `bool`, Whether to list only task runs., `False`
`triggering_event_external_id`, `str \, None`, The event id that triggered the task run., `None`
Returns:
Type, Description
`list[V1TaskSummary]`, A list of `V1TaskSummary` objects representing the runs of the workflow.
#### `create_filter`
Create a new filter.
Parameters:
Name, Type, Description, Default
`expression`, `str`, The expression to evaluate for the filter., _required_
`scope`, `str`, The scope for the filter., _required_
`payload`, `JSONSerializableMapping \, None`, The payload to send with the filter., `None`
Returns:
Type, Description
`V1Filter`, The created filter.
#### `aio_create_filter`
Create a new filter.
Parameters:
Name, Type, Description, Default
`expression`, `str`, The expression to evaluate for the filter., _required_
`scope`, `str`, The scope for the filter., _required_
`payload`, `JSONSerializableMapping \, None`, The payload to send with the filter., `None`
Returns:
Type, Description
`V1Filter`, The created filter.
## Task
Bases: `Generic[TWorkflowInput, R]`
Methods:
Name, Description
`mock_run`, Mimic the execution of a task. This method is intended to be used to unit test
`aio_mock_run`, Mimic the execution of a task. This method is intended to be used to unit test
### Functions
#### `mock_run`
Mimic the execution of a task. This method is intended to be used to unit test tasks without needing to interact with the Hatchet engine. Use `mock_run` for sync tasks and `aio_mock_run` for async tasks.
Parameters:
Name, Type, Description, Default
`input`, `TWorkflowInput \, None`, The input to the task., `None`
`additional_metadata`, `JSONSerializableMapping \, None`, Additional metadata to attach to the task., `None`
`parent_outputs`, `dict[str, JSONSerializableMapping] \, None`, Outputs from parent tasks, if any. This is useful for mimicking DAG functionality. For instance, if you have a task `step_2` that has a `parent` which is `step_1`, you can pass `parent_outputs={"step_1": {"result": "Hello, world!"}}` to `step_2.mock_run()` to be able to access `ctx.task_output(step_1)` in `step_2`., `None`
`retry_count`, `int`, The number of times the task has been retried., `0`
`lifespan`, `Any`, The lifespan to be used in the task, which is useful if one was set on the worker. This will allow you to access `ctx.lifespan` inside of your task., `None`
`dependencies`, `dict[str, Any] \, None`, Dependencies to be injected into the task. This is useful for tasks that have dependencies defined using `Depends`. **IMPORTANT**: You must pass the dependencies _directly_, **not** the `Depends` objects themselves. For example, if you have a task that has a dependency `config: Annotated[str, Depends(get_config)]`, you should pass `dependencies={"config": "config_value"}` to `aio_mock_run`., `None`
Returns:
Type, Description
`R`, The output of the task.
Raises:
Type, Description
`TypeError`, If the task is an async function and `mock_run` is called, or if the task is a sync function and `aio_mock_run` is called.
#### `aio_mock_run`
Mimic the execution of a task. This method is intended to be used to unit test tasks without needing to interact with the Hatchet engine. Use `mock_run` for sync tasks and `aio_mock_run` for async tasks.
Parameters:
Name, Type, Description, Default
`input`, `TWorkflowInput \, None`, The input to the task., `None`
`additional_metadata`, `JSONSerializableMapping \, None`, Additional metadata to attach to the task., `None`
`parent_outputs`, `dict[str, JSONSerializableMapping] \, None`, Outputs from parent tasks, if any. This is useful for mimicking DAG functionality. For instance, if you have a task `step_2` that has a `parent` which is `step_1`, you can pass `parent_outputs={"step_1": {"result": "Hello, world!"}}` to `step_2.mock_run()` to be able to access `ctx.task_output(step_1)` in `step_2`., `None`
`retry_count`, `int`, The number of times the task has been retried., `0`
`lifespan`, `Any`, The lifespan to be used in the task, which is useful if one was set on the worker. This will allow you to access `ctx.lifespan` inside of your task., `None`
`dependencies`, `dict[str, Any] \, None`, Dependencies to be injected into the task. This is useful for tasks that have dependencies defined using `Depends`. **IMPORTANT**: You must pass the dependencies _directly_, **not** the `Depends` objects themselves. For example, if you have a task that has a dependency `config: Annotated[str, Depends(get_config)]`, you should pass `dependencies={"config": "config_value"}` to `aio_mock_run`., `None`
Returns:
Type, Description
`R`, The output of the task.
Raises:
Type, Description
`TypeError`, If the task is an async function and `mock_run` is called, or if the task is a sync function and `aio_mock_run` is called.
## Standalone
Bases: `BaseWorkflow[TWorkflowInput]`, `Generic[TWorkflowInput, R]`
Methods:
Name, Description
`run`, Run the workflow synchronously and wait for it to complete.
`aio_run`, Run the workflow asynchronously and wait for it to complete.
`run_no_wait`, Trigger a workflow run without waiting for it to complete.
`aio_run_no_wait`, Asynchronously trigger a workflow run without waiting for it to complete.
`run_many`, Run a workflow in bulk and wait for all runs to complete.
`aio_run_many`, Run a workflow in bulk and wait for all runs to complete.
`run_many_no_wait`, Run a workflow in bulk without waiting for all runs to complete.
`aio_run_many_no_wait`, Run a workflow in bulk without waiting for all runs to complete.
`schedule`, Schedule a workflow to run at a specific time.
`aio_schedule`, Schedule a workflow to run at a specific time.
`create_cron`, Create a cron job for the workflow.
`aio_create_cron`, Create a cron job for the workflow.
`create_bulk_run_item`, Create a bulk run item for the workflow. This is intended to be used in conjunction with the various `run_many` methods.
`list_runs`, List runs of the workflow.
`aio_list_runs`, List runs of the workflow.
`create_filter`, Create a new filter.
`aio_create_filter`, Create a new filter.
`delete`, Permanently delete the workflow.
`aio_delete`, Permanently delete the workflow.
`get_run_ref`, Get a reference to a task run by its run ID.
`get_result`, Get the result of a task run by its run ID.
`aio_get_result`, Get the result of a task run by its run ID.
`mock_run`, Mimic the execution of a task. This method is intended to be used to unit test
`aio_mock_run`, Mimic the execution of a task. This method is intended to be used to unit test
### Functions
#### `run`
Run the workflow synchronously and wait for it to complete.
This method triggers a workflow run, blocks until completion, and returns the extracted result.
Parameters:
Name, Type, Description, Default
`input`, `TWorkflowInput`, The input data for the workflow., `cast(TWorkflowInput, EmptyModel())`
`options`, `TriggerWorkflowOptions`, Additional options for workflow execution., `TriggerWorkflowOptions()`
Returns:
Type, Description
`R`, The extracted result of the workflow execution.
#### `aio_run`
Run the workflow asynchronously and wait for it to complete.
This method triggers a workflow run, awaits until completion, and returns the extracted result.
Parameters:
Name, Type, Description, Default
`input`, `TWorkflowInput`, The input data for the workflow, must match the workflow's input type., `cast(TWorkflowInput, EmptyModel())`
`options`, `TriggerWorkflowOptions`, Additional options for workflow execution like metadata and parent workflow ID., `TriggerWorkflowOptions()`
Returns:
Type, Description
`R`, The extracted result of the workflow execution.
#### `run_no_wait`
Trigger a workflow run without waiting for it to complete.
This method triggers a workflow run and immediately returns a reference to the run without blocking while the workflow runs.
Parameters:
Name, Type, Description, Default
`input`, `TWorkflowInput`, The input data for the workflow, must match the workflow's input type., `cast(TWorkflowInput, EmptyModel())`
`options`, `TriggerWorkflowOptions`, Additional options for workflow execution like metadata and parent workflow ID., `TriggerWorkflowOptions()`
Returns:
Type, Description
`TaskRunRef[TWorkflowInput, R]`, A `TaskRunRef` object representing the reference to the workflow run.
#### `aio_run_no_wait`
Asynchronously trigger a workflow run without waiting for it to complete. This method is useful for starting a workflow run and immediately returning a reference to the run without blocking while the workflow runs.
Parameters:
Name, Type, Description, Default
`input`, `TWorkflowInput`, The input data for the workflow., `cast(TWorkflowInput, EmptyModel())`
`options`, `TriggerWorkflowOptions`, Additional options for workflow execution., `TriggerWorkflowOptions()`
Returns:
Type, Description
`TaskRunRef[TWorkflowInput, R]`, A `TaskRunRef` object representing the reference to the workflow run.
#### `run_many`
Run a workflow in bulk and wait for all runs to complete. This method triggers multiple workflow runs, blocks until all of them complete, and returns the final results.
Parameters:
Name, Type, Description, Default
`workflows`, `list[WorkflowRunTriggerConfig]`, A list of `WorkflowRunTriggerConfig` objects, each representing a workflow run to be triggered., _required_
`return_exceptions`, `bool`, If `True`, exceptions will be returned as part of the results instead of raising them., `False`
Returns:
Type, Description
`list[R] \, list[R \, BaseException]`, A list of results for each workflow run.
#### `aio_run_many`
Run a workflow in bulk and wait for all runs to complete. This method triggers multiple workflow runs, blocks until all of them complete, and returns the final results.
Parameters:
Name, Type, Description, Default
`workflows`, `list[WorkflowRunTriggerConfig]`, A list of `WorkflowRunTriggerConfig` objects, each representing a workflow run to be triggered., _required_
`return_exceptions`, `bool`, If `True`, exceptions will be returned as part of the results instead of raising them., `False`
Returns:
Type, Description
`list[R] \, list[R \, BaseException]`, A list of results for each workflow run.
#### `run_many_no_wait`
Run a workflow in bulk without waiting for all runs to complete.
This method triggers multiple workflow runs and immediately returns a list of references to the runs without blocking while the workflows run.
Parameters:
Name, Type, Description, Default
`workflows`, `list[WorkflowRunTriggerConfig]`, A list of `WorkflowRunTriggerConfig` objects, each representing a workflow run to be triggered., _required_
Returns:
Type, Description
`list[TaskRunRef[TWorkflowInput, R]]`, A list of `WorkflowRunRef` objects, each representing a reference to a workflow run.
#### `aio_run_many_no_wait`
Run a workflow in bulk without waiting for all runs to complete.
This method triggers multiple workflow runs and immediately returns a list of references to the runs without blocking while the workflows run.
Parameters:
Name, Type, Description, Default
`workflows`, `list[WorkflowRunTriggerConfig]`, A list of `WorkflowRunTriggerConfig` objects, each representing a workflow run to be triggered., _required_
Returns:
Type, Description
`list[TaskRunRef[TWorkflowInput, R]]`, A list of `WorkflowRunRef` objects, each representing a reference to a workflow run.
#### `schedule`
Schedule a workflow to run at a specific time.
Parameters:
Name, Type, Description, Default
`run_at`, `datetime`, The time at which to schedule the workflow., _required_
`input`, `TWorkflowInput`, The input data for the workflow., `cast(TWorkflowInput, EmptyModel())`
`options`, `ScheduleTriggerWorkflowOptions`, Additional options for workflow execution., `ScheduleTriggerWorkflowOptions()`
Returns:
Type, Description
`WorkflowVersion`, A `WorkflowVersion` object representing the scheduled workflow.
#### `aio_schedule`
Schedule a workflow to run at a specific time.
Parameters:
Name, Type, Description, Default
`run_at`, `datetime`, The time at which to schedule the workflow., _required_
`input`, `TWorkflowInput`, The input data for the workflow., `cast(TWorkflowInput, EmptyModel())`
`options`, `ScheduleTriggerWorkflowOptions`, Additional options for workflow execution., `ScheduleTriggerWorkflowOptions()`
Returns:
Type, Description
`WorkflowVersion`, A `WorkflowVersion` object representing the scheduled workflow.
#### `create_cron`
Create a cron job for the workflow.
Parameters:
Name, Type, Description, Default
`cron_name`, `str`, The name of the cron job., _required_
`expression`, `str`, The cron expression that defines the schedule for the cron job., _required_
`input`, `TWorkflowInput`, The input data for the workflow., `cast(TWorkflowInput, EmptyModel())`
`additional_metadata`, `JSONSerializableMapping \, None`, Additional metadata for the cron job., `None`
`priority`, `int \, None`, The priority of the cron job. Must be between 1 and 3, inclusive., `None`
Returns:
Type, Description
`CronWorkflows`, A `CronWorkflows` object representing the created cron job.
#### `aio_create_cron`
Create a cron job for the workflow.
Parameters:
Name, Type, Description, Default
`cron_name`, `str`, The name of the cron job., _required_
`expression`, `str`, The cron expression that defines the schedule for the cron job., _required_
`input`, `TWorkflowInput`, The input data for the workflow., `cast(TWorkflowInput, EmptyModel())`
`additional_metadata`, `JSONSerializableMapping \, None`, Additional metadata for the cron job., `None`
`priority`, `int \, None`, The priority of the cron job. Must be between 1 and 3, inclusive., `None`
Returns:
Type, Description
`CronWorkflows`, A `CronWorkflows` object representing the created cron job.
#### `create_bulk_run_item`
Create a bulk run item for the workflow. This is intended to be used in conjunction with the various `run_many` methods.
Parameters:
Name, Type, Description, Default
`input`, `TWorkflowInput`, The input data for the workflow., `cast(TWorkflowInput, EmptyModel())`
`key`, `str \, None`, The key for the workflow run. This is used to identify the run in the bulk operation and for deduplication., `None`
`options`, `TriggerWorkflowOptions`, Additional options for the workflow run., `TriggerWorkflowOptions()`
Returns:
Type, Description
`WorkflowRunTriggerConfig`, A `WorkflowRunTriggerConfig` object that can be used to trigger the workflow run, which you then pass into the `run_many` methods.
#### `list_runs`
List runs of the workflow.
Parameters:
Name, Type, Description, Default
`since`, `datetime \, None`, The start time for the runs to be listed., `None`
`until`, `datetime \, None`, The end time for the runs to be listed., `None`
`limit`, `int`, The maximum number of runs to be listed., `100`
`offset`, `int \, None`, The offset for pagination., `None`
`statuses`, `list[V1TaskStatus] \, None`, The statuses of the runs to be listed., `None`
`additional_metadata`, `dict[str, str] \, None`, Additional metadata for filtering the runs., `None`
`worker_id`, `str \, None`, The ID of the worker that ran the tasks., `None`
`parent_task_external_id`, `str \, None`, The external ID of the parent task., `None`
`only_tasks`, `bool`, Whether to list only task runs., `False`
`triggering_event_external_id`, `str \, None`, The event id that triggered the task run., `None`
Returns:
Type, Description
`list[V1TaskSummary]`, A list of `V1TaskSummary` objects representing the runs of the workflow.
#### `aio_list_runs`
List runs of the workflow.
Parameters:
Name, Type, Description, Default
`since`, `datetime \, None`, The start time for the runs to be listed., `None`
`until`, `datetime \, None`, The end time for the runs to be listed., `None`
`limit`, `int`, The maximum number of runs to be listed., `100`
`offset`, `int \, None`, The offset for pagination., `None`
`statuses`, `list[V1TaskStatus] \, None`, The statuses of the runs to be listed., `None`
`additional_metadata`, `dict[str, str] \, None`, Additional metadata for filtering the runs., `None`
`worker_id`, `str \, None`, The ID of the worker that ran the tasks., `None`
`parent_task_external_id`, `str \, None`, The external ID of the parent task., `None`
`only_tasks`, `bool`, Whether to list only task runs., `False`
`triggering_event_external_id`, `str \, None`, The event id that triggered the task run., `None`
Returns:
Type, Description
`list[V1TaskSummary]`, A list of `V1TaskSummary` objects representing the runs of the workflow.
#### `create_filter`
Create a new filter.
Parameters:
Name, Type, Description, Default
`expression`, `str`, The expression to evaluate for the filter., _required_
`scope`, `str`, The scope for the filter., _required_
`payload`, `JSONSerializableMapping \, None`, The payload to send with the filter., `None`
Returns:
Type, Description
`V1Filter`, The created filter.
#### `aio_create_filter`
Create a new filter.
Parameters:
Name, Type, Description, Default
`expression`, `str`, The expression to evaluate for the filter., _required_
`scope`, `str`, The scope for the filter., _required_
`payload`, `JSONSerializableMapping \, None`, The payload to send with the filter., `None`
Returns:
Type, Description
`V1Filter`, The created filter.
#### `delete`
Permanently delete the workflow.
**DANGEROUS: This will delete a workflow and all of its data**
#### `aio_delete`
Permanently delete the workflow.
**DANGEROUS: This will delete a workflow and all of its data**
#### `get_run_ref`
Get a reference to a task run by its run ID.
Parameters:
Name, Type, Description, Default
`run_id`, `str`, The ID of the run to get the reference for., _required_
Returns:
Type, Description
`TaskRunRef[TWorkflowInput, R]`, A `TaskRunRef` object representing the reference to the task run.
#### `get_result`
Get the result of a task run by its run ID.
Parameters:
Name, Type, Description, Default
`run_id`, `str`, The ID of the run to get the result for., _required_
Returns:
Type, Description
`R`, The result of the task run.
#### `aio_get_result`
Get the result of a task run by its run ID.
Parameters:
Name, Type, Description, Default
`run_id`, `str`, The ID of the run to get the result for., _required_
Returns:
Type, Description
`R`, The result of the task run.
#### `mock_run`
Mimic the execution of a task. This method is intended to be used to unit test tasks without needing to interact with the Hatchet engine. Use `mock_run` for sync tasks and `aio_mock_run` for async tasks.
Parameters:
Name, Type, Description, Default
`input`, `TWorkflowInput \, None`, The input to the task., `None`
`additional_metadata`, `JSONSerializableMapping \, None`, Additional metadata to attach to the task., `None`
`parent_outputs`, `dict[str, JSONSerializableMapping] \, None`, Outputs from parent tasks, if any. This is useful for mimicking DAG functionality. For instance, if you have a task `step_2` that has a `parent` which is `step_1`, you can pass `parent_outputs={"step_1": {"result": "Hello, world!"}}` to `step_2.mock_run()` to be able to access `ctx.task_output(step_1)` in `step_2`., `None`
`retry_count`, `int`, The number of times the task has been retried., `0`
`lifespan`, `Any`, The lifespan to be used in the task, which is useful if one was set on the worker. This will allow you to access `ctx.lifespan` inside of your task., `None`
`dependencies`, `dict[str, Any] \, None`, Dependencies to be injected into the task. This is useful for tasks that have dependencies defined using `Depends`. **IMPORTANT**: You must pass the dependencies _directly_, **not** the `Depends` objects themselves. For example, if you have a task that has a dependency `config: Annotated[str, Depends(get_config)]`, you should pass `dependencies={"config": "config_value"}` to `aio_mock_run`., `None`
Returns:
Type, Description
`R`, The output of the task.
#### `aio_mock_run`
Mimic the execution of a task. This method is intended to be used to unit test tasks without needing to interact with the Hatchet engine. Use `mock_run` for sync tasks and `aio_mock_run` for async tasks.
Parameters:
Name, Type, Description, Default
`input`, `TWorkflowInput \, None`, The input to the task., `None`
`additional_metadata`, `JSONSerializableMapping \, None`, Additional metadata to attach to the task., `None`
`parent_outputs`, `dict[str, JSONSerializableMapping] \, None`, Outputs from parent tasks, if any. This is useful for mimicking DAG functionality. For instance, if you have a task `step_2` that has a `parent` which is `step_1`, you can pass `parent_outputs={"step_1": {"result": "Hello, world!"}}` to `step_2.mock_run()` to be able to access `ctx.task_output(step_1)` in `step_2`., `None`
`retry_count`, `int`, The number of times the task has been retried., `0`
`lifespan`, `Any`, The lifespan to be used in the task, which is useful if one was set on the worker. This will allow you to access `ctx.lifespan` inside of your task., `None`
`dependencies`, `dict[str, Any] \, None`, Dependencies to be injected into the task. This is useful for tasks that have dependencies defined using `Depends`. **IMPORTANT**: You must pass the dependencies _directly_, **not** the `Depends` objects themselves. For example, if you have a task that has a dependency `config: Annotated[str, Depends(get_config)]`, you should pass `dependencies={"config": "config_value"}` to `aio_mock_run`., `None`
Returns:
Type, Description
`R`, The output of the task.
---
# Working with `asyncio`
Hatchet's Python SDK, similarly to other popular libraries like FastAPI, Langchain, etc., makes heavy use of `asyncio`, and recommends that you do as well!
To learn the basics of `asyncio`, check out [this introduction from
FastAPI](https://fastapi.tiangolo.com/async/).
However, as is the case in FastAPI, when using `asyncio` in Hatchet, you need to be careful to not have any blocking logic in the functions you define as tasks, as this will block the asyncio event loop and prevent additional work from executing until the blocking operation has completed.
For example, this is async-safe:
```python
async def async_safe() -> int:
await asyncio.sleep(5)
return 42
```
But this is not:
```python
async def blocking() -> int:
time.sleep(5)
return 42
```
In the second case, your worker will not be able to process any other work that's defined as async until the five-second sleep has finished.
### Using `asyncio.to_thread` and `loop.run_in_executor`
To avoid problems caused by blocking code, you can run your blocking code in an executor with `asyncio.to_thread` or, more verbosely, `loop.run_in_executor`. The two examples below are async-safe and will no longer block the event loop.
```python
async def to_thread() -> int:
await asyncio.to_thread(time.sleep, 5)
return 42
```
```python
async def run_in_executor() -> int:
loop = asyncio.get_event_loop()
await loop.run_in_executor(None, time.sleep, 5)
return 42
```
### More Resources for working with `asyncio`
If you're looking for more info on developing with AsyncIO more broadly, we highly recommend the following
resources:
- Python's Documentation on [Developing with
AsyncIO](https://docs.python.org/3/library/asyncio-dev.html)
- Tusamma's Medium post about [How AsyncIO
works](https://medium.com/@tssovi/how-does-asyncio-works-f5386316b7fa)
- Zac Hatfield-Dodds's PyCon 2023 talk on [Async: scaling structured concurrency with static and dynamic analysis](https://www.youtube.com/watch?v=FrpUb6OEbcc)
---
# Pydantic Support
The V1 Hatchet SDK leans heavily on [Pydantic](https://docs.pydantic.dev/latest/) (both internally and externally) for handling validation of workflow inputs and outputs, method inputs, and more.
### Usage
To enable Pydantic for validation, you'll need to:
1. Provide an `input_validator` as a parameter to your `workflow`.
2. Add return type hints for your `tasks`.
### Default Behavior
By default, if no `input_validator` is provided, the `EmptyModel` is used, which is a Pydantic model that accepts any input. For example:
```python
from hatchet_sdk import Context, DurableContext, EmptyModel, Hatchet
hatchet = Hatchet()
@hatchet.task()
def simple(input: EmptyModel, ctx: Context) -> dict[str, str]:
return {"result": "Hello, world!"}
@hatchet.durable_task()
async def simple_durable(input: EmptyModel, ctx: DurableContext) -> dict[str, str]:
# durable tasks should be async
return {"result": "Hello, world!"}
def main() -> None:
worker = hatchet.worker(
"test-worker",
workflows=[simple, simple_durable],
)
worker.start()
```
In this simple example, the `input` that's injected into the task accepts an argument `input`, which is of type `EmptyModel`. The `EmptyModel` can be imported directly from Hatchet, and is an alias for:
```python
from pydantic import BaseModel, ConfigDict
class EmptyModel(BaseModel):
model_config = ConfigDict(extra="allow")
```
Note that since `extra="allow"` is set, workflows will not fail with validation errors if an extra field is provided.
### Example Usage
We highly recommend creating Pydantic models to represent your workflow inputs and outputs. This will help you catch errors early and ensure that your workflows are well-typed. For example, consider a fanout workflow like this:
```python
class ParentInput(BaseModel):
n: int = 100
class ChildInput(BaseModel):
a: str
parent_wf = hatchet.workflow(name="FanoutParent", input_validator=ParentInput)
child_wf = hatchet.workflow(name="FanoutChild", input_validator=ChildInput)
@parent_wf.task(execution_timeout=timedelta(minutes=5))
async def spawn(input: ParentInput, ctx: Context) -> dict[str, Any]:
print("spawning child")
result = await child_wf.aio_run_many(
[
child_wf.create_bulk_run_item(
input=ChildInput(a=str(i)),
additional_metadata={"hello": "earth"},
key=f"child{i}",
)
for i in range(input.n)
],
)
print(f"results {result}")
return {"results": result}
```
In this case, we've defined two workflows: a parent and a child. They both have their inputs typed, and the parent spawns the child. Note that `child_wf.create_workflow_run_config` is typed, so the type checker (and your IDE) know the type of the input to the child workflow.
Then, the child tasks are defined as follows:
```python
@child_wf.task()
async def process(input: ChildInput, ctx: Context) -> dict[str, str]:
print(f"child process {input.a}")
return {"status": input.a}
@child_wf.task(parents=[process])
async def process2(input: ChildInput, ctx: Context) -> dict[str, str]:
process_output = ctx.task_output(process)
a = process_output["status"]
return {"status2": a + "2"}
```
In the children, the inputs are validated by Pydantic, so you can access their attributes directly without needing a type cast or parsing a dictionary with the inputs instead.
---
# Lifespans
Lifespans are an **experimental feature** in Hatchet, and are subject to
change.
Hatchet's Python SDK allows you define a **_lifespan_**, which is an async generator that runs when your worker starts up and cleans up when it exits, which lets you share state across all of the tasks running on the worker. This behaves almost identically to [FastAPI's lifespans](https://fastapi.tiangolo.com/advanced/events/), and is intended to be used in the same way. Lifespans are useful for sharing state like connection pools across all tasks on a single worker. They also work great for loading expensive machine learning models into memory before the worker starts.
We recommend only using lifespans for storing **_immutable_** state to share
between tasks running on your worker. The intention is not to e.g. store a
counter of the number of tasks that a worker has run and increment that
counter on each task run. This is prone to unexpected behavior due to
concurrency in Hatchet.
## Usage
To use Hatchet's `lifespan` feature, define an async generator and pass it into your `worker`:
```python
class Lifespan(BaseModel):
model_config = ConfigDict(arbitrary_types_allowed=True)
foo: str
pool: ConnectionPool
async def lifespan() -> AsyncGenerator[Lifespan, None]:
print("Running lifespan!")
with ConnectionPool("postgres://hatchet:hatchet@localhost:5431/hatchet") as pool:
yield Lifespan(
foo="bar",
pool=pool,
)
print("Cleaning up lifespan!")
worker = hatchet.worker(
"test-worker", slots=1, workflows=[lifespan_workflow], lifespan=lifespan
)
```
When the worker starts, it will run the lifespan up to the `yield`. Then, on worker shutdown, it will clean up by running everything after the `yield` (the same as with any other generator).
Your lifespan must only `yield` **_once_**.
Then, to use your lifespan in a task, you can extract it from the context with `Context.lifespan`.
```python
class TaskOutput(BaseModel):
num_rows: int
external_ids: list[UUID]
lifespan_workflow = hatchet.workflow(name="LifespanWorkflow")
@lifespan_workflow.task()
def sync_lifespan_task(input: EmptyModel, ctx: Context) -> TaskOutput:
pool = cast(Lifespan, ctx.lifespan).pool
with pool.connection() as conn:
query = conn.execute("SELECT * FROM v1_lookup_table_olap LIMIT 5;")
rows = query.fetchall()
for row in rows:
print(row)
print("executed sync task with lifespan", ctx.lifespan)
return TaskOutput(
num_rows=len(rows),
external_ids=[cast(UUID, row[0]) for row in rows],
)
```
For type checking, cast the `Context.lifespan` to whatever type your lifespan
generator yields.
And that's it! Now, any task running on the worker with the lifespan provided will have access to the lifespan data.
---
# Dependency Injection
Dependency injection is an **experimental feature** in Hatchet, and is subject
to change.
Hatchet's Python SDK allows you to inject **_dependencies_** into your tasks, FastAPI style. These dependencies can be either synchronous or asynchronous functions. They are executed before the task is triggered, and their results are injected into the task as parameters.
This behaves almost identically to [FastAPI's dependency injection](https://fastapi.tiangolo.com/tutorial/dependencies/), and is intended to be used in the same way. Dependencies are useful for sharing logic between tasks that you'd like to avoid repeating, or would like to factor out of the task logic itself (e.g. to make testing easier).
Since dependencies are run before tasks are executed, having many dependencies (or any that take a long time to evaluate) can cause tasks to experience significantly delayed start times, as they must wait for all dependencies to finish evaluating.
## Usage
To add dependencies to your tasks, import `Depends` from the `hatchet_sdk`. Then:
```python
async def async_dep(input: EmptyModel, ctx: Context) -> str:
return ASYNC_DEPENDENCY_VALUE
def sync_dep(input: EmptyModel, ctx: Context) -> str:
return SYNC_DEPENDENCY_VALUE
@asynccontextmanager
async def async_cm_dep(
input: EmptyModel, ctx: Context, async_dep: Annotated[str, Depends(async_dep)]
) -> AsyncGenerator[str, None]:
try:
yield ASYNC_CM_DEPENDENCY_VALUE + "_" + async_dep
finally:
pass
@contextmanager
def sync_cm_dep(
input: EmptyModel, ctx: Context, sync_dep: Annotated[str, Depends(sync_dep)]
) -> Generator[str, None, None]:
try:
yield SYNC_CM_DEPENDENCY_VALUE + "_" + sync_dep
finally:
pass
@contextmanager
def base_cm_dep(input: EmptyModel, ctx: Context) -> Generator[str, None, None]:
try:
yield CHAINED_CM_VALUE
finally:
pass
def chained_dep(
input: EmptyModel, ctx: Context, base_cm: Annotated[str, Depends(base_cm_dep)]
) -> str:
return "chained_" + base_cm
@asynccontextmanager
async def base_async_cm_dep(
input: EmptyModel, ctx: Context
) -> AsyncGenerator[str, None]:
try:
yield CHAINED_ASYNC_CM_VALUE
finally:
pass
async def chained_async_dep(
input: EmptyModel,
ctx: Context,
base_async_cm: Annotated[str, Depends(base_async_cm_dep)],
) -> str:
return "chained_" + base_async_cm
```
In this example, we've declared two dependencies: one synchronous and one asynchronous. You can do anything you like in your dependencies, such as creating database sessions, managing configuration, sharing instances of service-layer logic, and more.
Once you've defined your dependency functions, inject them into your tasks as follows:
```python
@hatchet.task()
async def async_task_with_dependencies(
_i: EmptyModel,
ctx: Context,
async_dep: Annotated[str, Depends(async_dep)],
sync_dep: Annotated[str, Depends(sync_dep)],
async_cm_dep: Annotated[str, Depends(async_cm_dep)],
sync_cm_dep: Annotated[str, Depends(sync_cm_dep)],
chained_dep: Annotated[str, Depends(chained_dep)],
chained_async_dep: Annotated[str, Depends(chained_async_dep)],
) -> Output:
return Output(
sync_dep=sync_dep,
async_dep=async_dep,
async_cm_dep=async_cm_dep,
sync_cm_dep=sync_cm_dep,
chained_dep=chained_dep,
chained_async_dep=chained_async_dep,
)
```
Important note: Your dependency functions must take two positional arguments:
the workflow input and the `Context` (the same as any other task).
That's it! Now, whenever your task is triggered, its dependencies will be evaluated, and the results will be injected into the task at runtime for you to use as needed.
---
# Dataclass Support
Throughout the docs, we use Pydantic models in virtually all of our Python examples for validating task inputs and outputs. This is the recommended path, as it provides lots of safety guarantees as you're writing tasks. With that said, Hatchet also supports using `dataclasses` as both input and output types to tasks. **Dataclass support was added in SDK version 1.21.0.**
> **Warning:** Dataclasses do not perform any type validation on instantiation like Pydantic
> models do.
### Usage
To use a dataclass instead of a Pydantic model, you'll need to:
1. Provide an `input_validator` as a parameter to your `workflow` or `task` (in the case of a standalone task with `hatchet.task`).
2. Add return type hints for your `tasks`.
### Example Usage
`dataclass` validators work exactly like Pydantic models in Hatchet. First, you create the classes:
```python
@dataclass
class Input:
name: str
@dataclass
class Output:
message: str
```
And then you provide the classes to your workflow or task:
```python
@hatchet.task(input_validator=Input)
def say_hello(input: Input, ctx: Context) -> Output:
return Output(message=f"Hello, {input.name}!")
```
And finally, triggering works the same as well - you just provide the dataclass instance as input:
```python
say_hello.run(input=Input(name="Hatchet"))
```
---
# Hatchet TypeScript SDK Reference
This is the TypeScript SDK reference, documenting methods available for interacting with Hatchet resources.
Check out the [user guide](https://docs.hatchet.run/home/) for an introduction to getting your first tasks running.
### HatchetClient
HatchetV1 implements the main client interface for interacting with the Hatchet workflow engine.
It provides methods for creating and executing workflows, as well as managing workers.
Implements
- `IHatchetClient`
Properties
Property, Type, Description
`tenantId`, `string`, The tenant ID for the Hatchet client
Accessors
##### `api`
Get Signature
Get the API client for making HTTP requests to the Hatchet API
Note: This is not recommended for general use, but is available for advanced scenarios
Returns
`Api`\<`unknown`\>
A API client instance
##### `crons`
Get Signature
Get the cron client for creating and managing cron workflow runs
Returns
[`CronClient`](feature-clients/crons.mdx#cronclient)
A cron client instance
Implementation of
```ts
IHatchetClient.crons;
```
##### `events`
Get Signature
Get the event client for creating and managing event workflow runs
Returns
`EventClient`
A event client instance
Implementation of
```ts
IHatchetClient.events;
```
##### `filters`
Get Signature
Get the filters client for creating and managing filters
Returns
[`FiltersClient`](feature-clients/filters.mdx#filtersclient)
A filters client instance
Implementation of
```ts
IHatchetClient.filters;
```
##### `logs`
Get Signature
Get the logs client for creating and managing logs
Returns
[`LogsClient`](feature-clients/logs.mdx#logsclient)
A logs client instance
Implementation of
```ts
IHatchetClient.logs;
```
##### `metrics`
Get Signature
Get the metrics client for creating and managing metrics
Returns
[`MetricsClient`](feature-clients/metrics.mdx#metricsclient)
A metrics client instance
Implementation of
```ts
IHatchetClient.metrics;
```
##### `ratelimits`
Get Signature
Get the rate limits client for creating and managing rate limits
Returns
[`RatelimitsClient`](feature-clients/ratelimits.mdx#ratelimitsclient)
A rate limits client instance
Implementation of
```ts
IHatchetClient.ratelimits;
```
##### `runs`
Get Signature
Get the runs client for creating and managing runs
Returns
[`RunsClient`](feature-clients/runs.mdx#runsclient)
A runs client instance
Implementation of
```ts
IHatchetClient.runs;
```
##### `scheduled`
Get Signature
Get the schedules client for creating and managing scheduled workflow runs
Returns
[`ScheduleClient`](feature-clients/schedules.mdx#scheduleclient)
A schedules client instance
Implementation of
```ts
IHatchetClient.scheduled;
```
##### `schedules`
Get Signature
Alias
scheduled
Returns
[`ScheduleClient`](feature-clients/schedules.mdx#scheduleclient)
##### `tasks`
Get Signature
Get the tasks client for creating and managing tasks
Returns
[`WorkflowsClient`](feature-clients/workflows.mdx#workflowsclient)
A tasks client instance
##### `tenant`
Get Signature
Get the tenant client for managing tenants
Returns
`TenantClient`
A tenant client instance
Implementation of
```ts
IHatchetClient.tenant;
```
##### `webhooks`
Get Signature
Get the webhooks client for creating and managing webhooks
Returns
[`WebhooksClient`](feature-clients/webhooks.mdx#webhooksclient)
A webhooks client instance
Implementation of
```ts
IHatchetClient.webhooks;
```
##### `workers`
Get Signature
Get the workers client for creating and managing workers
Returns
[`WorkersClient`](feature-clients/workers.mdx#workersclient)
A workers client instance
Implementation of
```ts
IHatchetClient.workers;
```
##### `workflows`
Get Signature
Get the workflows client for creating and managing workflows
Returns
[`WorkflowsClient`](feature-clients/workflows.mdx#workflowsclient)
A workflows client instance
Implementation of
```ts
IHatchetClient.workflows;
```
#### Methods
##### `durableTask()`
Implementation of the durableTask method.
Creates a new durable task workflow.
Types can be explicitly specified as generics or inferred from the function signature.
Parameters
Parameter, Type, Description
`options`, `CreateDurableTaskWorkflowOpts`\<`I` & `Resolved`\<`GlobalInput`, `MiddlewareBefore`\>, `MergeIfNonEmpty`\<`O`, `GlobalOutput`\>\>, Durable task configuration options
Returns
[`TaskWorkflowDeclaration`](Runnables.mdx#taskworkflowdeclaration)\<`I`, `O`, `GlobalInput`, `GlobalOutput`, `MiddlewareBefore`, `MiddlewareAfter`\>
A TaskWorkflowDeclaration instance for a durable task
Creates a new durable task workflow with types inferred from the function parameter.
Parameters
Parameter, Type, Description
`options`, `object` & `Omit`\<`CreateDurableTaskWorkflowOpts`\<`I`, `O`\>, `"fn"`\>, Durable task configuration options with function that defines types
Returns
[`TaskWorkflowDeclaration`](Runnables.mdx#taskworkflowdeclaration)\<`I`, `O`, `GlobalInput`, `GlobalOutput`, `MiddlewareBefore`, `MiddlewareAfter`\>
A TaskWorkflowDeclaration instance with inferred types
##### `run()`
Triggers a workflow run and waits for the result.
Parameters
Parameter, Type, Description
`workflow`, `string` \, `Workflow` \, `BaseWorkflowDeclaration`\<`I`, `O`\>, The workflow to run, either as a Workflow instance or workflow name
`input`, `I`, The input data for the workflow
`options`, `RunOpts`, Configuration options for the workflow run
Returns
`Promise`\<`O`\>
A promise that resolves with the workflow result
##### `runAndWait()`
Triggers a workflow run and waits for the result.
Parameters
Parameter, Type, Description
`workflow`, `string` \, `Workflow` \, `BaseWorkflowDeclaration`\<`I`, `O`\>, The workflow to run, either as a Workflow instance or workflow name
`input`, `I`, The input data for the workflow
`options`, `RunOpts`, Configuration options for the workflow run
Returns
`Promise`\<`O`\>
A promise that resolves with the workflow result
Alias
run
##### `runNoWait()`
Triggers a workflow run without waiting for completion.
Parameters
Parameter, Type, Description
`workflow`, `string` \, `Workflow` \, `BaseWorkflowDeclaration`\<`I`, `O`\>, The workflow to run, either as a Workflow instance or workflow name
`input`, `I`, The input data for the workflow
`options`, `RunOpts`, Configuration options for the workflow run
Returns
`Promise`\<`WorkflowRunRef`\<`O`\>\>
A WorkflowRunRef containing the run ID and methods to interact with the run
##### `task()`
Implementation of the task method.
Creates a new task workflow.
Types can be explicitly specified as generics or inferred from the function signature.
Parameters
Parameter, Type, Description
`options`, `CreateTaskWorkflowOpts`\<`I` & `Resolved`\<`GlobalInput`, `MiddlewareBefore`\>, `MergeIfNonEmpty`\<`O`, `GlobalOutput`\>\>, Task configuration options
Returns
[`TaskWorkflowDeclaration`](Runnables.mdx#taskworkflowdeclaration)\<`I`, `O`, `GlobalInput`, `GlobalOutput`, `MiddlewareBefore`, `MiddlewareAfter`\>
A TaskWorkflowDeclaration instance
Creates a new task workflow with types inferred from the function parameter.
Parameters
Parameter, Type, Description
`options`, `object` & `Omit`\<`CreateTaskWorkflowOpts`\<`I`, `O`\>, `"fn"`\>, Task configuration options with function that defines types
Returns
[`TaskWorkflowDeclaration`](Runnables.mdx#taskworkflowdeclaration)\<`I`, `O`, `GlobalInput`, `GlobalOutput`, `MiddlewareBefore`, `MiddlewareAfter`\>
A TaskWorkflowDeclaration instance with inferred types
##### `worker()`
Creates a new worker instance for processing workflow tasks.
Parameters
Parameter, Type, Description
`name`, `string`, -
`options?`, `number` \, `CreateWorkerOpts`, Configuration options for creating the worker
Returns
`Promise`\<`Worker`\>
A promise that resolves with a new HatchetWorker instance
##### `workflow()`
Creates a new workflow definition.
Parameters
Parameter, Type, Description
`options`, `CreateWorkflowOpts`, Configuration options for creating the workflow
Returns
[`WorkflowDeclaration`](Runnables.mdx#workflowdeclaration)\<`I`, `O`, `Resolved`\<`GlobalInput`, `MiddlewareBefore`\>\>
A new Workflow instance
Note
It is possible to create an orphaned workflow if no client is available using @hatchet/client CreateWorkflow
---
# Context
The Hatchet Context class provides helper methods and useful data to tasks at runtime. It is passed as the second argument to all tasks and durable tasks.
There are two types of context classes you'll encounter:
- Context - The standard context for regular tasks with methods for logging, task output retrieval, cancellation, and more.
- DurableContext - An extended context for durable tasks that includes additional methods for durable execution.
### Context
Extended by
- [`DurableContext`](#durablecontext)
#### Methods
##### `additionalMetadata()`
Retrieves additional metadata associated with the current workflow run.
Returns
`Record`\<`string`, `string`\>
A record of metadata key-value pairs.
##### `bulkRunChildren()`
Runs multiple children workflows in parallel and waits for all results.
Parameters
Parameter, Type, Description
`children`, `object`[], An array of objects containing the workflow name, input data, and options for each workflow.
Returns
`Promise`\<`P`[]\>
A list of results from the children workflows.
##### `bulkRunNoWaitChildren()`
Runs multiple children workflows in parallel without waiting for their results.
Parameters
Parameter, Type, Description
`children`, `object`[], An array of objects containing the workflow name, input data, and options for each workflow.
Returns
`Promise`\<`WorkflowRunRef`\<`P`\>[]\>
A list of workflow run references to the enqueued runs.
##### `childIndex()`
Gets the index of this workflow if it was spawned as part of a bulk operation.
Returns
`number` \| `undefined`
The child index number, or undefined if not set.
##### `childKey()`
Gets the key associated with this workflow if it was spawned as a child workflow.
Returns
`string` \| `undefined`
The child key, or undefined if not set.
##### `errors()`
Returns errors from any task runs in the workflow.
Returns
`Record`\<`string`, `string`\>
A record mapping task names to error messages.
Throws
A warning if no errors are found (this method should be used in on-failure tasks).
##### `filterPayload()`
Gets the payload from the filter that matched when triggering the event.
Returns
`Record`\<`string`, `any`\>
The payload.
##### `parentOutput()`
Retrieves the output of a parent task.
Parameters
Parameter, Type, Description
`parentTask`, \, `string` \, `CreateWorkflowTaskOpts`\<`any`, `L`\> \, `CreateWorkflowDurableTaskOpts`\<`any`, `L`\>, The a CreateTaskOpts or string of the parent task name.
Returns
`Promise`\<`L`\>
The output of the specified parent task.
Throws
An error if the task output is not found.
##### `parentWorkflowRunId()`
Gets the ID of the parent workflow run if this workflow was spawned as a child.
Returns
`string` \| `undefined`
The parent workflow run ID, or undefined if not a child workflow.
##### `putStream()`
Streams data from the current task run.
Parameters
Parameter, Type, Description
`data`, `string` \, `Uint8Array`\<`ArrayBufferLike`\>, The data to stream (string or binary).
Returns
`Promise`\<`void`\>
A promise that resolves when the data has been streamed.
##### `refreshTimeout()`
Refreshes the timeout for the current task.
Parameters
Parameter, Type, Description
`incrementBy`, `Duration`, The interval by which to increment the timeout. The interval should be specified in the format of '10s' for 10 seconds, '1m' for 1 minute, or '1d' for 1 day.
Returns
`Promise`\<`void`\>
##### `releaseSlot()`
Releases a worker slot for a task run such that the worker can pick up another task.
Note: this is an advanced feature that may lead to unexpected behavior if used incorrectly.
Returns
`Promise`\<`void`\>
A promise that resolves when the slot has been released.
##### `rethrowIfCancelled()`
Helper for broad `catch` blocks so cancellation isn't accidentally swallowed.
Example:
```ts
try { ... } catch (e) { ctx.rethrowIfCancelled(e); ... }
```
Parameters
Parameter, Type
`err`, `unknown`
Returns
`void`
##### `retryCount()`
Gets the number of times the current task has been retried.
Returns
`number`
The retry count.
##### `runChild()`
Runs a new workflow and waits for its result.
Parameters
Parameter, Type, Description
`workflow`, \, `string` \, `BaseWorkflowDeclaration`\<`Q`, `P`\> \, [`TaskWorkflowDeclaration`](Runnables.mdx#taskworkflowdeclaration)\<`Q`, `P`, \{ \}, \{ \}, \{ \}, \{ \}\>, The workflow to run (name, Workflow instance, or WorkflowV1 instance).
`input`, `Q`, The input data for the workflow.
`options?`, `ChildRunOpts`, An options object containing key, sticky, priority, and additionalMetadata.
Returns
`Promise`\<`P`\>
The result of the workflow.
##### `runNoWaitChild()`
Enqueues a new workflow without waiting for its result.
Parameters
Parameter, Type, Description
`workflow`, `string` \, `BaseWorkflowDeclaration`\<`Q`, `P`\>, The workflow to enqueue (name, Workflow instance, or WorkflowV1 instance).
`input`, `Q`, The input data for the workflow.
`options?`, `ChildRunOpts`, An options object containing key, sticky, priority, and additionalMetadata.
Returns
`Promise`\<`WorkflowRunRef`\<`P`\>\>
A reference to the spawned workflow run.
##### `taskName()`
Gets the name of the current running task.
Returns
`string`
The name of the task.
##### `taskRunExternalId()`
Gets the ID of the current task run.
Returns
`string`
The task run ID.
##### `triggeredByEvent()`
Determines if the workflow was triggered by an event.
Returns
`boolean`
True if the workflow was triggered by an event, otherwise false.
##### `triggers()`
Gets the dag conditional triggers for the current workflow run.
Returns
`TriggerData`
The triggers for the current workflow.
##### `userData()`
Gets the user data associated with the workflow.
Returns
`K`
The user data.
##### `workflowId()`
Gets the workflow ID of the currently running workflow.
Returns
`string` \| `undefined`
The workflow id.
##### `workflowName()`
Gets the name of the current workflow.
Returns
`string`
The name of the workflow.
##### `workflowRunId()`
Gets the ID of the current workflow run.
Returns
`string`
The workflow run ID.
##### `workflowVersionId()`
Gets the workflow version ID of the currently running workflow.
Returns
`string` \| `undefined`
The workflow version ID.
---
### ContextWorker
ContextWorker is a wrapper around the V1Worker class that provides a more user-friendly interface for the worker from the context of a run.
#### Methods
##### `hasWorkflow()`
Checks if the worker has a registered workflow.
Parameters
Parameter, Type, Description
`workflowName`, `string`, The name of the workflow to check.
Returns
`boolean`
True if the workflow is registered, otherwise false.
##### `id()`
Gets the ID of the worker.
Returns
`string` \| `undefined`
The ID of the worker.
##### `labels()`
Gets the current state of the worker labels.
Returns
`WorkerLabels`
The labels of the worker.
##### `upsertLabels()`
Upserts the a set of labels on the worker.
Parameters
Parameter, Type, Description
`labels`, `WorkerLabels`, The labels to upsert.
Returns
`Promise`\<`WorkerLabels`\>
A promise that resolves when the labels have been upserted.
---
### DurableContext
DurableContext provides helper methods and useful data to durable tasks at runtime.
It extends the Context class and includes additional methods for durable execution like sleepFor and waitFor.
Extends
- [`Context`](#context)\<`T`, `K`\>
Accessors
##### `invocationCount`
Get Signature
The invocation count for the current durable task. Used for deduplication across replays.
Returns
`number`
#### Methods
##### `now()`
Get the current timestamp, memoized across replays. Returns the same Date on every replay of the same task run.
Returns
`Promise`\<`Date`\>
The memoized current timestamp.
##### `sleepFor()`
Pauses execution for the specified duration.
Duration is "global" meaning it will wait in real time regardless of transient failures like worker restarts.
Parameters
Parameter, Type, Description
`duration`, `Duration`, The duration to sleep for.
`readableDataKey?`, `string`, -
Returns
`Promise`\<`SleepResult`\>
A promise that resolves with a SleepResult when the sleep duration has elapsed.
##### `sleepUntil()`
Durably sleep until a specific timestamp.
Uses the memoized `now()` to compute the remaining duration, then delegates to `sleepFor`.
Parameters
Parameter, Type, Description
`wakeAt`, `Date`, The timestamp to sleep until.
Returns
`Promise`\<`SleepResult`\>
A SleepResult containing the actual duration slept.
##### `spawnChild()`
Spawns a child workflow through the durable event log, waits for the child to complete.
Parameters
Parameter, Type, Description
`workflow`, \, `string` \, `BaseWorkflowDeclaration`\<`Q`, `P`\> \, [`TaskWorkflowDeclaration`](Runnables.mdx#taskworkflowdeclaration)\<`Q`, `P`, \{ \}, \{ \}, \{ \}, \{ \}\>, The workflow to spawn.
`input?`, `Q`, The input data for the child workflow.
`options?`, `ChildRunOpts`, Options for spawning the child workflow.
Returns
`Promise`\<`P`\>
The result of the child workflow.
##### `spawnChildren()`
Spawns multiple child workflows through the durable event log, waits for all to complete.
Parameters
Parameter, Type, Description
`children`, `object`[], An array of objects containing the workflow, input, and options for each child.
Returns
`Promise`\<`P`[]\>
A list of results from the child workflows.
##### `waitFor()`
Pauses execution until the specified conditions are met.
Conditions are "global" meaning they will wait in real time regardless of transient failures like worker restarts.
Parameters
Parameter, Type, Description
`conditions`, `Conditions` \, `Conditions`[], The conditions to wait for.
Returns
`Promise`\<`Record`\<`string`, `any`\>\>
A promise that resolves with the event that satisfied the conditions.
##### `waitForEvent()`
Lightweight wrapper for waiting for a user event. Allows for shorthand usage of
`ctx.waitFor` when specifying a user event condition.
For more complicated conditions, use `ctx.waitFor` directly.
Parameters
Parameter, Type, Description
`key`, `string`, The event key to wait for.
`expression?`, `string`, An optional CEL expression to filter events.
`payloadSchema?`, `T`, An optional Zod schema to validate and parse the event payload.
Returns
`Promise`\<`TypeOf`\<`T`\>\>
The event payload, validated against the schema if provided.
Lightweight wrapper for waiting for a user event. Allows for shorthand usage of
`ctx.waitFor` when specifying a user event condition.
For more complicated conditions, use `ctx.waitFor` directly.
Parameters
Parameter, Type, Description
`key`, `string`, The event key to wait for.
`expression?`, `string`, An optional CEL expression to filter events.
Returns
`Promise`\<`Record`\<`string`, `any`\>\>
The event payload, validated against the schema if provided.
---
### Cron Client
The cron client is a client for managing cron workflows within Hatchet.
#### Methods
##### `create()`
Creates a new Cron workflow.
Parameters
Parameter, Type, Description
`workflow`, `string` \, `BaseWorkflowDeclaration`\<`any`, `any`\> \, `Workflow`, The workflow identifier or Workflow object.
`cron`, \{ `additionalMetadata?`: `Record`\<`string`, `string`\>; `expression`: `string`; `input?`: `Record`\<`string`, `any`\>; `name`: `string`; `priority?`: `number`; \}, The input data for creating the Cron Trigger.
`cron.additionalMetadata?`, `Record`\<`string`, `string`\>, -
`cron.expression`, `string`, -
`cron.input?`, `Record`\<`string`, `any`\>, -
`cron.name`, `string`, -
`cron.priority?`, `number`, -
Returns
`Promise`\<`CronWorkflows`\>
A promise that resolves to the created CronWorkflows object.
Throws
Will throw an error if the input is invalid or the API call fails.
##### `delete()`
Deletes an existing Cron Trigger.
Parameters
Parameter, Type, Description
`cron`, `string` \, `CronWorkflows`, The Cron Trigger ID as a string or CronWorkflows object.
Returns
`Promise`\<`void`\>
A promise that resolves when the Cron Trigger is deleted.
##### `get()`
Retrieves a specific Cron Trigger by its ID.
Parameters
Parameter, Type, Description
`cron`, `string` \, `CronWorkflows`, The Cron Trigger ID as a string or CronWorkflows object.
Returns
`Promise`\<`CronWorkflows`\>
A promise that resolves to the CronWorkflows object.
##### `list()`
Lists all Cron Triggers based on the provided query parameters.
Parameters
Parameter, Type, Description
`query`, `object` & `object`, Query parameters for listing Cron Triggers.
Returns
`Promise`\<`CronWorkflowsList`\>
A promise that resolves to a CronWorkflowsList object.
---
### Filters Client
The filters client is a client for interacting with Hatchet's filters API.
#### Methods
##### `create()`
Creates a new filter.
Parameters
Parameter, Type, Description
`opts`, `V1CreateFilterRequest`, The options for the create operation.
Returns
`Promise`\<`V1Filter`\>
A promise that resolves to the created filter.
##### `delete()`
Deletes a filter by its ID.
Parameters
Parameter, Type, Description
`filterId`, `string`, The ID of the filter to delete.
Returns
`Promise`\<`V1Filter`\>
A promise that resolves to the deleted filter.
##### `get()`
Gets a filter by its ID.
Parameters
Parameter, Type, Description
`filterId`, `string`, The ID of the filter to get.
Returns
`Promise`\<`V1Filter`\>
A promise that resolves to the filter.
##### `list()`
Lists all filters.
Parameters
Parameter, Type, Description
`opts?`, \{ `limit?`: `number`; `offset?`: `number`; `scopes?`: ...[]; `workflowIds?`: ...[]; \}, The options for the list operation.
`opts.limit?`, `number`, The number of filters to return.
`opts.offset?`, `number`, The number of filters to skip before returning the result set.
`opts.scopes?`, ...[], A list of scopes to filter by.
`opts.workflowIds?`, ...[], A list of workflow IDs to filter by.
Returns
`Promise`\<`V1FilterList`\>
A promise that resolves to the list of filters.
##### `update()`
Updates a filter by its ID.
Parameters
Parameter, Type, Description
`filterId`, `string`, The ID of the filter to update.
`updates`, `V1UpdateFilterRequest`, The updates to apply to the filter.
Returns
`Promise`\<`V1Filter`\>
A promise that resolves to the updated filter.
---
### Logs Client
The logs client is a client for interacting with Hatchet's logs API.
#### Methods
##### `list()`
Lists the logs for a given task run.
Parameters
Parameter, Type, Description
`taskRunId`, `string`, The ID of the task run to list logs for.
`opts?`, [`ListLogsOpts`](#listlogsopts), The options filter for the list operation.
Returns
`Promise`\<`V1LogLineList`\>
A promise that resolves to the list of logs.
## Type Aliases
### ListLogsOpts
```ts
type ListLogsOpts = object;
```
The options for the list logs operation.
Properties
Property, Type, Description
`attempt?`, `number`, Filter logs by attempt number.
`levels?`, `V1LogLineLevel`[], Filter logs by log level.
`limit?`, `number`, The maximum number of log lines to return.
`orderByDirection?`, `V1LogLineOrderByDirection`, The direction to order the logs by.
`search?`, `string`, Filter logs by a search string.
`since?`, `Date`, Return only logs after this date.
`until?`, `Date`, Return only logs before this date.
---
### Metrics Client
The metrics client is a client for reading metrics out of Hatchet’s metrics API.
#### Methods
##### `getQueueMetrics()`
Returns the queue metrics for the current tenant.
Parameters
Parameter, Type, Description
`opts?`, `RequestParams`, The options for the request.
Returns
`Promise`\<`TenantStepRunQueueMetrics`\>
The queue metrics for the current tenant.
##### `getTaskStats()`
Get task statistics for the tenant.
Returns
`Promise`\<`TaskStats`\>
A record mapping task names to their statistics.
##### `getTaskStatusMetrics()`
Returns aggregate task run counts grouped by status (queued, running, completed, failed, cancelled)
Parameters
Parameter, Type, Description
`query`, \{ `additional_metadata?`: `string`[]; `parent_task_external_id?`: `string`; `since`: `string`; `triggering_event_external_id?`: `string`; `until?`: `string`; `workflow_ids?`: `string`[]; \}, Filters for the metrics query (e.g. `since`, `until`, `workflow_ids`).
`query.additional_metadata?`, `string`[], Additional metadata k-v pairs to filter by
`query.parent_task_external_id?`, `string`, The parent task's external id **Format** uuid **Min Length** 36 **Max Length** 36
`query.since?`, `string`, The start time to get metrics for **Format** date-time
`query.triggering_event_external_id?`, `string`, The id of the event that triggered the task **Format** uuid **Min Length** 36 **Max Length** 36
`query.until?`, `string`, The end time to get metrics for **Format** date-time
`query.workflow_ids?`, `string`[], The workflow id to find runs for
`requestParams?`, `RequestParams`, Optional request-level overrides (headers, signal, etc.).
Returns
`Promise`\<`TaskStatusMetrics`\>
Counts per status for the matched task runs.
##### `scrapePrometheusMetrics()`
Scrape Prometheus metrics for the tenant.
Returns
`Promise`\<`string`\>
The metrics in Prometheus text format.
---
### RatelimitsClient
The rate limits client is a wrapper for Hatchet’s gRPC API that makes it easier to work with rate limits in Hatchet.
#### Methods
##### `list()`
Lists all rate limits for the current tenant.
Parameters
Parameter, Type, Description
`opts`, \, \{ `limit?`: `number`; `offset?`: `number`; `orderByDirection?`: `RateLimitOrderByDirection`; `orderByField?`: `RateLimitOrderByField`; `search?`: `string`; \} \, `undefined`, The options for the list operation.
Returns
`Promise`\<`RateLimitList`\>
A promise that resolves to the list of rate limits.
##### `upsert()`
Upserts a rate limit for the current tenant.
Parameters
Parameter, Type, Description
`opts`, `CreateRateLimitOpts`, The options for the upsert operation.
Returns
`Promise`\<`string`\>
A promise that resolves to the key of the upserted rate limit.
---
### Runs Client
The runs client is a client for interacting with task and workflow runs within Hatchet.
#### Methods
##### `branchDurableTask()`
Fork (reset) a durable task from a specific node, triggering re-execution from that point.
Parameters
Parameter, Type, Default value, Description
`taskExternalId`, `string`, `undefined`, The external ID of the durable task to reset.
`nodeId`, `number`, `undefined`, The node ID to replay from.
`branchId`, `number`, `0`, -
Returns
`Promise`\<`AxiosResponse`\<`V1BranchDurableTaskResponse`, `any`, \{
\}\>\>
##### `cancel()`
Cancels a task or workflow run by its ID.
Parameters
Parameter, Type, Description
`opts`, `CancelRunOpts`, The options for the cancel operation.
Returns
`Promise`\<`AxiosResponse`\<`V1CancelledTasks`, `any`, \{
\}\>\>
A promise that resolves to the cancelled run.
##### `get()`
Gets a task or workflow run by its ID.
Parameters
Parameter, Type, Description
`run`, `string` \, `WorkflowRunRef`\<`T`\>, The ID of the run to get.
Returns
`Promise`\<`V1WorkflowRunDetails`\>
A promise that resolves to the run.
##### `get_status()`
Gets the status of a task or workflow run by its ID.
Parameters
Parameter, Type, Description
`run`, `string` \, `WorkflowRunRef`\<`T`\>, The ID of the run to get the status of.
Returns
`Promise`\<`V1TaskStatus`\>
A promise that resolves to the status of the run.
##### `getTaskExternalId()`
Resolve the task external ID for a workflow run. For runs with multiple tasks,
returns the first task's external ID.
Parameters
Parameter, Type, Description
`workflowRunId`, `string`, The workflow run ID to look up.
Returns
`Promise`\<`string`\>
The task external ID.
##### `list()`
Lists all task and workflow runs for the current tenant.
Parameters
Parameter, Type, Description
`opts?`, `Partial`\<`ListRunsOpts`\>, The options for the list operation.
Returns
`Promise`\<`V1TaskSummaryList`\>
A promise that resolves to the list of runs.
##### `replay()`
Replays a task or workflow run by its ID.
Parameters
Parameter, Type, Description
`opts`, `ReplayRunOpts`, The options for the replay operation.
Returns
`Promise`\<`AxiosResponse`\<`V1ReplayedTasks`, `any`, \{
\}\>\>
A promise that resolves to the replayed run.
##### `restoreTask()`
Restore an evicted durable task so it can resume execution.
Parameters
Parameter, Type, Description
`taskExternalId`, `string`, The external ID of the evicted task.
Returns
`Promise`\<`AxiosResponse`\<`V1RestoreTaskResponse`, `any`, \{
\}\>\>
##### `subscribeToStream()`
Subscribes to a stream of events for a task or workflow run by its ID.
Parameters
Parameter, Type, Description
`workflowRunId`, `string`, The ID of the run to subscribe to.
Returns
`AsyncIterableIterator`\<`string`\>
A promise that resolves to the stream of events.
---
### ScheduleClient
The scheduled client is a client for managing scheduled workflows within Hatchet
#### Methods
##### `bulkDelete()`
Bulk deletes scheduled runs (by explicit IDs and/or a filter).
Parameters
Parameter, Type, Description
`opts`, \{ `filter?`: `ScheduledWorkflowsBulkDeleteFilter`; `scheduledRuns?`: (... \, ...)[]; \}, Either `scheduledRuns` (ids/objects) and/or a server-side filter.
`opts.filter?`, `ScheduledWorkflowsBulkDeleteFilter`, -
`opts.scheduledRuns?`, (... \, ...)[], -
Returns
`Promise`\<`ScheduledWorkflowsBulkDeleteResponse`\>
A promise that resolves to deleted ids + per-id errors.
##### `bulkUpdate()`
Bulk updates (reschedules) scheduled runs.
Parameters
Parameter, Type, Description
`updates`, `object`[], List of id/object + new triggerAt.
Returns
`Promise`\<`ScheduledWorkflowsBulkUpdateResponse`\>
A promise that resolves to updated ids + per-id errors.
##### `create()`
Creates a new Scheduled Run.
Parameters
Parameter, Type, Description
`workflow`, `string` \, `Workflow`, The workflow name or Workflow object.
`cron`, \{ `additionalMetadata?`: `Record`\<`string`, `string`\>; `input?`: `Record`\<`string`, `any`\>; `priority?`: `number`; `triggerAt`: `Date`; \}, -
`cron.additionalMetadata?`, `Record`\<`string`, `string`\>, -
`cron.input?`, `Record`\<`string`, `any`\>, -
`cron.priority?`, `number`, -
`cron.triggerAt`, `Date`, -
Returns
`Promise`\<`ScheduledWorkflows`\>
A promise that resolves to the created ScheduledWorkflows object.
Throws
Will throw an error if the input is invalid or the API call fails.
###### Important
This method is instrumented by HatchetInstrumentor.\_patchScheduleCreate.
Keep the signature in sync with the instrumentor wrapper.
##### `delete()`
Deletes an existing Scheduled Run.
Parameters
Parameter, Type, Description
`scheduledRun`, `string` \, `ScheduledWorkflows`, The Scheduled Run ID as a string or ScheduledWorkflows object.
Returns
`Promise`\<`void`\>
A promise that resolves when the Scheduled Run is deleted.
##### `get()`
Retrieves a specific Scheduled Run by its ID.
Parameters
Parameter, Type, Description
`scheduledRun`, `string` \, `ScheduledWorkflows`, The Scheduled Run ID as a string or ScheduledWorkflows object.
Returns
`Promise`\<`ScheduledWorkflows`\>
A promise that resolves to the ScheduledWorkflows object.
##### `list()`
Lists all Cron Triggers based on the provided query parameters.
Parameters
Parameter, Type, Description
`query`, `object` & `object`, Query parameters for listing Scheduled Runs.
Returns
`Promise`\<`ScheduledWorkflowsList`\>
A promise that resolves to a ScheduledWorkflowsList object.
##### `update()`
Updates (reschedules) an existing Scheduled Run.
Parameters
Parameter, Type, Description
`scheduledRun`, `string` \, `ScheduledWorkflows`, The Scheduled Run ID as a string or ScheduledWorkflows object.
`update`, \{ `triggerAt`: `Date`; \}, The update payload (currently only triggerAt).
`update.triggerAt`, `Date`, -
Returns
`Promise`\<`ScheduledWorkflows`\>
A promise that resolves to the updated ScheduledWorkflows object.
---
### Webhooks Client
Client for managing incoming webhooks in Hatchet.
Webhooks allow external systems to trigger Hatchet workflows by sending
HTTP requests to dedicated endpoints. This enables real-time integration
with third-party services like GitHub, Stripe, Slack, or any system that
can send webhook events.
#### Methods
##### `create()`
Creates a new webhook.
Parameters
Parameter, Type, Description
`request`, `CreateWebhookOptions`, The request options for the create operation.
Returns
`Promise`\<`V1Webhook`\>
A promise that resolves to the created webhook.
##### `delete()`
Deletes a webhook by its name.
Parameters
Parameter, Type, Description
`webhookName`, `string`, The name of the webhook to delete.
Returns
`Promise`\<`V1Webhook`\>
A promise that resolves to the deleted webhook.
##### `get()`
Gets a webhook by its name.
Parameters
Parameter, Type, Description
`webhookName`, `string`, The name of the webhook to get.
Returns
`Promise`\<`V1Webhook`\>
A promise that resolves to the webhook.
##### `list()`
Lists all webhooks for the current tenant.
Parameters
Parameter, Type, Description
`options?`, \{ `limit?`: `number`; `offset?`: `number`; `sourceNames?`: ...[]; `webhookNames?`: ...[]; \}, The options for the list operation.
`options.limit?`, `number`, -
`options.offset?`, `number`, -
`options.sourceNames?`, ...[], -
`options.webhookNames?`, ...[], -
Returns
`Promise`\<`V1WebhookList`\>
A promise that resolves to the list of webhooks.
##### `update()`
Updates a webhook by its name.
Parameters
Parameter, Type, Description
`webhookName`, `string`, The name of the webhook to update.
`options`, `Partial`\<`V1UpdateWebhookRequest`\>, The options for the update operation.
Returns
`Promise`\<`V1Webhook`\>
A promise that resolves to the updated webhook.
---
### Workers Client
The workers client is a client for managing workers programmatically within Hatchet.
#### Methods
##### `get()`
Get a worker by its ID.
Parameters
Parameter, Type, Description
`workerId`, `string`, The ID of the worker to get.
Returns
`Promise`\<`Worker`\>
A promise that resolves to the worker.
##### `isPaused()`
Check if a worker is paused.
Parameters
Parameter, Type, Description
`workerId`, `string`, The ID of the worker to check.
Returns
`Promise`\<`boolean`\>
A promise that resolves to true if the worker is paused, false otherwise.
##### `list()`
List all workers in the tenant.
Returns
`Promise`\<`WorkerList`\>
A promise that resolves to the list of workers.
##### `pause()`
Pause a worker.
Parameters
Parameter, Type, Description
`workerId`, `string`, The ID of the worker to pause.
Returns
`Promise`\<`Worker`\>
A promise that resolves to the paused worker.
##### `unpause()`
Unpause a worker.
Parameters
Parameter, Type, Description
`workerId`, `string`, The ID of the worker to unpause.
Returns
`Promise`\<`Worker`\>
A promise that resolves to the unpaused worker.
---
### Workflows Client
The workflows client is a client for managing workflows programmatically within Hatchet.
NOTE: that workflows are the declaration, not the individual runs. If you're looking for runs, use the RunsClient instead.
#### Methods
##### `delete()`
Delete a workflow by its name, ID, or object.
Parameters
Parameter, Type, Description
`workflow`, `string` \, `BaseWorkflowDeclaration`\<`any`, `any`\> \, `Workflow`, The workflow name, ID, or object.
Returns
`Promise`\<`void`\>
A promise that resolves to the deleted workflow.
##### `get()`
Get a workflow by its name, ID, or object.
Parameters
Parameter, Type, Description
`workflow`, `string` \, `BaseWorkflowDeclaration`\<`any`, `any`\> \, `Workflow`, The workflow name, ID, or object.
Returns
`Promise`\<`Workflow`\>
A promise that resolves to the workflow.
##### `getWorkflowIdFromName()`
Gets the workflow ID from a workflow name, ID, or object.
If the input is not a valid UUID, it will look up the workflow by name.
Parameters
Parameter, Type, Description
`workflow`, \, `string` \, `WorkflowDefinition` \, `BaseWorkflowDeclaration`\<`any`, `any`\> \, `Workflow`, The workflow name, ID, or object.
Returns
`Promise`\<`string`\>
The workflow ID as a string.
##### `list()`
List all workflows in the tenant.
Parameters
Parameter, Type, Description
`opts?`, \{ `limit?`: `number`; `name?`: `string`; `offset?`: `number`; \}, The options for the list operation.
`opts.limit?`, `number`, The number to limit by **Format** int **Default** `50`
`opts.name?`, `string`, Search by name
`opts.offset?`, `number`, The number to skip **Format** int **Default** `0`
Returns
`Promise`\<`WorkflowList`\>
A promise that resolves to the list of workflows.
---
# Runnables
`Runnables` in the Hatchet TypeScript SDK are things that can be run, namely tasks and workflows. The two main types of runnables you'll encounter are:
- `WorkflowDeclaration`, returned by `hatchet.workflow(...)`, which lets you define tasks and call `run()`, `schedule()`, `cron()`, etc.
- `TaskWorkflowDeclaration`, returned by `hatchet.task(...)`, which is a single standalone task that exposes the same execution helpers as a workflow.
### TaskWorkflowDeclaration
A standalone task declaration that can be run like a workflow.
`TaskWorkflowDeclaration` is returned by `hatchet.task(...)` and wraps a single
task definition while exposing the same execution helpers as workflows, such as
`run()`, `runNoWait()`, `schedule()`, and `cron()` (inherited from
`BaseWorkflowDeclaration`).
Example:
```typescript
const greet = hatchet.task<{ name: string }, { message: string }>({
name: "greet",
fn: async (input) => ({ message: `Hello, ${input.name}!` }),
});
await greet.run({ name: "World" });
const ref = await greet.runNoWait({ name: "World" });
```
#### Template
Extra fields added to the task fn input by pre-middleware hooks.
#### Methods
##### `cron()`
Creates a cron schedule for the task.
Parameters
Parameter, Type, Description
`name`, `string`, The name of the cron schedule.
`expression`, `string`, The cron expression defining the schedule.
`input`, `I` & `GlobalInput`, The input data for the task, including global input fields.
`options?`, `RunOpts`, Optional configuration for this task run.
Returns
`Promise`\<`CronWorkflows`\>
A promise that resolves with the cron workflow details.
Overrides
```ts
BaseWorkflowDeclaration.cron;
```
##### `delay()`
Schedules the task to run after a specified delay.
Parameters
Parameter, Type, Description
`duration`, `number`, The delay in seconds before the task should run.
`input`, `I` & `GlobalInput`, The input data for the task, including global input fields.
`options?`, `RunOpts`, Optional configuration for this task run.
Returns
`Promise`\<`ScheduledWorkflows`\>
A promise that resolves with the scheduled workflow details.
Overrides
```ts
BaseWorkflowDeclaration.delay;
```
##### `run()`
Triggers a task run and waits for the result.
Parameters
Parameter, Type, Description
`input`, `I` & `GlobalInput`, The input data for the task, including global input fields.
`options?`, `RunOpts`, Optional configuration for this task run.
Returns
`Promise`\<`O` & `Resolved`\<`GlobalOutput`, `MiddlewareAfter`\>\>
A promise that resolves with the task output merged with post-middleware fields.
Overrides
```ts
BaseWorkflowDeclaration.run;
```
##### `runAndWait()`
Triggers a task run and waits for the result.
Parameters
Parameter, Type, Description
`input`, `I` & `GlobalInput`, The input data for the task, including global input fields.
`options?`, `RunOpts`, Optional configuration for this task run.
Returns
`Promise`\<`O` & `Resolved`\<`GlobalOutput`, `MiddlewareAfter`\>\>
A promise that resolves with the task output merged with post-middleware fields.
Overrides
```ts
BaseWorkflowDeclaration.runAndWait;
```
##### `runNoWait()`
Triggers a task run without waiting for completion.
Parameters
Parameter, Type, Description
`input`, `I` & `GlobalInput`, The input data for the task, including global input fields.
`options?`, `RunOpts`, Optional configuration for this task run.
Returns
`Promise`\<`WorkflowRunRef`\<`O` & `Resolved`\<..., ...\>\>\>
A WorkflowRunRef containing the run ID and methods to get results.
Overrides
```ts
BaseWorkflowDeclaration.runNoWait;
```
Triggers a task run without waiting for completion.
Parameters
Parameter, Type, Description
`input`, `I` & `GlobalInput`[], The input data for the task, including global input fields.
`options?`, `RunOpts`, Optional configuration for this task run.
Returns
`Promise`\<`WorkflowRunRef`\<... & ...\>[]\>
A WorkflowRunRef containing the run ID and methods to get results.
Overrides
```ts
BaseWorkflowDeclaration.runNoWait;
```
##### `schedule()`
Schedules the task to run at a specific date and time.
Parameters
Parameter, Type, Description
`enqueueAt`, `Date`, The date when the task should be triggered.
`input`, `I` & `GlobalInput`, The input data for the task, including global input fields.
`options?`, `RunOpts`, Optional configuration for this task run.
Returns
`Promise`\<`ScheduledWorkflows`\>
A promise that resolves with the scheduled workflow details.
Overrides
```ts
BaseWorkflowDeclaration.schedule;
```
---
### WorkflowDeclaration
A Hatchet workflow, which lets you define tasks and perform actions on the workflow.
Workflows in Hatchet represent coordinated units of work that can be triggered,
scheduled, or run on a cron schedule. Each workflow can contain multiple tasks
that can be arranged in dependencies (DAGs), with customized retry behavior,
timeouts, concurrency controls, and more.
Example:
```typescript
import { hatchet } from "./hatchet-client";
type MyInput = { name: string };
const workflow = hatchet.workflow({
name: "my-workflow",
});
workflow.task({
name: "greet",
fn: async (input) => {
return { message: `Hello, ${input.name}!` };
},
});
// Run the workflow
await workflow.run({ name: "World" });
```
Workflows support various execution patterns, including:
- One-time execution with `run()` and `runNoWait()`
- Scheduled execution with `schedule()`
- Cron-based recurring execution with `cron()`
- Bulk execution by passing an array input to `run()` and `runNoWait()`
Tasks within workflows can be defined with `workflow.task()` or
`workflow.durableTask()` and arranged into complex dependency patterns.
#### Methods
##### `durableTask()`
Adds a durable task to the workflow.
The return type will be either the property on O that corresponds to the task name,
or if there is no matching property, the inferred return type of the function.
Parameters
Parameter, Type, Description
`options`, `Omit`\<`CreateWorkflowTaskOpts`\<`I`, `TO`\>, `"fn"`\> & `object`, The task configuration options.
Returns
`CreateWorkflowDurableTaskOpts`\<`I`, `TO`\>
The task options that were added.
##### `onFailure()`
Adds an onFailure task to the workflow.
This will only run if any task in the workflow fails.
Parameters
Parameter, Type, Description
`options`, \, `Omit`\<`CreateOnFailureTaskOpts`\<..., ...\>, `"fn"`\> & `object` \, [`TaskWorkflowDeclaration`](#taskworkflowdeclaration)\<`any`, `any`, \{ \}, \{ \}, \{ \}, \{ \}\>, The task configuration options.
Returns
`CreateWorkflowTaskOpts`\<`I`, `TaskOutputType`\<`O`, `Name`, `L`\>\>
The task options that were added.
##### `onSuccess()`
Adds an onSuccess task to the workflow.
This will only run if all tasks in the workflow complete successfully.
Parameters
Parameter, Type, Description
`options`, \, [`TaskWorkflowDeclaration`](#taskworkflowdeclaration)\<`any`, `any`, \{ \}, \{ \}, \{ \}, \{ \}\> \, `Omit`\<`CreateOnSuccessTaskOpts`\<..., ...\>, `"fn"`\> & `object`, The task configuration options.
Returns
`CreateWorkflowTaskOpts`\<`I`, `TaskOutputType`\<`O`, `Name`, `L`\>\>
The task options that were added.
##### `task()`
Adds a task to the workflow.
The return type will be either the property on O that corresponds to the task name,
or if there is no matching property, the inferred return type of the function.
Parameters
Parameter, Type, Description
`options`, \, `Omit`\<`CreateWorkflowTaskOpts`\<..., ...\>, `"fn"`\> & `object` \, [`TaskWorkflowDeclaration`](#taskworkflowdeclaration)\<`I`, `TO`, \{ \}, \{ \}, \{ \}, \{ \}\>, The task configuration options.
Returns
`CreateWorkflowTaskOpts`\<`I`, `TO`\>
The task options that were added.
## Functions
### `CreateDurableTaskWorkflow()`
Creates a new durable task workflow declaration with types inferred from the function parameter.
Parameters
Parameter, Type, Description
`options`, `object` & `Omit`\<`CreateWorkflowDurableTaskOpts`\<`I`, `O`\>, `"fn"`\>, The durable task configuration options.
`client?`, `IHatchetClient`, Optional Hatchet client instance.
Returns
[`TaskWorkflowDeclaration`](#taskworkflowdeclaration)\<`I`, `O`\>
A new TaskWorkflowDeclaration with inferred types.
---
### `CreateTaskWorkflow()`
Creates a new task workflow declaration with types inferred from the function parameter.
Parameters
Parameter, Type, Description
`options`, `object` & `Omit`\<`CreateTaskWorkflowOpts`\<`I`, `O`\>, `"fn"`\>, The task configuration options.
`client?`, `IHatchetClient`, Optional Hatchet client instance.
Returns
[`TaskWorkflowDeclaration`](#taskworkflowdeclaration)\<`I`, `O`\>
A new TaskWorkflowDeclaration with inferred types.
---
### `CreateWorkflow()`
Creates a new workflow instance.
Parameters
Parameter, Type, Description
`options`, `CreateWorkflowOpts`, The options for creating the workflow. Optionally include a Zod schema via the `input` field to generate a JSON Schema for the backend.
`client?`, `IHatchetClient`, Optional Hatchet client instance.
Returns
[`WorkflowDeclaration`](#workflowdeclaration)\<`I`, `O`\>
A new Workflow instance.
---
# Hatchet CLI Setup & Installation
> **Warning:** The Hatchet CLI is currently in beta and may have breaking changes in future
> releases.
The Hatchet CLI is a command-line tool with utilities for running workers locally, interacting with a running Hatchet deployment, and running a local Hatchet instance for development.
## Features
- **Quickstarts**: the `hatchet quickstart` command sets up a local Hatchet instance with a sample project to help you get started quickly.
- **Local worker reloading**: the [`hatchet worker dev`](/cli/running-workers-locally) command lets you run a worker locally with automatic reloading when code changes are detected.
- **A full built-in TUI**: the [`hatchet tui`](/cli/tui) command lets you interact with your Hatchet deployment through a terminal user interface (TUI) that provides real-time observability into tasks, workflows, workers, and more.
- **Profiles**: the [`hatchet profile`](/cli/profiles) commands allow you to manage multiple Hatchet instances and tenants with named profiles, making it easy to switch between different environments.
## Installation
The recommended way to install the Hatchet CLI is via our install script or Homebrew:
#### Native Install (Recommended)
**MacOS, Linux, WSL**
```sh
curl -fsSL https://install.hatchet.run/install.sh | bash
```
#### Homebrew
**MacOS**
```sh
brew install hatchet-dev/hatchet/hatchet --cask
```
## Verifying Installation
After installation, verify that the Hatchet CLI is installed correctly by checking its version:
```sh
hatchet --version
```
---
# Profiles
The Hatchet CLI supports managing multiple Hatchet instances and tenants using named profiles. This feature makes it easy to switch between different environments, such as development, staging, and production.
## Creating a Profile
You can create a new profile using the `hatchet profile add` command. You will need to provide a Hatchet API token for the profile.
```sh
hatchet profile add
```
This command will prompt you to enter the API token followed by the profile name. You can also provide these as flags:
```sh
hatchet profile add --name [name] --token [token]
```
## Listing Profiles
You can list all the profiles you have configured using the `hatchet profile list` command:
```sh
hatchet profile list
```
This will display all configured profiles, with the default profile marked with `(default)` if one is set.
## Setting a Default Profile
You can set a profile as the default using the `hatchet profile set-default` command. The default profile will be automatically used when no profile is specified with the `--profile` flag.
```sh
# Set default profile interactively (prompts for selection)
hatchet profile set-default
# Set a specific profile as default
hatchet profile set-default --name [name]
```
Once a default profile is set, you can run commands without specifying the `--profile` flag:
```sh
# Uses the default profile
hatchet worker dev
```
To unset the default profile:
```sh
hatchet profile unset-default
```
## Using a Profile
To use a specific profile for your Hatchet CLI commands, you can specify the profile name using the `--profile` flag. This overrides the default profile if one is set.
```sh
hatchet worker dev --profile [name]
```
## Updating a Profile
You can update an existing profile using the `hatchet profile update` command. This allows you to change the API token associated with a profile.
```sh
hatchet profile update
```
## Deleting a Profile
You can delete a profile using the `hatchet profile remove` command:
```sh
hatchet profile remove
```
If you remove a profile that is set as the default, the default profile setting will be automatically cleared.
---
# Running Hatchet Locally
The Hatchet CLI provides the `hatchet server` commands to run a local instance of Hatchet for development and testing purposes. This local instance relies on Docker to run the necessary services.
When `DOCKER_HOST` is not set, Hatchet resolves the Docker host from the current Docker context. This respects `DOCKER_CONTEXT` when it is set, and otherwise uses the active context from your local Docker configuration. If `DOCKER_HOST` is set explicitly, it takes precedence.
## Prerequisites
Before running Hatchet locally, you must have Docker installed on your machine. You can download Docker from [here](https://www.docker.com/get-started).
## Starting Hatchet Locally
To start a local instance of Hatchet, run the following command in your terminal:
```sh
hatchet server start
```
## Stopping Hatchet Locally
To stop the local Hatchet instance, run the following command:
```sh
hatchet server stop
```
## Reference
#### `hatchet server start`
```txt
Start a local Hatchet server environment using Docker containers. This command will start both a PostgreSQL database and a Hatchet server instance, automatically creating a local profile for easy access.
Usage:
hatchet server start [flags]
Examples:
# Start server with default settings (port 8888)
hatchet server start
# Start server with custom dashboard port
hatchet server start --dashboard-port 9000
# Start server with custom ports and project name
hatchet server start --dashboard-port 9000 --grpc-port 8077 --project-name my-hatchet
# Start server with custom profile name
hatchet server start --profile my-local
Flags:
-d, --dashboard-port int Port for the Hatchet dashboard (default: auto-detect starting at 8888)
-g, --grpc-port int Port for the Hatchet gRPC server (default: auto-detect starting at 7077)
-h, --help help for start
-n, --profile string Name for the local profile (default: local) (default "local")
-p, --project-name string Docker project name for containers (default: hatchet-cli)
Global Flags:
-v, --version The version of the hatchet cli.
```
#### `hatchet server stop`
```txt
Stop a local Hatchet server environment that was started using Docker containers with the 'hatchet server start' command.
Usage:
hatchet server stop [flags]
Examples:
# Stop the local Hatchet server
hatchet server stop
# Stop the local Hatchet server with a custom project name
hatchet server stop --project-name my-hatchet
Flags:
-h, --help help for stop
-p, --project-name string Docker project name for containers (default: hatchet-cli)
Global Flags:
-v, --version The version of the hatchet cli.
```
---
# Running Workers Locally
The Hatchet CLI provides the `hatchet worker` commands to run Hatchet workers locally for development and testing purposes.
## Setting up a hatchet.yaml file
> **Info:** If you've set up a project using `hatchet quickstart`, a `hatchet.yaml` file
> is already created for you in the project directory.
The `hatchet worker` commands rely on a `hatchet.yaml` configuration file to define the worker settings. You can create a `hatchet.yaml` file in your project directory which resembles the following (you will need to adjust the `preCmds` and `runCmd` fields to match your project's setup):
#### Python
```yaml
dev:
preCmds: ["poetry install"]
runCmd: "poetry run python src/worker.py"
files:
- "**/*.py"
- "!**/__pycache__/**"
- "!**/.venv/**"
reload: true
```
#### Typescript
```yaml
dev:
preCmds: ["pnpm install"]
runCmd: "pnpm start"
files:
- "**/*.ts"
- "!**/node_modules/**"
reload: true
```
#### Go
```yaml
dev:
preCmds: ["go mod download"]
runCmd: "go run ./cmd/worker"
files:
- "**/*.go"
reload: true
```
## Running a worker
Once you have a `hatchet.yaml` file set up, you can run a worker locally using the following command:
```sh
hatchet worker dev
```
To run a worker with a specific profile, you can run:
```sh
hatchet worker dev --profile
```
### Disabling auto-reload
If you want to run the worker without auto-reloading on file changes, you can set the `dev.reload` field to `false` in your `hatchet.yaml` file:
```yaml
dev:
reload: false
```
Or you can pass the `--no-reload` flag when running the worker:
```sh
hatchet worker dev --no-reload
```
### Overriding the run command
You can override the `runCmd` specified in the `hatchet.yaml` file by using the `--run-cmd` flag:
```sh
hatchet worker dev --run-cmd "npm run dev"
```
---
# Triggering Workflows
You can use the `hatchet trigger` command to trigger workflows locally for testing and development purposes. This command allows you to set up triggers in your `hatchet.yaml` file that define how to run specific workflows.
## Example
In your `hatchet.yaml` file, you can define a trigger for a simple workflow like this:
```yaml
triggers:
- name: "simple"
command: "poetry run python src/run.py"
description: "Trigger a simple workflow"
```
Then, you can select this trigger when running the `hatchet trigger` command:
```sh
hatchet trigger simple
```
Or just `hatchet trigger`, which will prompt you to select a trigger interactively.
---
# Using the Hatchet TUI
The Hatchet CLI includes a built-in terminal user interface (TUI) that you can use to interact with your Hatchet deployment directly from the terminal. The TUI provides real-time observability into tasks, workflows, workers, and more:
```sh
hatchet tui
```
# Features
> **Info:** You can access help documentation by pressing the `h` key within the TUI. This
> will display a list of available commands and their descriptions.
## Runs View
The Runs view provides a similar experience to the Runs page in the Hatchet dashboard. You can filter and list runs based on filters.
## Workflows View
The Workflows view allows you to see the list of workflows defined in your Hatchet deployment. You can view some details about each workflow, along with recent runs:
## Workers View
The Workers view shows the status of workers connected to your Hatchet deployment. You can see which workers are online, their registered workflows, and some other information.
---
# Contributing
> **Note:** this guide### Setup
1. Start the Database and Queue services:
```sh
task start-db
```
2. Install dependencies, run migrations, generate encryption keys, and seed the database:
```sh
task setup
```
### Starting the dev server
Start the Hatchet engine, API server, dashboard, and Prisma studio:
```sh
task start-dev # or task start-dev-tmux if you want to use tmux panes
```
### Creating and testing workflows
To create and test workflows, run the examples in the `./examples` directory.
You will need to add the tenant (output from the `task seed-dev` command) to the `.env` file in each example directory. An example `.env` file for the `./examples/simple` directory can be generated via:
```sh
alias get_token='go run ./cmd/hatchet-admin token create --name local --tenant-id 707d0855-80ab-4e1f-a156-f1c4546cbf52'
cat > ./examples/simple/.env <
# optional
OTEL_EXPORTER_OTLP_HEADERS=
# optional
OTEL_EXPORTER_OTLP_ENDPOINT=
```
### CloudKMS
CloudKMS can be used to generate master encryption keys:
```
gcloud kms keyrings create "development" --location "global"
gcloud kms keys create "development" --location "global" --keyring "development" --purpose "encryption"
gcloud kms keys list --location "global" --keyring "development"
```
From the last step, copy the Key URI and set the following environment variable:
```
SERVER_ENCRYPTION_CLOUDKMS_KEY_URI=gcp-kms://projects//locations/global/keyRings/development/cryptoKeys/development
```
Generate a service account in GCP which can encrypt/decrypt on CloudKMS, then download a service account JSON file and set it via:
```
SERVER_ENCRYPTION_CLOUDKMS_CREDENTIALS_JSON='{...}'
```
## Issues
### Query engine leakage
Sometimes the spawned query engines from Prisma don't get killed when hot reloading. You can run `task kill-query-engines` on OSX to kill the query engines.
Make sure you call `.Disconnect` on the database config object when writing CLI commands which interact with the database. If you don't, and you try to wrap these CLI commands in a new command, it will never exit, for example:
```
export HATCHET_CLIENT_TOKEN="$(go run ./cmd/hatchet-admin token create --tenant-id )"
```
---
## Setup
### Using `ngrok`
You can use `ngrok` to expose a local port to the internet to accept incoming webhooks from Github. To do this, run the following:
```sh
task start-ngrok
```
Make note of the `https` URL as you will need it later.
### Github App Creation
To create a Github app that can read from your repositories, navigate to your organization settings page (alternately, you can navigate to your personal settings page) and select **Developer Settings** in the sidebar. Go to **Github Apps** and select **New Github App**. You should use the following settings:
- Homepage URL: you can set this as https://hatchet.run, or some other domain for your organization.
- Callback URL: `:///api/v1/users/github-app/callback`
- The **Request user authorization (OAuth) during installation** checkbox should be checked.
- Webhook URL: `:///api/v1/github/webhook`
- Webhook secret: generate a random webhook secret for your domain, for example by running `cat /dev/urandom | base64 | head -c 32`. **Make note of this secret, as you will need it later**.
- Permissions:
- **Repository:**
- **Checks (Read & write)**: required to write Github checks for each commit/PR.
- **Contents (Read):** required for Hatchet to read files from the repository.
- **Metadata (Read-only):** mandatory, required for Github apps that integrate with repositories.
- **Pull Requests (Read & write):** required for Hatchet to add comments to Github PRs, and to create PRs.
- **Webhooks (Read & write):** required for Hatchet to create a Github repository webhooks that notify the Hatchet instance when PRs are updated.
- **Account:**
- **Email addresses (read-only)**: required for Hatchet to read your Github email address for authentication.
### Creating a Secret and Private Key
After creating the Github App, create the following:
- In the "Client secrets" section, select **Generate a new client secret**. You will need this secret in the following section.
- In the "Private keys" section, download a new private key for your app. You will need this private key in the following section.
### Private Keys and Environment Variables
After creating the private key, you can place it somewhere in your filesystem and set the `SERVER_VCS_GITHUB_APP_SECRET_PATH` environment variable to the path of the private key.
Make sure the following environment variables are set:
```txt
SERVER_VCS_KIND=github
SERVER_VCS_GITHUB_ENABLED=true
SERVER_VCS_GITHUB_APP_CLIENT_ID=
SERVER_VCS_GITHUB_APP_CLIENT_SECRET=
SERVER_VCS_GITHUB_APP_NAME=
SERVER_VCS_GITHUB_APP_WEBHOOK_SECRET=
SERVER_VCS_GITHUB_APP_WEBHOOK_URL=
SERVER_VCS_GITHUB_APP_ID=
SERVER_VCS_GITHUB_APP_SECRET_PATH=
```
---
# SDKs
This document tracks the feature support of the various SDKs, and aims to consolidate the expected behavior around environment variables and configuration loading.
## Environment Variables
Each SDK should support the following environment variables:
Variable, Description, Required, Default
`HATCHET_CLIENT_TOKEN`, The tenant-scoped API token to use., Yes, N/A
`HATCHET_CLIENT_HOST_PORT`, The host and port of the Hatchet server to connect to, in `host:port` format. SDKs should handle schemes and trailing slashes, i.e. `https://host:port, No, Automatically detected in new tokens.
`HATCHET_CLIENT_TLS_STRATEGY`, The TLS strategy to use. Valid values are `none`, `tls`, and `mtls`., No, `tls`
`HATCHET_CLIENT_TLS_CERT_FILE`, The path to the TLS client certificate file to use., Only if strategy is set to `mtls`, N/A
`HATCHET_CLIENT_TLS_CERT`, The TLS client key file to use., Only if strategy is set to `mtls`, N/A
`HATCHET_CLIENT_TLS_KEY_FILE`, The path to the TLS client key file to use., Only if strategy is set to `mtls`, N/A
`HATCHET_CLIENT_TLS_KEY`, The TLS client key to use., Only if strategy is set to `mtls`, N/A
`HATCHET_CLIENT_TLS_ROOT_CA_FILE`, The path to the TLS root CA file to use., Only if the server certificate is not signed by a public authority that's available to your environment, N/A
`HATCHET_CLIENT_TLS_ROOT_CA`, The TLS root CA to use., Only if the server certificate is not signed by a public authority that's available to your environment, N/A
`HATCHET_CLIENT_TLS_SERVER_NAME`, The TLS server name to use., No, Defaults to the `host` of the `host:port`
The following environment variables are deprecated:
Variable, Description, Explanation
`HATCHET_CLIENT_TENANT_ID`, The tenant ID to use., This is now part of the token.
## Compatibility Matrices
### DAGs
Whether the SDKs support full DAG-style execution.
SDK, DAGs?, Notes
Go SDK, Yes
Python SDK, Yes
Typescript SDK, Yes
### Timeouts
Whether the SDKs support setting timeouts and cancelling after timeouts.
SDK, Timeouts?, Step cancellation?, Notes
Go SDK, Yes, Yes
Python SDK, Yes, Yes, If thread is blocking, this won't be respected
Typescript SDK, Yes, Unknown
### Middleware
Whether the SDKs support setting middleware.
SDK, Middleware?, Notes
Go SDK, Yes
Python SDK, No
Typescript SDK, No
### Separately Registering and Calling Actions
Whether the SDKs support separately registering and calling actions, instead of defining them inline in the workflows.
SDK, Supported?, Notes
Go SDK, Yes
Python SDK, No
Typescript SDK, No
### Custom Services
Whether the SDKs support defining services to logically separate workflows and actions.
SDK, Supported?, Notes
Go SDK, Yes
Python SDK, No
Typescript SDK, No
### Scheduled Workflows
Whether the SDKs support defining scheduled workflows.
SDK, Supported?, Notes
Go SDK, Yes
Python SDK, No
Typescript SDK, No