# Hatchet Documentation > Hatchet is a distributed task queue and workflow engine for modern applications. It provides durable execution, concurrency control, rate limiting, and observability for background tasks and workflows in Python, TypeScript, and Go. --- # What is Hatchet? Hatchet is a modern orchestration platform that helps engineering teams build low-latency and high-throughput data ingestion and agentic AI pipelines. You write simple functions, called [tasks](./home/your-first-task), in Python, Typescript, and Go and run them on [workers](./home/workers) in your own infrastructure. You can compose these tasks into [parent/child relationships](./home/child-spawning) or predefined as [Directed Acyclic Graphs (DAGs)](./home/dags) to build more complex pipelines, which we call [workflows](./home/orchestration). All tasks and workflows are **defined as code**, making them easy to version, test, and deploy. Hatchet handles scheduling, complex assignment, fault tolerance, and observability so you can focus on building your application as you scale. ## Use-Cases While Hatchet is a general-purpose orchestration platform, it's particularly well-suited for: - **Real-time data processing pipelines**: for example, data ingestion which is crucial for keeping LLM contexts up-to-date, or ETL pipelines that require fast execution and high throughput. - **AI agents**: a core number of Hatchet's features, like [webhooks](./home/webhooks), [child spawning](./home/child-spawning), and [dynamic workflows](./home/child-spawning) are designed to support AI agents. - **Event-driven systems**: Hatchet's [eventing features](./home/run-on-event) allow you to build event-driven architectures without requiring additional infrastructure. ## Why Hatchet? ⚑️ **Low-Latency For Real-Time Workloads** - Sub-25ms task dispatch for hot workers with thousands of concurrent tasks. Smart assignment rules handle [rate-limits](./home/rate-limits), [fairness](./home/concurrency), and [priorities](./home/priority) without complex configuration. πŸͺ¨ **Durability for Long Running Jobs** - Every task invocation is durably logged to PostgreSQL. With [durable execution](./home/durable-execution), when jobs fail your workflow will resume exactly where you left off β€” no lost work, no duplicate LLM calls, no engineer headaches. πŸ§˜β€β™‚οΈ **Zen Developer Experience** - Hatchet SDKs (Python, Typescript, and Go) are built with modern tooling and are designed to be easy to use. Hatchet has built-in observability and debugging tools for things like replays, logs, and alerts. If you plan on self-hosting or have requirements for an on-premise deployment, there are some additional considerations: 🐘 **Minimal Infra Dependencies** - Hatchet is built on top of PostgreSQL and for simple workloads, [its all you need](./self-hosting/hatchet-lite.mdx). ⬆️ **Fully Featured Open Source** - Hatchet is 100% MIT licensed, so you can run the same application code against [Hatchet Cloud](https://cloud.onhatchet.run) to get started quickly or [self-host](./self-hosting.mdx) when you need more control. ## Hatchet vs. Alternatives Today, developers who need background task processing and workflow orchestration face two main options: 1. Adopt external services like Temporal or Airflow, which are powerful but complex to run or introduce latency, or 2. Use simple task queue libraries like Celery or BullMQ, which lack critical workflow features and become difficult to debug at scale. | Feature | Hatchet | Celery | Airflow | Temporal | | ------------------------------------------- | ------- | ---------- | ---------- | ------------ | | **Task Start Latency** | 25ms | 5-100ms+ | 5-30s | 25ms | | **Concurrent Tasks** | 1000s | Variable\* | Variable\* | 10000 | | **Code-First Workflows** | βœ… | βœ… | βœ… | βœ… | | **Cron Jobs and Scheduling** | βœ… | βœ… | βœ… | βœ… | | **Priority Queues** | βœ… | βœ… | βœ… | βœ… (beta) | | **Durable Sleep/Checkpoints** | βœ… | ❌ | ❌ | βœ… | | **Sticky Assignment/Complex Routing Logic** | βœ… | ❌ | ❌ | βœ… (limited) | | **Event-Based Triggering** | βœ… | ❌ | βœ… | ❌ | | **Real-time Streaming** | βœ… | ❌ | ❌ | ❌ | | **Global Rate Limits** | βœ… | ❌ | ❌ | ❌ | | **Event Streaming** | βœ… | ❌ | ❌ | ❌ | \*Requires careful configuration and infrastructure scaling ## Production Readiness Hatchet has been battle-tested in production environments, processing billions of tasks per month for scale-ups and enterprises across various industries. Our open source offering is deployed over 10k times per month, while Hatchet Cloud supports hundreds of companies running at scale. > "With Hatchet, we've scaled our indexing workflows effortlessly, reducing failed runs by 50% and doubling our user base in just two weeks!" > β€” Soohoon, Co-Founder @ Greptile > "Hatchet enables Aevy to process up to 50,000 documents in under an hour through optimized parallel execution, compared to nearly a week with our previous setup." > β€” Ymir, CTO @ Aevy ## Quick Starts We have a number of quick start tutorials for getting up and running quickly with Hatchet: - [Hatchet Cloud Quickstart](./hatchet-cloud-quickstart.mdx) - [Hatchet Self-Hosted Quickstarts](./self-hosting.mdx) We also have guides for getting started with the Hatchet SDKs: - [Python SDK Quickstart](https://github.com/hatchet-dev/hatchet-python-quickstart) - [Typescript SDK Quickstart](https://github.com/hatchet-dev/hatchet-typescript-quickstart) - [Go SDK Quickstart](https://github.com/hatchet-dev/hatchet-go-quickstart) ## Learn More Ready to dive deeper? Explore these additional resources: **[Architecture](./architecture.mdx)** - Learn how Hatchet is built and designed for scale. **[Guarantees & Tradeoffs](./guarantees-and-tradeoffs.mdx)** - Understand Hatchet's guarantees, limitations, and when to use it. Or get started with the **[Hatchet Cloud Quickstart](./hatchet-cloud-quickstart.mdx)** or **[self-hosting](./self-hosting.mdx)**. --- # Architecture ## Overview Hatchet's architecture is designed around simplicity and reliability. At its core, Hatchet consists of three main components: the **Engine**, the **API Server**, and **Workers**. State is managed durably and efficiently, eliminating the need for additional message brokers or distributed systems. Whether you use [Hatchet Cloud](https://cloud.onhatchet.run) or self-host, the architecture remains consistent, allowing seamless migration between deployment models as your needs evolve. ```mermaid graph LR subgraph "External (Optional)" EXT[Webhooks
Events] end subgraph "Your Infrastructure" APP[Your API, App, Service, etc.] W[Workers] end subgraph "Hatchet" API[API Server] ENG[Engine] DB[(Database)] end EXT --> API APP <--> API API --> ENG ENG <--> DB API <--> DB ENG <-.->|gRPC| W classDef userInfra fill:#e3f2fd,stroke:#1976d2,stroke-width:2px,color:#0d47a1 classDef hatchet fill:#f1f8e9,stroke:#388e3c,stroke-width:2px,color:#1b5e20 classDef external fill:#fff8e1,stroke:#f57c00,stroke-width:2px,color:#e65100 class APP,W userInfra class API,ENG,DB hatchet class EXT external ``` ## Core Components ### Engine The **Hatchet Engine** orchestrates the entire workflow execution process. It determines when and where tasks should run based on complex dependencies, concurrency limits, and worker availability. Key responsibilities include: - **Task Scheduling**: Intelligent routing based on worker capacity and constraints - **Queue Management**: Sophisticated priority, rate limiting, and fairness algorithms - **Flow Control**: Enforces concurrency limits, rate limits, and routing rules - **Retry Logic**: Automatic handling of failures, timeouts, and backpressure - **Cron Processing**: Manages scheduled workflow executions Communication with workers are handled through bidirectional gRPC connections that enable real-time task dispatch and status updates with minimal latency and network overhead. The Hatchet engine continuously tracks task execution state and automatically handles retries, timeouts, and failure scenarios without manual intervention. The Engine is designed to be horizontally scalableβ€”multiple engine instances can run simultaneously, coordinating through the persistent storage layer to handle increasing workloads seamlessly. ### API Server The **API Server** serves as the primary interface for viewing Hatchet resources. It exposes REST endpoints that the Hatchet UI and your applications use to: - **Trigger Workflows**: Start new workflow executions with input data - **Query or Subscribe to Status**: Check workflow and task execution status - **Manage Resources**: Configure workflows, schedules, and settings - **Webhook Ingestion**: Receive and process external events Security is handled through multi-tenant authentication with API keys and JWT tokens, or webhook signature verification where applicable, to ensure only authentic requests are processed. The API Server also powers Hatchet's web dashboard through REST endpoints, giving you real-time visibility into your workflows. ### Workers **Workers** are your application processes that execute the actual business logic. They establish secure, bidirectional gRPC connections to the Engine and run your functions when tasks are dispatched. Workers continuously report status updates, including task progress, logs, and results, giving you real-time visibility into execution. When tasks need to be cancelled, workers handle this gracefully with proper cleanup procedures. One of Hatchet's key design goals is deployment flexibility: workers can run anywhere, from containers to VMs or even your local development machine. This flexibility means you can start development locally, deploy to staging in containers, and run production workloads on dedicated infrastructure without changing your worker code. You can run either homogeneous or heterogeneous workers. Homogeneous workers are a single type of worker that is used for all tasks. Heterogeneous workers are a mix of different types of workers that are used for different tasks. Heterogeneous workers can also be polyglot, meaning they can run multiple languages. For example, you can run a Python worker, a Go worker, and a TypeScript worker which can all be invoked from the same service application. ### Persistent Storage & Inter-Service Communication The platform maintains durable state for all aspects of workflow execution, including task queue state for queued, running, and completed tasks. Workflow definitions with their dependencies, configuration, and metadata are stored persistently, ensuring your orchestration logic survives system restarts. In [self-hosted deployments](../self-hosting), this can be a single PostgreSQL database, or for high-throughput workloads you can use RabbitMQ for inter-service communication. In [Hatchet Cloud](https://hatchet.run), this is managed for you with enterprise-grade reliability and performance, handling backups, scaling, and maintenance automatically. ## Design Philosophy Hatchet prioritizes simplicity over complexity: - **PostgreSQL foundation** - Built on PostgreSQL with optional RabbitMQ for high-throughput workloads - **Stateless services** - Engine and API scale horizontally - **Worker flexibility** - Deploy anywhere, any language (Python/TypeScript/Go), independent scaling ## Next Steps **[Guarantees & Tradeoffs](./guarantees-and-tradeoffs.mdx)** - Learn about Hatchet's guarantees, limitations, and performance characteristics. **[Quick Start](./setup.mdx)** - Set up your first Hatchet worker. **[Self Hosting](../self-hosting)** - Deploy the Hatchet platform on your own infrastructure. --- # Guarantees & Tradeoffs Hatchet is designed as a modern task orchestration platform that bridges the gap between simple job queues and complex workflow engines. Understanding where it excelsβ€”and where it doesn'tβ€”will help you determine if it's the right fit for your needs. ### Good Fit
βœ… Real-time Requests - Sub-25ms task dispatch for hot workers with thousands of concurrent tasks
βœ… Workflow Orchestration with dependencies and error handling
βœ… Reliable Task Processing where durability matters
βœ… Moderate Throughput (hundreds to low 10,000s of tasks/second)
βœ… Multi-Language Workers or polyglot teams
βœ… Operational Simplicity if your team is already using PostgreSQL
βœ… Cloud or Air-Gapped Environments for flexible deployment options ( Hatchet Cloud and{" "} self-hosting)
### Not a Good Fit
❌ Extremely High Throughput (consistently 10,000+ tasks/second)
❌ Sub-Millisecond Latency requirements
❌ Memory-Only Queuing where persistence or durability isn't needed
❌ Serverless Environments on cloud providers like AWS Lambda, Google Cloud Functions, or Azure Functions
## Core Reliability Guarantees Hatchet is designed with the following core reliability guarantees: **Every task will execute at least once.** Hatchet ensures that no task gets lost, even during system failures, network outages, or deployments. Failed tasks automatically retry according to your configuration, and all tasks persist through restarts and network issues. **Consistent state management.** All workflow state changes happen within PostgreSQL transactions, ensuring that your workflow dependencies resolve consistently and no tasks are lost during failures or deployments. **Predictable execution order.** The default task assignment strategy is First In First Out (FIFO) which can be modified with [concurrency policies](./concurrency.mdx), [rate limits](./rate-limits.mdx), and [priorities](./priority.mdx). **Operational resilience.** The engine and API servers are stateless, allowing them to restart without losing state and enabling horizontal scaling by simply adding more instances. Workers automatically reconnect after network issues and can be deployed anywhereβ€”containers, VMs, or local development environments. ## Performance Expectations Understanding Hatchet's performance characteristics helps you plan your implementation and set realistic expectations. **Typical time-to-start latency** for task dispatch is sub 50ms with PostgreSQL storage, though this can be optimized to ~25ms P95 for hot workers in optimized setups. Network latency between your workers and the Hatchet engine will directly impact dispatch times, so consider deployment topology when latency matters. **Throughput capacity** varies significantly based on your setup. A single engine instance with PostgreSQL-only storage typically handles hundreds of tasks per second. When you need higher throughput, adding RabbitMQ as a message queue can substantially increase capacity, though your database will eventually become the bottleneck at very high scales. Through tuning and sharding, we can support throughputs of tens of thousands of tasks per second. **Concurrent processing** scales well β€” Hatchet supports thousands of concurrent workers, with worker-level concurrency controlled through slot configuration. The depth of your queues is limited by your database storage capacity rather than memory constraints. **Performance optimization** comes through several strategies: RabbitMQ for high-throughput workloads, read replicas for analytics queries, connection pooling with tools like PgBouncer, and shorter retention periods for execution history. Conversely, performance can be limited by database connection limits, large task payloads (over 1MB), complex dependency graphs, and cross-region network latency. > **Warning:** **Not seeing expected performance?** > > If you're not seeing the performance you expect, please [reach out to us](https://hatchet.run/office-hours) or [join our community](https://hatchet.run/discord) to explore tuning options. ## Ready to Get Started? Now that you understand Hatchet's capabilities and limitations, explore the technical details: **[Quick Start](../setup.mdx)** - Set up your first Hatchet worker. **[Self-Hosting](../self-hosting)** - Learn how to deploy Hatchet on your own infrastructure with appropriate sizing for your needs. --- # Hatchet Cloud Quickstart Welcome to Hatchet! This guide walks you through getting set up on Hatchet Cloud. If you'd like to self-host Hatchet, please see the [self-hosted quickstart](../../self-hosting/) instead. ## Quickstart ### Sign up If you haven't already signed up for Hatchet Cloud, please register [here](https://cloud.onhatchet.run). ### Set up your tenant In Hatchet Cloud, you'll be shown a screen to create your first tenant. A tenant is a logical separation of your environments (e.g. `dev`, `staging`, `production`). Each tenant has its own set of users who can access it. After creating the tenant, you can simply follow the instructions in the Hatchet Cloud dashboard to set up your first quickstart project and workflow. We have copied the instructions in the following steps. ### Install the Hatchet CLI #### Native Install (Recommended) **MacOS, Linux, WSL** ```sh curl -fsSL https://install.hatchet.run/install.sh | bash ``` #### Homebrew **MacOS** ```sh brew install hatchet-dev/hatchet/hatchet --cask ``` ### Set your Hatchet profile You will need to create a Hatchet CLI profile to connect to your Hatchet Cloud tenant. You can do this using the `hatchet profile add` command: ```sh hatchet profile add ``` Note that the Hatchet Cloud dashboard will provide you with an API token to use when creating your profile. ### Run the quickstart You can run the Hatchet Cloud quickstart using the `hatchet quickstart` command: ```sh hatchet quickstart ``` ### Run your worker After setting up the quickstart project, you can run your worker locally by following the instructions printed after the quickstart command. This will involve using the `hatchet worker dev` command: ```sh hatchet worker dev ``` ### Trigger a workflow Finally, you can trigger your workflow using the `hatchet trigger simple` command: ```sh hatchet trigger simple ``` ### (Optional) Install Hatchet docs MCP Get Hatchet documentation directly in your AI coding assistant (Cursor, Claude Code, Claude Desktop, and more): ```sh copy hatchet docs install ``` See the [full setup guide](./install-docs-mcp.mdx) for manual configuration options. And that's it! You should now have a Hatchet project set up on Hatchet Cloud with a worker running locally. ## Next Steps Once you've completed the quickstart, you can explore a more in-depth walkthrough of Hatchet using the [walkthrough](/home/setup) guide. --- # Advanced Setup > **Info:** This guide is intended for users who want to explore Hatchet in more depth > beyond the quickstart. If you haven't already set up Hatchet, please see the > [Hatchet Cloud Quickstart](./hatchet-cloud-quickstart) or the [Self-Hosting > Quickstart](../../self-hosting/) first. ## Set environment variables All Hatchet SDKs require the `HATCHET_CLIENT_TOKEN` environment variable to be set. This token is automatically created when you run a CLI command like `hatchet worker dev` or `hatchet trigger`, but if you're setting up a project manually, you'll need to set this variable yourself. You can generate an API token from the Hatchet frontend by navigating to the `Settings` tab and clicking on the `API Tokens` tab. Click the `Generate API Token` button to create a new token. Set this environment variable in your project, and **do not share it publicly**. ```bash copy export HATCHET_CLIENT_TOKEN="" ``` Additionally, if you are a self-hosted user provisioning without TLS enabled, you will need to set the `HATCHET_CLIENT_TLS_STRATEGY` environment variable to `none`. If you are on Hatchet Cloud, TLS is enabled by default, so this is not required. ```bash copy export HATCHET_CLIENT_TLS_STRATEGY=none ``` ## Setup your codebase #### Clone a Quickstart Project #### Python #### Clone a Quickstart Project ```bash copy git clone https://github.com/hatchet-dev/hatchet-python-quickstart.git ``` #### CD into the project ```bash copy cd hatchet-python-quickstart ``` #### Install dependencies #### Typescript #### Clone a Quickstart Project ```bash copy git clone https://github.com/hatchet-dev/hatchet-typescript-quickstart.git ``` #### CD into the project ```bash copy cd hatchet-typescript-quickstart ``` #### Install dependencies #### Go #### Clone a Quickstart Project ```bash copy git clone https://github.com/hatchet-dev/hatchet-go-quickstart.git ``` #### CD into the project ```bash copy cd hatchet-go-quickstart ``` #### Install dependencies #### Create a New Project from Scratch #### Python #### Create a new project directory and cd into it ```bash copy mkdir hatchet-tutorial && cd hatchet-tutorial ``` #### Initialize a new python project with required dependencies #### Create project directories ```bash copy mkdir src && mkdir src/workflows && mkdir src/workers ``` #### Instantiate your Hatchet Client It is recommended to instantiate a shared Hatchet Client in a separate file as a singleton. Create a new file called `hatchet_client.py` ```bash copy touch src/hatchet_client.py ``` Add the following code to the file: ```python from hatchet_sdk import Hatchet hatchet = Hatchet() ``` You can now import the Hatchet Client in any file that needs it. ```python copy from src.hatchet_client import hatchet ``` #### Typescript #### Create a new project directory and cd into it ```bash copy mkdir hatchet-tutorial && cd hatchet-tutorial ``` #### Initialize a new typescript project with required dependencies #### Create project directories ```bash copy mkdir src && mkdir src/workflows && mkdir src/workers ``` #### Instantiate your Hatchet Client It is recommended to instantiate a shared Hatchet Client in a separate file as a singleton. Create a new file called `hatchet-client.ts` ```bash copy touch src/hatchet-client.ts ``` Add the following code to the file: ```typescript import { HatchetClient } from '@hatchet/v1'; export const hatchet = HatchetClient.init(); ``` You can now import the Hatchet Client in any file that needs it. #### Go #### Create a new project directory and cd into it ```bash copy mkdir hatchet-tutorial && cd hatchet-tutorial ``` #### Initialize a new go project with required dependencies ```bash copy go mod init hatchet-tutorial ``` #### Create project directories ```bash copy mkdir workflows && mkdir workers ``` #### Instantiate your Hatchet Client In the `workers` directory, create a Hatchet client ```go copy package main import ( "log" hatchet "github.com/hatchet-dev/hatchet/sdks/go" ) func main() { client, err := hatchet.NewClient() if err != nil { log.Fatalf("failed to create hatchet client: %v", err) } } ``` #### Add to an Existing Project #### Python #### Cd into your project directory ```bash copy cd path-to-your-project ``` #### Install the Hatchet SDK and required dependencies #### Create project directories By convention it is recommended to create your workflows in the `workflows` directory and your workers in the `workers` directory. ```bash copy mkdir workflows && mkdir workers ``` #### Instantiate your Hatchet Client It is recommended to instantiate a shared Hatchet Client in a separate file as a singleton. Create a new file called `hatchet-client.py` in your project root. ```bash copy touch hatchet-client.py ``` Add the following code to the file: ```python from hatchet_sdk import Hatchet hatchet = Hatchet() ``` You can now import the Hatchet Client in any file that needs it. ```python copy from src.hatchet_client import hatchet ``` #### Typescript #### Cd into your project directory ```bash copy cd path-to-your-project ``` #### Install the Hatchet SDK and required dependencies #### Create project directories By convention it is recommended to create your workflows in the `workflows` directory and your workers in the `workers` directory. ```bash copy mkdir workflows && mkdir workers ``` #### Instantiate your Hatchet Client It is recommended to instantiate a shared Hatchet Client in a separate file as a singleton. Create a new file called `hatchet-client.ts` in your project root. ```bash copy touch hatchet-client.ts ``` Add the following code to the file: ```typescript import { HatchetClient } from '@hatchet/v1'; export const hatchet = HatchetClient.init(); ``` #### Go #### Cd into your project directory ```bash copy cd /path/to/your-project ``` #### Create project directories By convention it is recommended to create your workflows in the `workflows` directory and your workers in the `workers` directory. ```bash copy mkdir workflows && mkdir workers ``` #### Install the Hatchet SDK and required dependencies #### Instantiate your Hatchet Client You can now import the Hatchet Client Factory in any file that needs it. In the `workers` directory, create a Hatchet client ```go copy package main import ( "log" hatchet "github.com/hatchet-dev/hatchet/sdks/go" ) func main() { client, err := hatchet.NewClient() if err != nil { log.Fatalf("failed to create hatchet client: %v", err) } } ``` Continue to the next section to learn how to [create your first task](./your-first-task) --- McpUrl, CursorDeeplinkButton, CursorMcpConfig, ClaudeCodeCommand, CursorTabLabel, ClaudeCodeTabLabel, OtherAgentsTabLabel, } from "@/components/McpSetup"; # Install Docs MCP Hatchet documentation is optimized for LLMs and available as an **MCP (Model Context Protocol) server**, so AI coding assistants like Cursor and Claude Code can search and reference Hatchet docs directly. MCP endpoint: #### Hatchet CLI ```bash copy hatchet docs install claude-code ``` If `claude` is on your PATH, this runs the command automatically. Otherwise it prints it for you to copy. #### Command Run this command in your terminal: For any AI tool that supports [llms.txt](https://llmstxt.org/), Hatchet docs are available at: | Resource | URL | |----------|-----| | **llms.txt** (index) | [docs.hatchet.run/llms.txt](https://docs.hatchet.run/llms.txt) | | **llms-full.txt** (all docs) | [docs.hatchet.run/llms-full.txt](https://docs.hatchet.run/llms-full.txt) | | **Per-page markdown** | `docs.hatchet.run/llms/{section}/{page}.md` | | **MCP endpoint** | | Every documentation page also includes a `` header pointing to its markdown version, and a "View as Markdown" link at the top of the page. --- # Declaring Your First Task In Hatchet, the fundamental unit of invocable work is a [Task](#defining-a-task). Each task is an atomic function. As we continue to build with Hatchet, we'll add additional configuration options to compose tasks into [DAG workflows](./dags.mdx) or [procedural child spawning](./child-spawning.mdx). ## Defining a Task Start by declaring a task with a name. The task object can declare additional task-level configuration options which we'll cover later. The returned object is an instance of the `Task` class, which is the primary interface for interacting with the task (i.e. [running](./run-with-results.mdx), [enqueuing](./run-no-wait.mdx), [scheduling](./scheduled-runs.mdx), etc). #### Python ```python class SimpleInput(BaseModel): message: str class SimpleOutput(BaseModel): transformed_message: str # Declare the task to run @hatchet.task(name="first-task", input_validator=SimpleInput) def first_task(input: SimpleInput, ctx: Context) -> SimpleOutput: print("first-task task called") return SimpleOutput(transformed_message=input.message.lower()) ``` #### Typescript ```typescript import { hatchet } from '../hatchet-client'; // (optional) Define the input type for the workflow export type SimpleInput = { Message: string; }; export const simple = hatchet.task({ name: 'simple', retries: 3, fn: async (input: SimpleInput) => { return { TransformedMessage: input.Message.toLowerCase(), }; }, }); ``` #### Go ```go type SimpleInput struct { Message string `json:"message"` } type SimpleOutput struct { Result string `json:"result"` } task := client.NewStandaloneTask("process-message", func(ctx hatchet.Context, input SimpleInput) (SimpleOutput, error) { return SimpleOutput{ Result: "Processed: " + input.Message, }, nil }) ``` #### Ruby ```ruby FIRST_TASK = HATCHET.task(name: "first-task") do |input, ctx| puts "first-task called" { "transformed_message" => input["message"].downcase } end ``` ## Running a Task With your task defined, you can import it wherever you need to use it and invoke it with the `run` method. > **Warning:** NOTE: You must first [register the task on a worker](./workers.mdx) before you > can run it. Calling `your_task.run` will enqueue a task to be executed by a > worker but it will wait indefinitely for the task to be executed. #### Python ```python result = await first_task.aio_run(SimpleInput(message="Hello World!")) ``` #### Typescript ```typescript const res = await parent.run( { Message: 'HeLlO WoRlD', }, { additionalMetadata: { test: 'test', }, } ); // πŸ‘€ Access the results of the Task console.log(res.TransformedMessage); ``` #### Go ```go result, err := task.Run(context.Background(), SimpleInput{Message: "Hello, World!"}) if err != nil { return err } ``` #### Ruby ```ruby result = FIRST_TASK.run({ "message" => "Hello World!" }) puts "Finished running task: #{result['transformed_message']}" ``` There are many ways to run a task, including: - [Running a task with results](./run-with-results.mdx) - [Enqueuing a task](./run-no-wait.mdx) - [Scheduling a task](./scheduled-runs.mdx) - [Scheduling a task with a cron schedule](./cron-runs.mdx) - [Event-driven task execution](./run-on-event.mdx) Now that you have defined a complete task, you can move on to [creating a worker to execute the task](./workers.mdx). --- # Workers Workers are the backbone of Hatchet, responsible for executing the individual tasks. They operate across different nodes in your infrastructure, allowing for distributed and scalable task execution. ## How Workers Operate In Hatchet, workers are long-running processes that wait for instructions from the Hatchet engine to execute specific steps. They communicate with the Hatchet engine to receive tasks, execute them, and report back the results. ## Declaring a Worker Now that we have a [task declared](./your-first-task.mdx) we can create a worker that can execute the task. Declare a worker by calling the `worker` method on the Hatchet client. The `worker` method takes a name and an optional configuration object. #### Python ```python def main() -> None: worker = hatchet.worker("dag-worker", workflows=[dag_workflow]) worker.start() ``` > **Warning:** If you are using Windows, attempting to run a worker will result in an error: > > ``` > AttributeError: module 'signal' has no attribute 'SIGQUIT' > ``` > > However you can use the [Windows Subsystem for Linux (WSL)](https://learn.microsoft.com/en-us/windows/wsl/install) to run your workers. After > you install your Python environment (e.g. via `uv` or `poetry`) in WSL, you can then > run your workers inside that environment. You can still run client code (e.g. to > trigger task runs or query the API) in your native Windows environment, but your > workers have to be run in WSL. > > Another option is to run workers in Docker containers. #### Typescript ### Register the Worker ```typescript import { hatchet } from '../hatchet-client'; import { simple } from './workflow'; import { parent, child } from './workflow-with-child'; import { simpleWithZod } from './zod'; async function main() { const worker = await hatchet.worker('simple-worker', { // πŸ‘€ Declare the workflows that the worker can execute workflows: [simple, simpleWithZod, parent, child], // πŸ‘€ Declare the number of concurrent task runs the worker can accept slots: 100, }); await worker.start(); } if (require.main === module) { main(); } ``` ### Add an Entrypoint Script Add a script to your `package.json` to start the worker (changing the file path to the location of your worker file): ```json "scripts": { "start:worker": "ts-node src/v1/examples/simple/worker.ts" } ``` ### Run the Worker Start the worker by running the script you just added to your `package.json`: #### npm ```bash npm run start:worker ``` #### pnpm ```bash pnpm run start:worker ``` #### yarn ```bash yarn start:worker ``` #### Go ```go worker, err := client.NewWorker("simple-worker", hatchet.WithWorkflows(task)) if err != nil { log.Fatalf("failed to create worker: %v", err) } interruptCtx, cancel := cmdutils.NewInterruptContext() defer cancel() err = worker.StartBlocking(interruptCtx) if err != nil { log.Fatalf("failed to start worker: %v", err) } ``` Then start the worker by running: ```bash go run main.go ``` > **Info:** Note there are both `worker.Start` and `worker.StartBlocking` methods. The `StartBlocking` method will block the main thread until the worker is stopped, while the `Start` method will return immediately and you'll need to call `worker.Stop` to stop the worker. #### Ruby ### Add the Hatchet SDK to your Gemfile ```ruby gem "hatchet-sdk" ``` Then install with: ```bash bundle install ``` ### Register the Worker ```ruby def main worker = HATCHET.worker("dag-worker", workflows: [DAG_WORKFLOW]) worker.start end ``` ### Run the Worker Start the worker by running: ```bash bundle exec ruby worker.rb ``` And that's it! Once you run your script to start the worker, you'll see some logs like this, which tell you that your worker is running. > **Info:** For self-hosted users, you may need to set other gRPC configuration options to > ensure your worker can connect to the Hatchet engine. See the > [Self-Hosting](../self-hosting/worker-configuration-options.mdx) docs for more > information. ``` [DEBUG] πŸͺ“ -- 2025-03-24 15:11:32,755 - creating new event loop [INFO] πŸͺ“ -- 2025-03-24 15:11:32,755 - ------------------------------------------ [INFO] πŸͺ“ -- 2025-03-24 15:11:32,755 - STARTING HATCHET... [DEBUG] πŸͺ“ -- 2025-03-24 15:11:32,755 - worker runtime starting on PID: 26406 [DEBUG] πŸͺ“ -- 2025-03-24 15:11:32,758 - action listener starting on PID: 26434 [INFO] πŸͺ“ -- 2025-03-24 15:11:32,760 - starting runner... [DEBUG] πŸͺ“ -- 2025-03-24 15:11:32,761 - starting action listener health check... [DEBUG] πŸͺ“ -- 2025-03-24 15:11:32,764 - 'test-worker' waiting for ['simpletask:step1'] [DEBUG] πŸͺ“ -- 2025-03-24 15:11:33,413 - starting action listener: test-worker [DEBUG] πŸͺ“ -- 2025-03-24 15:11:33,542 - acquired action listener: efc4aaf2-be4a-4964-a578-db6465f9297e [DEBUG] πŸͺ“ -- 2025-03-24 15:11:33,542 - sending heartbeat [DEBUG] πŸͺ“ -- 2025-03-24 15:11:37,658 - sending heartbeat ``` > **Info:** Note that many of these logs are `debug` logs, which only are shown if the > `debug` option on the Hatchet client is set to `True` ## Understanding Slots Slots are the number of concurrent _task_ runs that a worker can execute, are are configured using the `slots` option on the worker. For instance, if you set `slots=5` on your worker, then your worker will be able to run five tasks concurrently before new tasks start needing to wait in the queue before being picked up. Increasing the number of `slots` on your worker, or the number of workers you run, will allow you to handle more concurrent work (and thus more throughput, in many cases). An important caveat is that slot-level concurrency is only helpful up to the point where the worker is not bottlenecked by another resource, such as CPU, memory, or network bandwidth. If your worker is bottlenecked by one of these resources, increasing the number of slots will not improve throughput. ## Best Practices for Managing Workers To ensure a robust and efficient Hatchet implementation, consider the following best practices when managing your workers: 1. **Reliability**: Deploy workers in a stable environment with sufficient resources to avoid resource contention and ensure reliable execution. 2. **Monitoring and Logging**: Implement robust monitoring and logging mechanisms to track worker health, performance, and task execution status. 3. **Error Handling**: Design workers to handle errors gracefully, report execution failures to Hatchet, and retry tasks based on configured policies. 4. **Secure Communication**: Ensure secure communication between workers and the Hatchet engine, especially when distributed across different networks. 5. **Lifecycle Management**: Implement proper lifecycle management for workers, including automatic restarts on critical failures and graceful shutdown procedures. 6. **Scalability**: Plan for scalability by designing your system to easily add or remove workers based on demand, leveraging containerization, orchestration tools, or cloud auto-scaling features. 7. **Consistent Updates**: Keep worker implementations up to date with the latest Hatchet SDKs and ensure compatibility with the Hatchet engine version. --- # Running Your First Task With your task defined, you can import it wherever you need to use it and invoke it with the `run` method. #### Python ```python result = await first_task.aio_run(SimpleInput(message="Hello World!")) ``` #### Typescript ```typescript const res = await parent.run( { Message: 'HeLlO WoRlD', }, { additionalMetadata: { test: 'test', }, } ); // πŸ‘€ Access the results of the Task console.log(res.TransformedMessage); ``` #### Go ```go result, err := task.Run(context.Background(), SimpleInput{Message: "Hello, World!"}) if err != nil { return err } ``` #### Ruby ```ruby result = FIRST_TASK.run({ "message" => "Hello World!" }) puts "Finished running task: #{result['transformed_message']}" ``` There are many ways to run a task, including: - [Running a task with results](./run-with-results.mdx) - [Enqueuing a task](./run-no-wait.mdx) - [Scheduling a task](./scheduled-runs.mdx) - [Scheduling a task with a cron schedule](./cron-runs.mdx) --- # Managing Environments with Hatchet ## Multiple Developers, One Orchestrator When multiple developers share a single Hatchet orchestrator, conflicts can arise as workflow runs and events intermingle. Without proper isolation, one developer's workflows might interfere with another's testing or development work. Hatchet provides three key solutions for managing this challenge: namespaces, multi-tenancy, and a local Hatchet instance. ### Solution 1: Multi-Tenancy The easiest way to isolate environments for different developers or teams is to use Hatchet's multi-tenancy feature. Each tenant represents a separate environment with its own set of workflows and workers. To add a new tenant for each developer, create an organization and follow these steps: 1. Access the organization dropdown in the dashboard (top right) 2. Select the `+` icon next to your organization's name 3. Generate a new token for that tenant 4. Each developer configures their environment with their designated tenant token ### Solution 2: Local Hatchet Instance If you are using Hatchet locally, you can create a local instance of Hatchet to manage your isolated local development environment. Follow instructions [here](../self-hosting/hatchet-lite.mdx) to get started. --- # Running Tasks Once you have a running worker, you'll want to run your tasks. Hatchet provides a number of ways of triggering task runs, from which you should select the one(s) that best suit(s) your use case. 1. Tasks can be [run, and have their results waited on](./run-with-results.mdx) 2. Tasks can be [enqueued without waiting for their results ("fire and forget")](./run-no-wait.mdx). 3. Tasks can be run on [cron schedules](./cron-runs.mdx). 4. Tasks can be [triggered by events](./run-on-event.mdx). 5. Tasks can be [scheduled for a later time](./scheduled-runs.mdx). Each of these methods for triggering tasks have their own uses in different scenarios, and the next few sections will give some examples of each. These methods can be invoked directly from the workflow definition, or from other services. --- # Running with Results > This example assumes we have a [task](./your-first-task.mdx) registered on a running [worker](./workers.mdx). One method for running a task in Hatchet is to run it and wait for its result. Some example use cases for this type of task trigger include: 1. Fanout patterns, where a parent fans out work to a number of children, and wants to receive the results of those child tasks and make some decision based on them. For example, if each child run fips a coin, and the parent wants to count up how many heads there were and do something with that information. 2. Waiting for long-running API calls to complete, such as if calling an LLM. For instance, if you had a part of your product that writes a poem for a user, your backend might run a `write_poem` task, which in turn calls an LLM, and then your backend would wait for that task to complete and return its result (the poem). #### Python You can use your `Task` object to run a task and wait for it to complete by calling the `run` method. This method will block until the task completes and return the result. ```python from examples.child.worker import SimpleInput, child_task child_task.run(SimpleInput(message="Hello, World!")) ``` You can also `await` the result of `aio_run`: ```python result = await child_task.aio_run(SimpleInput(message="Hello, World!")) ``` Note that the type of `input` here is a Pydantic model that matches the input schema of your workflow. #### Typescript You can use your `Task` object to run a task and wait for it to complete by calling the `run` method. This method will return a promise that resolves when the task completes and returns the result. ```typescript const res = await parent.run( { Message: 'HeLlO WoRlD', }, { additionalMetadata: { test: 'test', }, } ); // πŸ‘€ Access the results of the Task console.log(res.TransformedMessage); ``` #### Go You can use your `Task` object to run a task and wait for it to complete by calling the `Run` method. This method will block until the task completes and return the result. ```go result, err := task.Run(context.Background(), SimpleInput{Message: "Hello, World!"}) if err != nil { return err } ``` #### Ruby ```ruby result = CHILD_TASK_WF.run({ "message" => "Hello, World!" }) ``` ```ruby # In Ruby, run is synchronous result = CHILD_TASK_WF.run({ "message" => "Hello, World!" }) ``` ## Spawning Tasks from within a Task You can also spawn tasks from within a task. This is useful for composing tasks together to create more complex workflows, fanning out batched tasks, or creating conditional workflows. #### Python You can run a task from within a task by calling the `aio_run` method on the task object from within a task function. This will associate the runs in the dashboard for easier debugging. ```python @hatchet.task(name="SpawnTask") async def spawn(input: EmptyModel, ctx: Context) -> dict[str, Any]: # Simply run the task with the input we received result = await child_task.aio_run( input=SimpleInput(message="Hello, World!"), ) return {"results": result} ``` And that's it! The parent task will run and spawn the child task, and then will collect the results from its tasks. #### Typescript You can run a task from within a task by calling the `runChild` method on the `ctx` parameter of the task function. This will associate the runs in the dashboard for easier debugging. ```typescript const parentTask = hatchet.task({ name: 'parent', fn: async (input, ctx) => { // Simply the task and it will be spawned from the parent task const child = await simple.run({ Message: 'HeLlO WoRlD', }); return { result: child.TransformedMessage, }; }, }); ``` #### Go You can run a task from within a task by calling the `Run` method on the task object from within a task function. This will associate the runs in the dashboard for easier debugging. ```go parent := workflow.NewTask("parent-task", func(ctx hatchet.Context, input SimpleInput) (*SimpleOutput, error) { // Run the child task _, err := task.Run(ctx, SimpleInput{Message: input.Message}) if err != nil { return nil, err } return &SimpleOutput{ Result: "Processed: " + input.Message, }, nil }) ``` #### Ruby ```ruby SPAWN_TASK = hatchet.task(name: "SpawnTask") do |input, ctx| result = CHILD_TASK_WF.run({ "message" => "Hello, World!" }) { "results" => result } end ``` ## Running Tasks in Parallel Sometimes you may want to run multiple tasks concurrently. Here's how to do that in each language: #### Python Since the `aio_run` method returns a coroutine, you can spawn multiple tasks in parallel and await using `asyncio.gather`. ```python result1 = child_task.aio_run(SimpleInput(message="Hello, World!")) result2 = child_task.aio_run(SimpleInput(message="Hello, Moon!")) # gather the results of the two tasks results = await asyncio.gather(result1, result2) # print the results of the two tasks print(results[0]["transformed_message"]) print(results[1]["transformed_message"]) ``` #### Typescript Since the `run` method returns a promise, you can spawn multiple tasks in parallel and await using `Promise.all`. ```typescript const res1 = simple.run({ Message: 'HeLlO WoRlD', }); const res2 = simple.run({ Message: 'Hello MoOn', }); const results = await Promise.all([res1, res2]); console.log(results[0].TransformedMessage); console.log(results[1].TransformedMessage); ``` #### Go You can run multiple tasks in parallel by calling `Run` multiple times in goroutines and using a `sync.WaitGroup` to wait for them to complete. ```go var results []string var resultsMutex sync.Mutex var errs []error var errsMutex sync.Mutex wg := sync.WaitGroup{} wg.Add(2) go func() { defer wg.Done() result, err := task.Run(context.Background(), SimpleInput{ Message: "Hello, World!", }) if err != nil { errsMutex.Lock() errs = append(errs, err) errsMutex.Unlock() return } resultsMutex.Lock() var resultOutput SimpleOutput err = result.Into(&resultOutput) if err != nil { return } results = append(results, resultOutput.Result) resultsMutex.Unlock() }() go func() { defer wg.Done() result, err := task.Run(context.Background(), SimpleInput{ Message: "Hello, Moon!", }) if err != nil { errsMutex.Lock() errs = append(errs, err) errsMutex.Unlock() return } resultsMutex.Lock() var resultOutput SimpleOutput err = result.Into(&resultOutput) if err != nil { return } results = append(results, resultOutput.Result) resultsMutex.Unlock() }() wg.Wait() ``` #### Ruby ```ruby results = CHILD_TASK_WF.run_many( [ CHILD_TASK_WF.create_bulk_run_item(input: { "message" => "Hello, World!" }), CHILD_TASK_WF.create_bulk_run_item(input: { "message" => "Hello, Moon!" }) ] ) puts results ``` > **Info:** While you can run multiple tasks in parallel using the `Run` method, this is > not recommended for large numbers of tasks. Instead, we recommend using [bulk > run methods](./bulk-run.mdx) for large parallel task execution. --- # Enqueuing a Task Run (Fire and Forget) > This example assumes we have a [task](./your-first-task.mdx) registered on a running [worker](./workers.mdx). Another method of triggering a task in Hatchet is to _enqueue_ the task without waiting for it to complete, sometimes known as "fire and forget". This pattern is useful for tasks that take a long time to complete or are not critical to the immediate operation of your application. Some example use cases for fire-and-forget style tasks might be: 1. Sending a shipping confirmation email to a user once their order has shipped. This is a truly async task, in the sense that the user is not necessarily using your application when it happens, and the part of the application triggering the task does not need to know the result of the work, just that it has been enqueued (assuming that it will complete, of course). 2. Triggering a machine learning model training job that can take minutes, hours, or even days to complete. Similarly to above, it's likely that no part of the application needs to wait on the result of this work, it just needs to "fire and forget" it - meaning that it needs to kick it off, and let it complete whenever it completes. #### Python If we have the following workflow: ```python class HelloInput(BaseModel): name: str class HelloOutput(BaseModel): greeting: str @hatchet.task(input_validator=HelloInput) async def say_hello(input: HelloInput, ctx: Context) -> HelloOutput: return HelloOutput(greeting=f"Hello, {input.name}!") ``` You can use your `Workflow` object to run a task and "forget" it by calling the `run_no_wait` method. This method enqueue a task run and return a `WorkflowRunRef`, a reference to that run, without waiting for the result. ```python ref = say_hello.run_no_wait(input=HelloInput(name="World")) ``` You can also `await` the result of `aio_run_no_wait`: ```python ref = await say_hello.aio_run_no_wait(input=HelloInput(name="Async World")) ``` Note that the type of `input` here is a Pydantic model that matches the input schema of your task. #### Typescript You can use your `Workflow` object to run a workflow and "forget" it by calling the `run_no_wait` method. This method enqueue a workflow run and return a `WorkflowRunRef`, a reference to that run, without waiting for the result. ```typescript import { simple } from './workflow'; // ... async function main() { // πŸ‘€ Enqueue the workflow const run = await simple.runNoWait({ Message: 'hello', }); // πŸ‘€ Get the run ID of the workflow const runId = await run.getWorkflowRunId(); // It may be helpful to store the run ID of the workflow // in a database or other persistent storage for later use console.log(runId); ``` #### Go You can use your `Workflow` object to run a workflow and "forget" it by calling the `RunNoWait` method. This method enqueue a workflow run and return a `WorkflowRunRef`, a reference to that run, without waiting for the result. ```go runRef, err := task.RunNoWait(context.Background(), SimpleInput{Message: "Hello, World!"}) if err != nil { return err } fmt.Println(runRef.RunId) ``` #### Ruby ```ruby SAY_HELLO = hatchet.task(name: "say_hello") do |input, ctx| { "greeting" => "Hello, #{input['name']}!" } end ``` ```ruby ref = SAY_HELLO.run_no_wait({ "name" => "World" }) ``` ```ruby # In Ruby, run_no_wait is the equivalent of async enqueuing ref = SAY_HELLO.run_no_wait({ "name" => "World" }) ``` ## Subscribing to results from an enqueued task Often it is useful to subscribe to the results of a task at a later time. The `run_no_wait` method returns a `WorkflowRunRef` object which includes a listener for the result of the task. #### Python Use `ref.result()` to block until the result is available: ```python result = ref.result() ``` or await `aio_result`: ```python result = await ref.aio_result() ``` #### Typescript You can use your `Workflow` object to run a workflow and "forget" it by calling the `run_no_wait` method. This method enqueue a workflow run and return a `WorkflowRunRef`, a reference to that run, without waiting for the result. ```typescript // the return object of the enqueue method is a WorkflowRunRef which includes a listener for the result of the workflow const result = await run.result(); console.log(result); // if you need to subscribe to the result of the workflow at a later time, you can use the runRef method and the stored runId const ref = hatchet.runRef(runId); const result2 = await ref.result(); console.log(result2); ``` #### Go You can use your `Workflow` object to run a workflow and "forget" it by calling the `RunNoWait` method. This method enqueue a workflow run and return a `WorkflowRunRef`, a reference to that run, without waiting for the result. ```go result, err := runRef.Result() if err != nil { return err } var resultOutput SimpleOutput err = result.TaskOutput("process-message").Into(&resultOutput) if err != nil { return err } fmt.Println(resultOutput.Result) ``` #### Ruby ```ruby result = ref.result ``` ```ruby # In Ruby, result is synchronous - use poll for async-like behavior result = ref.result ``` ## Triggering Runs in the Hatchet Dashboard In the Hatchet Dashboard, you can trigger and view runs for your tasks. Navigate to "Task Runs" in the left sidebar and click "Trigger Run" at the top right. You can specify run parameters such as Input, Additional Metadata, and the Scheduled Time. ![Create Scheduled Run](../../public/schedule-dash.gif) --- # Scheduled Runs > This example assumes we have a [task](./your-first-task.mdx) registered on a running [worker](./workers.mdx). Scheduled runs allow you to trigger a task at a specific time in the future. Some example use cases of scheduling runs might include: - Sending a reminder email at a specific time after a user took an action. - Running a one-time maintenance task at a predetermined time as determined by your application. For instance, you might want to run a database vacuum during a maintenance window any time a task matches a certain criteria. - Allowing a customer to decide when they want your application to perform a specific task. For instance, if your application is a simple alarm app that sends a customer a notification at a time that they specify, you might create a scheduled run for each alarm that the customer sets. Hatchet supports scheduled runs to run on a schedule defined in a few different ways: - [Programmatically](./scheduled-runs.mdx#programmatically-creating-scheduled-runs): Use the Hatchet SDKs to dynamically set the schedule of a task. - [Hatchet Dashboard](./scheduled-runs.mdx#managing-scheduled-runs-in-the-hatchet-dashboard): Manually create scheduled runs from the Hatchet Dashboard. > **Warning:** The scheduled time is when Hatchet **enqueues** the task, not when the run > starts. Scheduling constraints like concurrency limits, rate limits, and retry > policies can affect run start times. ## Programmatically Creating Scheduled Runs ### Create a Scheduled Run You can create dynamic scheduled runs programmatically via the API to run tasks at a specific time in the future. Here's an example of creating a scheduled run to trigger a task tomorrow at noon: #### Python ```python from datetime import datetime from examples.simple.worker import simple schedule = simple.schedule(datetime(2025, 3, 14, 15, 9, 26)) ## πŸ‘€ do something with the id print(schedule.id) ``` #### Typescript ```typescript const runAt = new Date(new Date().setHours(12, 0, 0, 0) + 24 * 60 * 60 * 1000); const scheduled = await simple.schedule(runAt, { Message: 'hello', }); // πŸ‘€ Get the scheduled run ID of the workflow // it may be helpful to store the scheduled run ID of the workflow // in a database or other persistent storage for later use const scheduledRunId = scheduled.metadata.id; console.log(scheduledRunId); ``` #### Go ```go scheduledRun, err := client.Schedules().Create( context.Background(), "scheduled", features.CreateScheduledRunTrigger{ TriggerAt: time.Now().Add(1 * time.Minute), Input: map[string]interface{}{"message": "Hello, World!"}, }, ) if err != nil { log.Fatalf("failed to create scheduled run: %v", err) } ``` #### Ruby ```ruby schedule = SIMPLE.schedule(Time.now + 86_400, input: { "message" => "Hello, World!" }) ## do something with the id puts schedule.metadata.id ``` In this example you can have different scheduled times for different customers, or dynamically set the scheduled time based on some other business logic. When creating a scheduled run via the API, you will receive a scheduled run object with a metadata property containing the id of the scheduled run. This id can be used to reference the scheduled run when deleting the scheduled run and is often stored in a database or other persistence layer. > **Info:** Note: Be mindful of the time zone of the scheduled run. Scheduled runs are > **always** stored and returned in UTC. ### Deleting a Scheduled Run You can delete a scheduled run by calling the `delete` method on the scheduled client. #### Python ```python hatchet.scheduled.delete(scheduled_id=scheduled_run.metadata.id) ``` #### Typescript ```typescript await hatchet.scheduled.delete(scheduled); ``` #### Go ```go err = client.Schedules().Delete( context.Background(), scheduledRun.Metadata.Id, ) if err != nil { log.Fatalf("failed to delete scheduled run: %v", err) } ``` #### Ruby ```ruby hatchet.scheduled.delete(scheduled_run.metadata.id) ``` ### Listing Scheduled Runs You can list all scheduled runs for a task by calling the `list` method on the scheduled client. #### Python ```python scheduled_runs = hatchet.scheduled.list() ``` #### Typescript ```typescript const scheduledRuns = await hatchet.scheduled.list({ workflow: simple, }); console.log(scheduledRuns); ``` #### Go ```go scheduledRuns, err := client.Schedules().List( context.Background(), rest.WorkflowScheduledListParams{}, ) if err != nil { log.Fatalf("failed to list scheduled runs: %v", err) } ``` #### Ruby ```ruby scheduled_runs = hatchet.scheduled.list ``` ### Rescheduling a Scheduled Run If you need to change the trigger time for an existing scheduled run, you can reschedule it by updating its `triggerAt`. #### Python ```python hatchet.scheduled.update( scheduled_id=scheduled_run.metadata.id, trigger_at=datetime.now(tz=timezone.utc) + timedelta(hours=1), ) ``` #### Typescript ```typescript await hatchet.scheduled.update(scheduledRunId, { triggerAt: new Date(Date.now() + 60 * 60 * 1000), }); ``` #### Ruby ```ruby hatchet.scheduled.update( scheduled_run.metadata.id, trigger_at: Time.now + 3600 ) ``` > **Warning:** You can only reschedule scheduled runs created via the API (not runs created > via a code-defined schedule), and Hatchet may reject rescheduling if the run > has already triggered. ### Bulk operations (delete / reschedule) Hatchet supports bulk operations for scheduled runs. You can bulk delete scheduled runs, and you can bulk reschedule scheduled runs by providing a list of updates. #### Python ```python hatchet.scheduled.bulk_delete(scheduled_ids=[id]) hatchet.scheduled.bulk_delete( workflow_id="workflow_id", statuses=[ScheduledRunStatus.SCHEDULED], additional_metadata={"customer_id": "customer-a"}, ) ``` ```python hatchet.scheduled.bulk_update( [ (id, datetime.now(tz=timezone.utc) + timedelta(hours=2)), ] ) ``` #### Typescript ```typescript await hatchet.scheduled.bulkDelete({ scheduledRuns: [scheduledRunId], }); ``` ```typescript await hatchet.scheduled.bulkUpdate([ { scheduledRun: scheduledRunId, triggerAt: new Date(Date.now() + 2 * 60 * 60 * 1000) }, ]); ``` #### Ruby ```ruby hatchet.scheduled.bulk_delete(scheduled_ids: [id]) ``` ```ruby hatchet.scheduled.bulk_update( [[id, Time.now + 7200]] ) ``` ## Managing Scheduled Runs in the Hatchet Dashboard In the Hatchet Dashboard, you can view and manage scheduled runs for your tasks. Navigate to "Triggers" > "Scheduled Runs" in the left sidebar and click "Create Scheduled Run" at the top right. You can specify run parameters such as Input, Additional Metadata, and the Scheduled Time. ![Create Scheduled Run](../../public/schedule-dash.gif) You can also manage existing scheduled runs: - **Single-run actions**: Use the per-row actions menu to **Reschedule** or **Delete** an individual scheduled run. - **Bulk actions**: Use the **Actions** menu to bulk **Delete** or **Reschedule** either: - The selected rows, or - All rows matching the current filters (including β€œall” if no filters are set). > **Info:** In the dashboard, reschedule/delete actions may be disabled for runs that were > created via a code-defined schedule, and rescheduling may be disabled for runs > that have already triggered. ## Scheduled Run Considerations When using scheduled runs, there are a few considerations to keep in mind: 1. **Time Zone**: Scheduled runs are stored and returned in UTC. Make sure to consider the time zone when defining your scheduled time. 2. **Execution Time**: The actual execution time of a scheduled run may vary slightly from the scheduled time. Hatchet makes a best-effort attempt to enqueue the task as close to the scheduled time as possible, but there may be slight delays due to system load or other factors. 3. **Missed Schedules**: If a scheduled task is missed (e.g., due to system downtime), Hatchet will not automatically run the missed instances when the service comes back online. 4. **Overlapping Schedules**: If a task is still running when a second scheduled run is scheduled to start, Hatchet will start a new instance of the task or respect [concurrency](./concurrency.mdx) policy. --- # Recurring Runs with Cron > This example assumes we have a [task](./your-first-task.mdx) registered on a running [worker](./workers.mdx). A [Cron](https://en.wikipedia.org/wiki/Cron) is a time-based job scheduler that allows you to define when a task should be executed automatically on a pre-determined schedule. Some example use cases for cron-style tasks might include: 1. Running a daily report at a specific time. 2. Sending weekly digest emails to users about their activity from the past week. 3. Running a monthly billing process to generate invoices for customers. Hatchet supports cron triggers to run on a schedule defined in a few different ways: - [Task Definitions](./cron-runs.mdx#defining-a-cron-in-your-task-definition): Define a cron expression in your task definition to trigger the task on a predefined schedule. - [Dynamic Programmatically](./cron-runs.mdx#programmatically-creating-cron-triggers): Use the Hatchet SDKs to dynamically set the cron schedule of a task. - [Hatchet Dashboard](./cron-runs.mdx#managing-cron-jobs-in-the-hatchet-dashboard): Manually create cron triggers from the Hatchet Dashboard. > **Warning:** The expression is when Hatchet **enqueues** the task, not when the run starts. > Scheduling constraints like concurrency limits, rate limits, and retry > policies can affect run start times. ### Cron Expression Syntax Cron expressions in Hatchet follow the standard cron syntax. A cron expression consists of five fields separated by spaces: ``` β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ minute (0 - 59) β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ hour (0 - 23) β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ day of the month (1 - 31) β”‚ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ month (1 - 12) β”‚ β”‚ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ day of the week (0 - 6) (Sunday to Saturday) * * * * * ``` Each field can contain a specific value, an asterisk (`*`) to represent all possible values, or a range of values. Here are some examples of cron expressions: - `0 0 * * *`: Run every day at midnight - `*/15 * * * *`: Run every 15 minutes - `0 9 * * 1`: Run every Monday at 9 AM - `0 0 1 * *`: Run on the first day of every month at midnight ## Defining a Cron in Your Task Definition You can define a task with a cron schedule by configuring the cron expression as part of the task definition: #### Python-Sync ```python # Adding a cron trigger to a workflow is as simple # as adding a `cron expression` to the `on_cron` # prop of the workflow definition cron_workflow = hatchet.workflow(name="CronWorkflow", on_crons=["* * * * *"]) @cron_workflow.task() def step1(input: EmptyModel, ctx: Context) -> dict[str, str]: return { "time": "step1", } ``` #### Python-Async ```python # Adding a cron trigger to a workflow is as simple # as adding a `cron expression` to the `on_cron` # prop of the workflow definition cron_workflow = hatchet.workflow(name="CronWorkflow", on_crons=["* * * * *"]) @cron_workflow.task() def step1(input: EmptyModel, ctx: Context) -> dict[str, str]: return { "time": "step1", } ``` #### Typescript ```typescript export const onCron = hatchet.workflow({ name: 'on-cron-workflow', on: { // πŸ‘€ add a cron expression to run the workflow every 15 minutes cron: '*/15 * * * *', }, }); ``` #### Go ```go dailyCleanup := client.NewStandaloneTask("cleanup-temp-files", func(ctx hatchet.Context, input CronInput) (CronOutput, error) { log.Printf("Running daily cleanup at %s", input.Timestamp) time.Sleep(2 * time.Second) return CronOutput{ JobName: "daily-cleanup", ExecutedAt: time.Now().Format(time.RFC3339), NextRun: "Next run: tomorrow at 2 AM", }, nil }, hatchet.WithWorkflowCron("0 2 * * *"), hatchet.WithWorkflowCronInput(CronInput{ Timestamp: time.Now().Format(time.RFC3339), }), hatchet.WithWorkflowDescription("Daily cleanup and maintenance tasks"), ) ``` #### Ruby ```ruby CRON_WORKFLOW = HATCHET.workflow( name: "CronWorkflow", on_crons: ["*/5 * * * *"] ) CRON_WORKFLOW.task(:cron_task) do |input, ctx| puts "Cron task executed at #{Time.now}" { "status" => "success" } end ``` In the examples above, we set the `on cron` property of the task. The property specifies the cron expression that determines when the task should be triggered. Note: When modifying a cron in your task definition, it will override any cron schedule for previous crons defined in previous task definitions, but crons created via the API or Dashboard will still be respected. ## Programmatically Creating Cron Triggers ### Create a Cron Trigger You can create dynamic cron triggers programmatically via the API. This is useful if you want to create a cron trigger that is not known at the time of task definition, Here's an example of creating a a cron to trigger a report for a specific customer every day at noon: #### Python-Sync ```python cron_trigger = dynamic_cron_workflow.create_cron( cron_name="customer-a-daily-report", expression="0 12 * * *", input=DynamicCronInput(name="John Doe"), additional_metadata={ "customer_id": "customer-a", }, ) id = cron_trigger.metadata.id # the id of the cron trigger ``` #### Python-Async ```python cron_trigger = await dynamic_cron_workflow.aio_create_cron( cron_name="customer-a-daily-report", expression="0 12 * * *", input=DynamicCronInput(name="John Doe"), additional_metadata={ "customer_id": "customer-a", }, ) cron_trigger.metadata.id # the id of the cron trigger ``` #### Typescript ```typescript const cron = await simple.cron('simple-daily', '0 0 * * *', { Message: 'hello', }); // it may be useful to save the cron id for later const cronId = cron.metadata.id; ``` #### Go ```go createdCron, err := client.Crons().Create(context.Background(), "cleanup-temp-files", features.CreateCronTrigger{ Name: "daily-cleanup", Expression: "0 0 * * *", Input: map[string]interface{}{ "timestamp": time.Now().Format(time.RFC3339), }, AdditionalMetadata: map[string]interface{}{ "description": "Daily cleanup and maintenance tasks", }, }) if err != nil { return err } ``` #### Ruby ```ruby cron_trigger = dynamic_cron_workflow.create_cron( "customer-a-daily-report", "0 12 * * *", input: { "name" => "John Doe" } ) id = cron_trigger.metadata.id ``` In this example you can have different expressions for different customers, or dynamically set the expression based on some other business logic. When creating a cron via the API, you will receive a cron trigger object with a metadata property containing the id of the cron trigger. This id can be used to reference the cron trigger when deleting the cron trigger and is often stored in a database or other persistence layer. Note: Cron Name and Expression are required fields when creating a cron trigger and we enforce a unique constraint on the two. ### Delete a Cron Trigger You can delete a cron trigger by passing the cron object or a cron trigger id to the delete method. #### Python-Sync ```python hatchet.cron.delete(cron_id=cron_trigger.metadata.id) ``` #### Python-Async ```python await hatchet.cron.aio_delete(cron_id=cron_trigger.metadata.id) ``` #### Typescript ```typescript await hatchet.crons.delete(cronId); ``` #### Go ```go err = client.Crons().Delete(context.Background(), createdCron.Metadata.Id) if err != nil { return err } ``` #### Ruby ```ruby hatchet.cron.delete(cron_trigger.metadata.id) ``` Note: Deleting a cron trigger will not cancel any currently running instances of the task. It will simply stop the cron trigger from triggering the task again. ### List Cron Triggers Retrieves a list of all task cron triggers matching the criteria. #### Python-Sync ```python cron_triggers = hatchet.cron.list() ``` #### Python-Async ```python await hatchet.cron.aio_list() ``` #### Typescript ```typescript const crons = await hatchet.crons.list({ workflow: simple, }); ``` #### Go ```go cronList, err := client.Crons().List(context.Background(), rest.CronWorkflowListParams{ AdditionalMetadata: &[]string{"description:Daily cleanup and maintenance tasks"}, }) if err != nil { return err } ``` #### Ruby ```ruby cron_triggers = hatchet.cron.list ``` ## Managing Cron Triggers in the Hatchet Dashboard In the Hatchet Dashboard, you can view and manage cron triggers for your tasks. Navigate to "Triggers" > "Cron Jobs" in the left sidebar and click "Create Cron Job" at the top right. You can specify run parameters such as Input, Additional Metadata, and the Expression. ![Create Cron Job](../../public/cron-dash.gif) ## Cron Considerations When using cron triggers, there are a few considerations to keep in mind: 1. **Time Zone**: Cron schedules are UTC. Make sure to consider the time zone when defining your cron expressions. 2. **Execution Time**: The actual execution time of a cron-triggered task may vary slightly from the scheduled time. Hatchet makes a best-effort attempt to enqueue the task as close to the scheduled time as possible, but there may be slight delays due to system load or other factors. 3. **Missed Schedules**: If a scheduled task is missed (e.g., due to system downtime), Hatchet will **not** automatically run the missed instances. It will wait for the next scheduled time to trigger the task. 4. **Overlapping Schedules**: If a task is still running when the next scheduled time arrives, Hatchet will start a new instance of the task or respect the [concurrency](./concurrency.mdx) policy. --- # Run on Event > This example assumes we have a [task](./your-first-task.mdx) registered on a running [worker](./workers.mdx). Run-on-event allows you to trigger one or more tasks when a specific event occurs. This is useful when you need to execute a task in response to an ephemeral event where the result is not important. A few common use cases for event-triggered task runs are: 1. Running a task when an ephemeral event is received, such as a webhook or a message from a queue. 2. When you want to run multiple independent tasks in response to a single event. For instance, if you wanted to run a `send_welcome_email` task, and you also wanted to run a `grant_new_user_credits` task, and a `reward_referral` task, all triggered by the signup. In this case, you might declare all three of those tasks with an event trigger for `user:signup`, and then have them all kick off when that event happens. > **Info:** Event triggers evaluate tasks to run at the time of the event. If an event is > received before the task is registered, the task will not be run. ## Declaring Event Triggers To run a task on an event, you need to declare the event that will trigger the task. This is done by declaring the `on_events` property in the task declaration. #### Python ```python EVENT_KEY = "user:create" SECONDARY_KEY = "foobarbaz" WILDCARD_KEY = "subscription:*" class EventWorkflowInput(BaseModel): should_skip: bool event_workflow = hatchet.workflow( name="EventWorkflow", on_events=[EVENT_KEY, SECONDARY_KEY, WILDCARD_KEY], input_validator=EventWorkflowInput, ) ``` #### Typescript ```typescript export const lower = hatchet.workflow({ name: 'lower', // πŸ‘€ Declare the event that will trigger the workflow onEvents: ['simple-event:create'], }); ``` #### Go ```go const SimpleEvent = "simple-event:create" func Lower(client *hatchet.Client) *hatchet.StandaloneTask { return client.NewStandaloneTask( "lower", func(ctx hatchet.Context, input EventInput) (*LowerTaskOutput, error) { return &LowerTaskOutput{ TransformedMessage: strings.ToLower(input.Message), }, nil }, hatchet.WithWorkflowEvents(SimpleEvent), ) } ``` #### Ruby ```ruby EVENT_KEY = "user:create" SECONDARY_KEY = "foobarbaz" WILDCARD_KEY = "subscription:*" EVENT_WORKFLOW = HATCHET.workflow( name: "EventWorkflow", on_events: [EVENT_KEY, SECONDARY_KEY, WILDCARD_KEY] ) ``` > **Info:** Note: Multiple tasks can be triggered by the same event. > **Info:** As of engine version 0.65.0, Hatchet supports wildcard event triggers using > the `*` wildcard pattern. For example, you could register `subscription:*` as > your event key, which would match incoming events like `subcription:create`, > `subscription:renew`, `subscription:cancel`, and so on. ### Pushing an Event You can push an event to the event queue by calling the `push` method on the Hatchet event client and providing the event name and payload. #### Python ```python hatchet.event.push("user:create", {"should_skip": False}) ``` #### Typescript ```typescript const res = await hatchet.events.push('simple-event:create', { Message: 'hello', ShouldSkip: false, }); ``` #### Go ```go err := client.Events().Push( context.Background(), "simple-event:create", EventInput{ Message: "Hello, World!", }, ) if err != nil { return err } ``` #### Ruby ```ruby HATCHET.event.push("user:create", { "should_skip" => false }) ``` ## Event Filtering Events can also be _filtered_ in Hatchet, which allows you to push events to Hatchet and only trigger task runs from them in certain cases. **If you enable filters on a workflow, your workflow will be triggered once for each matching filter on any incoming event with a matching scope** (more on scopes below). ### Basic Usage There are two ways to create filters in Hatchet. #### Default filters on the workflow The simplest way to create a filter is to register it declaratively with your workflow when it's created. For example: #### Python ```python event_workflow_with_filter = hatchet.workflow( name="EventWorkflow", on_events=[EVENT_KEY, SECONDARY_KEY, WILDCARD_KEY], input_validator=EventWorkflowInput, default_filters=[ DefaultFilter( expression="true", scope="example-scope", payload={ "main_character": "Anna", "supporting_character": "Stiva", "location": "Moscow", }, ) ], ) ``` #### Typescript ```typescript export const lowerWithFilter = hatchet.workflow({ name: 'lower', // πŸ‘€ Declare the event that will trigger the workflow onEvents: ['simple-event:create'], defaultFilters: [ { expression: 'true', scope: 'example-scope', payload: { mainCharacter: 'Anna', supportingCharacter: 'Stiva', location: 'Moscow', }, }, ], }); ``` #### Go ```go func LowerWithFilter(client *hatchet.Client) *hatchet.StandaloneTask { return client.NewStandaloneTask( "lower", accessFilterPayload, hatchet.WithWorkflowEvents(SimpleEvent), hatchet.WithFilters(types.DefaultFilter{ Expression: "true", Scope: "example-scope", Payload: map[string]interface{}{ "main_character": "Anna", "supporting_character": "Stiva", "location": "Moscow"}, }), ) } ``` #### Ruby ```ruby EVENT_WORKFLOW_WITH_FILTER = HATCHET.workflow( name: "EventWorkflow", on_events: [EVENT_KEY, SECONDARY_KEY, WILDCARD_KEY], default_filters: [ Hatchet::DefaultFilter.new( expression: "true", scope: "example-scope", payload: { "main_character" => "Anna", "supporting_character" => "Stiva", "location" => "Moscow" } ) ] ) EVENT_WORKFLOW.task(:task) do |input, ctx| puts "event received" ctx.filter_payload end ``` In each of these cases, we register a filter with the workflow. Note that these "declarative" filters are overwritten each time your workflow is updated, so the ids associated with them will not be stable over time. This allows you to modify a filter in-place or remove a filter, and not need to manually delete it over the API. #### Filters feature client You also can create event filters by using the `filters` clients on the SDKs: #### Python ```python hatchet.filters.create( workflow_id=event_workflow.id, expression="input.should_skip == false", scope="foobarbaz", payload={ "main_character": "Anna", "supporting_character": "Stiva", "location": "Moscow", }, ) ``` #### Typescript ```typescript hatchet.filters.create({ workflowId: lower.id, expression: 'input.ShouldSkip == false', scope: 'foobarbaz', payload: { main_character: 'Anna', supporting_character: 'Stiva', location: 'Moscow', }, }); ``` #### Go ```go _, err = client.Filters().Create( context.Background(), rest.V1CreateFilterRequest{ WorkflowId: uuid.MustParse("bb866b59-5a86-451b-8023-10d451db11d3"), Expression: "true", Scope: "example-scope", }, ) if err != nil { return err } ``` #### Ruby ```ruby HATCHET_CLIENT.filters.create( workflow_id: EVENT_WORKFLOW.id, expression: "input.should_skip == false", scope: "foobarbaz", payload: { "main_character" => "Anna", "supporting_character" => "Stiva", "location" => "Moscow" } ) ``` > **Warning:** Note the `scope` argument to the filter. When you create a filter, it must be > given a `scope` which will be used by Hatchet internally to look it up. When > you push events that you want filtered, you **must provide a `scope` with > those events that matches the scope sent with the filter**. If you do not, the > filter will not apply. Then, push an event that uses the filter to determine whether or not to run. For instance, this run will be skipped, since the payload does not match the expression: #### Python ```python hatchet.event.push( event_key=EVENT_KEY, payload={ "should_skip": True, }, options=PushEventOptions( scope="foobarbaz", ), ) ``` #### Typescript ```typescript hatchet.events.push( SIMPLE_EVENT, { Message: 'hello', ShouldSkip: true, }, { scope: 'foobarbaz', } ); ``` #### Go ```go skipPayload := map[string]interface{}{ "shouldSkip": true, } skipScope := "foobarbaz" err = client.Events().Push( context.Background(), "simple-event:create", skipPayload, v0Client.WithFilterScope(&skipScope), ) if err != nil { return err } ``` #### Ruby ```ruby HATCHET_CLIENT.event.push( EVENT_KEY, { "should_skip" => true }, scope: "foobarbaz" ) ``` But this one will be triggered since the payload _does_ match the expression: #### Python ```python hatchet.event.push( event_key=EVENT_KEY, payload={ "should_skip": False, }, options=PushEventOptions( scope="foobarbaz", ), ) ``` #### Typescript ```typescript hatchet.events.push( SIMPLE_EVENT, { Message: 'hello', ShouldSkip: false, }, { scope: 'foobarbaz', } ); ``` #### Go ```go triggerPayload := map[string]interface{}{ "shouldSkip": false, } triggerScope := "foobarbaz" err = client.Events().Push( context.Background(), "simple-event:create", triggerPayload, v0Client.WithFilterScope(&triggerScope), ) if err != nil { return err } ``` #### Ruby ```ruby HATCHET_CLIENT.event.push( EVENT_KEY, { "should_skip" => false }, scope: "foobarbaz" ) ``` > **Info:** In Hatchet, filters are "positive", meaning that we look for _matches_ to the > filter to determine which tasks to trigger. ### Accessing the filter payload You can access the filter payload by using the `Context` in the task that was triggered by your event: #### Python ```python @event_workflow_with_filter.task() def filtered_task(input: EventWorkflowInput, ctx: Context) -> None: print(ctx.filter_payload) ``` #### Typescript ```typescript lowerWithFilter.task({ name: 'lowerWithFilter', fn: (input, ctx) => { console.log(ctx.filterPayload()); }, }); ``` #### Go ```go func accessFilterPayload(ctx hatchet.Context, input EventInput) (*LowerTaskOutput, error) { fmt.Println(ctx.FilterPayload()) return &LowerTaskOutput{ TransformedMessage: strings.ToLower(input.Message), }, nil } ``` #### Ruby ```ruby EVENT_WORKFLOW_WITH_FILTER.task(:filtered_task) do |input, ctx| puts ctx.filter_payload.inspect end ``` ### Advanced Usage In addition to referencing `input` in the expression (which corresponds to the _event_ payload), you can also reference the following fields: 1. `payload` corresponds to the _filter_ payload (which was part of the request when the filter was created). 2. `additional_metadata` allows for filtering based on `additional_metadata` sent with the event. 3. `event_key` allows for filtering based on the key of the event, such as `user:created`. --- # Bulk Run Many Tasks Often you may want to run a task multiple times with different inputs. There is significant overhead (i.e. network roundtrips) to write the task, so if you're running multiple tasks, it's best to use the bulk run methods. #### Python You can use the `aio_run_many` method to bulk run a task. This will return a list of results. ```python greetings = ["Hello, World!", "Hello, Moon!", "Hello, Mars!"] results = await child_task.aio_run_many( [ # run each greeting as a task in parallel child_task.create_bulk_run_item( input=SimpleInput(message=greeting), ) for greeting in greetings ] ) # this will await all results and return a list of results print(results) ``` > **Info:** `Workflow.create_bulk_run_item` is a typed helper to create the inputs for > each task. There are additional bulk methods available on the `Workflow` object. - `aio_run_many` - `aio_run_many_no_wait` And blocking variants: - `run_many` - `run_many_no_wait` As with the run methods, you can call bulk methods from within a task and the runs will be associated with the parent task in the dashboard. #### Typescript You can use the `run` method directly to bulk run tasks by passing an array of inputs. This will return a list of results. ```typescript const res = await simple.run([ { Message: 'HeLlO WoRlD', }, { Message: 'Hello MoOn', }, ]); // πŸ‘€ Access the results of the Task console.log(res[0].TransformedMessage); console.log(res[1].TransformedMessage); ``` There are additional bulk methods available on the `Task` object. - `run` - `runNoWait` As with the run methods, you can call bulk methods on the task fn context parameter within a task and the runs will be associated with the parent task in the dashboard. ```typescript const parent = hatchet.task({ name: 'simple', fn: async (input: SimpleInput, ctx) => { // Bulk run two tasks in parallel const child = await ctx.bulkRunChildren([ { workflow: simple, input: { Message: 'Hello, World!', }, }, { workflow: simple, input: { Message: 'Hello, Moon!', }, }, ]); return { TransformedMessage: `${child[0].TransformedMessage} ${child[1].TransformedMessage}`, }; }, }); ``` Available bulk methods on the `Context` object are: - `bulkRunChildren` - `bulkRunChildrenNoWait` #### Go You can use the `RunMany` method directly on the `Workflow` or `StandaloneTask` instance to bulk run tasks by passing an array of inputs. This will return a list of run IDs. ```go // Prepare inputs as []RunManyOpt for bulk run inputs := make([]hatchet.RunManyOpt, len(bulkInputs)) for i, input := range bulkInputs { inputs[i] = hatchet.RunManyOpt{ Input: input, } } // Run workflows in bulk ctx := context.Background() runRefs, err := workflow.RunMany(ctx, inputs) if err != nil { log.Fatalf("failed to run bulk workflows: %v", err) } ``` Additional bulk methods are coming soon for the Go SDK. Join our [Discord](https://hatchet.run/discord) to stay up to date. #### Ruby ```ruby greetings = ["Hello, World!", "Hello, Moon!", "Hello, Mars!"] results = CHILD_TASK_WF.run_many( greetings.map do |greeting| CHILD_TASK_WF.create_bulk_run_item( input: { "message" => greeting } ) end ) puts results ``` --- # Webhooks This feature is currently in development and might change. Reach out for feedback or if you encounter any problems registering any external webhooks. Webhooks allow external systems to trigger Hatchet workflows by sending HTTP requests to dedicated endpoints. This enables real-time integration with third-party services like GitHub, Stripe, Slack, or any system that can send webhook events. ## Creating a webhook To create a webhook, you'll need to fill out some fields that tell Hatchet how to determine which workflows to trigger from your webhook, and how to validate it when it arrives from the sender. In particular, you'll need to provide the following fields: #### Name The **Webhook Name** is tenant-unique (meaning a single tenant can only use each name once), and is used to create the URL for where the incoming webhook request should be sent. For instance, if your tenant id was `d60181b7-da6c-4d4c-92ec-8aa0fc74b3e5` and your webhook name was `my-webhook`, then the URL might look like `https://cloud.onhatchet.run/api/v1/stable/tenants/d60181b7-da6c-4d4c-92ec-8aa0fc74b3e5/webhooks/my-webhook`. Note that you can copy this URL in the dashboard. #### Source The **Source** indicates the source of the webhook, which can be a pre-provided one for easy setup like Stripe or Github, or a "generic" one, which lets you configure all of the necessary fields for your webhook integration based on what the webhook sender provides. #### Event Key Expression The **Event Key Expression** is a [CEL](https://cel.dev/) expression that you can use to create a dynamic event key from the payload and headers of the incoming webhook. You can either set this to a constant value, like `webhook`, or you could set it to something dynamic using those two options. Some examples: 1. `'stripe:' + input.type` would create event keys where `'stripe:'` is a prefix for all keys indicating the webhook came from Stripe, and `input.type` selects the `type` field off of the webhook payload and uses it to create the final event key. The result might look something like `stripe:payment_intent.created`. 2. `'github:' + headers['x-github-event'] + ':' + input.action` could create a key like `github:star:created` > **Info:** The result of the event key expression is what Hatchet will use as the event > key, so you'd need to set a matching event key as a trigger on your workflows > in order to trigger them from the webhooks you create. For instance, you might > add `on_events=["stripe:payment_intent.created"]` to listen for payment intent > created events in the previous example. #### Scope Expression (Optional) The **Scope Expression** is an optional [CEL](https://cel.dev/) expression that evaluates to a string used to filter which workflows to trigger. This is useful when you have multiple workflows listening to the same event key but want to route to specific workflows based on the webhook content. Like the event key expression, you have access to `input` (the webhook payload) and `headers` (the request headers). Some examples: 1. `input.customer_id` would use the customer ID from the payload as the scope 2. `headers['x-organization-id']` would use a header value as the scope 3. `input.metadata.environment` could route to different workflows based on environment #### Static Payload (Optional) The **Static Payload** is an optional JSON object that gets merged with the incoming webhook payload before it's passed to your workflows. This is useful for: - Adding constant metadata to all events from this webhook - Injecting configuration values that aren't in the original payload - Overriding specific fields from the incoming payload > **Info:** When there's a key collision between the incoming webhook payload and the > static payload, the static payload values take precedence. For example, if you set a static payload of `{"source": "stripe", "environment": "production"}` and receive a webhook with `{"type": "payment_intent.created", "source": "api"}`, the final payload passed to your workflow would be `{"type": "payment_intent.created", "source": "stripe", "environment": "production"}`. #### Authentication Finally, you'll need to specify how Hatchet should authenticate incoming webhook requests. For non-generic sources like Stripe and Github, Hatchet has presets for most of the fields, so in most cases you'd only need to provide a secret. If you're using a generic source, then you'll need to specify an authentication method (either basic auth, an API key, HMAC-based auth), and provide the required fields (such as a username and password in the basic auth case). > **Warning:** Hatchet encrypts any secrets you provide for validating incoming webhooks. The different authentication methods require different fields to be provided: - **Pre-configured sources** (Stripe, GitHub, Slack): Only require a webhook secret - **Generic sources** require different fields depending on the selected authentication method: - **Basic Auth**: Requires a username and password - **API Key**: Requires header name containing the key on incoming requests, and secret key itself - **HMAC**: Requires a header name containing the secret on incoming requests, the secret itself, an encoding method (e.g. hex, base64), and an algorithm (e.g. `SHA256`, `SHA1`, etc.). ## Usage While you're creating your webhook (and also after you've created it), you can copy the webhook URL, which is what you'll provide to the webhook _sender_. Once you've done that, the last thing to do is register the event keys you want your workers to listen for so that they can be triggered by incoming webhooks. For examples on how to do this, see the [documentation on event triggers](./run-on-event.mdx). --- ## Invoking Tasks From Other Services While Hatchet recommends importing your workflows and standalone tasks directly to use for triggering runs, this only works in a monorepo or similar setups where you have access to those objects. However, it's common to have a polyrepo, have code written in multiple languages, or otherwise not be able to import your workflows and standalone tasks directly. Hatchet provides first-class, type-safe support for handling these cases as well, with only minor code duplication, to allow you to trigger your tasks from anywhere in a type-safe way. ### Creating a "Stub" Task on your External Service (Recommended) The recommended way to trigger a run from a service where you _cannot_ import the workflow or standalone task definition directly is to create a "stub" task or workflow on your external service. This is a Hatchet task or workflow that has the same name and input/output types as the task you want to trigger on your Hatchet worker, but without the function or other configuration. This allows you to have a polyglot, fully typed interface with full SDK support. #### Typescript ```typescript import { hatchet } from '../hatchet-client'; // (optional) Define the input type for the workflow export type SimpleInput = { Message: string; }; // (optional) Define the output type for the workflow export type SimpleOutput = { 'to-lower': { TransformedMessage: string; }; }; // declare the workflow with the same name as the // workflow name on the worker export const simple = hatchet.workflow({ name: 'simple', }); // you can use all the same run methods on the stub // with full type-safety simple.run({ Message: 'Hello, World!' }); simple.runNoWait({ Message: 'Hello, World!' }); simple.schedule(new Date(), { Message: 'Hello, World!' }); simple.cron('my-cron', '0 0 * * *', { Message: 'Hello, World!' }); ``` #### Python Consider a task with an implementation like this: ```python from pydantic import BaseModel from hatchet_sdk import Context, Hatchet class TaskInput(BaseModel): user_id: int class TaskOutput(BaseModel): ok: bool hatchet = Hatchet() @hatchet.task(name="externally-triggered-task", input_validator=TaskInput) async def externally_triggered_task(input: TaskInput, ctx: Context) -> TaskOutput: return TaskOutput(ok=True) ``` To trigger this task from a separate service, for instance, in a microservices architecture, where the code is not shared, start by defining models that match the input and output types of the task defined above. ```python class TaskInput(BaseModel): user_id: int class TaskOutput(BaseModel): ok: bool ``` Next, create the stub task. ```python stub = hatchet.stubs.task( # make sure the name and schemas exactly match the implementation name="externally-triggered-task", input_validator=TaskInput, output_validator=TaskOutput, ) ``` Finally, use the stub to trigger the underlying task, and (optionally) retrieve the result. ```python # input type checks properly result = await stub.aio_run(input=TaskInput(user_id=1234)) # `result.ok` type checks properly print("Is successful:", result.ok) ``` #### Go ```go package main import ( "context" "fmt" "log" hatchet "github.com/hatchet-dev/hatchet/sdks/go" ) type StubInput struct { Message string `json:"message"` } type StubOutput struct { Ok bool `json:"ok"` } func StubWorkflow(client *hatchet.Client) *hatchet.StandaloneTask { return client.NewStandaloneTask("stub-workflow", func(ctx hatchet.Context, input StubInput) (StubOutput, error) { return StubOutput{ Ok: true, }, nil }) } func main() { client, err := hatchet.NewClient() if err != nil { log.Fatalf("failed to create hatchet client: %v", err) } task := StubWorkflow(client) // we are simply running the task here, but it can be implemented in another service / worker // and in another language with the same name and input-output types result, err := task.Run(context.Background(), StubInput{Message: "Hello, World!"}) if err != nil { log.Fatalf("failed to run task: %v", err) } fmt.Println(result) } ``` #### Ruby Note that this approach requires code duplication, which can break type safety. For instance, if the input type to your workflow changes, you need to remember to also change the type passed to the stub. Some ways to mitigate risks here are helpful comments reminding developers to keep these types in sync, code generation tools, and end-to-end tests. --- # Dockerizing Hatchet Applications This guide explains how to create Dockerfiles for Hatchet applications. There are examples for Python, TypeScript, Go, and Ruby applications here. ## Entrypoint Configuration for Hatchet Before creating your Dockerfile, understand that Hatchet workers require specific entry point configuration: 1. The entry point must run code that runs the Hatchet worker. This can be done by calling the `worker.start()` method in your respective SDK. 2. Proper environment variables must be set for Hatchet SDK 3. The worker should be configured to handle your workflows using the `worker.register` method or by passing workflows into the worker constructor or factory. ## Example Dockerfiles #### Python - Poetry ```dockerfile FROM python:3.13-slim ENV PYTHONUNBUFFERED=1 \ POETRY_VERSION=1.4.2 \ HATCHET_ENV=production # Install system dependencies and Poetry RUN apt-get update && \ apt-get install -y curl && \ curl -sSL https://install.python-poetry.org | python3 - && \ ln -s /root/.local/bin/poetry /usr/local/bin/poetry && \ apt-get clean && \ rm -rf /var/lib/apt/lists/\* WORKDIR /app COPY pyproject.toml poetry.lock\* /app/ RUN poetry config virtualenvs.create false && \ poetry install --no-interaction --no-ansi COPY . /app CMD ["poetry", "run", "python", "worker.py"] ``` > **Info:** If you're using a poetry script to run your worker, you can replace `poetry run python worker.py` with `poetry run ` in the CMD. #### Python - pip ```dockerfile FROM python:3.13-slim ENV PYTHONUNBUFFERED=1 \ HATCHET_ENV=production WORKDIR /app COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt COPY . /app CMD ["python", "worker.py"] ``` #### JavaScript - npm ```dockerfile # Stage 1: Build FROM node:18 AS builder WORKDIR /app COPY package\*.json ./ RUN npm ci COPY . . RUN npm run build # Stage 2: Production FROM node:22-alpine WORKDIR /app COPY package\*.json ./ RUN npm ci --omit=dev COPY --from=builder /app/dist ./dist ENV NODE_ENV=production CMD ["node", "dist/worker.js"] ``` > **Info:** Use `npm ci` instead of `npm install` for more reliable builds. It's faster and ensures consistent installs across environments. #### JavaScript - pnpm ```dockerfile # Stage 1: Build FROM node:18 AS builder WORKDIR /app # Install pnpm RUN npm install -g pnpm COPY pnpm-lock.yaml package.json ./ RUN pnpm install --frozen-lockfile COPY . . RUN pnpm build # Stage 2: Production FROM node:22-alpine WORKDIR /app RUN npm install -g pnpm COPY pnpm-lock.yaml package.json ./ RUN pnpm install --frozen-lockfile --prod COPY --from=builder /app/dist ./dist ENV NODE_ENV=production CMD ["node", "dist/worker.js"] ``` > **Info:** PNPM's `--frozen-lockfile` flag ensures consistent installs and fails if an update is needed. #### JavaScript - yarn ```dockerfile # Stage 1: Build FROM node:18 AS builder WORKDIR /app COPY package.json yarn.lock ./ RUN yarn install --frozen-lockfile COPY . . RUN yarn build # Stage 2: Production FROM node:22-alpine WORKDIR /app COPY package.json yarn.lock ./ RUN yarn install --frozen-lockfile --production COPY --from=builder /app/dist ./dist ENV NODE_ENV=production CMD ["node", "dist/worker.js"] ``` > **Info:** Yarn's `--frozen-lockfile` ensures your dependencies match the lock file exactly. #### Go ```dockerfile # Stage 1: Build FROM golang:1.25-alpine3.21 AS builder WORKDIR /app COPY . . RUN go mod download RUN go build -o hatchet-worker . # Stage 2: Production FROM golang:1.25-alpine3.21 WORKDIR /app COPY --from=builder hatchet-worker . CMD ["/app/hatchet-worker"] ``` #### Ruby ```dockerfile FROM ruby:3.3-slim ENV HATCHET_ENV=production # Install system dependencies for native gems RUN apt-get update && \ apt-get install -y build-essential && \ apt-get clean && \ rm -rf /var/lib/apt/lists/\* WORKDIR /app COPY Gemfile Gemfile.lock ./ RUN bundle config set --local without 'development test' && \ bundle install COPY . /app CMD ["bundle", "exec", "ruby", "worker.rb"] ``` > **Info:** If you're using a Rake task or binstub to start your worker, replace the CMD with the appropriate command, e.g. `CMD ["bundle", "exec", "rake", "hatchet:worker"]`. ``` --- # Troubleshooting Hatchet Workers This guide aims to document common issues when deploying Hatchet workers. ## Could not send task to worker If you see this error in the event history of a task, it could mean several things: 1. The worker is closing its network connection while the task is being sent. This could be caused by the worker crashing or going offline. 2. The payload is too large for the worker to accept or the Hatchet engine to send. The default maximum payload size is 4MB. Consider reducing the size of the input data or output data of your tasks. 3. The worker has a large backlog of tasks in-flight on the network connection and is rejecting new tasks. This can occur if workers are geographically distant from the Hatchet engine or if there are network issues causing delays. Hatchet Cloud runs by default in `us-west-2` (Oregon, USA), so consider deploying your workers in a region close to that for the best performance. If you are self-hosting, you can increase the maximum backlog size via the `SERVER_GRPC_WORKER_STREAM_MAX_BACKLOG_SIZE` environment variable in your Hatchet engine configuration. The default is 20. ## No workers visible in dashboard If you have deployed workers but they are not visible in the Hatchet dashboard, it is likely that: 1. Your API token is invalid or incorrect. Ensure that the token you are using to start the worker matches the token generated in the Hatchet dashboard for your tenant. 2. Worker heartbeats are not reaching the Hatchet engine. You will see noisy logs in the worker output if this is the case. ## Phantom workers active in dashboard This is often due to workers still running in your deployed environment. We see this most often with very long termination periods for workers, or in local development environments where worker processes are leaking. If you are in a local development environment, you can usually view running Hatchet worker processes via `ps -a | grep worker` (or whatever your entrypoint binary is called) and kill them manually. --- # Worker Health Checks The Python SDK allows you to enable and ping a healthcheck to check on the status of your worker. ### Usage First, set the `HATCHET_CLIENT_WORKER_HEALTHCHECK_ENABLED` environment variable to `True`. Once that flag is set, two health check endpoints will be available (on port `8001` by default): 1. `/health` - Returns **200** when the worker listener is healthy, otherwise **503** with body `{"status":"HEALTHY"}` or `{"status":"UNHEALTHY"}`. 2. `/metrics` - A metrics endpoint intended to be used by a monitoring system like Prometheus. ### Custom Port You can set a custom port with the `HATCHET_CLIENT_WORKER_HEALTHCHECK_PORT` environment variable, e.g. `HATCHET_CLIENT_WORKER_HEALTHCHECK_PORT=8002`. ### Event loop blocked threshold If the worker listener process event loop becomes blocked for longer than a threshold, `/health` will return **503**. You can configure this threshold (in seconds) with: - `HATCHET_CLIENT_WORKER_HEALTHCHECK_EVENT_LOOP_BLOCK_THRESHOLD_SECONDS` (default: `5.0`) #### Example request to `/health`: ```bash curl localhost:8001/health {"status":"HEALTHY"} ``` #### Example request to `/metrics`: ```bash curl localhost:8001/metrics # HELP python_gc_objects_collected_total Objects collected during gc # TYPE python_gc_objects_collected_total counter python_gc_objects_collected_total{generation="0"} 18782.0 python_gc_objects_collected_total{generation="1"} 4907.0 python_gc_objects_collected_total{generation="2"} 244.0 # HELP python_gc_objects_uncollectable_total Uncollectable objects found during GC # TYPE python_gc_objects_uncollectable_total counter python_gc_objects_uncollectable_total{generation="0"} 0.0 python_gc_objects_uncollectable_total{generation="1"} 0.0 python_gc_objects_uncollectable_total{generation="2"} 0.0 # HELP python_gc_collections_total Number of times this generation was collected # TYPE python_gc_collections_total counter python_gc_collections_total{generation="0"} 308.0 python_gc_collections_total{generation="1"} 27.0 python_gc_collections_total{generation="2"} 2.0 # HELP python_info Python platform information # TYPE python_info gauge python_info{implementation="CPython",major="3",minor="10",patchlevel="15",version="3.10.15"} 1.0 # HELP hatchet_worker_listener_health_my_worker Listener health (1 healthy, 0 unhealthy) # TYPE hatchet_worker_listener_health_my_worker gauge hatchet_worker_listener_health_my_worker 1.0 # HELP hatchet_worker_event_loop_lag_seconds_my_worker Event loop lag in seconds (listener process) # TYPE hatchet_worker_event_loop_lag_seconds_my_worker gauge hatchet_worker_event_loop_lag_seconds_my_worker 0.0 ``` #### Example Prometheus Configuration for `/metrics`: ```yaml scrape_configs: - job_name: "hatchet" scrape_interval: 5s static_configs: - targets: ["localhost:8001"] ``` #### Example Prometheus Query An example query to check if the worker is healthy might look something like: ``` (hatchet_worker_listener_health_my_worker{instance="localhost:8001", job="hatchet"}) or vector(0) ``` --- # Autoscaling Workers Hatchet provides a Task Stats API that enables you to implement autoscaling for your worker pools. By querying real-time queue depths and task distribution, you can dynamically scale workers based on actual workload demand. ## Task Stats API The Task Stats endpoint returns current statistics for queued and running tasks across your tenant, broken down by task name, queue, and concurrency group. ### Endpoint ``` GET /api/v1/tenants/{tenantId}/task-stats ``` ### Authentication The endpoint requires Bearer token authentication using a valid API token: ``` Authorization: Bearer ``` ### Response Format The response is a JSON object keyed by task name, with each task containing statistics for queued and running states: ```json { "my-task": { "queued": { "total": 150, "queues": { "my-task:default": 100, "my-task:priority": 50 }, "concurrency": [ { "expression": "input.user_id", "type": "GROUP_ROUND_ROBIN", "keys": { "user-123": 10, "user-456": 15 } } ], "oldest": "2024-01-15T10:30:00Z" }, "running": { "total": 25, "oldest": "2024-01-15T10:25:00Z", "concurrency": [] } } } ``` Each task stat includes: - **total**: The total count of tasks in this state - **concurrency**: Distribution across concurrency groups (if concurrency limits are configured) - **oldest**: Timestamp of the oldest task in the specified state These are available only for `queued` tasks: - **queues**: A breakdown of task counts by queue name ### Example Usage ```bash curl -H "Authorization: Bearer your-api-token-here" \ https://cloud.onhatchet.run/api/v1/tenants/707d0855-80ab-4e1f-a156-f1c4546cbf52/task-stats ``` ## Autoscaling with KEDA [KEDA](https://keda.sh) (Kubernetes Event-driven Autoscaling) can use the Task Stats API to automatically scale your worker deployments based on queue depth. ### Setting Up a KEDA ScaledObject Create a `ScaledObject` that queries the Hatchet Task Stats API and scales your worker deployment based on the number of queued tasks: ```yaml apiVersion: keda.sh/v1alpha1 kind: ScaledObject metadata: name: hatchet-worker-scaler spec: scaleTargetRef: name: hatchet-worker minReplicaCount: 1 maxReplicaCount: 10 triggers: - type: metrics-api metadata: targetValue: "100" url: "https://cloud.onhatchet.run/api/v1/tenants/YOUR_TENANT_ID/task-stats" valueLocation: "my-task.queued.total" authMode: "bearer" authenticationRef: name: hatchet-api-token --- apiVersion: v1 kind: Secret metadata: name: hatchet-api-token type: Opaque stringData: token: "your-api-token-here" --- apiVersion: keda.sh/v1alpha1 kind: TriggerAuthentication metadata: name: hatchet-api-token spec: secretTargetRef: - parameter: token name: hatchet-api-token key: token ``` > **Info:** The `valueLocation` field uses JSONPath-style notation to extract a specific > value from the response. Adjust `my-task` to match your actual task name. ### Scaling Based on Multiple Tasks If you have multiple task types handled by the same worker, you can create multiple triggers or use a custom metrics endpoint that aggregates the totals: ```yaml triggers: - type: metrics-api metadata: targetValue: "50" url: "https://cloud.onhatchet.run/api/v1/tenants/YOUR_TENANT_ID/task-stats" valueLocation: "task-a.queued.total" authMode: "bearer" authenticationRef: name: hatchet-api-token - type: metrics-api metadata: targetValue: "50" url: "https://cloud.onhatchet.run/api/v1/tenants/YOUR_TENANT_ID/task-stats" valueLocation: "task-b.queued.total" authMode: "bearer" authenticationRef: name: hatchet-api-token ``` --- # Concurrency Control in Hatchet Tasks Hatchet provides powerful concurrency control features to help you manage the execution of your tasks. This is particularly useful when you have tasks that may be triggered frequently or have long-running steps, and you want to limit the number of concurrent executions to prevent overloading your system, ensure fairness, or avoid race conditions. > **Info:** Concurrency strategies can be added to both `Tasks` and `Workflows`. ### Why use concurrency control? There are several reasons why you might want to use concurrency control in your Hatchet tasks: 1. **Fairness**: When you have multiple clients or users triggering tasks, concurrency control can help ensure fair access to resources. By limiting the number of concurrent runs per client or user, you can prevent a single client from monopolizing the system and ensure that all clients get a fair share of the available resources. 2. **Resource management**: If your task steps are resource-intensive (e.g., they make external API calls or perform heavy computations), running too many instances concurrently can overload your system. By limiting concurrency, you can ensure your system remains stable and responsive. 3. **Avoiding race conditions**: If your task steps modify shared resources, running multiple instances concurrently can lead to race conditions and inconsistent data. Concurrency control helps you avoid these issues by ensuring only a limited number of instances run at a time. 4. **Compliance with external service limits**: If your task steps interact with external services that have rate limits, concurrency control can help you stay within those limits and avoid being throttled or blocked. 5. **Spike Protection**: When you have tasks that are triggered by external events, such as webhooks or user actions, you may experience spikes in traffic that can overwhelm your system. Concurrency control can help you manage these spikes by limiting the number of concurrent runs and queuing new runs until resources become available. ### Available Strategies: - [`GROUP_ROUND_ROBIN`](#group-round-robin): Distribute task instances across available slots in a round-robin fashion based on the `key` function. - [`CANCEL_IN_PROGRESS`](#cancel-in-progress): Cancel the currently running task instances for the same concurrency key to free up slots for the new instance. - [`CANCEL_NEWEST`](#cancel-newest): Cancel the newest task instance for the same concurrency key to free up slots for the new instance. > We're always open to adding more strategies to fit your needs. Join our [discord](https://hatchet.run/discord) to let us know. ### Setting concurrency on workers In addition to setting concurrency limits at the task level, you can also control concurrency at the worker level by passing the `slots` option when creating a new `Worker` instance: #### Python ```python class WorkflowInput(BaseModel): group: str concurrency_limit_rr_workflow = hatchet.workflow( name="ConcurrencyDemoWorkflowRR", concurrency=ConcurrencyExpression( expression="input.group", max_runs=1, limit_strategy=ConcurrencyLimitStrategy.GROUP_ROUND_ROBIN, ), input_validator=WorkflowInput, ) ``` #### Typescript ```typescript export const simpleConcurrency = hatchet.workflow({ name: 'simple-concurrency', concurrency: { maxRuns: 1, limitStrategy: ConcurrencyLimitStrategy.GROUP_ROUND_ROBIN, expression: 'input.GroupKey', }, }); ``` #### Go ```go var maxRuns int32 = 1 strategy := types.GroupRoundRobin return client.NewStandaloneTask("simple-concurrency", func(ctx worker.HatchetContext, input ConcurrencyInput) (*TransformedOutput, error) { // Random sleep between 200ms and 1000ms time.Sleep(time.Duration(200+rand.Intn(800)) * time.Millisecond) return &TransformedOutput{ TransformedMessage: input.Message, }, nil }, hatchet.WithWorkflowConcurrency(types.Concurrency{ Expression: "input.GroupKey", MaxRuns: &maxRuns, LimitStrategy: &strategy, }), ) ``` #### Ruby ```ruby CONCURRENCY_LIMIT_RR_WORKFLOW = HATCHET.workflow( name: "ConcurrencyDemoWorkflowRR", concurrency: Hatchet::ConcurrencyExpression.new( expression: "input.group", max_runs: 1, limit_strategy: :group_round_robin ) ) CONCURRENCY_LIMIT_RR_WORKFLOW.task(:step1) do |input, ctx| puts "starting step1" sleep 2 puts "finished step1" end ``` This example will only let 1 run in each group run at a given time to fairly distribute the load across the workers. ## Group Round Robin ### How it works When a new task instance is triggered, the `GROUP_ROUND_ROBIN` strategy will: 1. Determine the group that the instance belongs to based on the `key` function defined in the task's concurrency configuration. 2. Check if there are any available slots for the instance's group based on the `slots` limit of available workers. 3. If a slot is available, the new task instance starts executing immediately. 4. If no slots are available, the new task instance is added to a queue for its group. 5. When a running task instance completes and a slot becomes available for a group, the next queued instance for that group (in round-robin order) is dequeued and starts executing. This strategy ensures that task instances are processed fairly across different groups, preventing any one group from monopolizing the available resources. It also helps to reduce latency for instances within each group, as they are processed in a round-robin fashion rather than strictly in the order they were triggered. ### When to use `GROUP_ROUND_ROBIN` The `GROUP_ROUND_ROBIN` strategy is particularly useful in scenarios where: - You have multiple clients or users triggering task instances, and you want to ensure fair resource allocation among them. - You want to process instances within each group in a round-robin fashion to minimize latency and ensure that no single instance within a group is starved for resources. - You have long-running task instances and want to avoid one group's instances monopolizing the available slots. Keep in mind that the `GROUP_ROUND_ROBIN` strategy may not be suitable for all use cases, especially those that require strict ordering or prioritization of the most recent events. ## Cancel In Progress ### How it works When a new task instance is triggered, the `CANCEL_IN_PROGRESS` strategy will: 1. Determine the group that the instance belongs to based on the `key` function defined in the task's concurrency configuration. 2. Check if there are any available slots for the instance's group based on the `maxRuns` limit of available workers. 3. If a slot is available, the new task instance starts executing immediately. 4. If there are no available slots, currently running task instances for the same concurrency key are cancelled to free up slots for the new instance. 5. The new task instance starts executing immediately. ### When to use Cancel In Progress The `CANCEL_IN_PROGRESS` strategy is particularly useful in scenarios where: - You have long-running task instances that may become stale or irrelevant if newer instances are triggered. - You want to prioritize processing the most recent data or events, even if it means canceling older task instances. - You have resource-intensive tasks where it's more efficient to cancel an in-progress instance and start a new one than to wait for the old instance to complete. - Your user UI allows for multiple inputs, but only the most recent is relevant (i.e. chat messages, form submissions, etc.). ## Cancel Newest ### How it works The `CANCEL_NEWEST` strategy is similar to `CANCEL_IN_PROGRESS`, but it cancels the newly enqueued run instead of the oldest. ### When to use `CANCEL_NEWEST` The `CANCEL_NEWEST` strategy is particularly useful in scenarios where: - You want to allow in progress runs to complete before starting new work. - You have long-running task instances and want to avoid one group's instances monopolizing the available slots. ## Multiple concurrency strategies You can also combine multiple concurrency strategies to create a more complex concurrency control system. For example, you can use one group key to represent a specific team, and another group to represent a specific resource in that team, giving you more control over the rate at which tasks are executed. #### Python ```python class WorkflowInput(BaseModel): name: str digit: str concurrency_workflow_level_workflow = hatchet.workflow( name="ConcurrencyWorkflowLevel", input_validator=WorkflowInput, concurrency=[ ConcurrencyExpression( expression="input.digit", max_runs=DIGIT_MAX_RUNS, limit_strategy=ConcurrencyLimitStrategy.GROUP_ROUND_ROBIN, ), ConcurrencyExpression( expression="input.name", max_runs=NAME_MAX_RUNS, limit_strategy=ConcurrencyLimitStrategy.GROUP_ROUND_ROBIN, ), ], ) ``` #### Typescript ```typescript export const multipleConcurrencyKeys = hatchet.workflow({ name: 'simple-concurrency', concurrency: [ { maxRuns: 1, limitStrategy: ConcurrencyLimitStrategy.GROUP_ROUND_ROBIN, expression: 'input.Tier', }, { maxRuns: 1, limitStrategy: ConcurrencyLimitStrategy.GROUP_ROUND_ROBIN, expression: 'input.Account', }, ], }); ``` #### Go ```go strategy := types.GroupRoundRobin var maxRuns int32 = 20 return client.NewStandaloneTask("multi-concurrency", func(ctx worker.HatchetContext, input ConcurrencyInput) (*TransformedOutput, error) { // Random sleep between 200ms and 1000ms time.Sleep(time.Duration(200+rand.Intn(800)) * time.Millisecond) return &TransformedOutput{ TransformedMessage: input.Message, }, nil }, hatchet.WithWorkflowConcurrency( types.Concurrency{ Expression: "input.Tier", MaxRuns: &maxRuns, LimitStrategy: &strategy, }, types.Concurrency{ Expression: "input.Account", MaxRuns: &maxRuns, LimitStrategy: &strategy, }, ), ) ``` #### Ruby ```ruby CONCURRENCY_WORKFLOW_LEVEL_WORKFLOW = HATCHET.workflow( name: "ConcurrencyWorkflowLevel", concurrency: [ Hatchet::ConcurrencyExpression.new( expression: "input.digit", max_runs: DIGIT_MAX_RUNS_WL, limit_strategy: :group_round_robin ), Hatchet::ConcurrencyExpression.new( expression: "input.name", max_runs: NAME_MAX_RUNS_WL, limit_strategy: :group_round_robin ) ] ) CONCURRENCY_WORKFLOW_LEVEL_WORKFLOW.task(:task_1) do |input, ctx| sleep SLEEP_TIME_WL end CONCURRENCY_WORKFLOW_LEVEL_WORKFLOW.task(:task_2) do |input, ctx| sleep SLEEP_TIME_WL end ``` --- # Rate Limiting Step Runs in Hatchet Hatchet allows you to enforce rate limits on task runs, enabling you to control the rate at which your service runs consume resources, such as external API calls, database queries, or other services. By defining rate limits, you can prevent task runs from exceeding a certain number of requests per time window (e.g., per second, minute, or hour), ensuring efficient resource utilization and avoiding overloading external services. The state of active rate limits can be viewed in the dashboard in the `Rate Limit` resource tab. ## Dynamic vs Static Rate Limits Hatchet offers two patterns for Rate Limiting task runs: 1. [Dynamic Rate Limits](#dynamic-rate-limits): Allows for complex rate limiting scenarios, such as per-user limits, by using `input` or `additional_metadata` keys to upsert a limit at runtime. 2. [Static Rate Limits](#static-rate-limits): Allows for simple rate limiting for resources known prior to runtime (e.g., external APIs). ## Dynamic Rate Limits Dynamic rate limits are ideal for complex scenarios where rate limits need to be partitioned by resources that are only known at runtime. This pattern is especially useful for: 1. Rate limiting individual users or tenants 2. Implementing variable rate limits based on subscription tiers or user roles 3. Dynamically adjusting limits based on real-time system load or other factors ### How It Works 1. Define the dynamic rate limit key with a CEL (Common Expression Language) Expression on the key, referencing either `input` or `additional_metadata`. 2. Provide this key as part of the workflow trigger or event `input` or `additional_metadata` at runtime. 3. Hatchet will create or update the rate limit based on the provided key and enforce it for the step run. > **Info:** Note: Dynamic keys are a shared resource, this means the same rendered cel on > multiple steps will be treated as one global rate limit. ### Declaring and Consuming Dynamic Rate Limits #### Ruby > Note: `dynamic_key` must be a CEL expression. `units` and `limits` can be either an integer or a CEL expression. We can add one or more rate limits to a task by adding the `rate_limits` configuration to the task definition. ```python @rate_limit_workflow.task( rate_limits=[ RateLimit( dynamic_key="input.user_id", units=1, limit=10, duration=RateLimitDuration.MINUTE, ) ] ) def step_2(input: RateLimitInput, ctx: Context) -> None: print("executed step_2") ``` #### Tab 2 > Note: `dynamicKey` must be a CEL expression. `units` and `limit` can be either an integer or a CEL expression. We can add one or more rate limits to a task by adding the `rate_limits` configuration to the task definition. ```typescript const task2 = hatchet.task({ name: 'task2', fn: (input: { userId: string }) => { console.log('executed task2 for user: ', input.userId); }, rateLimits: [ { dynamicKey: 'input.userId', units: 1, limit: 10, duration: RateLimitDuration.MINUTE, }, ], }); ``` #### Tab 3 > Note: Go requires both a key and KeyExpr be set and the LimitValueExpr must be a CEL. ```go userUnits := 1 userLimit := "10" duration := types.Minute dynamicTask := client.NewStandaloneTask("task2", func(ctx hatchet.Context, input APIRequest) (string, error) { log.Printf("executed task2 for user: %s", input.UserID) return "completed", nil }, hatchet.WithRateLimits(&types.RateLimit{ Key: "input.userId", Units: &userUnits, LimitValueExpr: &userLimit, Duration: &duration, }), ) ``` #### Tab 4 ```ruby RATE_LIMIT_WORKFLOW.task( :step_2, rate_limits: [ Hatchet::RateLimit.new( dynamic_key: "input.user_id", units: 1, limit: 10, duration: :minute ) ] ) do |input, ctx| puts "executed step_2" end ``` ## Static Rate Limits Static Rate Limits (formerly known as Global Rate Limits) are defined as part of your worker startup lifecycle prior to runtime. This model provides a single "source of truth" for pre-defined resources such as: 1. External API resources that have a rate limit across all users or tenants 2. Database connection pools with a maximum number of concurrent connections 3. Shared computing resources with limited capacity ### How It Works 1. Declare static rate limits using the `put_rate_limit` method in the `Admin` client before starting your worker. 2. Specify the units of consumption for a specific rate limit key in each step definition using the `rate_limits` configuration. 3. Hatchet enforces the defined rate limits by tracking the number of units consumed by each step run across all workflow runs. If a step run exceeds the rate limit, Hatchet re-queues the step run until the rate limit is no longer exceeded. ### Declaring Static Limits Define the static rate limits that can be consumed by any step run across all workflow runs using the `put_rate_limit` method in the `Admin` client within your code. #### Ruby ```python RATE_LIMIT_KEY = "test-limit" hatchet.rate_limits.put(RATE_LIMIT_KEY, 2, RateLimitDuration.SECOND) ``` #### Tab 2 {" "} ```typescript hatchet.ratelimits.upsert({ key: 'api-service-rate-limit', limit: 10, duration: RateLimitDuration.SECOND, }); ``` #### Tab 3 ```go err = client.RateLimits().Upsert(features.CreateRatelimitOpts{ Key: RATE_LIMIT_KEY, Limit: 10, Duration: types.Second, }) if err != nil { log.Fatalf("failed to create rate limit: %v", err) } ``` #### Tab 4 ```ruby def main HATCHET.rate_limits.put(RATE_LIMIT_KEY, 2, :second) worker = HATCHET.worker( "rate-limit-worker", slots: 10, workflows: [RATE_LIMIT_WORKFLOW] ) worker.start end ``` ### Consuming Static Rate Limits With your rate limit key defined, specify the units of consumption for a specific key in each step definition by adding the `rate_limits` configuration to your step definition in your workflow. #### Ruby ```python RATE_LIMIT_KEY = "test-limit" @rate_limit_workflow.task(rate_limits=[RateLimit(static_key=RATE_LIMIT_KEY, units=1)]) def step_1(input: RateLimitInput, ctx: Context) -> None: print("executed step_1") ``` #### Tab 2 ```typescript const RATE_LIMIT_KEY = 'api-service-rate-limit'; const task1 = hatchet.task({ name: 'task1', rateLimits: [ { staticKey: RATE_LIMIT_KEY, units: 1, }, ], fn: (input) => { console.log('executed task1'); }, }); ``` #### Tab 3 ```go units := 1 staticTask := client.NewStandaloneTask("task1", func(ctx hatchet.Context, input APIRequest) (string, error) { log.Println("executed task1") return "completed", nil }, hatchet.WithRateLimits(&types.RateLimit{ Key: RATE_LIMIT_KEY, Units: &units, }), ) ``` #### Tab 4 ```ruby RATE_LIMIT_KEY = "test-limit" RATE_LIMIT_WORKFLOW.task( :step_1, rate_limits: [Hatchet::RateLimit.new(static_key: RATE_LIMIT_KEY, units: 1)] ) do |input, ctx| puts "executed step_1" end ``` ### Limiting Workflow Runs To rate limit an entire workflow run, it's recommended to specify the rate limit configuration on the entry step (i.e., the first step in the workflow). This will gate the execution of all downstream steps in the workflow. --- # Assigning priority to tasks in Hatchet Hatchet allows you to assign different `priority` values to your tasks depending on how soon you want them to run. `priority` can be set to either `1`, `2`, or `3`, (`low`, `medium`, and `high`, respectively) with relatively higher values resulting in that task being picked up before others of the same type. **By default, runs in Hatchet have a priority of 1 (low) unless otherwise specified.** Priority only affects multiple runs of a _single_ workflow. If you have two different workflows (A and B) and set A to globally have a priority of 3, and B to globally have a priority of 1, this does _not_ guarantee that if there is one task from A and one from B in the queue, that A's task will be run first. However, _within_ A, if you enqueue one task with priority 3 and one with priority 1, the priority 3 task will be run first. A couple of common use cases for assigning priorities are things like: 1. Having high-priority (e.g. paying, new, etc.) customers be prioritized over lower-priority ones, allowing them to get faster turnaround times on their tasks. 2. Having tasks triggered via your API run with higher priority than the same tasks triggered by a cron. ## Setting priority for a task or workflow There are a few different ways to set priorities for tasks or workflows in Hatchet. ### Workflow-level default priority First, you can set a default priority at the workflow level: #### Ruby ```python DEFAULT_PRIORITY = 1 SLEEP_TIME = 0.25 priority_workflow = hatchet.workflow( name="PriorityWorkflow", default_priority=DEFAULT_PRIORITY, ) ``` #### Tab 2 ```typescript export const priorityWf = hatchet.workflow({ name: 'priorityWf', defaultPriority: Priority.LOW, }); ``` #### Tab 3 ```go workflow := client.NewWorkflow( "priority", hatchet.WithWorkflowDefaultPriority(features.RunPriorityLow), ) ``` #### Tab 4 ```ruby DEFAULT_PRIORITY = 1 SLEEP_TIME = 0.25 PRIORITY_WORKFLOW = HATCHET.workflow( name: "PriorityWorkflow", default_priority: DEFAULT_PRIORITY ) PRIORITY_WORKFLOW.task(:priority_task) do |input, ctx| puts "Priority: #{ctx.priority}" sleep SLEEP_TIME end ``` This will assign the same default priority to all runs of this workflow (and all of the workflow's corresponding tasks), but will have no effect without also setting run-level priorities, since every run will use the same default. ### Priority-on-trigger When you trigger a run, you can set the priority of the triggered run to override its default priority. #### Ruby ```python low_prio = priority_workflow.run_no_wait( options=TriggerWorkflowOptions( ## πŸ‘€ Adding priority and key to metadata to show them in the dashboard priority=1, additional_metadata={"priority": "low", "key": 1}, ) ) high_prio = priority_workflow.run_no_wait( options=TriggerWorkflowOptions( ## πŸ‘€ Adding priority and key to metadata to show them in the dashboard priority=3, additional_metadata={"priority": "high", "key": 1}, ) ) ``` #### Tab 2 ```typescript const run = priority.run(new Date(Date.now() + 60 * 60 * 1000), { priority: Priority.HIGH }); ``` #### Tab 3 ```go ref, err := client.RunNoWait( context.Background(), workflow.GetName(), PriorityInput{}, hatchet.WithRunPriority(features.RunPriorityLow), ) if err != nil { return err } ``` #### Tab 4 ```ruby low_prio = PRIORITY_WORKFLOW.run_no_wait( {}, options: Hatchet::TriggerWorkflowOptions.new( priority: 1, additional_metadata: { "priority" => "low", "key" => 1 } ) ) high_prio = PRIORITY_WORKFLOW.run_no_wait( {}, options: Hatchet::TriggerWorkflowOptions.new( priority: 3, additional_metadata: { "priority" => "high", "key" => 1 } ) ) ``` Similarly, you can also assign a priority to scheduled and cron workflows. #### Ruby ```python schedule = priority_workflow.schedule( run_at=datetime.now(tz=timezone.utc) + timedelta(minutes=1), options=ScheduleTriggerWorkflowOptions(priority=3), ) cron = priority_workflow.create_cron( cron_name="my-scheduled-cron", expression="0 * * * *", priority=3, ) ``` #### Tab 2 ```typescript const scheduled = priority.schedule( new Date(Date.now() + 60 * 60 * 1000), {}, { priority: Priority.HIGH } ); const delayed = priority.delay(60 * 60 * 1000, {}, { priority: Priority.HIGH }); const cron = priority.cron( `daily-cron-${Math.random()}`, '0 0 * * *', {}, { priority: Priority.HIGH } ); ``` #### Tab 3 ```go priority := features.RunPriorityHigh schedule, err := client.Schedules().Create( context.Background(), workflow.GetName(), features.CreateScheduledRunTrigger{ Priority: &priority, }, ) if err != nil { return err } cron, err := client.Crons().Create( context.Background(), workflow.GetName(), features.CreateCronTrigger{ Priority: &priority, }, ) if err != nil { return err } ``` #### Tab 4 ```ruby schedule = PRIORITY_WORKFLOW.schedule( Time.now + 60, options: Hatchet::TriggerWorkflowOptions.new(priority: 3) ) cron = PRIORITY_WORKFLOW.create_cron( "my-scheduled-cron", "0 * * * *", input: {}, ) ``` In these cases, the priority set on the trigger will override the default priority, so these runs will be processed ahead of lower-priority ones. --- # Task Orchestration Not only can you run a single task in Hatchet, but you can also orchestrate multiple tasks together based on a shape that you define. For example, you can run a task that depends on the output of another task, or you can run a task that waits for a certain condition to be met before running. 1. [Declarative Workflow Design (DAGs)](./dags.mdx) -- which is a way to declaratively define the sequence and dependencies of tasks in a workflow when you know the dependencies ahead of time. 2. [Procedural Child Spawning](./child-spawning.mdx) -- which is a way to orchestrate tasks in a workflow when you don't know the dependencies ahead of time or when the dependencies are dynamic. ## Flow Controls In addition to coordinating the execution of tasks, Hatchet also provides a set of flow control primitives that allow you to orchestrate tasks in a workflow. This allows you to run only what your service can handle at any given time. 1. [Worker Slots](./workers.mdx#understanding-slots) -- which is a way to control the number of tasks that can be executed concurrently on a given compute process. 2. [Concurrency Control](./concurrency.mdx) -- which is a global way to control the concurrent execution of tasks based on a specific key. 3. [Rate Limiting](./rate-limits.mdx) -- which is a global way to control the rate of task execution based on time period. --- # Declarative Workflow Design (DAGs) Hatchet workflows are designed in a **Directed Acyclic Graph (DAG)** format, where each task is a node in the graph, and the dependencies between tasks are the edges. This structure ensures that workflows are organized, predictable, and free from circular dependencies. By defining the sequence and dependencies of tasks upfront, you can easily understand the actual runtime state as compared to the expected state when debugging or troubleshooting. ## Defining a Workflow Start by declaring a workflow with a name. The workflow object can declare additional workflow-level configuration options which we'll cover later. The returned object is an instance of the `Workflow` class, which is the primary interface for interacting with the workflow (i.e. [running](./run-with-results.mdx), [enqueuing](./run-no-wait.mdx), [scheduling](./scheduled-runs.mdx), etc). #### Python ```python dag_workflow = hatchet.workflow(name="DAGWorkflow") ``` #### Typescript ```typescript // First, we declare the workflow export const dag = hatchet.workflow({ name: 'simple', }); ``` #### Go ```go workflow := client.NewWorkflow("dag-workflow") ``` #### Ruby ```ruby DAG_WORKFLOW = HATCHET.workflow(name: "DAGWorkflow") ``` The Workflow return object can be interacted with in the same way as a [task](./your-first-task.mdx), however, it can only take a subset of options which are applied at the task level. ## Defining a Task Now that we have a workflow, we can define a task to be executed as part of the workflow. Tasks are defined by calling the `task` method on the workflow object. The `task` method takes a name and a function that defines the task's behavior. The function will receive the workflow's input and return the task's output. Tasks also accept a number of other configuration options, which are covered elsewhere in our documentation. #### Python In Python, the `task` method is a decorator, which is used like this to wrap a function: ```python @dag_workflow.task(execution_timeout=timedelta(seconds=5)) def step1(input: EmptyModel, ctx: Context) -> StepOutput: return StepOutput(random_number=random.randint(1, 100)) ``` The function takes two arguments: `input`, which is a Pydantic model, and `ctx`, which is the Hatchet `Context` object. We'll discuss both of these more later. > **Info:** In the internals of Hatchet, the task is called using _positional arguments_, meaning that you can name `input` and `ctx` whatever you like. > > For instance, `def task_1(foo: EmptyModel, bar: Context) -> None:` is perfectly valid. #### Typescript ```typescript // Next, we declare the tasks bound to the workflow const toLower = dag.task({ name: 'to-lower', fn: (input) => { return { TransformedMessage: input.Message.toLowerCase(), }; }, }); ``` The `fn` argument is a function that takes the workflow's input and a context object. The context object contains information about the workflow run (e.g. the run ID, the workflow's input, etc). It can be synchronous or asynchronous. #### Go ```go step1 := workflow.NewTask("step-1", func(ctx hatchet.Context, input Input) (StepOutput, error) { return StepOutput{ Step: 1, Result: input.Value * 2, }, nil }) ``` #### Ruby ```ruby STEP1 = DAG_WORKFLOW.task(:step1, execution_timeout: 5) do |input, ctx| { "random_number" => rand(1..100) } end STEP2 = DAG_WORKFLOW.task(:step2, execution_timeout: 5) do |input, ctx| { "random_number" => rand(1..100) } end ``` ## Building a DAG with Task Dependencies The power of Hatchet's workflow design comes from connecting tasks into a DAG structure. Tasks can specify dependencies (parents) which must complete successfully before the task can start. #### Python ```python @dag_workflow.task(execution_timeout=timedelta(seconds=5)) async def step2(input: EmptyModel, ctx: Context) -> StepOutput: return StepOutput(random_number=random.randint(1, 100)) @dag_workflow.task(parents=[step1, step2]) async def step3(input: EmptyModel, ctx: Context) -> RandomSum: one = ctx.task_output(step1).random_number two = ctx.task_output(step2).random_number return RandomSum(sum=one + two) ``` #### Typescript ```typescript dag.task({ name: 'reverse', parents: [toLower], fn: async (input, ctx) => { const lower = await ctx.parentOutput(toLower); return { Original: input.Message, Transformed: lower.TransformedMessage.split('').reverse().join(''), }; }, }); ``` #### Go ```go step2 := workflow.NewTask("step-2", func(ctx hatchet.Context, input Input) (StepOutput, error) { // Get output from step 1 var step1Output StepOutput if err := ctx.ParentOutput(step1, &step1Output); err != nil { return StepOutput{}, err } return StepOutput{ Step: 2, Result: step1Output.Result + 10, }, nil }, hatchet.WithParents(step1)) ``` #### Ruby ```ruby DAG_WORKFLOW.task(:step3, parents: [STEP1, STEP2]) do |input, ctx| one = ctx.task_output(STEP1)["random_number"] two = ctx.task_output(STEP2)["random_number"] { "sum" => one + two } end DAG_WORKFLOW.task(:step4, parents: [STEP1, :step3]) do |input, ctx| puts( "executed step4", Time.now.strftime("%H:%M:%S"), input.inspect, ctx.task_output(STEP1).inspect, ctx.task_output(:step3).inspect ) { "step4" => "step4" } end ``` ## Accessing Parent Task Outputs As shown in the examples above, tasks can access outputs from their parent tasks using the context object: #### Python ```python @dag_workflow.task(execution_timeout=timedelta(seconds=5)) async def step2(input: EmptyModel, ctx: Context) -> StepOutput: return StepOutput(random_number=random.randint(1, 100)) @dag_workflow.task(parents=[step1, step2]) async def step3(input: EmptyModel, ctx: Context) -> RandomSum: one = ctx.task_output(step1).random_number two = ctx.task_output(step2).random_number return RandomSum(sum=one + two) ``` #### Typescript ```typescript dag.task({ name: 'task-with-parent-output', parents: [toLower], fn: async (input, ctx) => { const lower = await ctx.parentOutput(toLower); return { Original: input.Message, Transformed: lower.TransformedMessage.split('').reverse().join(''), }; }, }); ``` #### Go ```go // Inside a task with parent dependencies var parentOutput ParentOutputType err := ctx.ParentOutput(parentTask, &parentOutput) if err != nil { return nil, err } ``` #### Ruby ```ruby DAG_WORKFLOW.task(:step3, parents: [STEP1, STEP2]) do |input, ctx| one = ctx.task_output(STEP1)["random_number"] two = ctx.task_output(STEP2)["random_number"] { "sum" => one + two } end DAG_WORKFLOW.task(:step4, parents: [STEP1, :step3]) do |input, ctx| puts( "executed step4", Time.now.strftime("%H:%M:%S"), input.inspect, ctx.task_output(STEP1).inspect, ctx.task_output(:step3).inspect ) { "step4" => "step4" } end ``` ## Running a Workflow You can run workflows directly or enqueue them for asynchronous execution. All the same methods for running a task are available for workflows! #### Python ```python dag_workflow.run() ``` #### Typescript ```typescript const input = { Message: 'Hello, World!' }; // Run workflow and wait for the result const result = await simple.run(input); // Enqueue workflow to be executed asynchronously const runReference = await simple.runNoWait(input); ``` #### Go ```go // Run workflow and wait for the result result, err := simple.Run(ctx, input) // Enqueue workflow to be executed asynchronously runID, err := simple.RunNoWait(ctx, input) ``` #### Ruby ```ruby result = DAG_WORKFLOW.run puts result ``` --- ## Introduction Hatchet V1 introduces the ability to add conditions to tasks in your workflows that determine whether or not a task should be run, based on a number of conditions. Conditions unlock a number of new ways to solve problems with Hatchet, such as: 1. A workflow that reads a feature flag, and then decides how to progress based on its value. In this case, you'd have two tasks that use parent conditions, where one task runs if the flag value is e.g. `True`, while the other runs if it's `False`. 2. Any type of human-in-the-loop workflow, where you want to wait for a human to e.g. approve something before continuing the dag. ## Types of Conditions There are three types of `Condition`s in Hatchet V1: 1. Sleep conditions, which sleep for a specified duration before continuing 2. Event conditions, which wait for an event (and optionally a CEL expression evaluated on the payload of that event) before deciding how to continue 3. Parent conditions, which wait for a parent task to complete and then decide how to progress based on its output. ## Or Groups Conditions can also be combined using an `Or` operator into groups of conditions (called "or groups") where at least one must be satisfied in order for the group to evaluate to `True`. An "or group" behaves like a boolean `OR` operator, where the group evaluates to `True` if at least one of its conditions is `True`. Or groups are an extremely powerful feature because they let you express arbitrarily complex sets of conditions in [conjunctive normal form](https://en.wikipedia.org/wiki/Conjunctive_normal_form) (CNF) for determining when your tasks should run and when they should not. As a simple example, consider the following conditions: - **Condition A**: Checking if the output of a parent task is greater than 50 - **Condition B**: Sleeping for 30 seconds - **Condition C**: Receiving the `payment:processed` event You might want to progress in your workflow if A _or_ B and C. In this case, we can express this set of conditions in CNF as `A or B` AND `A or C` where both `A or B` and `A or C` are or groups. ## Usage Conditions can be used at task _declaration_ time in three ways: 1. They can be used in a `wait_for` fashion, where a task will wait for the conditions to evaluate to `True` before being run. 2. They can be used in a `skip_if` fashion, where a task will be skipped if the conditions evaluate to `True`. 3. They can be used in a `cancel_if` fashion, where a task will be cancelled if the conditions evaluate to `True`. ### `wait_for` Declaring a task with conditions to `wait_for` will cause the task to wait before starting for until its conditions evaluate to `True`. For instance, if you use `wait_for` with a 60 second sleep, the workflow will wait for 60 seconds before triggering the task. Similar, if the task is waiting for an event, it will wait until the event is fired before continuing. ### `skip_if` Declaring a task with conditions to `skip_if` will cause the task to be skipped if the conditions evaluate to `True`. For instance, if you use a parent condition to check if the output of a parent task is equal to some value, the task will be skipped if that condition evaluates to `True`. ### `cancel_if` Declaring a task with conditions to `cancel_if` will cause the task to be cancelled if the conditions evaluate to `True`. For instance, if you use a parent condition to check if the output of a parent task is equal to some value, the task will be cancelled if that condition evaluates to `True`. > **Warning:** A task cancelled by a `cancel_if` operator will behave the same as any other > cancellation in Hatchet, meaning that downstream tasks will be cancelled as > well. ## Example Workflow In this example, we're going to build the following workflow: ![Branching DAG Workflow](/branching-dag.png) Note the branching logic (`left_branch` and `right_branch`), as well as the use of skips and waits. To get started, let's declare the workflow. #### Python ```python import random from datetime import timedelta from pydantic import BaseModel from hatchet_sdk import ( Context, EmptyModel, Hatchet, ParentCondition, SleepCondition, UserEventCondition, or_, ) hatchet = Hatchet(debug=True) class StepOutput(BaseModel): random_number: int class RandomSum(BaseModel): sum: int task_condition_workflow = hatchet.workflow(name="TaskConditionWorkflow") ``` #### Typescript ```typescript import { Or, SleepCondition, UserEventCondition } from '@hatchet/v1/conditions'; import { ParentCondition } from '@hatchet/v1/conditions/parent-condition'; import { Context } from '@hatchet/v1/client/worker/context'; import { hatchet } from '../hatchet-client'; export const taskConditionWorkflow = hatchet.workflow({ name: 'TaskConditionWorkflow', }); ``` #### Go ```go workflow := client.NewWorkflow("conditional-workflow") ``` #### Ruby ```ruby require "hatchet-sdk" HATCHET = Hatchet::Client.new(debug: true) unless defined?(HATCHET) TASK_CONDITION_WORKFLOW = HATCHET.workflow(name: "TaskConditionWorkflow") ``` Next, we'll start adding tasks to our workflow. First, we'll add a basic task that outputs a random number: #### Python ```python @task_condition_workflow.task() def start(input: EmptyModel, ctx: Context) -> StepOutput: return StepOutput(random_number=random.randint(1, 100)) ``` #### Typescript ```typescript const start = taskConditionWorkflow.task({ name: 'start', fn: () => { return { randomNumber: Math.floor(Math.random() * 100) + 1, }; }, }); ``` #### Go ```go start := workflow.NewTask("start", func(ctx hatchet.Context, input WorkflowInput) (StepOutput, error) { randomNum := rand.Intn(100) + 1 //nolint:gosec // This is a demo log.Printf("Starting workflow for process %s with random number: %d", input.ProcessID, randomNum) return StepOutput{ StepName: "start", RandomNumber: randomNum, ProcessedAt: time.Now().Format(time.RFC3339), }, nil }) ``` #### Ruby ```ruby COND_START = TASK_CONDITION_WORKFLOW.task(:start) do |input, ctx| { "random_number" => rand(1..100) } end ``` Next, we'll add a task to the workflow that's a child of the first task, but it has a `wait_for` condition that sleeps for 10 seconds. #### Python ```python @task_condition_workflow.task( parents=[start], wait_for=[SleepCondition(timedelta(seconds=10))] ) def wait_for_sleep(input: EmptyModel, ctx: Context) -> StepOutput: return StepOutput(random_number=random.randint(1, 100)) ``` #### Typescript ```typescript const waitForSleep = taskConditionWorkflow.task({ name: 'waitForSleep', parents: [start], waitFor: [new SleepCondition('10s')], fn: () => { return { randomNumber: Math.floor(Math.random() * 100) + 1, }; }, }); ``` #### Go ```go waitForSleep := workflow.NewTask("wait-for-sleep", func(ctx hatchet.Context, input WorkflowInput) (StepOutput, error) { return StepOutput{ RandomNumber: rand.Intn(100) + 1, }, nil }, hatchet.WithParents(start), hatchet.WithWaitFor(hatchet.SleepCondition(10*time.Second)), ) ``` #### Ruby ```ruby WAIT_FOR_SLEEP = TASK_CONDITION_WORKFLOW.task( :wait_for_sleep, parents: [COND_START], wait_for: [Hatchet::SleepCondition.new(10)] ) do |input, ctx| { "random_number" => rand(1..100) } end ``` This task will first wait for the parent task to complete, and then it'll sleep for 10 seconds before executing and returning another random number. Next, we'll add a task that will be skipped on an event: #### Python ```python @task_condition_workflow.task( parents=[start], wait_for=[SleepCondition(timedelta(seconds=30))], skip_if=[UserEventCondition(event_key="skip_on_event:skip")], ) def skip_on_event(input: EmptyModel, ctx: Context) -> StepOutput: return StepOutput(random_number=random.randint(1, 100)) ``` #### Typescript ```typescript const skipOnEvent = taskConditionWorkflow.task({ name: 'skipOnEvent', parents: [start], waitFor: [new SleepCondition('10s')], skipIf: [new UserEventCondition('skip_on_event:skip', 'true')], fn: () => { return { randomNumber: Math.floor(Math.random() * 100) + 1, }; }, }); ``` #### Go ```go // Task that waits for either 10 seconds or a user event skipOnEvent := workflow.NewTask("skip-on-event", func(ctx hatchet.Context, input WorkflowInput) (StepOutput, error) { log.Printf("Skip on event task completed for process %s", input.ProcessID) return StepOutput{ StepName: "skip-on-event", RandomNumber: rand.Intn(50) + 1, //nolint:gosec // This is a demo ProcessedAt: time.Now().Format(time.RFC3339), }, nil }, hatchet.WithParents(start), hatchet.WithWaitFor(hatchet.SleepCondition(10*time.Second)), hatchet.WithSkipIf(hatchet.UserEventCondition("process:skip", "true")), ) ``` #### Ruby ```ruby SKIP_ON_EVENT = TASK_CONDITION_WORKFLOW.task( :skip_on_event, parents: [COND_START], wait_for: [Hatchet::SleepCondition.new(30)], skip_if: [Hatchet::UserEventCondition.new(event_key: "skip_on_event:skip")] ) do |input, ctx| { "random_number" => rand(1..100) } end ``` In this case, our task will wait for a 30 second sleep, and then it will be skipped if the `skip_on_event:skip` is fired. Next, let's add some branching logic. Here we'll add two more tasks, a left and right branch. #### Python ```python @task_condition_workflow.task( parents=[wait_for_sleep], skip_if=[ ParentCondition( parent=wait_for_sleep, expression="output.random_number > 50", ) ], ) def left_branch(input: EmptyModel, ctx: Context) -> StepOutput: return StepOutput(random_number=random.randint(1, 100)) @task_condition_workflow.task( parents=[wait_for_sleep], skip_if=[ ParentCondition( parent=wait_for_sleep, expression="output.random_number <= 50", ) ], ) def right_branch(input: EmptyModel, ctx: Context) -> StepOutput: return StepOutput(random_number=random.randint(1, 100)) ``` #### Typescript ```typescript const leftBranch = taskConditionWorkflow.task({ name: 'leftBranch', parents: [waitForSleep], skipIf: [new ParentCondition(waitForSleep, 'output.randomNumber > 50')], fn: () => { return { randomNumber: Math.floor(Math.random() * 100) + 1, }; }, }); const rightBranch = taskConditionWorkflow.task({ name: 'rightBranch', parents: [waitForSleep], skipIf: [new ParentCondition(waitForSleep, 'output.randomNumber <= 50')], fn: () => { return { randomNumber: Math.floor(Math.random() * 100) + 1, }; }, }); ``` #### Go ```go // Left branch - only runs if start's random number <= 50 leftBranch := workflow.NewTask("left-branch", func(ctx hatchet.Context, input WorkflowInput) (StepOutput, error) { log.Printf("Left branch executing for process %s", input.ProcessID) return StepOutput{ StepName: "left-branch", RandomNumber: rand.Intn(25) + 1, //nolint:gosec // This is a demo ProcessedAt: time.Now().Format(time.RFC3339), }, nil }, hatchet.WithParents(waitForSleep), hatchet.WithSkipIf(hatchet.ParentCondition(start, "output.randomNumber > 50")), ) // Right branch - only runs if start's random number > 50 rightBranch := workflow.NewTask("right-branch", func(ctx hatchet.Context, input WorkflowInput) (StepOutput, error) { log.Printf("Right branch executing for process %s", input.ProcessID) return StepOutput{ StepName: "right-branch", RandomNumber: rand.Intn(25) + 26, //nolint:gosec // This is a demo ProcessedAt: time.Now().Format(time.RFC3339), }, nil }, hatchet.WithParents(waitForSleep), hatchet.WithSkipIf(hatchet.ParentCondition(start, "output.randomNumber <= 50")), ) ``` #### Ruby ```ruby LEFT_BRANCH = TASK_CONDITION_WORKFLOW.task( :left_branch, parents: [WAIT_FOR_SLEEP], skip_if: [ Hatchet::ParentCondition.new( parent: WAIT_FOR_SLEEP, expression: "output.random_number > 50" ) ] ) do |input, ctx| { "random_number" => rand(1..100) } end RIGHT_BRANCH = TASK_CONDITION_WORKFLOW.task( :right_branch, parents: [WAIT_FOR_SLEEP], skip_if: [ Hatchet::ParentCondition.new( parent: WAIT_FOR_SLEEP, expression: "output.random_number <= 50" ) ] ) do |input, ctx| { "random_number" => rand(1..100) } end ``` These two tasks use the `ParentCondition` and `skip_if` together to check if the output of an upstream task was greater or less than `50`, respectively. Only one of the two tasks will run: whichever one's condition evaluates to `True`. Next, we'll add a task that waits for an event: #### Python ```python @task_condition_workflow.task( parents=[start], wait_for=[ or_( SleepCondition(duration=timedelta(minutes=1)), UserEventCondition(event_key="wait_for_event:start"), ) ], ) def wait_for_event(input: EmptyModel, ctx: Context) -> StepOutput: return StepOutput(random_number=random.randint(1, 100)) ``` #### Typescript ```typescript const waitForEvent = taskConditionWorkflow.task({ name: 'waitForEvent', parents: [start], waitFor: [Or(new SleepCondition('1m'), new UserEventCondition('wait_for_event:start', 'true'))], fn: () => { return { randomNumber: Math.floor(Math.random() * 100) + 1, }; }, }); ``` #### Go ```go // Task that might be skipped based on external event skipableTask := workflow.NewTask("skipable-task", func(ctx hatchet.Context, input WorkflowInput) (StepOutput, error) { log.Printf("Skipable task executing for process %s", input.ProcessID) return StepOutput{ StepName: "skipable-task", RandomNumber: rand.Intn(10) + 1, //nolint:gosec // This is a demo ProcessedAt: time.Now().Format(time.RFC3339), }, nil }, hatchet.WithParents(start), hatchet.WithWaitFor(hatchet.SleepCondition(3*time.Second)), hatchet.WithSkipIf(hatchet.UserEventCondition("process:skip", "true")), ) ``` #### Ruby ```ruby WAIT_FOR_EVENT = TASK_CONDITION_WORKFLOW.task( :wait_for_event, parents: [COND_START], wait_for: [ Hatchet.or_( Hatchet::SleepCondition.new(60), Hatchet::UserEventCondition.new(event_key: "wait_for_event:start") ) ] ) do |input, ctx| { "random_number" => rand(1..100) } end ``` And finally, we'll add the last task, which collects all of its parents and sums them up. #### Python ```python @task_condition_workflow.task( parents=[ start, wait_for_sleep, wait_for_event, skip_on_event, left_branch, right_branch, ], ) def sum(input: EmptyModel, ctx: Context) -> RandomSum: one = ctx.task_output(start).random_number two = ctx.task_output(wait_for_event).random_number three = ctx.task_output(wait_for_sleep).random_number four = ( ctx.task_output(skip_on_event).random_number if not ctx.was_skipped(skip_on_event) else 0 ) five = ( ctx.task_output(left_branch).random_number if not ctx.was_skipped(left_branch) else 0 ) six = ( ctx.task_output(right_branch).random_number if not ctx.was_skipped(right_branch) else 0 ) return RandomSum(sum=one + two + three + four + five + six) ``` Note that in this task, we rely on `ctx.was_skipped` to determine if a task was skipped. #### Typescript ```typescript taskConditionWorkflow.task({ name: 'sum', parents: [start, waitForSleep, waitForEvent, skipOnEvent, leftBranch, rightBranch], fn: async (_, ctx: Context) => { const one = (await ctx.parentOutput(start)).randomNumber; const two = (await ctx.parentOutput(waitForEvent)).randomNumber; const three = (await ctx.parentOutput(waitForSleep)).randomNumber; const four = (await ctx.parentOutput(skipOnEvent))?.randomNumber || 0; const five = (await ctx.parentOutput(leftBranch))?.randomNumber || 0; const six = (await ctx.parentOutput(rightBranch))?.randomNumber || 0; return { sum: one + two + three + four + five + six, }; }, }); ``` #### Go ```go _ = workflow.NewTask("summarize", func(ctx hatchet.Context, input WorkflowInput) (SumOutput, error) { var total int var summary string // Get start output var startOutput StepOutput if err := ctx.ParentOutput(start, &startOutput); err != nil { return SumOutput{}, err } total += startOutput.RandomNumber summary = fmt.Sprintf("Start: %d", startOutput.RandomNumber) // Get wait for sleep output var waitForSleepOutput StepOutput if err := ctx.ParentOutput(waitForSleep, &waitForSleepOutput); err != nil { return SumOutput{}, err } total += waitForSleepOutput.RandomNumber summary += fmt.Sprintf(", Wait for sleep: %d", waitForSleepOutput.RandomNumber) // Get skip on event output var skipOnEventOutput StepOutput if err := ctx.ParentOutput(skipOnEvent, &skipOnEventOutput); err != nil { return SumOutput{}, err } total += skipOnEventOutput.RandomNumber summary += fmt.Sprintf(", Skip on event: %d", skipOnEventOutput.RandomNumber) // Try to get left branch output (might be skipped) var leftOutput StepOutput if err := ctx.ParentOutput(leftBranch, &leftOutput); err == nil { total += leftOutput.RandomNumber summary += fmt.Sprintf(", Left: %d", leftOutput.RandomNumber) } else { summary += ", Left: skipped" } // Try to get right branch output (might be skipped) var rightOutput StepOutput if err := ctx.ParentOutput(rightBranch, &rightOutput); err == nil { total += rightOutput.RandomNumber summary += fmt.Sprintf(", Right: %d", rightOutput.RandomNumber) } else { summary += ", Right: skipped" } // Try to get skipable task output (might be skipped) var skipableOutput StepOutput if err := ctx.ParentOutput(skipableTask, &skipableOutput); err == nil { total += skipableOutput.RandomNumber summary += fmt.Sprintf(", Skipable: %d", skipableOutput.RandomNumber) } else { summary += ", Skipable: skipped" } log.Printf("Final summary for process %s: total=%d, %s", input.ProcessID, total, summary) return SumOutput{ Total: total, Summary: summary, }, nil }, hatchet.WithParents( start, waitForSleep, skipOnEvent, leftBranch, rightBranch, skipableTask, )) ``` #### Ruby ```ruby TASK_CONDITION_WORKFLOW.task( :sum, parents: [COND_START, WAIT_FOR_SLEEP, WAIT_FOR_EVENT, SKIP_ON_EVENT, LEFT_BRANCH, RIGHT_BRANCH] ) do |input, ctx| one = ctx.task_output(COND_START)["random_number"] two = ctx.task_output(WAIT_FOR_EVENT)["random_number"] three = ctx.task_output(WAIT_FOR_SLEEP)["random_number"] four = ctx.was_skipped?(SKIP_ON_EVENT) ? 0 : ctx.task_output(SKIP_ON_EVENT)["random_number"] five = ctx.was_skipped?(LEFT_BRANCH) ? 0 : ctx.task_output(LEFT_BRANCH)["random_number"] six = ctx.was_skipped?(RIGHT_BRANCH) ? 0 : ctx.task_output(RIGHT_BRANCH)["random_number"] { "sum" => one + two + three + four + five + six } end ``` This workflow demonstrates the power of the new conditional logic in Hatchet V1. You can now create complex workflows that are much more dynamic than workflows in the previous version of Hatchet, and do all of it declaratively (rather than, for example, by dynamically spawning child workflows based on conditions in the parent). --- # On-Failure Tasks The on-failure task is a special type of task in Hatchet that allows you to define a function to be executed in the event that any task in the main task fails. This feature enables you to handle errors, perform cleanup tasks, or trigger notifications in case of task failure within a workflow. ## Defining an on-failure task You can define an on-failure task on your task the same as you'd define any other task: #### Python ```python # This workflow will fail because the step will throw an error # we define an onFailure step to handle this case on_failure_wf = hatchet.workflow(name="OnFailureWorkflow") @on_failure_wf.task(execution_timeout=timedelta(seconds=1)) def step1(input: EmptyModel, ctx: Context) -> None: # πŸ‘€ this step will always raise an exception raise Exception(ERROR_TEXT) # πŸ‘€ After the workflow fails, this special step will run @on_failure_wf.on_failure_task() def on_failure(input: EmptyModel, ctx: Context) -> dict[str, str]: # πŸ‘€ we can do things like perform cleanup logic # or notify a user here # πŸ‘€ Fetch the errors from upstream step runs from the context print(ctx.task_run_errors) return {"status": "success"} ``` Note: Only one on-failure task can be defined per workflow. #### Typescript ```typescript export const failureWorkflow = hatchet.workflow({ name: 'always-fail', }); failureWorkflow.task({ name: 'always-fail', fn: async () => { throw new Error('intentional failure'); }, }); failureWorkflow.onFailure({ name: 'on-failure', fn: async (input, ctx) => { console.log('onFailure for run:', ctx.workflowRunId()); return { 'on-failure': 'success', }; }, }); ``` #### Go ```go multiStepWorkflow.OnFailure(func(ctx hatchet.Context, input FailureInput) (FailureHandlerOutput, error) { log.Printf("Multi-step failure handler called for input: %s", input.Message) stepErrors := ctx.StepRunErrors() var errorDetails string for stepName, errorMsg := range stepErrors { log.Printf("Multi-step: Step '%s' failed with error: %s", stepName, errorMsg) errorDetails += stepName + ": " + errorMsg + "; " } // Access successful step outputs for cleanup var step1Output TaskOutput if err := ctx.StepOutput("first-step", &step1Output); err == nil { log.Printf("First step completed successfully with: %s", step1Output.Message) } return FailureHandlerOutput{ FailureHandled: true, ErrorDetails: "Multi-step workflow failed: " + errorDetails, OriginalInput: input.Message, }, nil }) ``` #### Ruby ```ruby # This workflow will fail because the step will throw an error # we define an onFailure step to handle this case ON_FAILURE_WF = HATCHET.workflow(name: "OnFailureWorkflow") ON_FAILURE_WF.task(:step1, execution_timeout: 1) do |input, ctx| # This step will always raise an exception raise ERROR_TEXT end # After the workflow fails, this special step will run ON_FAILURE_WF.on_failure_task do |input, ctx| # We can do things like perform cleanup logic # or notify a user here # Fetch the errors from upstream step runs from the context puts ctx.task_run_errors.inspect { "status" => "success" } end ``` In the examples above, the on-failure task will be executed only if any of the main tasks in the workflow fail. ## Use Cases Some common use cases for the on-failure task include: - Performing cleanup tasks after a task failure in a workflow - Sending notifications or alerts about the failure - Logging additional information for debugging purposes - Triggering a compensating action or a fallback task By utilizing the on-failure task, you can handle workflow failures gracefully and ensure that necessary actions are taken in case of errors. --- # Procedural Child Task Spawning Hatchet supports the dynamic creation of child tasks during a parent task's execution. This powerful feature enables: - **Complex, reusable task hierarchies** - Break down complex tasks into simpler, reusable components - **Fan-out parallelism** - Scale out to multiple parallel tasks dynamically - **Dynamic task behavior** - Create loops and conditional branches at runtime - **Agent-based tasks** - Support AI agents that can create new tasks based on analysis results or loop until a condition is met ## Creating Parent and Child Tasks To implement child task spawning, you first need to create both parent and child task definitions. #### Python First, we'll declare a couple of tasks for the parent and child: ```python class ParentInput(BaseModel): n: int = 100 class ChildInput(BaseModel): a: str parent_wf = hatchet.workflow(name="FanoutParent", input_validator=ParentInput) child_wf = hatchet.workflow(name="FanoutChild", input_validator=ChildInput) @parent_wf.task(execution_timeout=timedelta(minutes=5)) async def spawn(input: ParentInput, ctx: Context) -> dict[str, Any]: print("spawning child") result = await child_wf.aio_run_many( [ child_wf.create_bulk_run_item( input=ChildInput(a=str(i)), options=TriggerWorkflowOptions( additional_metadata={"hello": "earth"}, key=f"child{i}" ), ) for i in range(input.n) ], ) print(f"results {result}") return {"results": result} ``` We also created a step on the parent task that spawns the child tasks. Now, we'll add a couple of steps to the child task: ```python @child_wf.task() async def process(input: ChildInput, ctx: Context) -> dict[str, str]: print(f"child process {input.a}") return {"status": input.a} @child_wf.task(parents=[process]) async def process2(input: ChildInput, ctx: Context) -> dict[str, str]: process_output = ctx.task_output(process) a = process_output["status"] return {"status2": a + "2"} ``` And that's it! The fanout parent will run and spawn the child, and then will collect the results from its steps. #### Typescript ```typescript import { hatchet } from '../hatchet-client'; // (optional) Define the input type for the workflow export type ChildInput = { Message: string; }; export type ParentInput = { Message: string; }; export const child = hatchet.workflow({ name: 'child', }); export const child1 = child.task({ name: 'child1', fn: (input: ChildInput, ctx) => { ctx.logger.info('hello from the child1', { hello: 'moon' }); return { TransformedMessage: input.Message.toLowerCase(), }; }, }); export const child2 = child.task({ name: 'child2', fn: (input: ChildInput, ctx) => { ctx.logger.info('hello from the child2'); return { TransformedMessage: input.Message.toLowerCase(), }; }, }); export const child3 = child.task({ name: 'child3', parents: [child1, child2], fn: (input: ChildInput, ctx) => { ctx.logger.info('hello from the child3'); return { TransformedMessage: input.Message.toLowerCase(), }; }, }); export const parent = hatchet.task({ name: 'parent', fn: async (input: ParentInput, ctx) => { const c = await ctx.runChild(child, { Message: input.Message, }); return { TransformedMessage: 'not implemented', }; }, }); ``` #### Go ```go type ParentInput struct { Count int `json:"count"` } type ParentOutput struct { Sum int `json:"sum"` } func Parent(client *hatchet.Client) *hatchet.StandaloneTask { return client.NewStandaloneTask("parent-task", func(ctx hatchet.Context, input ParentInput) (ParentOutput, error) { log.Printf("Parent workflow spawning %d child workflows", input.Count) // Spawn multiple child workflows and collect results sum := 0 for i := 0; i < input.Count; i++ { log.Printf("Spawning child workflow %d/%d", i+1, input.Count) // Spawn child workflow and wait for result childResult, err := Child(client).Run(ctx, ChildInput{ Value: i + 1, }) if err != nil { return ParentOutput{}, fmt.Errorf("failed to spawn child workflow %d: %w", i, err) } var childOutput ChildOutput err = childResult.Into(&childOutput) if err != nil { return ParentOutput{}, fmt.Errorf("failed to get child workflow result: %w", err) } sum += childOutput.Result log.Printf("Child workflow %d completed with result: %d", i+1, childOutput.Result) } log.Printf("All child workflows completed. Total sum: %d", sum) return ParentOutput{ Sum: sum, }, nil }, ) } type ChildInput struct { Value int `json:"value"` } type ChildOutput struct { Result int `json:"result"` } func Child(client *hatchet.Client) *hatchet.StandaloneTask { return client.NewStandaloneTask("child-task", func(ctx hatchet.Context, input ChildInput) (ChildOutput, error) { return ChildOutput{ Result: input.Value * 2, }, nil }, ) } ``` #### Ruby ```ruby FANOUT_PARENT_WF = HATCHET.workflow(name: "FanoutParent") FANOUT_CHILD_WF = HATCHET.workflow(name: "FanoutChild") FANOUT_PARENT_WF.task(:spawn, execution_timeout: 300) do |input, ctx| puts "spawning child" n = input["n"] || 100 result = FANOUT_CHILD_WF.run_many( n.times.map do |i| FANOUT_CHILD_WF.create_bulk_run_item( input: { "a" => i.to_s }, options: Hatchet::TriggerWorkflowOptions.new( additional_metadata: { "hello" => "earth" }, key: "child#{i}" ) ) end ) puts "results #{result}" { "results" => result } end ``` ```ruby FANOUT_CHILD_PROCESS = FANOUT_CHILD_WF.task(:process) do |input, ctx| puts "child process #{input['a']}" { "status" => input["a"] } end FANOUT_CHILD_WF.task(:process2, parents: [FANOUT_CHILD_PROCESS]) do |input, ctx| process_output = ctx.task_output(FANOUT_CHILD_PROCESS) a = process_output["status"] { "status2" => "#{a}2" } end ``` ## Running Child Tasks To spawn and run a child task from a parent task, use the appropriate method for your language: #### Python ```python from examples.fanout.worker import ChildInput, child_wf # πŸ‘€ example: run this inside of a parent task to spawn a child child_wf.run( ChildInput(a="b"), ) ``` #### Typescript ```typescript export const parentSingleChild = hatchet.task({ name: 'parent-single-child', fn: async () => { const childRes = await child.run({ N: 1 }); return { Result: childRes.Value, }; }, }); ``` #### Go ```go // Inside a parent task childResult, err := childWorkflow.Run(hCtx, ChildInput{ Value: 1, }) if err != nil { return err } ``` #### Ruby ```ruby FANOUT_CHILD_WF.run({ "a" => "b" }) ``` ## Parallel Child Task Execution As shown in the examples above, you can spawn multiple child tasks in parallel: #### Python ```python async def run_child_workflows(n: int) -> list[dict[str, Any]]: return await child_wf.aio_run_many( [ child_wf.create_bulk_run_item( input=ChildInput(a=str(i)), ) for i in range(n) ] ) ``` #### Typescript ```typescript type ParentInput = { N: number; }; export const parent = hatchet.task({ name: 'parent', fn: async (input: ParentInput, ctx) => { const n = input.N; const promises = []; for (let i = 0; i < n; i++) { promises.push(child.run({ N: i })); } const childRes = await Promise.all(promises); const sum = childRes.reduce((acc, curr) => acc + curr.Value, 0); return { Result: sum, }; }, }); ``` #### Go ```go // Run multiple child tasks in parallel using goroutines var wg sync.WaitGroup var mu sync.Mutex results := make([]*ChildOutput, 0, n) wg.Add(n) for i := 0; i < n; i++ { go func(index int) { defer wg.Done() result, err := childWorkflow.Run(hCtx, ChildInput{Value: index}) if err != nil { return } var childOutput ChildOutput err = result.Into(&childOutput) if err != nil { return } mu.Lock() results = append(results, &childOutput) mu.Unlock() }(i) } wg.Wait() ``` #### Ruby ```ruby def run_child_workflows(n) FANOUT_CHILD_WF.run_many( n.times.map do |i| FANOUT_CHILD_WF.create_bulk_run_item( input: { "a" => i.to_s } ) end ) end ``` ## Use Cases for Child Workflows Child workflows are ideal for: 1. **Dynamic fan-out processing** - When the number of parallel tasks is determined at runtime 2. **Reusable workflow components** - Create modular workflows that can be reused across different parent workflows 3. **Resource-intensive operations** - Spread computation across multiple workers 4. **Agent-based systems** - Allow AI agents to spawn new workflows based on their reasoning 5. **Long-running operations** - Break down long operations into smaller, trackable units of work ## Error Handling with Child Workflows When working with child workflows, it's important to properly handle errors. Here are patterns for different languages: #### Python ```python try: child_wf.run( ChildInput(a="b"), ) except Exception as e: print(f"Child workflow failed: {e}") ``` #### Typescript ```typescript export const withErrorHandling = hatchet.task({ name: 'parent-error-handling', fn: async () => { try { const childRes = await child.run({ N: 1 }); return { Result: childRes.Value, }; } catch (error) { // decide how to proceed here return { Result: -1, }; } }, }); ``` #### Go ```go result, err := childWorkflow.Run(hCtx, ChildInput{Value: 1}) if err != nil { // Handle error from child workflow fmt.Printf("Child workflow failed: %v\n", err) // Decide how to proceed - retry, skip, or fail the parent } ``` #### Ruby ```ruby begin FANOUT_CHILD_WF.run({ "a" => "b" }) rescue StandardError => e puts "Child workflow failed: #{e.message}" end ``` --- # Additional Metadata Hatchet allows you to attach arbitrary key-value string pairs to events and task runs, which can be used for filtering, searching, or any other lookup purposes. This additional metadata is not part of the event payload or task input data but provides supplementary information for better organization and discoverability. > **Info:** Additional metadata can be added to `Runs`, `Scheduled Runs`, `Cron Runs`, and > `Events`. The data is propagated from parents to children or from events to > runs. You can attach additional metadata when pushing events or triggering task runs using the Hatchet client libraries: #### Event Push #### Ruby ```python hatchet.event.push( "user:create", {"userId": "1234", "should_skip": False}, options=PushEventOptions( additional_metadata={"source": "api"} # Arbitrary key-value pair ), ) ``` #### Tab 2 ```typescript const withMetadata = await hatchet.events.push( 'user:create', { test: 'test', }, { additionalMetadata: { source: 'api', // Arbitrary key-value pair }, } ); ``` #### Tab 3 ```go err = client.Events().Push( context.Background(), "user:create", Input{Message: "hello"}, v0Client.WithEventMetadata( map[string]string{"version": "1.0.0"}, ), ) if err != nil { log.Fatalf("failed to push event: %v", err) } ``` #### Tab 4 ```ruby HATCHET.event.push( "user:create", { "userId" => "1234", "should_skip" => false }, additional_metadata: { "source" => "api" } ) ``` #### Task Run Trigger #### Ruby ```python simple.run( options=TriggerWorkflowOptions( additional_metadata={"source": "api"} # Arbitrary key-value pair ) ) ``` #### Tab 2 ```typescript const withMetadata = simple.run( { Message: 'HeLlO WoRlD', }, { additionalMetadata: { source: 'api', // Arbitrary key-value pair }, } ); ``` #### Tab 3 ```go _, err = client.Run( context.Background(), "my-workflow", Input{Message: "hello"}, hatchet.WithRunMetadata( map[string]string{"version": "1.0.0"}, ), ) if err != nil { log.Fatalf("failed to run workflow: %v", err) } ``` #### Tab 4 ```ruby SIMPLE.run( {}, options: Hatchet::TriggerWorkflowOptions.new( additional_metadata: { "source" => "api" } ) ) ``` ## Filtering in the Dashboard Once you've attached additional metadata to events or task runs, this data will be available in the Event and Task Run list views in the Hatchet dashboard. You can use the filter input field to search for events or task runs based on the additional metadata key-value pairs you've attached. For example, you can filter events by the `source` metadata keys to quickly find events originating from a specific source or environment. ![Blocks](/addl-meta.gif) ## Use Cases Some common use cases for additional metadata include: - Tagging events or task runs with environment information (e.g., `production`, `staging`, `development`) - Specifying the source or origin of events (e.g., `api`, `webhook`, `manual`) - Categorizing events or task runs based on business-specific criteria (e.g., `priority`, `region`, `product`) By leveraging additional metadata, you can enhance the organization, searchability, and discoverability of your events and task runs within Hatchet. --- # Durable Execution ## Introduction **Durable execution** refers to the ability of a function to easily recover from failures or interruptions. In Hatchet, we refer to a function with this ability as a **durable task**. Durable tasks are essentially tasks that store intermediate results in a durable event log - in other words, they're a fancy cache. For an in-depth look at how durable execution works, have a look at [this blog post](https://hatchet.run/blog/durable-execution). This is especially useful in cases such as: 1. Tasks which need to always run to completion, even if the underlying machine crashes or the task is interrupted. 2. Situations where a task needs to wait for an very long amount of time for something to complete before continuing. Running a durable task will not take up a slot on the main worker, so is a strong candidate for e.g. fanout tasks that spawn a large number of children and then wait for their results. 3. Waiting for a potentially long time for an event, such as human-in-the-loop tasks where we might not get human feedback for hours or days. ## How Hatchet Runs Durable Tasks When you register a durable task, Hatchet will start a second worker in the background for running durable tasks. If you don't register any durable workflows, the durable worker will not be started. Similarly, if you start a worker with _only_ durable workflows, the "main" worker will not start, and _only_ the durable worker will run. The durable worker will show up as a second worker in the Hatchet Dashboard. Tasks that are declared as being durable (using `durable_task` instead of `task`), will receive a `DurableContext` object instead of a normal `Context,` which extends the `Context` by providing some additional tools for working with durable execution features. ## Example Task Now that we know a bit about how Hatchet handles durable execution, let's build a task. We'll start by declaring a task that will run durably, on the "durable worker". ```python durable_workflow = hatchet.workflow(name="DurableWorkflow") ``` Here, we've declared a Hatchet task just like any other. Now, we can add some tasks to it: ```python EVENT_KEY = "durable-example:event" SLEEP_TIME = 5 @durable_workflow.task() async def ephemeral_task(input: EmptyModel, ctx: Context) -> None: print("Running non-durable task") @durable_workflow.durable_task() async def durable_task(input: EmptyModel, ctx: DurableContext) -> dict[str, str]: print("Waiting for sleep") await ctx.aio_sleep_for(duration=timedelta(seconds=SLEEP_TIME)) print("Sleep finished") print("Waiting for event") await ctx.aio_wait_for( "event", UserEventCondition(event_key=EVENT_KEY, expression="true"), ) print("Event received") return { "status": "success", } ``` We've added two tasks to our workflow. The first is a normal, "ephemeral" task, which does not leverage any of Hatchet's durable features. Second, we've added a durable task, which we've created by using the `durable_task` method of the `Workflow`, as opposed to the `task` method. Note that the `durable_task` we've defined takes a `DurableContext`, as opposed to a regular `Context`, as its second argument. The `DurableContext` is a subclass of the regular `Context` that adds some additional methods for working with durable tasks. The durable task first waits for a sleep condition. Once the sleep has completed, it continues processing until it hits the second `wait_for`. At this point, it needs to wait for an event condition. Once it receives the event, the task prints `Event received` and completes. If this task is interrupted at any time, it will continue from where it left off. But more importantly, if an event comes into the system while the task is waiting, the task will immediately process the event. And if the task is interrupted while in a sleeping state, it will respect the original sleep duration on restart -- that is, if the task calls `ctx.aio_sleep_for` for 24 hours and is interrupted after 23 hours, it will only sleep for 1 more hour on restart. ### Or Groups Similarly to in [conditional workflows](./conditional-workflows.mdx#or-groups), durable tasks can also use or groups in the wait conditions they use. For example, you could wait for either an event or a sleep (whichever comes first) like this: ```python @durable_workflow.durable_task() async def wait_for_or_group_1( _i: EmptyModel, ctx: DurableContext ) -> dict[str, str | int]: start = time.time() wait_result = await ctx.aio_wait_for( uuid4().hex, or_( SleepCondition(timedelta(seconds=SLEEP_TIME)), UserEventCondition(event_key=EVENT_KEY), ), ) key = list(wait_result.keys())[0] event_id = list(wait_result[key].keys())[0] return { "runtime": int(time.time() - start), "key": key, "event_id": event_id, } ``` --- ## Durable Events Durable events are a feature of **durable tasks** which allow tasks to wait for an event to occur before continuing. This is useful in cases where a task needs to wait for a long time for an external action. Durable events are useful, because even if your task is interrupted and requeued while waiting for an event, the event will still be processed. When the task is resumed, it will read the event from the durable event log and continue processing. ## Declaring durable events Durable events are declared using the context method `WaitFor` (or utility method `WaitForEvent`) on the `DurableContext` object. #### Python ```python @hatchet.durable_task(name="DurableEventTask") async def durable_event_task(input: EmptyModel, ctx: DurableContext) -> None: res = await ctx.aio_wait_for( "event", UserEventCondition(event_key="user:update"), ) print("got event", res) ``` #### Typescript ```typescript export const durableEvent = hatchet.durableTask({ name: 'durable-event', executionTimeout: '10m', fn: async (_, ctx) => { const res = ctx.waitFor({ eventKey: 'user:update', }); console.log('res', res); return { Value: 'done', }; }, }); ``` #### Go ```go task := client.NewStandaloneDurableTask("long-running-task", func(ctx hatchet.DurableContext, input DurableInput) (DurableOutput, error) { log.Printf("Starting task, will sleep for %d seconds", input.Delay) if _, err := ctx.WaitForEvent("user:updated", ""); err != nil { return DurableOutput{}, err } log.Printf("Finished waiting for event, processing message: %s", input.Message) return DurableOutput{ ProcessedAt: time.Now().Format(time.RFC3339), Message: "Processed: " + input.Message, }, nil }) ``` #### Ruby ```ruby DURABLE_EVENT_TASK = HATCHET.durable_task(name: "DurableEventTask") do |input, ctx| res = ctx.wait_for( "event", Hatchet::UserEventCondition.new(event_key: "user:update") ) puts "got event #{res}" end DURABLE_EVENT_TASK_WITH_FILTER = HATCHET.durable_task(name: "DurableEventWithFilterTask") do |input, ctx| ``` ## Durable event filters Durable events can be filtered using [CEL](https://github.com/google/cel-spec) expressions. For example, to only receive `user:update` events for a specific user, you can use the following filter: #### Python ```python res = await ctx.aio_wait_for( "event", UserEventCondition( event_key="user:update", expression="input.user_id == '1234'" ), ) ``` #### Typescript ```typescript const res = ctx.waitFor({ eventKey: 'user:update', expression: "input.userId == '1234'", }); ``` #### Go ```go if _, err := ctx.WaitForEvent("user:updated", "input.status_code == 200"); err != nil { return DurableOutput{}, err } ``` #### Ruby ```ruby res = ctx.wait_for( "event", Hatchet::UserEventCondition.new( event_key: "user:update", expression: "input.user_id == '1234'" ) ) puts "got event #{res}" end ``` --- ## Durable Sleep Durable sleep is a feature of **durable tasks** which allow tasks to pause execution for a specified amount of time. Instead of a regular `sleep` call in your task, durable sleep is guaranteed to only sleep for the specified amount of time after the first time it was called. For example, say you'd like to send a notification to a user after 24 hours. With a regular `sleep`, if the task is interrupted after 23 hours, it will restart and call `sleep` for 24 hours again. This means that the task will sleep for 47 hours in total, which is not what you want. With durable sleep, the task will respect the original sleep duration on restart -- that is, if the task calls `ctx.aio_sleep_for` for 24 hours and is interrupted after 23 hours, it will only sleep for 1 more hour on restart. ## Using durable sleep Durable sleep can be used by calling the `SleepFor` method on the `DurableContext` object. This method takes a duration as an argument and will sleep for that duration. #### Python ```python @hatchet.durable_task(name="DurableSleepTask") async def durable_sleep_task(input: EmptyModel, ctx: DurableContext) -> None: res = await ctx.aio_sleep_for(timedelta(seconds=5)) print("got result", res) ``` #### Typescript ```typescript durableSleep.durableTask({ name: 'durable-sleep', executionTimeout: '10m', fn: async (_, ctx) => { console.log('sleeping for 5s'); const sleepRes = await ctx.sleepFor('5s'); console.log('done sleeping for 5s', sleepRes); return { Value: 'done', }; }, }); ``` #### Go ```go task := client.NewStandaloneDurableTask("long-running-task", func(ctx hatchet.DurableContext, input DurableInput) (DurableOutput, error) { log.Printf("Starting task, will sleep for %d seconds", input.Delay) if _, err := ctx.SleepFor(time.Duration(input.Delay) * time.Second); err != nil { return DurableOutput{}, err } log.Printf("Finished sleeping, processing message: %s", input.Message) return DurableOutput{ ProcessedAt: time.Now().Format(time.RFC3339), Message: "Processed: " + input.Message, }, nil }) ``` #### Ruby ```ruby DURABLE_SLEEP_TASK = HATCHET.durable_task(name: "DurableSleepTask") do |input, ctx| res = ctx.sleep_for(duration: 5) puts "got result #{res}" end ``` --- ## Durable Execution Best Practices Durable tasks require a bit of extra work to ensure that they are not misused. An important concept in running a durable task is that the task should be **deterministic**. This means that the task should always perform the same sequence of operations in between retries. The deterministic nature of durable tasks is what allows Hatchet to replay the task from the last checkpoint. If a task is not deterministic, it may produce different results on each retry, which can lead to unexpected behavior. ## Maintaining Determinism By following a few simple rules, you can ensure that your durable tasks are deterministic: 1. **Only call methods available on the `DurableContext`**: a very common way to introduce non-determinism is to call methods within your application code which produces side effects. If you need to call a method in your application code which fetches data from a database, calls any sort of i/o operation, or otherwise interacts with other systems, you should spawn those tasks as a **child task** or **child workflow** using `RunChild`. 2. **When updating durable tasks, always guarantee backwards compatibility**: if you change the order of operations in a durable task, you may break determinism. For example, if you call `SleepFor` followed by `WaitFor`, and then change the order of those calls, Hatchet will not be able to replay the task correctly. This is because the task may have already been checkpointed at the first call to `SleepFor`, and if you change the order of operations, the checkpoint is meaningless. ## Using DAGs instead of durable tasks [DAGs](./dags) are generally a much easier, more intuitive way to run a durable, deterministic workflow. DAGs are inherently deterministic, as their shape of work is predefined, and they cache intermediate results. If you are running simple workflows that can be represented as a DAG, you should use DAGs instead of durable tasks. DAGs also have conditional execution primitives which match the behavior of `SleepFor` and `WaitFor` in durable tasks. Durable tasks are useful if you need to run a workflow that is not easily represented as a DAG. --- # Timeouts in Hatchet Timeouts are an important concept in Hatchet that allow you to control how long a task is allowed to run before it is considered to have failed. This is useful for ensuring that your tasks don't run indefinitely and consume unnecessary resources. Timeouts in Hatchet are treated as failures and the task will be [retried](./retry-policies.mdx) if specified. There are two types of timeouts in Hatchet: 1. **Scheduling Timeouts** (Default 5m) - the time a task is allowed to wait in the queue before it is cancelled 2. **Execution Timeouts** (Default 60s) - the time a task is allowed to run before it is considered to have failed ## Timeout Format In Hatchet, timeouts are specified using a string in the format ``, where `` is an integer and `` is one of: - `s` for seconds - `m` for minutes - `h` for hours For example: - `10s` means 10 seconds - `4m` means 4 minutes - `1h` means 1 hour If no unit is specified, seconds are assumed. > **Info:** In the Python SDK, timeouts can also be specified as a `datetime.timedelta` > object. ### Task-Level Timeouts You can specify execution and scheduling timeouts for a task using the `execution_timeout` and `schedule_timeout` parameters when creating a task. #### Ruby ```python # πŸ‘€ Specify an execution timeout on a task @timeout_wf.task( execution_timeout=timedelta(seconds=5), schedule_timeout=timedelta(minutes=10) ) def timeout_task(input: EmptyModel, ctx: Context) -> dict[str, str]: time.sleep(30) return {"status": "success"} ``` #### Tab 2 ```typescript export const withTimeouts = hatchet.task({ name: 'with-timeouts', // time the task can wait in the queue before it is cancelled scheduleTimeout: '10s', // time the task can run before it is cancelled executionTimeout: '10s', fn: async (input: SimpleInput, ctx) => { // wait 15 seconds await sleep(15000); // get the abort controller const { abortController } = ctx; // if the abort controller is aborted, throw an error if (abortController.signal.aborted) { throw new Error('cancelled'); } return { TransformedMessage: input.Message.toLowerCase(), }; }, }); ``` #### Tab 3 ```go // Task that will timeout - sleeps for 10 seconds but has 3 second timeout _ = timeoutWorkflow.NewTask("timeout-task", func(ctx hatchet.Context, input TimeoutInput) (TimeoutOutput, error) { log.Printf("Starting task that will timeout. Message: %s", input.Message) // Sleep for 10 seconds (will be interrupted by timeout) time.Sleep(10 * time.Second) // This should not be reached due to timeout log.Println("Task completed successfully (this shouldn't be reached)") return TimeoutOutput{ Status: "completed", Completed: true, }, nil }, hatchet.WithExecutionTimeout(3*time.Second), // 3 second timeout ) ``` #### Tab 4 ```ruby # Specify an execution timeout on a task TIMEOUT_WF.task(:timeout_task, execution_timeout: 5, schedule_timeout: 600) do |input, ctx| sleep 30 { "status" => "success" } end REFRESH_TIMEOUT_WF = HATCHET.workflow(name: "RefreshTimeoutWorkflow") ``` In these tasks, both timeouts are specified, meaning: 1. If the task is not scheduled before the `schedule_timeout` is reached, it will be cancelled. 2. If the task does not complete before the `execution_timeout` is reached (after starting), it will be cancelled. > **Warning:** A timed out step does not guarantee that the step will be stopped immediately. > The step will be stopped as soon as the worker is able to stop the step. See > [cancellation](./cancellation.mdx) for more information. ## Refreshing Timeouts In some cases, you may need to extend the timeout for a step while it is running. This can be done using the `refreshTimeout` method provided by the step context (`ctx`). For example: #### Ruby ```python @refresh_timeout_wf.task(execution_timeout=timedelta(seconds=4)) def refresh_task(input: EmptyModel, ctx: Context) -> dict[str, str]: ctx.refresh_timeout(timedelta(seconds=10)) time.sleep(5) return {"status": "success"} ``` #### Tab 2 ```typescript export const refreshTimeout = hatchet.task({ name: 'refresh-timeout', executionTimeout: '10s', scheduleTimeout: '10s', fn: async (input: SimpleInput, ctx) => { // adds 15 seconds to the execution timeout ctx.refreshTimeout('15s'); await sleep(15000); // get the abort controller const { abortController } = ctx; // now this condition will not be met // if the abort controller is aborted, throw an error if (abortController.signal.aborted) { throw new Error('cancelled'); } return { TransformedMessage: input.Message.toLowerCase(), }; }, }); ``` #### Tab 3 ```go // Create workflow with timeout refresh example refreshTimeoutWorkflow := client.NewWorkflow("refresh-timeout-demo", hatchet.WithWorkflowDescription("Demonstrates timeout refresh functionality"), hatchet.WithWorkflowVersion("1.0.0"), ) // Task that refreshes its timeout to avoid timing out _ = refreshTimeoutWorkflow.NewTask("refresh-timeout-task", func(ctx hatchet.Context, input TimeoutInput) (TimeoutOutput, error) { log.Printf("Starting task with timeout refresh. Message: %s", input.Message) // Refresh timeout by 10 seconds log.Println("Refreshing timeout by 10 seconds...") err := ctx.RefreshTimeout("10s") if err != nil { log.Printf("Failed to refresh timeout: %v", err) return TimeoutOutput{ Status: "failed", Completed: false, }, err } // Now sleep for 5 seconds (should complete successfully) log.Println("Sleeping for 5 seconds...") time.Sleep(5 * time.Second) log.Println("Task completed successfully after timeout refresh") return TimeoutOutput{ Status: "completed", Completed: true, }, nil }, hatchet.WithExecutionTimeout(3*time.Second), // Initial 3 second timeout ) ``` #### Tab 4 ```ruby REFRESH_TIMEOUT_WF.task(:refresh_task, execution_timeout: 4) do |input, ctx| ctx.refresh_timeout(10) sleep 5 { "status" => "success" } end ``` In this example, the step initially would exceed its execution timeout. But before it does, we call the `refreshTimeout` method, which extends the timeout and allows it to complete. Importantly, refreshing a timeout is an additive operation - the new timeout is added to the existing timeout. So for instance, if the task originally had a timeout of `30s` and we call `refreshTimeout("15s")`, the new timeout will be `45s`. The `refreshTimeout` function can be called multiple times within a step to further extend the timeout as needed. ## Use Cases Timeouts are useful in a variety of scenarios: - Ensuring tasks don't run indefinitely and consume unnecessary resources - Failing tasks early if a critical step takes too long - Keeping tasks responsive by ensuring individual steps complete in a timely manner - Preventing infinite loops or hung processes from blocking the entire system For example, if you have a task that makes an external API call, you may want to set a timeout to ensure the task fails quickly if the API is unresponsive, rather than waiting indefinitely. By carefully considering timeouts for your tasks and steps, you can build more resilient and responsive systems with Hatchet. --- # Simple Task Retries Hatchet provides a simple and effective way to handle failures in your tasks using a retry policy. This feature allows you to specify the number of times a task should be retried if it fails, helping to improve the reliability and resilience of your tasks. > **Info:** Task-level retries can be added to both `Standalone Tasks` and `Workflow > Tasks`. ## How it works When a task fails (i.e. throws an error or returns a non-zero exit code), Hatchet can automatically retry the task based on the `retries` configuration defined in the task object. Here's how it works: 1. If a task fails and `retries` is set to a value greater than 0, Hatchet will catch the error and retry the task. 2. The task will be retried up to the specified number of times, with each retry being executed after a short delay to avoid overwhelming the system. 3. If the task succeeds during any of the retries, the task will continue as normal. 4. If the task continues to fail after exhausting all the specified retries, the task will be marked as failed. This simple retry mechanism can help to mitigate transient failures, such as network issues or temporary unavailability of external services, without requiring complex error handling logic in your task code. ## How to use task-level retries To enable retries for a task, simply add the `retries` property to the task object in your task definition: #### Python ```python @simple_workflow.task(retries=3) def always_fail(input: EmptyModel, ctx: Context) -> dict[str, str]: raise Exception("simple task failed") ``` #### Typescript ```typescript export const retries = hatchet.task({ name: 'retries', retries: 3, fn: async (_, ctx) => { throw new Error('intentional failure'); }, }); ``` #### Go ```go retries := client.NewStandaloneTask("retries-task", func(ctx hatchet.Context, input RetriesInput) (*RetriesResult, error) { return nil, errors.New("intentional failure") }, hatchet.WithRetries(3)) ``` #### Ruby ```ruby SIMPLE_RETRY_WORKFLOW.task(:always_fail, retries: 3) do |input, ctx| raise "simple task failed" end ``` You can add the `retries` property to any task, and Hatchet will handle the retry logic automatically. It's important to note that task-level retries are not suitable for all types of failures. For example, if a task fails due to a programming error or an invalid configuration, retrying the task will likely not resolve the issue. In these cases, you should fix the underlying problem in your code or configuration rather than relying on retries. Additionally, if a task interacts with external services or databases, you should ensure that the operation is idempotent (i.e. can be safely repeated without changing the result) before enabling retries. Otherwise, retrying the task could lead to unintended side effects or inconsistencies in your data. ## Accessing the Retry Count in a Running Task If you need to access the current retry count within a task, you can use the `retryCount` method available in the task context: #### Python ```python @simple_workflow.task(retries=3) def fail_twice(input: EmptyModel, ctx: Context) -> dict[str, str]: if ctx.retry_count < 2: raise Exception("simple task failed") return {"status": "success"} ``` #### Typescript ```typescript export const retriesWithCount = hatchet.task({ name: 'retriesWithCount', retries: 3, fn: async (_, ctx) => { // > Get the current retry count const retryCount = ctx.retryCount(); console.log(`Retry count: ${retryCount}`); if (retryCount < 2) { throw new Error('intentional failure'); } return { message: 'success', }; }, }); ``` #### Go ```go retriesWithCount := client.NewStandaloneTask("fail-twice-task", func(ctx hatchet.Context, input RetriesWithCountInput) (*RetriesWithCountResult, error) { // Get the current retry count retryCount := ctx.RetryCount() fmt.Printf("Retry count: %d\n", retryCount) if retryCount < 2 { return nil, errors.New("intentional failure") } return &RetriesWithCountResult{ Message: "success", }, nil }, hatchet.WithRetries(3)) ``` #### Ruby ```ruby SIMPLE_RETRY_WORKFLOW.task(:fail_twice, retries: 3) do |input, ctx| raise "simple task failed" if ctx.retry_count < 2 { "status" => "success" } end ``` ## Exponential Backoff Hatchet also supports exponential backoff for retries, which can be useful for handling failures in a more resilient manner. Exponential backoff increases the delay between retries exponentially, giving the failing service more time to recover before the next retry. #### Python ```python @backoff_workflow.task( retries=10, # πŸ‘€ Maximum number of seconds to wait between retries backoff_max_seconds=10, # πŸ‘€ Factor to increase the wait time between retries. # This sequence will be 2s, 4s, 8s, 10s, 10s, 10s... due to the maxSeconds limit backoff_factor=2.0, ) def backoff_task(input: EmptyModel, ctx: Context) -> dict[str, str]: if ctx.retry_count < 3: raise Exception("backoff task failed") return {"status": "success"} ``` #### Typescript ```typescript export const withBackoff = hatchet.task({ name: 'withBackoff', retries: 10, backoff: { // πŸ‘€ Maximum number of seconds to wait between retries maxSeconds: 10, // πŸ‘€ Factor to increase the wait time between retries. // This sequence will be 2s, 4s, 8s, 10s, 10s, 10s... due to the maxSeconds limit factor: 2, }, fn: async () => { throw new Error('intentional failure'); }, }); ``` #### Go ```go withBackoff := client.NewStandaloneTask("with-backoff-task", func(ctx hatchet.Context, input BackoffInput) (*BackoffResult, error) { return nil, errors.New("intentional failure") }, hatchet.WithRetries(3), hatchet.WithRetryBackoff(2, 10)) ``` #### Ruby ```ruby BACKOFF_WORKFLOW.task( :backoff_task, retries: 10, # Maximum number of seconds to wait between retries backoff_max_seconds: 10, # Factor to increase the wait time between retries. # This sequence will be 2s, 4s, 8s, 10s, 10s, 10s... due to the maxSeconds limit backoff_factor: 2.0 ) do |input, ctx| raise "backoff task failed" if ctx.retry_count < 3 { "status" => "success" } end ``` ## Bypassing Retry logic The Hatchet SDKs each expose a `NonRetryable` exception, which allows you to bypass pre-configured retry logic for the task. **If your task raises this exception, it will not be retried.** This allows you to circumvent the default retry behavior in instances where you don't want to or cannot safely retry. Some examples in which this might be useful include: 1. A task that calls an external API which returns a 4XX response code. 2. A task that contains a single non-idempotent operation that can fail but cannot safely be rerun on failure, such as a billing operation. 3. A failure that requires manual intervention to resolve. #### Python ```python @non_retryable_workflow.task(retries=1) def should_not_retry(input: EmptyModel, ctx: Context) -> None: raise NonRetryableException("This task should not retry") ``` #### Typescript ```typescript const shouldNotRetry = nonRetryableWorkflow.task({ name: 'should-not-retry', fn: () => { throw new NonRetryableError('This task should not retry'); }, retries: 1, }); ``` #### Go ```go retries := client.NewStandaloneTask("non-retryable-task", func(ctx hatchet.Context, input NonRetryableInput) (*NonRetryableResult, error) { return nil, worker.NewNonRetryableError(errors.New("intentional failure")) }, hatchet.WithRetries(3)) ``` #### Ruby ```ruby NON_RETRYABLE_WORKFLOW.task(:should_not_retry, retries: 1) do |input, ctx| raise Hatchet::NonRetryableError, "This task should not retry" end NON_RETRYABLE_WORKFLOW.task(:should_retry_wrong_exception_type, retries: 1) do |input, ctx| raise TypeError, "This task should retry because it's not a NonRetryableError" end NON_RETRYABLE_WORKFLOW.task(:should_not_retry_successful_task, retries: 1) do |input, ctx| # no-op end ``` In these cases, even though `retries` is set to a non-zero number (meaning the task would ordinarily retry), Hatchet will not retry. ## Conclusion Hatchet's task-level retry feature is a simple and effective way to handle transient failures in your tasks, improving the reliability and resilience of your tasks. By specifying the number of retries for each task, you can ensure that your tasks can recover from temporary issues without requiring complex error handling logic. Remember to use retries judiciously and only for tasks that are idempotent and can safely be repeated. For more advanced retry strategies, such as exponential backoff or circuit breaking, stay tuned for future updates to Hatchet's retry capabilities. --- # Bulk Cancellations and Replays V1 adds the ability to cancel or replay task runs in bulk, which you can now do either in the Hatchet Dashboard or programmatically via the SDKs and the REST API. There are two ways of bulk cancelling or replaying tasks in both cases: 1. You can provide a list of task run ids to cancel or replay, which will cancel or replay all of the tasks in the list. 2. You can provide a list of filters, similar to the list of filters on task runs in the Dashboard, and cancel or replay runs matching those filters. For instance, if you wanted to replay all failed runs of a `SimpleTask` from the past fifteen minutes that had the `foo` field in `additional_metadata` set to `bar`, you could apply those filters and replay all of the matching runs. ### Bulk Operations by Run Ids The first way to bulk cancel or replay runs is by providing a list of run ids. This is the most straightforward way to cancel or replay runs in bulk. #### Python > **Info:** In the Python SDK, the mechanics of bulk replaying and bulk cancelling tasks > are exactly the same. The only change would be replacing e.g. > `hatchet.runs.bulk_cancel` with `hatchet.runs.bulk_replay`. First, we'll start by fetching a task via the REST API. ```python from datetime import datetime, timedelta, timezone from hatchet_sdk import BulkCancelReplayOpts, Hatchet, RunFilter, V1TaskStatus hatchet = Hatchet() workflows = hatchet.workflows.list() assert workflows.rows workflow = workflows.rows[0] ``` Now that we have a task, we'll get runs for it, so that we can use them to bulk cancel by run id. ```python workflow_runs = hatchet.runs.list(workflow_ids=[workflow.metadata.id]) ``` And finally, we can cancel the runs in bulk. ```python workflow_run_ids = [workflow_run.metadata.id for workflow_run in workflow_runs.rows] bulk_cancel_by_ids = BulkCancelReplayOpts(ids=workflow_run_ids) hatchet.runs.bulk_cancel(bulk_cancel_by_ids) ``` > **Info:** Note that the Python SDK also exposes async versions of each of these methods: > > - `workflows.list` -> `await workflows.aio_list` > - `runs.list` -> `await runs.aio_list` > - `runs.bulk_cancel` -> `await runs.aio_bulk_cancel` #### Go > **Info:** Just like in the Python SDK, the mechanics of bulk replaying and bulk > cancelling tasks are exactly the same. First, we'll start by fetching a task via the REST API. ```python from datetime import datetime, timedelta, timezone from hatchet_sdk import BulkCancelReplayOpts, Hatchet, RunFilter, V1TaskStatus hatchet = Hatchet() workflows = hatchet.workflows.list() assert workflows.rows workflow = workflows.rows[0] ``` Now that we have a task, we'll get runs for it, so that we can use them to bulk cancel by run id. ```python workflow_runs = hatchet.runs.list(workflow_ids=[workflow.metadata.id]) ``` And finally, we can cancel the runs in bulk. ```python workflow_run_ids = [workflow_run.metadata.id for workflow_run in workflow_runs.rows] bulk_cancel_by_ids = BulkCancelReplayOpts(ids=workflow_run_ids) hatchet.runs.bulk_cancel(bulk_cancel_by_ids) ``` #### Ruby ```ruby hatchet = Hatchet::Client.new workflows = hatchet.workflows.list workflow = workflows.rows.first ``` ```ruby workflow_runs = hatchet.runs.list(workflow_ids: [workflow.metadata.id]) ``` ```ruby workflow_run_ids = workflow_runs.rows.map { |run| run.metadata.id } hatchet.runs.bulk_cancel(ids: workflow_run_ids) ``` ### Bulk Operations by Filters The second way to bulk cancel or replay runs is by providing a list of filters. This is the most powerful way to cancel or replay runs in bulk, as it allows you to cancel or replay all runs matching a set of arbitrary filters without needing to provide IDs for the runs in advance. #### Python The example below provides some filters you might use to cancel or replay runs in bulk. Importantly, these filters are very similar to the filters you can use in the Hatchet Dashboard to filter which task runs are displaying. ```python bulk_cancel_by_filters = BulkCancelReplayOpts( filters=RunFilter( since=datetime.today() - timedelta(days=1), until=datetime.now(tz=timezone.utc), statuses=[V1TaskStatus.RUNNING], workflow_ids=[workflow.metadata.id], additional_metadata={"key": "value"}, ) ) hatchet.runs.bulk_cancel(bulk_cancel_by_filters) ``` Running this request will cancel all task runs matching the filters provided. #### Go The example below provides some filters you might use to cancel or replay runs in bulk. Importantly, these filters are very similar to the filters you can use in the Hatchet Dashboard to filter which task runs are displaying. ```python bulk_cancel_by_filters = BulkCancelReplayOpts( filters=RunFilter( since=datetime.today() - timedelta(days=1), until=datetime.now(tz=timezone.utc), statuses=[V1TaskStatus.RUNNING], workflow_ids=[workflow.metadata.id], additional_metadata={"key": "value"}, ) ) hatchet.runs.bulk_cancel(bulk_cancel_by_filters) ``` Running this request will cancel all task runs matching the filters provided. #### Ruby ```ruby hatchet.runs.bulk_cancel( since: Time.now - 86_400, until_time: Time.now, statuses: ["RUNNING"], workflow_ids: [workflow.metadata.id], additional_metadata: { "key" => "value" } ) ``` # Manual Retries Hatchet provides a manual retry mechanism that allows you to handle failed task instances flexibly from the Hatchet dashboard. Navigate to the specific task in the Hatchet dashboard and click on the failed run. From there, you can inspect the details of the run, including the input data and the failure reason for each task. To retry a failed task, simply click on the task in the run details view and then click the "Replay" button. This will create a new instance of the task, starting from the failed task, and using the same input data as the original run. Manual retries give you full control over when and how to reprocess failed instances. For example, you may choose to wait until an external service is back online before retrying instances that depend on that service, or you may need to deploy a bug fix to your task code before retrying instances that were affected by the bug. ## A Note on Dead Letter Queues A dead letter queue (DLQ) is a messaging concept used to handle messages that cannot be processed successfully. In the context of task management, a DLQ can be used to store failed task instances that require manual intervention or further analysis. While Hatchet does not have a built-in dead letter queue feature, the persistence of failed task instances in the dashboard serves a similar purpose. By keeping a record of failed instances, Hatchet allows you to track and manage failures, perform root cause analysis, and take appropriate actions, such as modifying input data or updating your task code before manually retrying the failed instances. It's important to note that the term "dead letter queue" is more commonly associated with messaging systems like Apache Kafka or Amazon SQS, where unprocessed messages are automatically moved to a separate queue for manual handling. In Hatchet, the failed instances are not automatically moved to a separate queue but are instead persisted in the dashboard for manual management. --- # Sticky Worker Assignment (Beta) > **Info:** This feature is currently in beta and may be subject to change. Sticky assignment is a task property that allows you to specify that all child tasks should be assigned to the same worker for the duration of its execution. This can be useful in situations like when you need to maintain expensive local memory state across multiple tasks in a workflow or ensure that certain tasks are processed by the same worker for consistency. > **Warning:** This feature is only compatible with long lived workers, and not webhook > workers. ## Setting Sticky Assignment Sticky assignment is set on the task level by adding the `sticky` property to the task definition. When a task is marked as sticky, all steps within that task will be assigned to the same worker for the duration of the task execution. > **Warning:** While sticky assignment can be useful in certain scenarios, it can also > introduce potential bottlenecks if the assigned worker becomes unavailable, or > if local state is not maintained when the job is picked up. Be sure to > consider the implications of sticky assignment when designing your tasks and > have a plan in place to handle local state issues. There are two strategies for setting sticky assignment for [DAG](./dags.mdx) workflows: - `SOFT`: All tasks in the workflow will attempt to be assigned to the same worker, but if that worker is unavailable, it will be assigned to another worker. - `HARD`: All taks in the workflow will only be assigned to the same worker. If that worker is unavailable, the workflow run will not be assigned to another worker and will remain in a pending state until the original worker becomes available or timeout is reached. (See [Scheduling Timeouts](./timeouts.mdx#task-level-timeouts)) #### Ruby ```python sticky_workflow = hatchet.workflow( name="StickyWorkflow", # πŸ‘€ Specify a sticky strategy when declaring the workflow sticky=StickyStrategy.SOFT, ) @sticky_workflow.task() def step1a(input: EmptyModel, ctx: Context) -> dict[str, str | None]: return {"worker": ctx.worker.id()} @sticky_workflow.task() def step1b(input: EmptyModel, ctx: Context) -> dict[str, str | None]: return {"worker": ctx.worker.id()} ``` #### Tab 2 ```typescript export const sticky = hatchet.task({ name: 'sticky', retries: 3, sticky: StickyStrategy.SOFT, fn: async (_, ctx) => { // specify a child workflow to run on the same worker const result = await child.run( { N: 1, }, { sticky: true } ); return { result, }; }, }); ``` #### Tab 3 ```go func StickyDag(client *hatchet.Client) *hatchet.Workflow { stickyDag := client.NewWorkflow("sticky-dag", hatchet.WithWorkflowStickyStrategy(types.StickyStrategy_SOFT), ) _ = stickyDag.NewTask("sticky-task", func(ctx worker.HatchetContext, input StickyInput) (interface{}, error) { workerId := ctx.Worker().ID() return &StickyResult{ Result: workerId, }, nil }, ) _ = stickyDag.NewTask("sticky-task-2", func(ctx worker.HatchetContext, input StickyInput) (interface{}, error) { workerId := ctx.Worker().ID() return &StickyResult{ Result: workerId, }, nil }, ) return stickyDag } ``` #### Tab 4 ```ruby STICKY_WORKFLOW = HATCHET.workflow( name: "StickyWorkflow", # Specify a sticky strategy when declaring the workflow sticky: :soft ) STEP1A = STICKY_WORKFLOW.task(:step1a) do |input, ctx| { "worker" => ctx.worker.id } end STEP1B = STICKY_WORKFLOW.task(:step1b) do |input, ctx| { "worker" => ctx.worker.id } end ``` In this example, the `sticky` property is set to `SOFT`, which means that the task will attempt to be assigned to the same worker for the duration of its execution. If the original worker is unavailable, the task will be assigned to another worker. ## Sticky Child Tasks It is possible to spawn child tasks on the same worker as the parent task by setting the `sticky` property to `true` in the `run` method options. This can be useful when you need to maintain local state across multiple tasks or ensure that child tasks are processed by the same worker for consistency. However, the child task must: 1. Specify a `sticky` strategy in the child task's definition 2. Be registered with the same worker as the parent task If either condition is not met, an error will be thrown when the child task is spawned. #### Ruby ```python sticky_child_workflow = hatchet.workflow( name="StickyChildWorkflow", sticky=StickyStrategy.SOFT ) @sticky_workflow.task(parents=[step1a, step1b]) async def step2(input: EmptyModel, ctx: Context) -> dict[str, str | None]: ref = await sticky_child_workflow.aio_run_no_wait( options=TriggerWorkflowOptions(sticky=True) ) await ref.aio_result() return {"worker": ctx.worker.id()} @sticky_child_workflow.task() def child(input: EmptyModel, ctx: Context) -> dict[str, str | None]: return {"worker": ctx.worker.id()} ``` #### Tab 2 ```typescript export const sticky = hatchet.task({ name: 'sticky', retries: 3, sticky: StickyStrategy.SOFT, fn: async (_, ctx) => { // specify a child workflow to run on the same worker const result = await child.run( { N: 1, }, { sticky: true } ); return { result, }; }, }); ``` #### Tab 3 ```go func Sticky(client *hatchet.Client) *hatchet.StandaloneTask { sticky := client.NewStandaloneTask("sticky-task", func(ctx worker.HatchetContext, input StickyInput) (*StickyResult, error) { // Run a child workflow on the same worker childWorkflow := Child(client) childResult, err := childWorkflow.Run(ctx, ChildInput{N: 1}, hatchet.WithRunSticky(true)) if err != nil { return nil, err } var childOutput ChildResult err = childResult.Into(&childOutput) if err != nil { return nil, err } return &StickyResult{ Result: fmt.Sprintf("child-result-%s", childOutput.Result), }, nil }, ) return sticky } ``` #### Tab 4 ```ruby STICKY_CHILD_WORKFLOW = HATCHET.workflow( name: "StickyChildWorkflow", sticky: :soft ) STICKY_WORKFLOW.task(:step2, parents: [STEP1A, STEP1B]) do |input, ctx| ref = STICKY_CHILD_WORKFLOW.run_no_wait( options: Hatchet::TriggerWorkflowOptions.new(sticky: true) ) ref.result { "worker" => ctx.worker.id } end STICKY_CHILD_WORKFLOW.task(:child) do |input, ctx| { "worker" => ctx.worker.id } end ``` --- # Worker Affinity Assignment (Beta) > **Info:** This feature is currently in beta and may be subject to change. It is often desirable to assign workflows to specific workers based on certain criteria, such as worker capabilities, resource availability, or location. Worker affinity allows you to specify that a workflow should be assigned to a specific worker based on worker label state. Labels can be set dynamically on workers to reflect their current state, such as a specific model loaded into memory or specific disk requirements. Specific tasks can then specify desired label state to ensure that workflows are assigned to workers that meet specific criteria. If no worker meets the specified criteria, the task run will remain in a pending state until a suitable worker becomes available or the task is cancelled. (See [Scheduling Timeouts](./timeouts.mdx#task-level-timeouts)) ## Specifying Worker Labels Labels can be set on workers when they are registered with Hatchet. Labels are key-value pairs that can be used to specify worker capabilities, resource availability, or other criteria that can be used to match workflows to workers. Values can be strings or numbers, and multiple labels can be set on a worker. #### Ruby ```python worker = hatchet.worker( "affinity-worker", slots=10, labels={ "model": "fancy-ai-model-v2", "memory": 512, }, workflows=[affinity_worker_workflow], ) worker.start() ``` #### Tab 2 ```typescript const workflow = hatchet.workflow({ name: 'affinity-workflow', description: 'test', }); workflow.task({ name: 'step1', fn: async (_, ctx) => { const results: Promise[] = []; // eslint-disable-next-line no-plusplus for (let i = 0; i < 50; i++) { const result = await ctx.spawnWorkflow(childWorkflow.id, {}); results.push(result.output); } console.log('Spawned 50 child workflows'); console.log('Results:', await Promise.all(results)); return { step1: 'step1 results!' }; }, }); ``` #### Tab 3 ```go w, err := worker.NewWorker( worker.WithClient( c, ), worker.WithLabels(map[string]interface{}{ "model": "fancy-ai-model-v2", "memory": 512, }), ) ``` #### Tab 4 ```ruby def main worker = HATCHET.worker( "affinity-worker", slots: 10, labels: { "model" => "fancy-ai-model-v2", "memory" => 512 }, workflows: [AFFINITY_WORKER_WORKFLOW] ) worker.start end ``` ## Specifying Step Desired Labels You can specify desired worker label state for specific tasks in a workflow by setting the `desired_worker_labels` property on the task definition. This property is an object where the keys are the label keys and the values are objects with the following properties: - `value`: The desired value of the label - `comparator` (default: `EQUAL`): The comparison operator to use when matching the label value. - `EQUAL`: The label value must be equal to the desired value - `NOT_EQUAL`: The label value must not be equal to the desired value - `GREATER_THAN`: The label value must be greater than the desired value - `GREATER_THAN_OR_EQUAL`: The label value must be greater than or equal to the desired value - `LESS_THAN`: The label value must be less than the desired value - `LESS_THAN_OR_EQUAL`: The label value must be less than or equal to the desired value - `required` (default: `true`): Whether the label is required for the task to run. If `true`, the task will remain in a pending state until a worker with the desired label state becomes available. If `false`, the worker will be prioritized based on the sum of the highest matching weights. - `weight` (optional, default: `100`): The weight of the label. Higher weights are prioritized over lower weights when selecting a worker for the task. If multiple workers have the same highest weight, the worker with the highest sum of weights will be selected. Ignored if `required` is `true`. #### Ruby ```python affinity_worker_workflow = hatchet.workflow(name="AffinityWorkflow") @affinity_worker_workflow.task( desired_worker_labels={ "model": DesiredWorkerLabel(value="fancy-ai-model-v2", weight=10), "memory": DesiredWorkerLabel( value=256, required=True, comparator=WorkerLabelComparator.LESS_THAN, ), }, ) ``` #### Tab 2 ```typescript const workflow = hatchet.workflow({ name: 'affinity-workflow', description: 'test', }); workflow.task({ name: 'step1', fn: async (_, ctx) => { const results: Promise[] = []; // eslint-disable-next-line no-plusplus for (let i = 0; i < 50; i++) { const result = await ctx.spawnWorkflow(childWorkflow.id, {}); results.push(result.output); } console.log('Spawned 50 child workflows'); console.log('Results:', await Promise.all(results)); return { step1: 'step1 results!' }; }, }); ``` #### Tab 3 ```go err = w.RegisterWorkflow( &worker.WorkflowJob{ On: worker.Events("user:create:affinity"), Name: "affinity", Description: "affinity", Steps: []*worker.WorkflowStep{ worker.Fn(func(ctx worker.HatchetContext) (result *taskOneOutput, err error) { return &taskOneOutput{ Message: ctx.Worker().ID(), }, nil }). SetName("task-one"). SetDesiredLabels(map[string]*types.DesiredWorkerLabel{ "model": { Value: "fancy-ai-model-v2", Weight: 10, }, "memory": { Value: 512, Required: true, Comparator: types.ComparatorPtr(types.WorkerLabelComparator_GREATER_THAN), }, }), }, }, ) ``` #### Tab 4 ```ruby AFFINITY_WORKER_WORKFLOW = HATCHET.workflow(name: "AffinityWorkflow") ``` > **Warning:** Use extra care when using worker affinity with [sticky assignment `HARD` > strategy](./sticky-assignment.mdx). In this case, it is recommended to set > desired labels on the first task of the workflow to ensure that the workflow > is assigned to a worker that meets the desired criteria and remains on that > worker for the duration of the workflow. ### Dynamic Worker Labels Labels can also be set dynamically on workers using the `upsertLabels` method. This can be useful when worker state changes over time, such as when a new model is loaded into memory or when a worker's resource availability changes. #### Ruby ```python async def step(input: EmptyModel, ctx: Context) -> dict[str, str | None]: if ctx.worker.labels().get("model") != "fancy-ai-model-v2": ctx.worker.upsert_labels({"model": "unset"}) # DO WORK TO EVICT OLD MODEL / LOAD NEW MODEL ctx.worker.upsert_labels({"model": "fancy-ai-model-v2"}) return {"worker": ctx.worker.id()} ``` #### Tab 2 ```typescript const childWorkflow = hatchet.workflow({ name: 'child-affinity-workflow', description: 'test', }); childWorkflow.task({ name: 'child-step1', desiredWorkerLabels: { model: { value: 'xyz', required: true, }, }, fn: async (ctx) => { return { childStep1: 'childStep1 results!' }; }, }); ``` #### Tab 3 ```go err = w.RegisterWorkflow( &worker.WorkflowJob{ On: worker.Events("user:create:affinity"), Name: "affinity", Description: "affinity", Steps: []*worker.WorkflowStep{ worker.Fn(func(ctx worker.HatchetContext) (result *taskOneOutput, err error) { model := ctx.Worker().GetLabels()["model"] if model != "fancy-vision-model" { ctx.Worker().UpsertLabels(map[string]interface{}{ "model": nil, }) // Do something to load the model evictModel(); loadNewModel("fancy-vision-model"); ctx.Worker().UpsertLabels(map[string]interface{}{ "model": "fancy-vision-model", }) } return &taskOneOutput{ Message: ctx.Worker().ID(), }, nil }). SetName("task-one"). SetDesiredLabels(map[string]*types.DesiredWorkerLabel{ "model": { Value: "fancy-vision-model", Weight: 10, }, "memory": { Value: 512, Required: true, Comparator: types.WorkerLabelComparator_GREATER_THAN, }, }), }, }, ) ``` #### Tab 4 ```ruby AFFINITY_WORKER_WORKFLOW.task( :step, desired_worker_labels: { "model" => Hatchet::DesiredWorkerLabel.new(value: "fancy-ai-model-v2", weight: 10), "memory" => Hatchet::DesiredWorkerLabel.new( value: 256, required: true, comparator: :less_than ) } ) do |input, ctx| if ctx.worker.labels["model"] != "fancy-ai-model-v2" ctx.worker.upsert_labels("model" => "unset") # DO WORK TO EVICT OLD MODEL / LOAD NEW MODEL ctx.worker.upsert_labels("model" => "fancy-ai-model-v2") end { "worker" => ctx.worker.id } end ``` --- # Manual Slot Release The Hatchet execution model sets a number of available slots for running tasks in a workflow. When a task is running, it occupies a slot, and if a worker has no available slots, it will not be able to run any more tasks concurrently. In some cases, you may have a task in your workflow that is resource-intensive and requires exclusive access to a shared resource, such as a database connection or a GPU compute instance. To ensure that other tasks in the workflow can run concurrently, you can manually release the slot after the resource-intensive task has completed, but the task still has non-resource-intensive work to do (i.e. upload or cleanup). > **Warning:** This is an advanced feature and should be used with caution. Manually > releasing the slot can have unintended side effects on system performance and > concurrency. For example, if the worker running the task dies, the task will > not be reassigned and will remain in a running state until manually > terminated. ## Using Manual Slot Release You can manually release a slot in from within a running task in your workflow using the Hatchet context method `release_slot`: #### Ruby ```python slot_release_workflow = hatchet.workflow(name="SlotReleaseWorkflow") @slot_release_workflow.task() def step1(input: EmptyModel, ctx: Context) -> dict[str, str]: print("RESOURCE INTENSIVE PROCESS") time.sleep(10) # πŸ‘€ Release the slot after the resource-intensive process, so that other steps can run ctx.release_slot() print("NON RESOURCE INTENSIVE PROCESS") return {"status": "success"} ``` #### Tab 2 ```go func StepOne(ctx worker.HatchetContext) (result \*taskOneOutput, err error) { fmt.Println("RESOURCE INTENSIVE PROCESS") time.Sleep(10 * time.Second) // Release the slot after the resource-intensive process, so that other tasks can run ctx.ReleaseSlot() fmt.Println("NON RESOURCE INTENSIVE PROCESS") return &taskOneOutput{ Message: "task1 results", }, nil }, ``` #### Tab 3 ```ruby SLOT_RELEASE_WORKFLOW = HATCHET.workflow(name: "SlotReleaseWorkflow") SLOT_RELEASE_WORKFLOW.task(:step1) do |input, ctx| puts "RESOURCE INTENSIVE PROCESS" sleep 10 # Release the slot after the resource-intensive process, so that other steps can run ctx.release_slot puts "NON RESOURCE INTENSIVE PROCESS" { "status" => "success" } end ``` In the above examples, the `release_slot()` method is called after the resource-intensive process has completed. This allows other tasks in the workflow to start executing while the current task continues with non-resource-intensive tasks. > **Info:** Manually releasing the slot does not terminate the current task. The task will > continue executing until it completes or encounters an error. ## Use Cases Some common use cases for Manual Slot Release include: - Performing data processing or analysis that requires significant CPU, GPU, or memory resources - Acquiring locks or semaphores to access shared resources - Executing long-running tasks that don't need to block other tasks after some initial work is done By utilizing Manual Slot Release, you can optimize the concurrency and resource utilization of your workflows, allowing multiple tasks to run in parallel when possible. --- # Logging Hatchet comes with a built-in logging view where you can push logs from your workflows. This is useful for debugging and monitoring your workflows. #### Ruby You can use either Python's built-in `logging` package, or the `context.log` method for more control over the logs that are sent. ## Using the built-in `logging` package You can pass a custom logger to the `Hatchet` class when initializing it. For example: ```python import logging from hatchet_sdk import ClientConfig, Hatchet logging.basicConfig(level=logging.INFO) root_logger = logging.getLogger() hatchet = Hatchet( debug=True, config=ClientConfig( logger=root_logger, ), ) ``` It's recommended that you pass the root logger to the `Hatchet` class, as this will ensure that all logs are captured by the Hatchet logger. If you have workflows defined in multiple files, they should be children of the root logger. For example, with the following file structure: ``` workflows/ workflow.py client.py worker.py workflow.py ``` You should pass the root logger to the `Hatchet` class in `client.py`: ```python import logging from hatchet_sdk import ClientConfig, Hatchet logging.basicConfig(level=logging.INFO) root_logger = logging.getLogger() hatchet = Hatchet( debug=True, config=ClientConfig( logger=root_logger, ), ) ``` And then in `workflows/workflow.py`, you should create a child logger: ```python import logging import time from examples.logger.client import hatchet from hatchet_sdk import Context, EmptyModel logger = logging.getLogger(__name__) logging_workflow = hatchet.workflow( name="LoggingWorkflow", ) @logging_workflow.task() def root_logger(input: EmptyModel, ctx: Context) -> dict[str, str]: for i in range(12): logger.info(f"executed step1 - {i}") logger.info({"step1": "step1"}) time.sleep(0.1) return {"status": "success"} ``` ## Using the `context.log` method You can also use the `context.log` method to log messages from your workflows. This method is available on the `Context` object that is passed to each task in your workflow. For example: ```python @logging_workflow.task() def context_logger(input: EmptyModel, ctx: Context) -> dict[str, str]: for i in range(12): ctx.log(f"executed step1 - {i}") ctx.log({"step1": "step1"}) time.sleep(0.1) return {"status": "success"} ``` Each task is currently limited to 1000 log lines. #### Tab 2 In TypeScript, there are two options for logging from your tasks. The first is to use the `ctx.log()` method (from the `Context`) to send logs: ```typescript const workflow = hatchet.workflow({ name: 'logger-example', description: 'test', on: { event: 'user:create', }, }); workflow.task({ name: 'logger-step1', fn: async (_, ctx) => { // log in a for loop // eslint-disable-next-line no-plusplus for (let i = 0; i < 10; i++) { ctx.logger.info(`log message ${i}`); await sleep(200); } return { step1: 'completed step run' }; }, }); ``` This has the benefit of being easy to use out of the box (no setup required!), but it's limited in its flexibiliy and how pluggable it is with your existing logging setup. Hatchet also allows you to "bring your own" logger when you define a workflow: ```typescript const logger = pino(); class PinoLogger implements Logger { logLevel: LogLevel; context: string; constructor(context: string, logLevel: LogLevel = 'DEBUG') { this.logLevel = logLevel; this.context = context; } debug(message: string, extra?: JsonObject): void { logger.debug(extra, message); } info(message: string, extra?: JsonObject): void { logger.info(extra, message); } green(message: string, extra?: JsonObject): void { logger.info(extra, `%c${message}`); } warn(message: string, error?: Error, extra?: JsonObject): void { logger.warn(extra, `${message} ${error}`); } error(message: string, error?: Error, extra?: JsonObject): void { logger.error(extra, `${message} ${error}`); } // optional util method util(key: string, message: string, extra?: JsonObject): void { // for example you may want to expose a trace method if (key === 'trace') { logger.info(extra, 'trace'); } } } const hatchet = Hatchet.init({ log_level: 'DEBUG', logger: (ctx, level) => new PinoLogger(ctx, level), }); ``` In this example, we create Pino logger that implement's Hatchet's `Logger` interface and pass it to the Hatchet client constructor. We can then use that logger in our steps: ```typescript const workflow = hatchet.workflow({ name: 'logger-example', description: 'test', on: { event: 'user:create', }, }); workflow.task({ name: 'logger-step1', fn: async (_, ctx) => { // log in a for loop // eslint-disable-next-line no-plusplus for (let i = 0; i < 10; i++) { ctx.logger.info(`log message ${i}`); await sleep(200); } return { step1: 'completed step run' }; }, }); ``` #### Tab 3 ```ruby require "hatchet-sdk" require "logger" logger = Logger.new($stdout) logger.level = Logger::INFO HATCHET = Hatchet::Client.new(debug: true) unless defined?(HATCHET) LOGGING_WORKFLOW = HATCHET.workflow(name: "LoggingWorkflow") LOGGING_WORKFLOW.task(:root_logger) do |input, ctx| 12.times do |i| logger.info("executed step1 - #{i}") logger.info({ "step1" => "step1" }.inspect) sleep 0.1 end { "status" => "success" } end ``` ```ruby LOGGING_WORKFLOW.task(:context_logger) do |input, ctx| 12.times do |i| ctx.log("executed step1 - #{i}") ctx.log({ "step1" => "step1" }.inspect) sleep 0.1 end { "status" => "success" } end ``` --- # OpenTelemetry OpenTelemetry support is currently only available for the Python SDK. Hatchet supports exporting traces from your tasks to an [OpenTelemetry Collector](https://opentelemetry.io/docs/collector/) to improve visibility into your Hatchet tasks. ## Usage ### Setup Hatchet's SDK provides an instrumentor that auto-instruments Hatchet code if you opt in. Setup is straightforward: First, install the `otel` extra with (e.g.) `pip install hatchet-sdk[otel]`. Then, import the instrumentor: ```python from hatchet_sdk.opentelemetry.instrumentor import HatchetInstrumentor HatchetInstrumentor( tracer_provider=trace_provider, ).instrument() ``` You bring your own trace provider and plug it into the `HatchetInstrumentor`, call `instrument`, and that's it! Check out the [OpenTelemetry documentation](https://opentelemetry.io/docs/languages/python/instrumentation/) for more information on how to set up a trace provider. ### Spans By default, Hatchet creates spans at the following points in the lifecycle of a task run: 1. When a trigger is run on the client side, e.g. `run()` or `push()` is called. 2. When a worker handles a task event, such as starting to run the task or cancelling the task In addition, you'll get a handful of attributes set (prefixed by `hatchet.`) on the task run events, such as the task name and the worker ID, as well as success/failure states, and so on. Some other important notes: 1. The instrumentor will automatically propagate the trace context between task runs, so if you spawn a task from another task, the child will correctly show up as a child of its parent in the trace waterfall. 2. You can exclude specific attributes from being attached to spans by providing the `otel` configuration option on the `ClientConfig` and passing a list of `excluded_attributes`, which come from [this list](https://github.com/hatchet-dev/hatchet/blob/main/sdks/python/hatchet_sdk/utils/opentelemetry.py). ## Integrations Hatchet's instrumentor is easy to integrate with a number of third-party tracing tools. ### Langfuse For example, you might be interested in using [Langfuse](https://langfuse.com/) for tracing an LLM-intensive application. Note that this example uses Langfuse's [V3 (OTel-based) SDK](https://langfuse.com/docs/sdk/python/sdk-v3). See their docs for more information. First, configure the Langfuse client [as described by their documentation](https://langfuse.com/docs/opentelemetry/example-python-sdk): ```python LANGFUSE_AUTH = base64.b64encode( f"{os.getenv('LANGFUSE_PUBLIC_KEY')}:{os.getenv('LANGFUSE_SECRET_KEY')}".encode() ).decode() os.environ["OTEL_EXPORTER_OTLP_ENDPOINT"] = ( os.getenv("LANGFUSE_HOST", "https://us.cloud.langfuse.com") + "/api/public/otel" ) os.environ["OTEL_EXPORTER_OTLP_HEADERS"] = f"Authorization=Basic {LANGFUSE_AUTH}" ## Note: Langfuse sets the global tracer provider, so you don't need to worry about it lf = Langfuse( public_key=os.getenv("LANGFUSE_PUBLIC_KEY"), secret_key=os.getenv("LANGFUSE_SECRET_KEY"), host=os.getenv("LANGFUSE_HOST", "https://app.langfuse.com"), ) ``` Langfuse will set the global tracer provider, so you don't have to do it manually. Next, create an OpenAI client [using Langfuse's OpenAI wrapper `langfuse.openai` as a drop-in replacement for the default OpenAI](https://langfuse.com/docs/integrations/openai/python/get-started) client: ```python openai = AsyncOpenAI( api_key=os.getenv("OPENAI_API_KEY"), ) ``` And that's it! Now you're ready to instrument your Hatchet workers with Langfuse. For example, create a task like this: ```python HatchetInstrumentor( ## Langfuse sets the global tracer provider tracer_provider=get_tracer_provider(), ).instrument() hatchet = Hatchet() @hatchet.task() async def langfuse_task(input: EmptyModel, ctx: Context) -> dict[str, str | None]: ## Usage, cost, etc. of this call will be send to Langfuse generation = await openai.chat.completions.create( model="gpt-4o-mini", messages=[ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Where does Anna Karenina take place?"}, ], ) location = generation.choices[0].message.content return { "location": location, } ``` And finally, run the task to view the Langfuse traces (cost, usage, etc.) interspersed with Hatchet's traces, in addition to any other traces you may have: ```python tracer = get_client() async def main() -> None: # Traces will send to Langfuse # Use `_otel_tracer` to access the OpenTelemetry tracer if you need # to e.g. log statuses or attributes manually. with tracer._otel_tracer.start_as_current_span(name="trigger") as span: result = await langfuse_task.aio_run() location = result.get("location") if not location: span.set_status(StatusCode.ERROR) return span.set_attribute("location", location) ``` When you run this task, you'll see a trace like this in Langfuse! example-langfuse-trace --- # Prometheus Metrics > **Info:** Only available in the Growth tier and above on Hatchet Cloud Hatchet exports Prometheus Metrics for your tenant which can be scraped with services like Grafana and DataDog. ## Tenant Metrics > **Warning:** Only works with v1 tenants Metrics for individual tenants are available in Prometheus Text Format via a REST API endpoint. ### Endpoint ``` GET /api/v1/tenants/{tenantId}/prometheus-metrics ``` ### Authentication The endpoint requires Bearer token authentication using a valid API token: ``` Authorization: Bearer ``` ### Response Format The response is returned in standard Prometheus Text Format, including: - HELP comments describing each metric - TYPE declarations (counter, gauge, etc.) - Metric samples with labels and values ### Example Usage ```bash curl -H "Authorization: Bearer your-api-token-here" \ https://cloud.onhatchet.run/api/v1/tenants/707d0855-80ab-4e1f-a156-f1c4546cbf52/prometheus-metrics ``` --- # Cancellation in Hatchet Tasks Hatchet provides a mechanism for canceling task executions gracefully, allowing you to signal to running tasks that they should stop running. Cancellation can be triggered on graceful termination of a worker or automatically through concurrency control strategies like [`CANCEL_IN_PROGRESS`](./concurrency.mdx#cancel_in_progress), which cancels currently running task instances to free up slots for new instances when the concurrency limit is reached. When a task is canceled, Hatchet sends a cancellation signal to the task. The task can then check for the cancellation signal and take appropriate action, such as cleaning up resources, aborting network requests, or gracefully terminating their execution. ## Cancellation Mechanisms #### Python ```python @cancellation_workflow.task() def check_flag(input: EmptyModel, ctx: Context) -> dict[str, str]: for i in range(3): time.sleep(1) # Note: Checking the status of the exit flag is mostly useful for cancelling # sync tasks without needing to forcibly kill the thread they're running on. if ctx.exit_flag: print("Task has been cancelled") raise ValueError("Task has been cancelled") return {"error": "Task should have been cancelled"} ``` ```python @cancellation_workflow.task() async def self_cancel(input: EmptyModel, ctx: Context) -> dict[str, str]: await asyncio.sleep(2) ## Cancel the task await ctx.aio_cancel() await asyncio.sleep(10) return {"error": "Task should have been cancelled"} ``` #### Typescript ```typescript export const cancellation = hatchet.task({ name: 'cancellation', fn: async (_, ctx) => { await sleep(10 * 1000); if (ctx.cancelled) { throw new Error('Task was cancelled'); } return { Completed: true, }; }, }); ``` ```typescript export const abortSignal = hatchet.task({ name: 'abort-signal', fn: async (_, { abortController }) => { try { const response = await axios.get('https://api.example.com/data', { signal: abortController.signal, }); // Handle the response } catch (error) { if (axios.isCancel(error)) { // Request was canceled console.log('Request canceled'); } else { // Handle other errors } } }, }); ``` #### Go ```go // Add a long-running task that can be cancelled _ = workflow.NewTask("long-running-task", func(ctx hatchet.Context, input CancellationInput) (CancellationOutput, error) { log.Printf("Starting long-running task with message: %s", input.Message) // Simulate long-running work with cancellation checking for i := 0; i < 10; i++ { select { case <-ctx.Done(): log.Printf("Task cancelled after %d steps", i) return CancellationOutput{ Status: "cancelled", Completed: false, }, nil default: log.Printf("Working... step %d/10", i+1) time.Sleep(1 * time.Second) } } log.Println("Task completed successfully") return CancellationOutput{ Status: "completed", Completed: true, }, nil }, hatchet.WithExecutionTimeout(30*time.Second)) ``` #### Ruby ```ruby CANCELLATION_WORKFLOW.task(:check_flag) do |input, ctx| 3.times do sleep 1 # Note: Checking the status of the exit flag is mostly useful for cancelling # sync tasks without needing to forcibly kill the thread they're running on. if ctx.cancelled? puts "Task has been cancelled" raise "Task has been cancelled" end end { "error" => "Task should have been cancelled" } end ``` ```ruby CANCELLATION_WORKFLOW.task(:self_cancel) do |input, ctx| sleep 2 ## Cancel the task ctx.cancel sleep 10 { "error" => "Task should have been cancelled" } end ``` ## Cancellation Best Practices When working with cancellation in Hatchet tasks, consider the following best practices: 1. **Graceful Termination**: When a task receives a cancellation signal, aim to terminate its execution gracefully. Clean up any resources, abort pending operations, and perform any necessary cleanup tasks before returning from the task function. 2. **Cancellation Checks**: Regularly check for cancellation signals within long-running tasks or loops. This allows the task to respond to cancellation in a timely manner and avoid unnecessary processing. 3. **Asynchronous Operations**: If a task performs asynchronous operations, such as network requests or file I/O, consider passing the cancellation signal to those operations. Many libraries and APIs support cancellation through the `AbortSignal` interface. 4. **Error Handling**: Handle cancellation errors appropriately. Distinguish between cancellation errors and other types of errors to provide meaningful error messages and take appropriate actions. 5. **Cancellation Propagation**: If a task invokes other functions or libraries, consider propagating the cancellation signal to those dependencies. This ensures that cancellation is handled consistently throughout the task. ## Additional Features In addition to the methods of cancellation listed here, Hatchet also supports [bulk cancellation](./bulk-retries-and-cancellations.mdx), which allows you to cancel many tasks in bulk using either their IDs or a set of filters, which is often the easiest way to cancel many things at once. ## Conclusion Cancellation is a powerful feature in Hatchet that allows you to gracefully stop task executions when needed. Remember to follow best practices when implementing cancellation in your tasks, such as graceful termination, regular cancellation checks, handling asynchronous operations, proper error handling, and cancellation propagation. By incorporating cancellation into your Hatchet tasks and workflows, you can build more resilient and responsive systems that can adapt to changing circumstances and user needs. --- # Streaming in Hatchet Hatchet tasks can stream data back to a consumer in real-time. This has a number of valuable uses, such as streaming the results of an LLM call back from a Hatchet worker to a frontend or sending progress updates as a task chugs along. ## Publishing Stream Events You can stream data out of a task run by using the `put_stream` (or equivalent) method on the `Context`. #### Python ```python anna_karenina = """ Happy families are all alike; every unhappy family is unhappy in its own way. Everything was in confusion in the Oblonskys' house. The wife had discovered that the husband was carrying on an intrigue with a French girl, who had been a governess in their family, and she had announced to her husband that she could not go on living in the same house with him. """ def create_chunks(content: str, n: int) -> Generator[str, None, None]: for i in range(0, len(content), n): yield content[i : i + n] chunks = list(create_chunks(anna_karenina, 10)) @hatchet.task() async def stream_task(input: EmptyModel, ctx: Context) -> None: # πŸ‘€ Sleeping to avoid race conditions await asyncio.sleep(2) for chunk in chunks: await ctx.aio_put_stream(chunk) await asyncio.sleep(0.20) ``` #### Typescript ```typescript const annaKarenina = ` Happy families are all alike; every unhappy family is unhappy in its own way. Everything was in confusion in the Oblonskys' house. The wife had discovered that the husband was carrying on an intrigue with a French girl, who had been a governess in their family, and she had announced to her husband that she could not go on living in the same house with him. `; function* createChunks(content: string, n: number): Generator { for (let i = 0; i < content.length; i += n) { yield content.slice(i, i + n); } } export const streamingTask = hatchet.task({ name: 'stream-example', fn: async (_, ctx) => { await sleep(2000); for (const chunk of createChunks(annaKarenina, 10)) { ctx.putStream(chunk); await sleep(200); } }, }); ``` #### Go ```go const annaKarenina = ` Happy families are all alike; every unhappy family is unhappy in its own way. Everything was in confusion in the Oblonskys' house. The wife had discovered that the husband was carrying on an intrigue with a French girl, who had been a governess in their family, and she had announced to her husband that she could not go on living in the same house with him. ` func createChunks(content string, n int) []string { var chunks []string for i := 0; i < len(content); i += n { end := i + n if end > len(content) { end = len(content) } chunks = append(chunks, content[i:end]) } return chunks } func StreamTask(ctx hatchet.Context, input StreamTaskInput) (*StreamTaskOutput, error) { time.Sleep(2 * time.Second) chunks := createChunks(annaKarenina, 10) for _, chunk := range chunks { ctx.PutStream(chunk) time.Sleep(200 * time.Millisecond) } return &StreamTaskOutput{ Message: "Streaming completed", }, nil } ``` #### Ruby ```ruby ANNA_KARENINA = <<~TEXT Happy families are all alike; every unhappy family is unhappy in its own way. Everything was in confusion in the Oblonskys' house. The wife had discovered that the husband was carrying on an intrigue with a French girl, who had been a governess in their family, and she had announced to her husband that she could not go on living in the same house with him. TEXT STREAM_CHUNKS = ANNA_KARENINA.scan(/.{1,10}/) STREAM_TASK = HATCHET.task(name: "stream_task") do |input, ctx| # Sleeping to avoid race conditions sleep 2 STREAM_CHUNKS.each do |chunk| ctx.put_stream(chunk) sleep 0.20 end end ``` This task will stream small chunks of content through Hatchet, which can then be consumed elsewhere. Here we use some text as an example, but this is intended to replicate streaming the results of an LLM call back to a consumer. ## Consuming Streams You can easily consume stream events by using the stream method on the workflow run ref that the various [fire-and-forget](./run-no-wait.mdx) methods return. #### Python ```python ref = await stream_task.aio_run_no_wait() async for chunk in hatchet.runs.subscribe_to_stream(ref.workflow_run_id): print(chunk, flush=True, end="") ``` #### Typescript ```typescript const ref = await streamingTask.runNoWait({}); const id = await ref.getWorkflowRunId(); for await (const content of hatchet.runs.subscribeToStream(id)) { process.stdout.write(content); } ``` #### Go ```go func main() { client, err := hatchet.NewClient() if err != nil { log.Fatalf("Failed to create Hatchet client: %v", err) } ctx := context.Background() streamingWorkflow := shared.StreamingWorkflow(client) workflowRun, err := streamingWorkflow.RunNoWait(ctx, shared.StreamTaskInput{}) if err != nil { log.Fatalf("Failed to run workflow: %v", err) } id := workflowRun.RunId stream := client.Runs().SubscribeToStream(ctx, id) for content := range stream { fmt.Print(content) } fmt.Println("\nStreaming completed!") } ``` #### Ruby ```ruby ref = STREAM_TASK.run_no_wait HATCHET.runs.subscribe_to_stream(ref.workflow_run_id) do |chunk| print chunk end ``` In the examples above, this will result in the famous text below being gradually printed to the console, bit by bit. ``` Happy families are all alike; every unhappy family is unhappy in its own way. Everything was in confusion in the Oblonskys' house. The wife had discovered that the husband was carrying on an intrigue with a French girl, who had been a governess in their family, and she had announced to her husband that she could not go on living in the same house with him. ``` You must begin consuming the stream before any events are published. Any events published before a consumer is initialized will be dropped. In practice, this will not be an issue in most cases, but adding a short sleep before beginning streaming results back can help. ## Streaming to a Web Application It's common to want to stream events out of a Hatchet task and back to the frontend of your application, for consumption by an end user. As mentioned before, some clear cases where this is useful would be for streaming back progress of some long-running task for a customer to monitor, or streaming back the results of an LLM call. In both cases, we recommend using your application's backend as a proxy for the stream, where you would subscribe to the stream of events from Hatchet, and then stream events through to the frontend as they're received by the backend. #### Python For example, with FastAPI, you'd do the following: ```python hatchet = Hatchet() app = FastAPI() @app.get("/stream") async def stream() -> StreamingResponse: ref = await stream_task.aio_run_no_wait() return StreamingResponse( hatchet.runs.subscribe_to_stream(ref.workflow_run_id), media_type="text/plain" ) ``` #### Typescript For example, with NextJS backend-as-frontend, you'd do the following: ```typescript export async function GET() { try { const ref = await streamingTask.runNoWait({}); const workflowRunId = await ref.getWorkflowRunId(); const stream = Readable.from(hatchet.runs.subscribeToStream(workflowRunId)); // @ts-ignore return new Response(Readable.toWeb(stream), { headers: { 'Content-Type': 'text/plain', 'Cache-Control': 'no-cache', Connection: 'keep-alive', }, }); } catch (error) { return new Response('Internal Server Error', { status: 500 }); } } ``` #### Go For example, with Go's built-in HTTP server, you'd do the following: ```go func main() { client, err := hatchet.NewClient() if err != nil { log.Fatalf("Failed to create Hatchet client: %v", err) } streamingWorkflow := shared.StreamingWorkflow(client) http.HandleFunc("/stream", func(w http.ResponseWriter, r *http.Request) { ctx := context.Background() w.Header().Set("Content-Type", "text/plain") w.Header().Set("Cache-Control", "no-cache") w.Header().Set("Connection", "keep-alive") workflowRun, err := streamingWorkflow.RunNoWait(ctx, shared.StreamTaskInput{}) if err != nil { http.Error(w, err.Error(), http.StatusInternalServerError) return } stream := client.Runs().SubscribeToStream(ctx, workflowRun.RunId) flusher, _ := w.(http.Flusher) for content := range stream { fmt.Fprint(w, content) if flusher != nil { flusher.Flush() } } }) server := &http.Server{ Addr: ":8000", ReadTimeout: 5 * time.Second, WriteTimeout: 10 * time.Second, } if err := server.ListenAndServe(); err != nil { log.Println("Failed to start server:", err) } } ``` #### Ruby Then, assuming you run the server on port `8000`, running `curl -N http://localhost:8000/stream` would result in the text streaming back to your console from Hatchet through your FastAPI proxy. --- ## SDK Improvements in V1 The Hatchet SDKs have seen considerable improvements with the V1 release. The examples in our documentation now use the V1 SDKs, so following individual examples will help you get familiar with the new SDKs and understand how to migrate from V0. #### Python ### Highlights The Python SDK has a number of notable highlights to showcase for V1. Many of them have been highlighted elsewhere, such as [in the migration guide](./migration-guide-python.mdx), on the [Pydantic page](./pydantic.mdx), an in various examples. Here, we'll list out each of them, along with their motivations and benefits. First and foremost: Many of the changes in the V1 Python SDK are motivated by improved support for type checking and validation across large codebases and in production use-cases. With that in mind, the main highlights in the V1 Python SDK are: 1. Workflows are now declared with `hatchet.workflow`, which returns a `Workflow` object, or `hatchet.task` (for simple cases) which returns a `Standalone` object. Workflows then have their corresponding tasks registered with `Workflow.task`. The `Workflow` object (and the `Standalone` object) can be reused easily across the codebase, and has wrapper methods like `run` and `schedule` that make it easy to run workflows. In these wrapper methods, inputs to the workflow are type checked, and you no longer need to specify the name of the workflow to run as a magic string. 2. Tasks have their inputs type checked, and inputs are now Pydantic models. The `input` field is either the model you provide to the workflow as the `input_validator`, or is an `EmptyModel`, which is a helper Pydantic model Hatchet provides and uses as a default. 3. In the new SDK, we define the `parents` of a task as a list of `Task` objects as opposed to as a list of strings. This also allows us to use `ctx.task_output(my_task)` to access the output of the `my_task` task in the a downstream task, while allowing that output to be type checked correctly. 4. In the new SDK, inputs are injected directly into the task as the first positional argument, so the signature of a task now will be `Callable[[YourWorkflowInputType, Context]]`. This replaces the old method of accessing workflow inputs via `context.workflow_input()`. #### Other Breaking Changes There have been a number of other breaking changes throughout the SDK in V1. Typing improvements: 1. External-facing protobuf objects, such as `StickyStrategy` and `ConcurrencyLimitStrategy`, have been replaced by native Python enums to make working with them easier. 2. All external-facing types that are used for triggering workflows, scheduling workflows, etc. are now Pydantic objects, as opposed to being `TypedDict`s. 3. The return type of each `Task` is restricted to a `JSONSerializableMapping` or a Pydantic model, to better align with what the Hatchet Engine expects. 4. The `ClientConfig` now uses Pydantic Settings, and we've removed the static methods on the Client for `from_environment` and `from_config` in favor of passing configuration in correctly. 5. The REST API wrappers, which previously were under `hatchet.rest`, have been completely overhauled. Naming changes: 1. We no longer have nested `aio` clients for async methods. Instead, async methods throughout the entire SDK are prefixed by `aio_`, similar to [Langchain's use of the `a` prefix](https://python.langchain.com/docs/concepts/streaming/#stream-and-astream) to indicate async. For example, to run a workflow, you may now either use `workflow.run()` or `workflow.aio_run()`. 2. All functions on Hatchet clients are now _verbs_. For instance, if something was named `hatchet.nounVerb` before, it now will be something more like `hatchet.verb_noun`. For example, `hatchet.runs.get_result` gets the result of a workflow run. 3. `timeout`, the execution timeout of a task, has been renamed to `execution_timeout` for clarity. Removals: 1. `sync_to_async` has been removed. We recommend reading [our asyncio documentation](./asyncio.mdx) for our recommendations on handling blocking work in otherwise async tasks. 2. The `AdminClient` has been removed, and refactored into individual clients. For example, if you absolutely need to create a workflow run manually without using `Workflow.run` or `Standalone.run`, you can use `hatchet.runs.create`. This replaces the old `hatchet.admin.run_workflow`. Other miscellaneous changes: 1. As shown in the Pydantic example above, there is no longer a `spawn_workflow(s)` method on the `Context`. `run` is now the preferred method for spawning workflows, which will automatically propagate the parent's metadata to the child workflow. 2. All times and durations, such as `execution_timeout` and `schedule_timeout`, now allow `datetime.timedelta` objects instead of only allowing strings (e.g. `"10s"` can be `timedelta(seconds=10)`). #### Other New features There are a handful of other new features that will make interfacing with the SDK easier, which are listed below. 1. Concurrency keys using the `input` to a workflow are now checked for validity at runtime. If the workflow's `input_validator` does not contain a field that's used in a key, Hatchet will reject the workflow when it's created. For example, if the key is `input.user_id`, the `input_validator` Pydantic model _must_ contain a `user_id` field. 2. There is now an `on_success_task` on the `Workflow` object, which works just like an on-failure task, but it runs after all upstream tasks in the workflow have _succeeded_. 3. We've exposed feature clients on the Hatchet client to make it easier to interact with and control your environment. For example, you can write scripts to find all runs that match certain criteria, and replay or cancel them. First, fetch some run ids: ```python workflow_runs = hatchet.runs.list(workflow_ids=[workflow.metadata.id]) ``` Then, use those ids to bulk cancel: ```python workflow_run_ids = [workflow_run.metadata.id for workflow_run in workflow_runs.rows] bulk_cancel_by_ids = BulkCancelReplayOpts(ids=workflow_run_ids) hatchet.runs.bulk_cancel(bulk_cancel_by_ids) ``` Or cancel directly by using filters: ```python bulk_cancel_by_filters = BulkCancelReplayOpts( filters=RunFilter( since=datetime.today() - timedelta(days=1), until=datetime.now(tz=timezone.utc), statuses=[V1TaskStatus.RUNNING], workflow_ids=[workflow.metadata.id], additional_metadata={"key": "value"}, ) ) hatchet.runs.bulk_cancel(bulk_cancel_by_filters) ``` The `hatchet` client also has clients for `workflows` (declarations), `schedules`, `crons`, `metrics` (i.e. queue depth), `events`, and `workers`. #### Typescript ### Highlights The Typescript SDK has a number of notable highlights to showcase for V1. Many of them have been highlighted elsewhere, such as [in the migration guide](./migration-guide-typescript.mdx), an in various examples. Here, we'll list out each of them, along with their motivations and benefits. First and foremost: Many of the changes in the V1 Typescript SDK are motivated by improved support for type checking and inference across large codebases and in production use-cases. With that in mind, here are the main highlights: 1. We've moved away from a pure object-based pattern to a factory pattern for creating your workflows and tasks. This allows for much more flexibility and type safety. The simplest way to declare a workflow is with `hatchet.task`. ```typescript import { hatchet } from '../hatchet-client'; // (optional) Define the input type for the workflow export type SimpleInput = { Message: string; }; export const simple = hatchet.task({ name: 'simple', retries: 3, fn: async (input: SimpleInput) => { return { TransformedMessage: input.Message.toLowerCase(), }; }, }); ``` This returns an object that you can use to run the task with fully inferred types! ```typescript const input = { Message: 'Hello, World!' }; // run now const result = await simple.run(input); const runReference = await simple.runNoWait(input); // or in the future const runAt = new Date(new Date().setHours(12, 0, 0, 0) + 24 * 60 * 60 * 1000); const scheduled = await simple.schedule(runAt, input); const cron = await simple.cron('simple-daily', '0 0 * * *', input); ``` 2. DAGs got a similar and can be run the same way. DAGs are now a collection of tasks that are composed by calling `.task` on the `Workflow` object. You can declare your types for DAGs. Output types are checked if there is a corresponding task name as a key in the output type. First, declare the types: ```typescript type DagInput = { Message: string; }; type DagOutput = { reverse: { Original: string; Transformed: string; }; }; ``` Then, create the workflow: ```typescript // First, we declare the workflow export const dag = hatchet.workflow({ name: 'simple', }); ``` And bind tasks to it: ```typescript // Next, we declare the tasks bound to the workflow const toLower = dag.task({ name: 'to-lower', fn: (input) => { return { TransformedMessage: input.Message.toLowerCase(), }; }, }); ``` 3. Logical organization of SDK features to make it easier to understand and use. We've exposed feature clients on the Hatchet client to make it easier to interact with and control your environment. For example, you can write scripts to find all runs that match certain criteria, and replay or cancel them. ```typescript // list all failed runs const allFailedRuns = await runs.list({ statuses: [V1TaskStatus.FAILED], }); // replay by ids await runs.replay({ ids: allFailedRuns.rows?.map((r) => r.metadata.id) }); // or you can run bulk operations with filters directly await runs.cancel({ filters: { since: new Date('2025-03-27'), additionalMetadata: { user: '123' }, }, }); ``` The `hatchet` client also has clients for `workflows` (declarations), `schedules`, `crons`, `metrics` (i.e. queue depth), `events`, and `workers`. #### Go ### Highlights The Go SDK has a number of notable highlights to showcase for V1. Many of them have been highlighted elsewhere, such as [in the migration guide](./migration-guide-go.mdx), an in various examples. Here, we'll list out each of them, along with their motivations and benefits. 1. Workflows and tasks are now instantiated via a factory pattern which makes it easier to define and run workflows. For example: ```go type SimpleInput struct { Message string } type SimpleResult struct { TransformedMessage string } simple := factory.NewTask( create.StandaloneTask{ Name: "simple-task", }, func(ctx worker.HatchetContext, input SimpleInput) (*SimpleResult, error) { return &SimpleResult{ TransformedMessage: strings.ToLower(input.Message), }, nil }, hatchet, // a Hatchet client instance ) // somewhere else in your code result, err := simple.Run(ctx, SimpleInput{ Message: "Hello, World!", }) // result is fully typed! ``` 2. Instead of passing parent references via `[]string`, you can simply pass task references directly to other tasks in a workflow, reducing the fragility of your code. For example: ```go simple := factory.NewWorkflow[DagInput, DagResult]( create.WorkflowCreateOpts[DagInput]{ Name: "simple-dag", }, hatchet, ) step1 := simple.Task( create.WorkflowTask[DagInput, DagResult]{ Name: "Step1", }, func(ctx worker.HatchetContext, input DagInput) (interface{}, error) { // ... }, ) simple.Task( create.WorkflowTask[DagInput, DagResult]{ Name: "Step2", Parents: []create.NamedTask{ step1, }, }, func(ctx worker.HatchetContext, input DagInput) (interface{}, error) { // getting parent input also uses the task reference, for example: var step1Output SimpleOutput ctx.ParentOutput(step1, &step1Output) // ... }, ) ``` 3. Configuring workflows and tasks is much easier, with all configuration options flattened into a single struct. 4. We've exposed feature clients on the Hatchet client to make it easier to interact with and control your environment. For example, you can write scripts to find all runs that match certain criteria, and replay or cancel them. ```go hatchet, err := v1.NewHatchetClient() if err != nil { panic(err) } ctx := context.Background() runs, err := hatchet.Runs().List(ctx, rest.V1WorkflowRunListParams{ Statuses: &[]rest.V1TaskStatus{rest.V1TaskStatusFAILED}, }) if err != nil { panic(err) } replayIds := []types.UUID{} for _, run := range runs.JSON200.Rows { replayIds = append(replayIds, uuid.MustParse(run.Metadata.Id)) } // Replay the runs hatchet.Runs().Replay(ctx, rest.V1ReplayTaskRequest{ ExternalIds: &replayIds, }) // Or run bulk operations with filters directly hatchet.Runs().Cancel(ctx, rest.V1CancelTaskRequest{ Filter: &rest.V1TaskFilter{ Since: time.Now().Add(-time.Hour * 24), AdditionalMetadata: &[]string{"user:123"}, }, }) ``` The `hatchet` client also has clients for `workflows` (declarations), `schedules`, `crons`, `metrics` (i.e. queue depth), `events`, and `workers`. --- ## Engine Migration Guide Please read this document in its entirety before upgrading to a v1 engine. ### How to upgrade The latest Hatchet engine is available on Hatchet Cloud and in self-hosted Hatchet versions `>0.55.11`. To upgrade, navigate to the Tenant Settings page and click the "Upgrade to v1" button: ![Upgrade to v1](https://github.com/user-attachments/assets/55f6b343-e9a9-4d6b-9c03-43eba105357d) ### How to downgrade If you navigate to the Tenant Settings page and click the "Downgrade to v0" button, you will be able to downgrade to the v0 engine. Please note that this will not migrate any workflow runs that were created on the v1 engine, see the section on Workflow Run Migration below for more information. ### Viewing v0 Workflow Runs After upgrading, you will be able to view your old workflows by clicking on your account icon (in the top right) and selecting **View Legacy V0 Data**. ### Setting v1 as the default engine For self-hosted instances, you can set v1 to be the default engine for all new tenants by setting the following environment variable: ``` SERVER_DEFAULT_ENGINE_VERSION=V1 ``` ### Workflow Run Migration Please note that upgrading to the v1 engine will **not migrate your existing workflow runs**, including runs which are in a Running or Queued state. If you have a large number of runs in a Running or Queued state, we recommend the following course of action: 1. Create a new tenant and upgrade it to v1. 2. Generate a new API token, and create a new set of workers which use this API token. These workers will connect to the tenant with the v1 runtime. 3. After your workers are connected, change the API token of the service which is publishing events or triggering workflow runs (typically your backend/API) to use the new API token, which is connected to the v1 tenant. The new workers will start processing new work that enters your system. 4. After the old v0 tenant has processed all of its existing workflow runs, spin down the workers which are connected to the v0 tenant. Note that if you are dependent on strict FIFO ordering in your queues, this strategy will not work, as new work may be processed on the new tenant before old work is processed on the old tenant. Please reach out to us over Discord or Slack if you need help with migrating this type of workload. ### SDK Compatibility First, a quick note β€” we are **not removing any features from our existing feature set**. Our goal is for all features to work with minimal changes in Hatchet v1. Up to now, we've attempted to be as backwards-compatible as possible while keeping up with feature velocity, and we'd like to support users on v0 for as long as possible while providing an easy path for upgrading. Generally, we'd recommend upgrading the Hatchet engine first, followed by the SDK version. Here's our compatibility matrix between v0 and v1: | | Engine v0 | Engine v1 | | ------ | ------------------- | ----------- | | SDK v0 | Supported | Supported\* | | SDK v1 | Limited support\*\* | Supported | \* Some features will behave slightly differently on the v1 engine, but all workflows defined in v0 can be registered in v1 \*\* It will not be possible to register a v1 workflow against the v0 engine, but each SDK will continue to bundle the v0 version until September 30th, 2025. For instructions on upgrading to the latest SDKs, please refer to the following guides: - [Python SDK Migration Guide](./migration-guide-python.mdx) - [Typescript SDK Migration Guide](./migration-guide-typescript.mdx) - [Go SDK Migration Guide](./migration-guide-go.mdx) ### List of breaking changes While we'd prefer to avoid any breaking changes, v1 is architecturally very different from v0, which means that the following APIs will be modified/replaced: - While we haven't published an official REST API doc, we have often recommended usage of the REST API in our SDKs to implement replays, retrieving task status, and dead-letter queueing. The current API for listing, cancelling and replaying workflow runs will not work against a v1 engine. We will be providing an upgrade path using new endpoints which are more conducive to bulk replays and cancellations. - We will only be supporting [CEL-based concurrency keys](./concurrency.mdx), and we will not be supporting custom concurrency methods defined on the client. If you require custom logic to compute the concurrency key that can't be captured in a CEL expression, we recommend computing the key ahead of time and passing it as part of the input to the workflow. **Workflows registered against a v1 engine with a custom concurrency method (instead of an expression) will not use a concurrency queue.** - Concurrency queues previously did not respect the `ScheduleTimeout` value set on the workflow level, so concurrency queues had no timeouts. In v1, concurrency queues will respect the schedule timeout value as well. _These are the most important breaking changes, but we will add any small modifications to queueing/workflow behavior ahead of March 24th._ --- ## Hatchet Python V1 Migration Guide This guide will help you migrate Hatchet workflows from the V0 SDK to the V1 SDK. #### Introductory Example First, a simple example of how to define a task with the V1 SDK: ```python from hatchet_sdk import Context, EmptyModel, Hatchet hatchet = Hatchet(debug=True) @hatchet.task() def simple(input: EmptyModel, ctx: Context) -> dict[str, str]: return {"result": "Hello, world!"} @hatchet.durable_task() def simple_durable(input: EmptyModel, ctx: Context) -> dict[str, str]: return {"result": "Hello, world!"} def main() -> None: worker = hatchet.worker("test-worker", workflows=[simple, simple_durable]) worker.start() ``` The API has changed significantly in the V1 SDK. Even in this simple example, there are some notable highlights: 1. Tasks can now be declared with `hatchet.task`, meaning you no longer _have_ to create a workflow explicitly to define a task. This should feel similar to how e.g. Celery handles task definition. Note that we recommend declaring a workflow in many cases, but the simplest possible way to get set up is to use `hatchet.task`. 2. Tasks have a new signature. They now take two arguments: `input` and `context`. The `input` is either of type `input_validator` (a Pydantic model you provide to the workflow), or is an `EmptyModel`, which is a helper Pydantic model Hatchet provides and uses as a default. The `context` is once again the Hatchet `Context` object. 3. Workflows can now be registered on a worker by using the `workflows` keyword argument to the `worker` method, although the old `register_workflows` method is still available. #### Pydantic Hatchet's V1 SDK makes heavy use of Pydantic models, and recommends you do too! Let's dive into a more involved example using Pydantic in a fanout example. ```python class ParentInput(BaseModel): n: int = 100 class ChildInput(BaseModel): a: str parent_wf = hatchet.workflow(name="FanoutParent", input_validator=ParentInput) child_wf = hatchet.workflow(name="FanoutChild", input_validator=ChildInput) @parent_wf.task(execution_timeout=timedelta(minutes=5)) async def spawn(input: ParentInput, ctx: Context) -> dict[str, Any]: print("spawning child") result = await child_wf.aio_run_many( [ child_wf.create_bulk_run_item( input=ChildInput(a=str(i)), options=TriggerWorkflowOptions( additional_metadata={"hello": "earth"}, key=f"child{i}" ), ) for i in range(input.n) ], ) print(f"results {result}") return {"results": result} ``` In this example, we use a few more new SDK features: 1. Workflows are now declared with `hatchet.workflow`, and then have their corresponding tasks registered with `workflow.task`. 2. We define two Pydantic models, `ParentInput` and `ChildInput`, and pass them to the parent and child workflows as `input_validator`s. Note that now, the `input` parameters for the tasks in those two workflows are Pydantic models of those types, and we can treat them as such. This replaces the old `context.workflow_input` for accessing the input to the workflow / task - now, we just can access the input directly. 3. When we want to spawn the child workflow, we can use the `run` methods on the `child_workflow` object, which is a Hatchet `Workflow`, instead of needing to refer to the workflow by its name (a string). The `input` field to `run()` is now also properly typed as `ChildInput`. 4. The child workflow (see below) makes use of some of Hatchet's DAG features, such as defining parent tasks. In the new SDK, `parents` of a task are defined as a list of `Task` objects as opposed to as a list of strings, so now, `process2` has `process` (the `Task`) as its parent, as opposed to `"process"` (the string). This also allows us to use `ctx.task_output(process)` to access the output of the `process` task in the `process2` task, and know the type of that output at type checking time. ```python @child_wf.task() async def process(input: ChildInput, ctx: Context) -> dict[str, str]: print(f"child process {input.a}") return {"status": input.a} @child_wf.task(parents=[process]) async def process2(input: ChildInput, ctx: Context) -> dict[str, str]: process_output = ctx.task_output(process) a = process_output["status"] return {"status2": a + "2"} ``` See our [Pydantic documentation](./pydantic.mdx) for more. #### Other Breaking Changes There have been a number of other breaking changes throughout the SDK in V1. Typing improvements: 1. All times and durations, such as `timeout` and `schedule_timeout` fields are now `datetime.timedelta` objects instead of strings (e.g. `"10s"` becomes `timedelta(seconds=10)`). 2. External-facing protobuf objects, such as `StickyStrategy` and `ConcurrencyLimitStrategy`, have been replaced by native Python enums to make working with them easier. 3. All interactions with the `Task` and `Workflow` objects are now typed, so you know e.g. what the type of the input to the task needs to be at type checking time (we see this in the Pydantic example above). 4. All external-facing types that are used for triggering tasks, scheduling tasks, etc. are now Pydantic objects, as opposed to being `TypedDict`s. 5. The return type of each `Task` is restricted to a `JSONSerializableMapping` or a Pydantic model, to better align with what the Hatchet Engine expects. 6. The `ClientConfig` now uses Pydantic Settings, and we've removed the static methods on the Client for `from_environment` and `from_config` in favor of passing configuration in correctly. 7. The REST API wrappers, which previously were under `hatchet.rest`, have been completely overhauled. Naming changes: 1. We no longer have nested `aio` clients for async methods. Instead, async methods throughout the entire SDK are prefixed by `aio_`, similar to [Langchain's use of the `a` prefix](https://python.langchain.com/docs/concepts/streaming/#stream-and-astream) to indicate async. For example, to run a task, you may now either use `workflow.run()` or `workflow.aio_run()`. 2. All functions on Hatchet clients are now _verbs_. For instance the way to list workflow runs is via `hatchet.runs.list()`. 3. `max_runs` on the worker has been renamed to `slots`. Removals: 1. `sync_to_async` has been removed. We recommend reading [our asyncio documentation](./asyncio.mdx) for our recommendations on handling blocking work in otherwise async tasks. Other miscellaneous changes: 1. As shown in the Pydantic example above, there is no longer a `spawn_workflow(s)` method on the `Context`. `run` is now the preferred method for spawning workflows, which will automatically propagate the parent's metadata to the child workflow. #### Other New features There are a handful of other new features that will make interfacing with the SDK easier, which are listed below. 1. Concurrency keys using the `input` to a task are now checked for validity at runtime. If the task's `input_validator` does not contain a field that's used in a key, Hatchet will reject the task when it's created. For example, if the key is `input.user_id`, the `input_validator` Pydantic model _must_ contain a `user_id` field. 2. There is now an `on_success_task` on the `Workflow` object, which works just like an on-failure task, but it runs after all upstream tasks have _succeeded_. --- ## Hatchet TypeScript V1 Migration Guide This guide will help you migrate Hatchet workflows from the V0 SDK to the V1 SDK. #### Introductory Example First, we've exposed a new `hatchet.task` method in the V1 SDK for single function tasks. ```typescript import { hatchet } from '../hatchet-client'; // (optional) Define the input type for the workflow export type SimpleInput = { Message: string; }; export const simple = hatchet.task({ name: 'simple', retries: 3, fn: async (input: SimpleInput) => { return { TransformedMessage: input.Message.toLowerCase(), }; }, }); ``` DAGs are still defined as workflows, but they can now be declared using the `hatchet.workflow` method. ```typescript // First, we declare the workflow export const dag = hatchet.workflow({ name: 'simple', }); ``` And you can bind tasks to workflows as follows: ```typescript // Next, we declare the tasks bound to the workflow const toLower = dag.task({ name: 'to-lower', fn: (input) => { return { TransformedMessage: input.Message.toLowerCase(), }; }, }); ``` You can now run work for tasks and workflows by directly interacting with the returned object. ```typescript const res = await dag.run({ Message: 'hello world', }); ``` There are a few important things to note when migrating to the new SDK: 1. The new SDK uses a factory pattern (shown above) for creating tasks and workflows, which we've found to be more ergonomic than the previous SDK. 2. The old method of defining tasks will still work in the new SDK, but we recommend migrating over to the new method shown above for improved type checking and for access to new features. 3. New features of the SDK, such as the new durable execution features rolled out in V1, will only be accessible from the new `TaskDeclaration` object in the new SDK. Since the old pattern for declaring tasks will still work in the new SDK, we recommend migrating existing tasks to the new patterns in V1 gradually. #### Fanout Example The new SDK also provides improved type support for spawning child tasks from around the codebase. Consider the following example: First, we declare a child task: ```typescript import { hatchet } from '../hatchet-client'; type ChildInput = { N: number; }; export const child = hatchet.task({ name: 'child', fn: (input: ChildInput) => { return { Value: input.N, }; }, }); ``` Next, we spawn that child from a parent task: ```typescript type ParentInput = { N: number; }; export const parent = hatchet.task({ name: 'parent', fn: async (input: ParentInput, ctx) => { const n = input.N; const promises = []; for (let i = 0; i < n; i++) { promises.push(child.run({ N: i })); } const childRes = await Promise.all(promises); const sum = childRes.reduce((acc, curr) => acc + curr.Value, 0); return { Result: sum, }; }, }); ``` In this example, the compiler knows what to expect for the types of `input` and `ctx` for each of the tasks, as well as the type of the input of the `child` task when spawning it from the `parent` task. --- ## Hatchet Go SDK Migration Guide This comprehensive guide covers migration paths between all three major versions of the Hatchet Go SDK: - **V0 SDK** (`github.com/hatchet-dev/hatchet/pkg/client`) - Original SDK - **V1 Generics SDK** (`github.com/hatchet-dev/hatchet/pkg/v1`) - Type-safe SDK with Go generics (deprecated) - **V1 Reflection SDK** (`github.com/hatchet-dev/hatchet/sdks/go`) - Current SDK with reflection-based API > **Info:** The V1 engine will continue to support V0 tasks until September 30th, 2025. ## Quick Start with V1 Reflection SDK (Current) The current V1 SDK provides the cleanest API using reflection for type safety: ```go package main import ( "context" "log" hatchet "github.com/hatchet-dev/hatchet/sdks/go" ) func main() { client, err := hatchet.NewClient() if err != nil { log.Fatal(err) } // Define input/output types type Input struct { Message string `json:"message"` } type Output struct { Result string `json:"result"` } // Create a simple task task := client.NewStandaloneTask("simple-task", func(ctx hatchet.Context, input Input) (Output, error) { return Output{Result: "Processed: " + input.Message}, nil }) // Start worker worker, err := client.NewWorker("worker", hatchet.WithWorkflows(task)) if err != nil { log.Fatal(err) } if err := worker.StartBlocking(context.Background()); err != nil { log.Fatal(err) } } ``` ## Migration Paths ### From V0 SDK to V1 Reflection SDK **V0 SDK (Legacy)**: ```go package main import ( "log" "github.com/hatchet-dev/hatchet/pkg/client" v0Worker "github.com/hatchet-dev/hatchet/pkg/worker" ) func V0() { c, err := client.New() if err != nil { log.Fatal(err) } worker, err := v0Worker.NewWorker( v0Worker.WithClient(c), v0Worker.WithName("worker"), ) if err != nil { log.Fatal(err) } err = worker.RegisterWorkflow( &v0Worker.WorkflowJob{ On: v0Worker.Event("user:create"), Name: "simple-workflow", Steps: []*v0Worker.WorkflowStep{ { Name: "step1", Function: func(ctx v0Worker.HatchetContext) error { log.Println("executed step1") return nil }, }, }, }, ) if err != nil { log.Fatal(err) } } ``` **V1 Reflection SDK (Current)**: ```go package main import ( "log" "strings" hatchet "github.com/hatchet-dev/hatchet/sdks/go" ) type SimpleInput struct { Message string `json:"message"` } type SimpleResult struct { TransformedMessage string `json:"result"` } func V1() { client, err := hatchet.NewClient() if err != nil { log.Fatal(err) } workflow := client.NewStandaloneTask("simple-workflow", func(ctx hatchet.Context, input SimpleInput) (SimpleResult, error) { log.Println("executed step1") return SimpleResult{TransformedMessage: strings.ToLower(input.Message)}, nil }, hatchet.WithWorkflowEvents("user:create")) _, err = client.NewWorker( "worker", hatchet.WithWorkflows(workflow), ) if err != nil { log.Fatal(err) } } ``` ### From V1 Generics SDK to V1 Reflection SDK **V1 Generics SDK (Deprecated)**: ```go package main import ( "log" "strings" "github.com/hatchet-dev/hatchet/pkg/client/create" v1 "github.com/hatchet-dev/hatchet/pkg/v1" "github.com/hatchet-dev/hatchet/pkg/v1/factory" "github.com/hatchet-dev/hatchet/pkg/v1/worker" "github.com/hatchet-dev/hatchet/pkg/v1/workflow" v0Worker "github.com/hatchet-dev/hatchet/pkg/worker" ) func V1Old() { hatchet, err := v1.NewHatchetClient() if err != nil { log.Fatal(err) } simple := factory.NewTask( create.StandaloneTask{Name: "simple-task", OnEvents: []string{"user:create"}}, func(ctx v0Worker.HatchetContext, input SimpleInput) (*SimpleResult, error) { return &SimpleResult{TransformedMessage: strings.ToLower(input.Message)}, nil }, hatchet, ) _, err = hatchet.Worker(worker.WorkerOpts{ Name: "worker", Workflows: []workflow.WorkflowBase{simple}, }) if err != nil { log.Fatal(err) } } ``` **V1 Reflection SDK (Current)**: ```go package main import ( "log" "strings" hatchet "github.com/hatchet-dev/hatchet/sdks/go" ) type SimpleInput struct { Message string `json:"message"` } type SimpleResult struct { TransformedMessage string `json:"result"` } func V1() { client, err := hatchet.NewClient() if err != nil { log.Fatal(err) } workflow := client.NewStandaloneTask("simple-workflow", func(ctx hatchet.Context, input SimpleInput) (SimpleResult, error) { log.Println("executed step1") return SimpleResult{TransformedMessage: strings.ToLower(input.Message)}, nil }, hatchet.WithWorkflowEvents("user:create")) _, err = client.NewWorker( "worker", hatchet.WithWorkflows(workflow), ) if err != nil { log.Fatal(err) } } ``` ## Migration Checklist ### From V0 to V1 Reflection SDK - [ ] Update import: `github.com/hatchet-dev/hatchet/pkg/client` β†’ `github.com/hatchet-dev/hatchet/sdks/go` - [ ] Change client creation: `client.New()` β†’ `hatchet.NewClient()` - [ ] Convert `WorkflowJob` to `NewWorkflow()` with tasks - [ ] Replace `RegisterWorkflow()` with `WithWorkflows()` option - [ ] Update function signatures to use typed inputs/outputs - [ ] Replace `worker.HatchetContext` with `hatchet.Context` ### From V1 Generics to V1 Reflection SDK - [ ] Update import: `github.com/hatchet-dev/hatchet/pkg/v1` β†’ `github.com/hatchet-dev/hatchet/sdks/go` - [ ] Change client creation: `v1.NewHatchetClient()` β†’ `hatchet.NewClient()` - [ ] Remove factory imports and usage - [ ] Convert `factory.NewTask()` to `NewStandaloneTask()` or workflow tasks - [ ] Remove explicit type parameters (generics) - [ ] Update function return types (remove pointers where appropriate) - [ ] Replace `create.StandaloneTask{}` structs with option functions ## Common Patterns ### Error Handling and Retries ```go task := workflow.NewTask("resilient-task", func(ctx hatchet.Context, input any) (any, error) { // Your task logic here return result, nil }, hatchet.WithRetries(5), hatchet.WithRetryBackoff(2.0, time.Second*30), // 2x backoff, max 30s ) ``` ### Child Workflows ```go parentTask := workflow.NewTask("parent", func(ctx hatchet.Context, input any) (any, error) { // Spawn child workflow result, err := childWorkflow.Run(ctx, input, hatchet.WithRunKey("key")) if err != nil { return nil, err } return result, nil }) ``` ### Bulk Operations ```go // Run multiple instances of a workflow runInputs := []hatchet.RunManyOpt{ {Input: map[string]string{"user": "alice"}}, {Input: map[string]string{"user": "bob"}}, {Input: map[string]string{"user": "charlie"}}, } renRefs, err := client.RunMany(ctx, "bulk-workflow", runInputs) ``` This guide should cover all major migration scenarios between the three Go SDK versions. The V1 Reflection SDK provides the most ergonomic API while maintaining full compatibility with the Hatchet platform. --- # Working with `asyncio` Hatchet's Python SDK, similarly to other popular libraries like FastAPI, Langchain, etc., makes heavy use of `asyncio`, and recommends that you do as well! To learn the basics of `asyncio`, check out [this introduction from FastAPI](https://fastapi.tiangolo.com/async/). However, as is the case in FastAPI, when using `asyncio` in Hatchet, you need to be careful to not have any blocking logic in the functions you define as tasks, as this will block the asyncio event loop and prevent additional work from executing until the blocking operation has completed. For example, this is async-safe: ```python async def async_safe() -> int: await asyncio.sleep(5) return 42 ``` But this is not: ```python async def blocking() -> int: time.sleep(5) return 42 ``` In the second case, your worker will not be able to process any other work that's defined as async until the five-second sleep has finished. ### Using `asyncio.to_thread` and `loop.run_in_executor` To avoid problems caused by blocking code, you can run your blocking code in an executor with `asyncio.to_thread` or, more verbosely, `loop.run_in_executor`. The two examples below are async-safe and will no longer block the event loop. ```python async def to_thread() -> int: await asyncio.to_thread(time.sleep, 5) return 42 ``` ```python async def run_in_executor() -> int: loop = asyncio.get_event_loop() await loop.run_in_executor(None, time.sleep, 5) return 42 ``` ### More Resources for working with `asyncio` If you're looking for more info on developing with AsyncIO more broadly, we highly recommend the following resources: - Python's Documentation on [Developing with AsyncIO](https://docs.python.org/3/library/asyncio-dev.html) - Tusamma's Medium post about [How AsyncIO works](https://medium.com/@tssovi/how-does-asyncio-works-f5386316b7fa) - Zac Hatfield-Dodds's PyCon 2023 talk on [Async: scaling structured concurrency with static and dynamic analysis](https://www.youtube.com/watch?v=FrpUb6OEbcc) --- # Pydantic Support The V1 Hatchet SDK leans heavily on [Pydantic](https://docs.pydantic.dev/latest/) (both internally and externally) for handling validation of workflow inputs and outputs, method inputs, and more. ### Usage To enable Pydantic for validation, you'll need to: 1. Provide an `input_validator` as a parameter to your `workflow`. 2. Add return type hints for your `tasks`. ### Default Behavior By default, if no `input_validator` is provided, the `EmptyModel` is used, which is a Pydantic model that accepts any input. For example: ```python from hatchet_sdk import Context, EmptyModel, Hatchet hatchet = Hatchet(debug=True) @hatchet.task() def simple(input: EmptyModel, ctx: Context) -> dict[str, str]: return {"result": "Hello, world!"} @hatchet.durable_task() def simple_durable(input: EmptyModel, ctx: Context) -> dict[str, str]: return {"result": "Hello, world!"} def main() -> None: worker = hatchet.worker("test-worker", workflows=[simple, simple_durable]) worker.start() ``` In this simple example, the `input` that's injected into the task accepts an argument `input`, which is of type `EmptyModel`. The `EmptyModel` can be imported directly from Hatchet, and is an alias for: ```python from pydantic import BaseModel, ConfigDict class EmptyModel(BaseModel): model_config = ConfigDict(extra="allow") ``` Note that since `extra="allow"` is set, workflows will not fail with validation errors if an extra field is provided. ### Example Usage We highly recommend creating Pydantic models to represent your workflow inputs and outputs. This will help you catch errors early and ensure that your workflows are well-typed. For example, consider a fanout workflow like this: ```python class ParentInput(BaseModel): n: int = 100 class ChildInput(BaseModel): a: str parent_wf = hatchet.workflow(name="FanoutParent", input_validator=ParentInput) child_wf = hatchet.workflow(name="FanoutChild", input_validator=ChildInput) @parent_wf.task(execution_timeout=timedelta(minutes=5)) async def spawn(input: ParentInput, ctx: Context) -> dict[str, Any]: print("spawning child") result = await child_wf.aio_run_many( [ child_wf.create_bulk_run_item( input=ChildInput(a=str(i)), options=TriggerWorkflowOptions( additional_metadata={"hello": "earth"}, key=f"child{i}" ), ) for i in range(input.n) ], ) print(f"results {result}") return {"results": result} ``` In this case, we've defined two workflows: a parent and a child. They both have their inputs typed, and the parent spawns the child. Note that `child_wf.create_workflow_run_config` is typed, so the type checker (and your IDE) know the type of the input to the child workflow. Then, the child tasks are defined as follows: ```python @child_wf.task() async def process(input: ChildInput, ctx: Context) -> dict[str, str]: print(f"child process {input.a}") return {"status": input.a} @child_wf.task(parents=[process]) async def process2(input: ChildInput, ctx: Context) -> dict[str, str]: process_output = ctx.task_output(process) a = process_output["status"] return {"status2": a + "2"} ``` In the children, the inputs are validated by Pydantic, so you can access their attributes directly without needing a type cast or parsing a dictionary with the inputs instead. --- # Lifespans Lifespans are an **experimental feature** in Hatchet, and are subject to change. Hatchet's Python SDK allows you define a **_lifespan_**, which is an async generator that runs when your worker starts up and cleans up when it exits, which lets you share state across all of the tasks running on the worker. This behaves almost identically to [FastAPI's lifespans](https://fastapi.tiangolo.com/advanced/events/), and is intended to be used in the same way. Lifespans are useful for sharing state like connection pools across all tasks on a single worker. They also work great for loading expensive machine learning models into memory before the worker starts. We recommend only using lifespans for storing **_immutable_** state to share between tasks running on your worker. The intention is not to e.g. store a counter of the number of tasks that a worker has run and increment that counter on each task run. This is prone to unexpected behavior due to concurrency in Hatchet. ## Usage To use Hatchet's `lifespan` feature, define an async generator and pass it into your `worker`: ```python class Lifespan(BaseModel): model_config = ConfigDict(arbitrary_types_allowed=True) foo: str pool: ConnectionPool async def lifespan() -> AsyncGenerator[Lifespan, None]: print("Running lifespan!") with ConnectionPool("postgres://hatchet:hatchet@localhost:5431/hatchet") as pool: yield Lifespan( foo="bar", pool=pool, ) print("Cleaning up lifespan!") worker = hatchet.worker( "test-worker", slots=1, workflows=[lifespan_workflow], lifespan=lifespan ) ``` When the worker starts, it will run the lifespan up to the `yield`. Then, on worker shutdown, it will clean up by running everything after the `yield` (the same as with any other generator). Your lifespan must only `yield` **_once_**. Then, to use your lifespan in a task, you can extract it from the context with `Context.lifespan`. ```python class TaskOutput(BaseModel): num_rows: int external_ids: list[UUID] lifespan_workflow = hatchet.workflow(name="LifespanWorkflow") @lifespan_workflow.task() def sync_lifespan_task(input: EmptyModel, ctx: Context) -> TaskOutput: pool = cast(Lifespan, ctx.lifespan).pool with pool.connection() as conn: query = conn.execute("SELECT * FROM v1_lookup_table_olap LIMIT 5;") rows = query.fetchall() for row in rows: print(row) print("executed sync task with lifespan", ctx.lifespan) return TaskOutput( num_rows=len(rows), external_ids=[cast(UUID, row[0]) for row in rows], ) ``` For type checking, cast the `Context.lifespan` to whatever type your lifespan generator yields. And that's it! Now, any task running on the worker with the lifespan provided will have access to the lifespan data. --- # Dependency Injection Dependency injection is an **experimental feature** in Hatchet, and is subject to change. Hatchet's Python SDK allows you to inject **_dependencies_** into your tasks, FastAPI style. These dependencies can be either synchronous or asynchronous functions. They are executed before the task is triggered, and their results are injected into the task as parameters. This behaves almost identically to [FastAPI's dependency injection](https://fastapi.tiangolo.com/tutorial/dependencies/), and is intended to be used in the same way. Dependencies are useful for sharing logic between tasks that you'd like to avoid repeating, or would like to factor out of the task logic itself (e.g. to make testing easier). Since dependencies are run before tasks are executed, having many dependencies (or any that take a long time to evaluate) can cause tasks to experience significantly delayed start times, as they must wait for all dependencies to finish evaluating. ## Usage To add dependencies to your tasks, import `Depends` from the `hatchet_sdk`. Then: ```python async def async_dep(input: EmptyModel, ctx: Context) -> str: return ASYNC_DEPENDENCY_VALUE def sync_dep(input: EmptyModel, ctx: Context) -> str: return SYNC_DEPENDENCY_VALUE @asynccontextmanager async def async_cm_dep( input: EmptyModel, ctx: Context, async_dep: Annotated[str, Depends(async_dep)] ) -> AsyncGenerator[str, None]: try: yield ASYNC_CM_DEPENDENCY_VALUE + "_" + async_dep finally: pass @contextmanager def sync_cm_dep( input: EmptyModel, ctx: Context, sync_dep: Annotated[str, Depends(sync_dep)] ) -> Generator[str, None, None]: try: yield SYNC_CM_DEPENDENCY_VALUE + "_" + sync_dep finally: pass @contextmanager def base_cm_dep(input: EmptyModel, ctx: Context) -> Generator[str, None, None]: try: yield CHAINED_CM_VALUE finally: pass def chained_dep( input: EmptyModel, ctx: Context, base_cm: Annotated[str, Depends(base_cm_dep)] ) -> str: return "chained_" + base_cm @asynccontextmanager async def base_async_cm_dep( input: EmptyModel, ctx: Context ) -> AsyncGenerator[str, None]: try: yield CHAINED_ASYNC_CM_VALUE finally: pass async def chained_async_dep( input: EmptyModel, ctx: Context, base_async_cm: Annotated[str, Depends(base_async_cm_dep)], ) -> str: return "chained_" + base_async_cm ``` In this example, we've declared two dependencies: one synchronous and one asynchronous. You can do anything you like in your dependencies, such as creating database sessions, managing configuration, sharing instances of service-layer logic, and more. Once you've defined your dependency functions, inject them into your tasks as follows: ```python @hatchet.task() async def async_task_with_dependencies( _i: EmptyModel, ctx: Context, async_dep: Annotated[str, Depends(async_dep)], sync_dep: Annotated[str, Depends(sync_dep)], async_cm_dep: Annotated[str, Depends(async_cm_dep)], sync_cm_dep: Annotated[str, Depends(sync_cm_dep)], chained_dep: Annotated[str, Depends(chained_dep)], chained_async_dep: Annotated[str, Depends(chained_async_dep)], ) -> Output: return Output( sync_dep=sync_dep, async_dep=async_dep, async_cm_dep=async_cm_dep, sync_cm_dep=sync_cm_dep, chained_dep=chained_dep, chained_async_dep=chained_async_dep, ) ``` Important note: Your dependency functions must take two positional arguments: the workflow input and the `Context` (the same as any other task). That's it! Now, whenever your task is triggered, its dependencies will be evaluated, and the results will be injected into the task at runtime for you to use as needed. --- # Dataclass Support Throughout the docs, we use Pydantic models in virtually all of our Python examples for validating task inputs and outputs. This is the recommended path, as it provides lots of safety guarantees as you're writing tasks. With that said, Hatchet also supports using `dataclasses` as both input and output types to tasks. **Dataclass support was added in SDK version 1.21.0.** > **Warning:** Dataclasses do not perform any type validation on instantiation like Pydantic > models do. ### Usage To use a dataclass instead of a Pydantic model, you'll need to: 1. Provide an `input_validator` as a parameter to your `workflow` or `task` (in the case of a standalone task with `hatchet.task`). 2. Add return type hints for your `tasks`. ### Example Usage `dataclass` validators work exactly like Pydantic models in Hatchet. First, you create the classes: ```python @dataclass class Input: name: str @dataclass class Output: message: str ``` And then you provide the classes to your workflow or task: ```python @hatchet.task(input_validator=Input) def say_hello(input: Input, ctx: Context) -> Output: return Output(message=f"Hello, {input.name}!") ``` And finally, triggering works the same as well - you just provide the dataclass instance as input: ```python say_hello.run(input=Input(name="Hatchet")) ``` --- # Self-Hosting the Hatchet Control Plane Self-hosting Hatchet means running your own instance of the **Hatchet Control Plane** - the central orchestration system that manages workflows, schedules tasks, and coordinates worker execution. This is different from running workers, which can connect to any Hatchet instance (self-hosted or Hatchet Cloud). ## What You're Self-Hosting When you self-host Hatchet, you're deploying: - **API Server** - REST APIs for workflow management - **Engine** - gRPC API for core workflow orchestration and task scheduling - **Database** - PostgreSQL for storing workflow state and metadata - **Message Queue (optional)** - RabbitMQ for inter-service communication and high-throughput real-time updates - **Dashboard** - Web UI for monitoring workflows and debugging Your **workers** (the processes that execute your workflow steps) will connect to your self-hosted control plane and execute tasks. ## Deployment Options The fastest way to get a Hatchet instance running locally is with the [Hatchet CLI](/cli) (which wraps Hatchet Lite): ```sh hatchet server start ``` There are currently three supported ways to self-host the Hatchet Control Plane: Docker: 1. [Hatchet Lite](./self-hosting/hatchet-lite.mdx) - Single docker image with bundled engine and API (development, testing, or low-throughput production) 2. [Docker Compose](./self-hosting/docker-compose.mdx) - Multi-container setup with PostgreSQL and RabbitMQ (production) Kubernetes: 1. [Quickstart with Helm](./self-hosting/kubernetes-quickstart.mdx) - Production-ready Helm charts (production) --- # Hatchet Lite Deployment To get up and running quickly, you can deploy via the `hatchet-lite` image. This image is designed for development and low-volume use-cases. ### Prerequisites This deployment requires [Docker](https://docs.docker.com/engine/install/) installed locally to work. ### Getting Hatchet Lite Running #### Hatchet CLI The easiest way to get Hatchet Lite running is via the Hatchet CLI. Simply run the following command: ```sh hatchet server start ``` #### With Postgres (Default) To use Postgres as both your DB and message queue, copy the following `docker-compose.hatchet.yml` file to the root of your repository: > **Info:** If you have an existing Postgres instance already running, you can simply > point `DATABASE_URL` to that instance and ignore the `postgres` service > deployment in the following docker-compose file. ```yaml filename="docker-compose.hatchet.yml" copy version: "3.8" name: hatchet-lite services: postgres: image: postgres:15.6 command: postgres -c 'max_connections=200' restart: always environment: - POSTGRES_USER=hatchet - POSTGRES_PASSWORD=hatchet - POSTGRES_DB=hatchet volumes: - hatchet_lite_postgres_data:/var/lib/postgresql/data healthcheck: test: ["CMD-SHELL", "pg_isready -d hatchet -U hatchet"] interval: 10s timeout: 10s retries: 5 start_period: 10s hatchet-lite: image: ghcr.io/hatchet-dev/hatchet/hatchet-lite:latest ports: - "8888:8888" - "7077:7077" depends_on: postgres: condition: service_healthy environment: # Refer to https://docs.hatchet.run/self-hosting/configuration-options # for a list of all supported environment variables DATABASE_URL: "postgresql://hatchet:hatchet@postgres:5432/hatchet?sslmode=disable" SERVER_AUTH_COOKIE_DOMAIN: localhost SERVER_AUTH_COOKIE_INSECURE: "t" SERVER_GRPC_BIND_ADDRESS: "0.0.0.0" SERVER_GRPC_INSECURE: "t" SERVER_GRPC_BROADCAST_ADDRESS: localhost:7077 SERVER_GRPC_PORT: "7077" SERVER_URL: http://localhost:8888 SERVER_AUTH_SET_EMAIL_VERIFIED: "t" SERVER_DEFAULT_ENGINE_VERSION: "V1" SERVER_INTERNAL_CLIENT_INTERNAL_GRPC_BROADCAST_ADDRESS: localhost:7077 volumes: - "hatchet_lite_config:/config" volumes: hatchet_lite_postgres_data: hatchet_lite_config: ``` Then run `docker-compose -f docker-compose.hatchet.yml up` to get the Hatchet Lite instance running. #### With Postgres + RabbitMQ To use Postgres as your DB and RabbitMQ as the message queue, copy the following `docker-compose.hatchet.yml` file to the root of your repository: > **Info:** If you have an existing Postgres instance already running, you can simply > point `DATABASE_URL` to that instance and ignore the `postgres` service > deployment in the following docker-compose file. ```yaml filename="docker-compose.hatchet.yml" copy version: "3.8" name: hatchet-lite services: postgres: image: postgres:15.6 command: postgres -c 'max_connections=200' restart: always environment: - POSTGRES_USER=hatchet - POSTGRES_PASSWORD=hatchet - POSTGRES_DB=hatchet volumes: - hatchet_lite_postgres_data:/var/lib/postgresql/data healthcheck: test: ["CMD-SHELL", "pg_isready -d hatchet -U hatchet"] interval: 10s timeout: 10s retries: 5 start_period: 10s rabbitmq: image: "rabbitmq:3-management" hostname: "rabbitmq" ports: - "5672:5672" - "15672:15672" environment: RABBITMQ_DEFAULT_USER: "user" RABBITMQ_DEFAULT_PASS: "password" volumes: - "hatchet_rabbitmq_data:/var/lib/rabbitmq" - "hatchet_rabbitmq.conf:/etc/rabbitmq/rabbitmq.conf" healthcheck: test: ["CMD", "rabbitmqctl", "status"] interval: 30s timeout: 10s retries: 5 hatchet-lite: image: ghcr.io/hatchet-dev/hatchet/hatchet-lite:latest ports: - "8888:8888" - "7077:7077" depends_on: postgres: condition: service_healthy environment: SERVER_MSGQUEUE_KIND: rabbitmq SERVER_MSGQUEUE_RABBITMQ_URL: "amqp://user:password@rabbitmq:5672/" # Refer to https://docs.hatchet.run/self-hosting/configuration-options # for a list of all supported environment variables DATABASE_URL: "postgresql://hatchet:hatchet@postgres:5432/hatchet?sslmode=disable" SERVER_AUTH_COOKIE_DOMAIN: localhost SERVER_AUTH_COOKIE_INSECURE: "t" SERVER_GRPC_BIND_ADDRESS: "0.0.0.0" SERVER_GRPC_INSECURE: "t" SERVER_GRPC_BROADCAST_ADDRESS: localhost:7077 SERVER_GRPC_PORT: "7077" SERVER_URL: http://localhost:8888 SERVER_AUTH_SET_EMAIL_VERIFIED: "t" SERVER_DEFAULT_ENGINE_VERSION: "V1" volumes: - "hatchet_lite_config:/config" volumes: hatchet_lite_postgres_data: hatchet_lite_config: hatchet_rabbitmq_data: hatchet_rabbitmq.conf: ``` Then run `docker-compose -f docker-compose.hatchet.yml up` to get the Hatchet Lite instance running. ### Accessing Hatchet Lite Once the Hatchet Lite instance is running, you can access the Hatchet Lite UI at [http://localhost:8888](http://localhost:8888). By default, a user is created with the following credentials: ``` Email: admin@example.com Password: Admin123!! ``` After logging in, follow the steps in the UI to create your first tenant and run your first workflow! --- # Docker Compose Deployment This guide shows how to deploy Hatchet using Docker Compose for a production-ready deployment. If you'd like to get up and running quickly, you can also deploy Hatchet using the `hatchet-lite` image following the tutorial here: [Hatchet Lite Deployment](/self-hosting/hatchet-lite). This guide uses RabbitMQ as a message broker for Hatchet. This is optional: if you'd like to use Postgres as a message broker, modify the `setup-config` service in the `docker-compose.yml` file with the following env var, and delete all RabbitMQ references: ```sh SERVER_MSGQUEUE_KIND=postgres ``` ## Quickstart ### Prerequisites This deployment requires [Docker](https://docs.docker.com/engine/install/) installed locally to work. ### Create files We will be creating a `docker-compose.yml` file in the root of your repository: ``` root/ docker-compose.yml docker-compose.yml ``` ```yaml filename="docker-compose.yml" copy version: "3.8" services: postgres: image: postgres:15.6 command: postgres -c 'max_connections=1000' restart: always hostname: "postgres" environment: - POSTGRES_USER=hatchet - POSTGRES_PASSWORD=hatchet - POSTGRES_DB=hatchet ports: - "5435:5432" volumes: - hatchet_postgres_data:/var/lib/postgresql/data healthcheck: test: ["CMD-SHELL", "pg_isready -d hatchet -U hatchet"] interval: 10s timeout: 10s retries: 5 start_period: 10s rabbitmq: image: "rabbitmq:3-management" hostname: "rabbitmq" ports: - "5673:5672" # RabbitMQ - "15673:15672" # Management UI environment: RABBITMQ_DEFAULT_USER: "user" RABBITMQ_DEFAULT_PASS: "password" volumes: - "hatchet_rabbitmq_data:/var/lib/rabbitmq" - "hatchet_rabbitmq.conf:/etc/rabbitmq/rabbitmq.conf" # Configuration file mount healthcheck: test: ["CMD", "rabbitmqctl", "status"] interval: 10s timeout: 10s retries: 5 migration: image: ghcr.io/hatchet-dev/hatchet/hatchet-migrate:latest command: /hatchet/hatchet-migrate environment: DATABASE_URL: "postgres://hatchet:hatchet@postgres:5432/hatchet" depends_on: postgres: condition: service_healthy setup-config: image: ghcr.io/hatchet-dev/hatchet/hatchet-admin:latest command: /hatchet/hatchet-admin quickstart --skip certs --generated-config-dir /hatchet/config --overwrite=false environment: DATABASE_URL: "postgres://hatchet:hatchet@postgres:5432/hatchet" SERVER_MSGQUEUE_RABBITMQ_URL: amqp://user:password@rabbitmq:5672/ SERVER_AUTH_COOKIE_DOMAIN: localhost:8080 SERVER_AUTH_COOKIE_INSECURE: "t" SERVER_GRPC_BIND_ADDRESS: "0.0.0.0" SERVER_GRPC_INSECURE: "t" SERVER_GRPC_BROADCAST_ADDRESS: localhost:7077 SERVER_DEFAULT_ENGINE_VERSION: "V1" SERVER_INTERNAL_CLIENT_INTERNAL_GRPC_BROADCAST_ADDRESS: hatchet-engine:7070 volumes: - hatchet_certs:/hatchet/certs - hatchet_config:/hatchet/config depends_on: migration: condition: service_completed_successfully rabbitmq: condition: service_healthy postgres: condition: service_healthy hatchet-engine: image: ghcr.io/hatchet-dev/hatchet/hatchet-engine:latest command: /hatchet/hatchet-engine --config /hatchet/config restart: on-failure depends_on: setup-config: condition: service_completed_successfully migration: condition: service_completed_successfully ports: - "7077:7070" environment: DATABASE_URL: "postgres://hatchet:hatchet@postgres:5432/hatchet" SERVER_GRPC_BIND_ADDRESS: "0.0.0.0" SERVER_GRPC_INSECURE: "t" volumes: - hatchet_certs:/hatchet/certs - hatchet_config:/hatchet/config hatchet-dashboard: image: ghcr.io/hatchet-dev/hatchet/hatchet-dashboard:latest command: sh ./entrypoint.sh --config /hatchet/config ports: - 8080:80 restart: on-failure depends_on: setup-config: condition: service_completed_successfully migration: condition: service_completed_successfully environment: DATABASE_URL: "postgres://hatchet:hatchet@postgres:5432/hatchet" volumes: - hatchet_certs:/hatchet/certs - hatchet_config:/hatchet/config volumes: hatchet_postgres_data: hatchet_rabbitmq_data: hatchet_rabbitmq.conf: hatchet_config: hatchet_certs: ``` ### Get Hatchet up and running To start the services, run the following command in the root of your repository: ```bash docker compose up ``` Wait for the `hatchet-engine` and `hatchet-dashboard` services to start. ### Accessing Hatchet Once the Hatchet instance is running, you can access the Hatchet UI at [http://localhost:8080](http://localhost:8080). By default, a user is created with the following credentials: ``` Email: admin@example.com Password: Admin123!! ``` ## Run tasks against the Hatchet instance To run tasks against this instance, you will first need to create an API token for your worker. There are two ways to do this: 1. **Using a CLI command**: You can run the following command to create a token: ```sh docker compose run --no-deps setup-config /hatchet/hatchet-admin token create --config /hatchet/config --tenant-id 707d0855-80ab-4e1f-a156-f1c4546cbf52 ``` 2. **Using the Hatchet dashboard**: - Log in to the Hatchet dashboard. - Navigate to the "Settings" page. - Click on the "API Tokens" tab. - Click on "Create API Token". Now that you have an API token, see the guide [here](https://docs.hatchet.run/home/setup) for how to run your first task. ## Repulling images The docker compose file above uses the `latest` tag for all images. This means that if you want to pull the latest version of the images, you can run the following command: ```bash docker compose pull ``` ## Connecting to the engine from within Docker If you're also running your worker application inside of `docker-compose`, you should modify the `SERVER_GRPC_BROADCAST_ADDRESS` environment variable in the `setup-config` service to use `hatchet-engine` as the hostname. For example: ```yaml SERVER_GRPC_BROADCAST_ADDRESS: "hatchet-engine:7077" ``` Make sure your worker depends on hatchet-engine: ```yaml worker: depends_on: hatchet-engine: condition: service_started ``` > **Info:** **Note:** modifying the GRPC broadcast address or server URL will require > re-issuing an API token. ## Additional Docker configuration ### Increase Postgres shared memory By default, containers have a 64 MB shared memory segment (`/dev/shm`). For larger Hatchet deployments this can be too small and may lead to slow queries or an unresponsive dashboard. Increase the shared memory size for the `postgres` service: ```yaml filename="docker-compose.yml" copy # ... services: postgres: image: postgres:15.6 shm_size: 1g # Increase shared memory (adjust as needed) command: postgres -c 'max_connections=1000' restart: always hostname: "postgres" environment: - POSTGRES_USER=hatchet - POSTGRES_PASSWORD=hatchet - POSTGRES_DB=hatchet ports: - "5435:5432" volumes: - hatchet_postgres_data:/var/lib/postgresql/data healthcheck: test: ["CMD-SHELL", "pg_isready -d hatchet -U hatchet"] interval: 10s timeout: 10s retries: 5 start_period: 10s # ... ``` --- # Kubernetes Quickstart ## Prerequisites - A Kubernetes cluster currently set as the current context in `kubectl` - `kubectl` and `helm` installed ## Quickstart ### Get Hatchet Running To deploy `hatchet-stack`, run the following commands: ```sh helm repo add hatchet https://hatchet-dev.github.io/hatchet-charts helm install hatchet-stack hatchet/hatchet-stack --set caddy.enabled=true ``` This default installation will run the Hatchet server as an internal service in the cluster and spins up a reverse proxy via `Caddy` to get local access. To view the Hatchet server, run the following command: ```sh export NAMESPACE=default # TODO: replace with your namespace export POD_NAME=$(kubectl get pods --namespace $NAMESPACE -l "app=caddy" -o jsonpath="{.items[0].metadata.name}") export CONTAINER_PORT=$(kubectl get pod --namespace $NAMESPACE $POD_NAME -o jsonpath="{.spec.containers[0].ports[0].containerPort}") kubectl --namespace $NAMESPACE port-forward $POD_NAME 8080:$CONTAINER_PORT ``` And then navigate to `http://localhost:8080` to see the Hatchet frontend running. You can log into Hatchet with the following credentials: ``` Email: admin@example.com Password: Admin123!! ``` ### Port forward to the Hatchet engine ```sh export NAMESPACE=default # TODO: replace with your namespace export POD_NAME=$(kubectl get pods --namespace $NAMESPACE -l "app.kubernetes.io/name=engine" -o jsonpath="{.items[0].metadata.name}") export CONTAINER_PORT=$(kubectl get pod --namespace $NAMESPACE $POD_NAME -o jsonpath="{.spec.containers[0].ports[0].containerPort}") kubectl --namespace $NAMESPACE port-forward $POD_NAME 7070:$CONTAINER_PORT ``` This will spin up the Hatchet engine service on `localhost:7070` which you can then connect to from the examples. ### Generate an API token To generate an API token, navigate to the `Settings` tab in the Hatchet frontend and click on the `API Tokens` tab. Click the `Generate API Token` button to create a new token. Store this token somewhere safe. ### Run your first worker Now that you have an API token, see the guide [here](https://docs.hatchet.run/home/setup) for how to run your first task. --- # Kubernetes Deployment via Glasskube ## Prerequisites - A Kubernetes cluster currently set as the current context in `kubectl` - `docker`, `openssl`, `kubectl` and [`glasskube`](https://glasskube.dev) installed ## What is Glasskube? [Glasskube](https://glasskube.dev) is an alternative package manager for Kubernetes and part of the CNCF landscape. Glasskube is designed as a Cloud Native application and every installed package is represented by a Custom Resource. [`glasskube/glasskube`](https://github.com/glasskube/glasskube/) is in active development, with _good first issues_ available for new contributors. ## Quickstart ### Generate encryption keys There are 4 encryption secrets required for Hatchet to run which can be generated via the following bash script (requires `docker` and `openssl`): ```sh filename=generate.sh copy #!/bin/bash # Define an alias for generating random strings. This needs to be a function in a script. randstring() { openssl rand -base64 69 | tr -d "\n=+/" | cut -c1-$1 } # Create keys directory mkdir -p ./keys # Function to clean up the keys directory cleanup() { rm -rf ./keys } # Register the cleanup function to be called on the EXIT signal trap cleanup EXIT # Check if Docker is installed if ! command -v docker &> /dev/null then echo "Docker could not be found. Please install Docker." exit 1 fi # Generate keysets using Docker docker run --user $(id -u):$(id -g) -v $(pwd)/keys:/hatchet/keys ghcr.io/hatchet-dev/hatchet/hatchet-admin:latest /hatchet/hatchet-admin keyset create-local-keys --key-dir /hatchet/keys # Read keysets from files SERVER_ENCRYPTION_MASTER_KEYSET=$(<./keys/master.key) SERVER_ENCRYPTION_JWT_PRIVATE_KEYSET=$(<./keys/private_ec256.key) SERVER_ENCRYPTION_JWT_PUBLIC_KEYSET=$(<./keys/public_ec256.key) # Generate the random strings for SERVER_AUTH_COOKIE_SECRETS SERVER_AUTH_COOKIE_SECRET1=$(randstring 16) SERVER_AUTH_COOKIE_SECRET2=$(randstring 16) # Create the YAML file cat > hatchet-secret.yaml <" HATCHET_CLIENT_TLS_STRATEGY=none ``` You will need this in the following example. ### Port forward to the Hatchet engine ```sh export NAMESPACE=hatchet # TODO: change if you modified the namespace export POD_NAME=$(kubectl get pods --namespace $NAMESPACE -l "app.kubernetes.io/name=hatchet-engine,app.kubernetes.io/instance=hatchet" -o jsonpath="{.items[0].metadata.name}") export CONTAINER_PORT=$(kubectl get pod --namespace $NAMESPACE $POD_NAME -o jsonpath="{.spec.containers[0].ports[0].containerPort}") kubectl --namespace $NAMESPACE port-forward $POD_NAME 7070:$CONTAINER_PORT ``` This will spin up the Hatchet engine service on `localhost:7070` which you can then connect to from the examples. ### Generate an API token To generate an API token, navigate to the `Settings` tab in the Hatchet frontend and click on the `API Tokens` tab. Click the `Generate API Token` button to create a new token. Store this token somewhere safe. ### Run your first worker Now that you have an API token, see the guide [here](https://docs.hatchet.run/home/setup) for how to run your first task. --- # Kubernetes Networking ## Overview By default, the Kubernetes Helm chart does not expose any of the Hatchet services over an ingress. There are three services which can possibly be exposed: 1. `hatchet-engine` 2. `hatchet-stack-api` 3. `hatchet-stack-frontend` To expose these services, you will need to do the following: 1. Configure ingresses for `frontend` and `engine` services (and optionally the `api` service). We recommend configuring the ingress to reverse proxy `/api` endpoints to the `hatchet-stack-api` service, and configuring a separate ingress to proxy to `hatchet-engine`. 2. Update the following configuration variables: ```yaml api: env: SERVER_AUTH_COOKIE_DOMAIN: "hatchet.example.com" # example.com should be replaced with your domain SERVER_URL: "https://hatchet.example.com" # example.com should be replaced with your domain SERVER_GRPC_BIND_ADDRESS: "0.0.0.0" SERVER_GRPC_INSECURE: "false" SERVER_GRPC_BROADCAST_ADDRESS: "hatchet-engine.example.com:443" # example.com should be replaced with your domain engine: env: SERVER_AUTH_COOKIE_DOMAIN: "hatchet.example.com" # example.com should be replaced with your domain SERVER_URL: "https://hatchet.example.com" # example.com should be replaced with your domain SERVER_GRPC_BIND_ADDRESS: "0.0.0.0" SERVER_GRPC_INSECURE: "false" SERVER_GRPC_BROADCAST_ADDRESS: "engine.hatchet.example.com:443" # example.com should be replaced with your domain ``` ## Example: `nginx-ingress` Let's walk through an example of exposing Hatchet over `hatchet.example.com` (for the API and frontend) and `engine.hatchet.example.com` (for the engine). We'll be deploying this with SSL enabled, which requires a valid certificate. We recommend using [cert-manager](https://cert-manager.io/docs/) to manage your certificates. This guide assumes that you have a cert-manager `ClusterIssuer` called `letsencrypt-prod` configured. Here's an example `values.yaml` file for this setup: ```yaml api: env: # TODO: insert these values from the output of the keyset generation command SERVER_AUTH_COOKIE_SECRETS: "$SERVER_AUTH_COOKIE_SECRET1 $SERVER_AUTH_COOKIE_SECRET2" SERVER_ENCRYPTION_MASTER_KEYSET: "$SERVER_ENCRYPTION_MASTER_KEYSET" SERVER_ENCRYPTION_JWT_PRIVATE_KEYSET: "$SERVER_ENCRYPTION_JWT_PRIVATE_KEYSET" SERVER_ENCRYPTION_JWT_PUBLIC_KEYSET: "$SERVER_ENCRYPTION_JWT_PUBLIC_KEYSET" SERVER_AUTH_COOKIE_DOMAIN: "hatchet.example.com" # example.com should be replaced with your domain SERVER_URL: "https://hatchet.example.com" # example.com should be replaced with your domain SERVER_GRPC_BIND_ADDRESS: "0.0.0.0" SERVER_GRPC_INSECURE: "false" SERVER_GRPC_BROADCAST_ADDRESS: "engine.hatchet.example.com:443" # example.com should be replaced with your domain engine: env: # TODO: insert these values from the output of the keyset generation command SERVER_AUTH_COOKIE_SECRETS: "$SERVER_AUTH_COOKIE_SECRET1 $SERVER_AUTH_COOKIE_SECRET2" SERVER_ENCRYPTION_MASTER_KEYSET: "$SERVER_ENCRYPTION_MASTER_KEYSET" SERVER_ENCRYPTION_JWT_PRIVATE_KEYSET: "$SERVER_ENCRYPTION_JWT_PRIVATE_KEYSET" SERVER_ENCRYPTION_JWT_PUBLIC_KEYSET: "$SERVER_ENCRYPTION_JWT_PUBLIC_KEYSET" SERVER_AUTH_COOKIE_DOMAIN: "hatchet.example.com" # example.com should be replaced with your domain SERVER_URL: "https://hatchet.example.com" # example.com should be replaced with your domain SERVER_GRPC_BIND_ADDRESS: "0.0.0.0" SERVER_GRPC_INSECURE: "false" SERVER_GRPC_BROADCAST_ADDRESS: "engine.hatchet.example.com:443" # example.com should be replaced with your domain ingress: enabled: true ingressClassName: nginx labels: {} annotations: cert-manager.io/cluster-issuer: letsencrypt-prod nginx.ingress.kubernetes.io/auth-tls-verify-client: "optional" nginx.ingress.kubernetes.io/auth-tls-secret: "${kubernetes_namespace.cloud.metadata[0].name}/engine-cert" nginx.ingress.kubernetes.io/auth-tls-verify-depth: "1" nginx.ingress.kubernetes.io/auth-tls-pass-certificate-to-upstream: "true" nginx.ingress.kubernetes.io/backend-protocol: "GRPC" nginx.ingress.kubernetes.io/ssl-redirect: "true" nginx.ingress.kubernetes.io/grpc-backend: "true" nginx.ingress.kubernetes.io/server-snippet: | grpc_read_timeout 1d; grpc_send_timeout 1h; client_header_timeout 1h; client_body_timeout 1h; hosts: - host: engine.hatchet.example.com paths: - path: / backend: serviceName: hatchet-engine servicePort: 7070 tls: - hosts: - engine.hatchet.example.com secretName: engine-cert servicePort: 7070 frontend: ingress: enabled: true ingressClassName: nginx labels: {} annotations: nginx.ingress.kubernetes.io/proxy-body-size: 50m nginx.ingress.kubernetes.io/proxy-send-timeout: "60" nginx.ingress.kubernetes.io/proxy-read-timeout: "60" nginx.ingress.kubernetes.io/proxy-connect-timeout: "60" cert-manager.io/cluster-issuer: letsencrypt-prod hosts: - host: hatchet.example.com paths: - path: /api backend: serviceName: hatchet-api servicePort: 8080 - path: / backend: serviceName: hatchet-frontend servicePort: 8080 tls: - secretName: hatchet-api hosts: - hatchet.example.com ``` --- # Configuring the Helm Chart ## Shared Config For the `hatchet-stack` and `hatchet-ha` Helm charts, the `sharedConfig` object in the `values.yaml` file allows you to configure shared settings for all backend services. The default values are: ```yaml sharedConfig: # you can disable shared config by setting this to false enabled: true # these are the most commonly configured values serverUrl: "http://localhost:8080" serverAuthCookieDomain: "localhost:8080" # the domain for the auth cookie serverAuthCookieInsecure: "t" # allows cookies to be set over http serverAuthSetEmailVerified: "t" # automatically sets email_verified to true for all users serverAuthBasicAuthEnabled: "t" # allows login via basic auth (email/password) grpcBroadcastAddress: "localhost:7070" # the endpoint for the gRPC server, exposed via the `grpc` service grpcInsecure: "true" # allows gRPC to be served over http defaultAdminEmail: "admin@example.com" # in exposed/production environments, change this to a valid email defaultAdminPassword: "Admin123!!" # in exposed/production environments, change this to a secure password # you can set additional environment variables here, which will override any defaults env: {} ``` ### Networking - **`sharedConfig.serverUrl`** (default: `"http://localhost:8080"`): specifies the base URL for the server. This URL should be the public-facing URL of the Hatchet API server (which is typically bundled behind a reverse proxy with the Hatchet frontend). - **`sharedConfig.grpcBroadcastAddress`** (default: `"localhost:7070"`): defines the address for the gRPC server endpoint, which is exposed via the `grpc` service. - **`sharedConfig.grpcInsecure`** (default: `"true"`): when set to `true`, allows the gRPC server to be served over HTTP instead of HTTPS. Use this in non-production environments only. ### Authentication - **`sharedConfig.serverAuthCookieDomain`** (default: `"localhost:8080"`): specifies the domain for the authentication cookie. Should be set to the appropriate domain when deploying to production. - **`sharedConfig.serverAuthCookieInsecure`** (default: `"t"`): if set to `"t"`, allows authentication cookies to be set over HTTP, useful for local development. In production, use a secure setting. - **`sharedConfig.serverAuthSetEmailVerified`** (default: `"t"`): automatically sets `email_verified` to `true` for all users. This is useful for testing environments where email verification is not necessary. - **`sharedConfig.serverAuthBasicAuthEnabled`** (default: `"t"`): enables basic authentication (using email and password) for users. Should be enabled if the system needs to support user logins via email/password. - **`sharedConfig.defaultAdminEmail`** (default: `"admin@example.com"`): specifies the email for the default administrator account. Change this to a valid email when deploying to production environments. - **`sharedConfig.defaultAdminPassword`** (default: `"Admin123!!"`): defines the password for the default administrator account. This should be changed to a strong password for production deployments. ### Additional Env Variables You can set additional environment variables for the backend services using the `env` object. For example: ```yaml sharedConfig: env: MY_ENV_VAR: "my-value" ``` This will set the environment variable `MY_ENV_VAR` to `"my-value"` for all backend services. These values will override any default environment settings for the services. ### Seeding Data The `sharedConfig` object also allows you to seed the database with a default tenant and user. The following environment variables are used for seeding: ````yaml The following environment variables are used to seed the database: ```yaml seed: defaultAdminEmail: "admin@example.com" # in exposed/production environments, change this to a valid email defaultAdminPassword: "Admin123!!" # in exposed/production environments, change this to a secure password env: ADMIN_NAME: "Admin User" DEFAULT_TENANT_NAME: "Default" DEFAULT_TENANT_SLUG: "default" DEFAULT_TENANT_ID: "707d0855-80ab-4e1f-a156-f1c4546cbf52" ```` --- # Setting up Hatchet with an external database ## Connecting to Postgres To connect to an external Postgres instance, set `postgres.enabled` to `false` in the `values.yaml` file. This will disable the internal Postgres instance and allow you to connect to an external database. You should then add the following configuration for the `hatchet-stack` or `hatchet-ha` charts: > Note: Either `DATABASE_URL` or `DATABASE_POSTGRES_*` are required ```yaml sharedConfig: env: DATABASE_URL: "postgres://:@:5432/?sslmode=disable" DATABASE_POSTGRES_HOST: "" DATABASE_POSTGRES_PORT: "5432" DATABASE_POSTGRES_USERNAME: "" DATABASE_POSTGRES_PASSWORD: "" DATABASE_POSTGRES_DB_NAME: "" DATABASE_POSTGRES_SSL_MODE: "disable" ``` ## Mounting environment variables Environment variables can also be mounted from secrets or configmaps via the `deploymentEnvFrom` field, which corresponds to the `envFrom` field in a Kubernetes deployment. For example, to mount the `DATABASE_URL` environment variable from a secret, you can use the following configuration: ```yaml hatchet-api: deploymentEnvFrom: - secretRef: name: hatchet-api-secrets key: DATABASE_URL hatchet-engine: deploymentEnvFrom: - secretRef: name: hatchet-api-secrets key: DATABASE_URL ``` For more information on mounting environment variables from secrets, refer to the [Kubernetes documentation](https://kubernetes.io/docs/tasks/inject-data-application/distribute-credentials-secure/#configure-all-key-value-pairs-in-a-secret-as-container-environment-variables). ## Migrations In order for migrations to run, the database user requires permissions to write and modify schemas **on a clean database**. It is therefore recommended to create a separate database instance where Hatchet can run and grant permissions on this database to the Hatchet user. For example, to create a new database and user `hatchet` in Postgres, run the following commands (**warning:** change the username/password for production usage): ```sql create database hatchet; create role hatchet with login password 'hatchet'; grant hatchet to postgres; alter database hatchet owner to hatchet; ``` --- # High Availability If you are running Hatchet in a high-throughput production environment, you may want to set up an HA (High Availability) configuration to ensure that your system remains available in the event of infrastructure failures or other issues. There are multiple levels that you can configure Hatchet to be high availability: - At the **database level** by using a managed Postgres provider like AWS RDS or Google Cloud SQL which supports HA options. - At the **RabbitMQ level** by configuring the RabbitMQ cluster to have at least 3 replicas across multiple zones within a region. - At the **Hatchet Engine/API level** by running multiple instances of the Hatchet engine behind a load balancer and splitting the different Hatchet services into separate deployments. This guide will focus on the last level of high availability. To view an end-to-end example of configuring Hatchet for high availability on GCP using Terraform, check out the GCP deployment guide [here](https://github.com/hatchet-dev/hatchet-infra-examples/blob/main/self-hosting/gcp) ## HA Helm Chart Hatchet offers an HA Helm chart that can be used to deploy Hatchet in a high availability configuration. To use this Helm chart: ```sh helm repo add hatchet https://hatchet-dev.github.io/hatchet-charts helm install hatchet-ha hatchet/hatchet-ha ``` This chart accepts the same parameters as `hatchet-stack` for the top-level `api`, `frontend`, `postgres` and `rabbitmq` objects, but you can additionally configure the following services: ```yaml grpc: replicaCount: 4 controllers: replicaCount: 2 scheduler: replicaCount: 2 ``` See the [Helm configuration](./kubernetes-helm-configuration) guide for more information on configuring the Hatchet Helm charts. --- # Configuration Options The Hatchet server and engine can be configured via environment variables using several prefixes. This document contains a comprehensive list of all 195+ available options organized by component. ## Environment Variable Prefixes Hatchet uses the following environment variable prefixes: - **`SERVER_`** (173 variables) - Main server configuration including runtime, authentication, encryption, monitoring, and integrations - **`DATABASE_`** (13 variables) - PostgreSQL database connection and configuration - **`READ_REPLICA_`** (4 variables) - Read replica database configuration - **`ADMIN_`** (3 variables) - Administrator user setup for initial seeding - **`DEFAULT_`** (3 variables) - Default tenant configuration - **`SCHEDULER_`** (1 variable) - Scheduler-specific rate limiting - **`SEED_`** (1 variable) - Development environment seeding - **`CACHE_`** (1 variable) - Cache duration settings _Note: This documentation excludes `HATCHET*CLIENT*_` variables which are specific to Go SDK client configuration.\* ## Required Environment Variables The following variables are **absolutely required** for Hatchet to start successfully: ### Encryption Keys (Required - Choose One Strategy) **Option A: Local Encryption Keys** ```bash SERVER_ENCRYPTION_MASTER_KEYSET="" SERVER_ENCRYPTION_JWT_PUBLIC_KEYSET="" SERVER_ENCRYPTION_JWT_PRIVATE_KEYSET="" ``` **Option B: File-based Keys** ```bash SERVER_ENCRYPTION_MASTER_KEYSET_FILE="/path/to/master.keyset" SERVER_ENCRYPTION_JWT_PUBLIC_KEYSET_FILE="/path/to/jwt-public.keyset" SERVER_ENCRYPTION_JWT_PRIVATE_KEYSET_FILE="/path/to/jwt-private.keyset" ``` **Option C: Google Cloud KMS** ```bash SERVER_ENCRYPTION_CLOUDKMS_ENABLED=true SERVER_ENCRYPTION_CLOUDKMS_KEY_URI="gcp-kms://your-key-uri" SERVER_ENCRYPTION_CLOUDKMS_CREDENTIALS_JSON="" ``` ### Authentication Secrets (Required) ```bash SERVER_AUTH_COOKIE_SECRETS=" " ``` ### Database Connection (Required) **Option A: Connection String** ```bash DATABASE_URL="postgresql://user:password@host:port/dbname" ``` **Option B: Individual Parameters** (uses defaults if not specified) ```bash DATABASE_POSTGRES_HOST=your-postgres-host DATABASE_POSTGRES_PASSWORD=your-secure-password ``` ## Minimal Configuration Example ```bash # Database DATABASE_URL='postgresql://hatchet:hatchet@127.0.0.1:5431/hatchet' # Encryption (using key files - recommended for development) SERVER_ENCRYPTION_MASTER_KEYSET_FILE=./keys/master.key SERVER_ENCRYPTION_JWT_PRIVATE_KEYSET_FILE=./keys/private_ec256.key SERVER_ENCRYPTION_JWT_PUBLIC_KEYSET_FILE=./keys/public_ec256.key # Authentication SERVER_AUTH_COOKIE_SECRETS="your-secret-key-1 your-secret-key-2" SERVER_AUTH_SET_EMAIL_VERIFIED=true # Basic server config SERVER_PORT=8080 SERVER_URL=http://localhost:8080 # Development settings (optional but recommended) SERVER_GRPC_INSECURE=true SERVER_INTERNAL_CLIENT_BASE_STRATEGY=none SERVER_LOGGER_LEVEL=error SERVER_LOGGER_FORMAT=console DATABASE_LOGGER_LEVEL=error DATABASE_LOGGER_FORMAT=console ``` Generate encryption keys with: ```bash go run ./cmd/hatchet-admin keyset create-local-keys --key-dir ./keys ``` ## Runtime Configuration Variables marked with ⚠️ are conditionally required when specific features are enabled. | Variable | Description | Default Value | | -------------------------------------------- | --------------------------------------- | ----------------------- | | `SERVER_PORT` | Port for the core server | `8080` | | `SERVER_URL` | Full server URL, including protocol | `http://localhost:8080` | | `SERVER_GRPC_PORT` | Port for the GRPC service | `7070` | | `SERVER_GRPC_BIND_ADDRESS` | GRPC server bind address | `127.0.0.1` | | `SERVER_GRPC_BROADCAST_ADDRESS` | GRPC server broadcast address | `127.0.0.1:7070` | | `SERVER_GRPC_INSECURE` | Controls if the GRPC server is insecure | `false` | | `SERVER_SHUTDOWN_WAIT` | Shutdown wait duration | `20s` | | `SERVER_ENFORCE_LIMITS` | Enforce tenant limits | `false` | | `SERVER_ALLOW_SIGNUP` | Allow new tenant signups | `true` | | `SERVER_ALLOW_INVITES` | Allow new invites | `true` | | `SERVER_ALLOW_CREATE_TENANT` | Allow tenant creation | `true` | | `SERVER_ALLOW_CHANGE_PASSWORD` | Allow password changes | `true` | | `SERVER_HEALTHCHECK` | Enable healthcheck endpoint | `true` | | `SERVER_HEALTHCHECK_PORT` | Healthcheck port | `8733` | | `SERVER_GRPC_MAX_MSG_SIZE` | gRPC max message size | `4194304` | | `SERVER_GRPC_RATE_LIMIT` | gRPC rate limit | `1000` | | `SCHEDULER_CONCURRENCY_RATE_LIMIT` | Scheduler concurrency rate limit | `20` | | `SCHEDULER_CONCURRENCY_POLLING_MIN_INTERVAL` | Minimum concurrency polling interval | `500ms` | | `SCHEDULER_CONCURRENCY_POLLING_MAX_INTERVAL` | Maximum concurrency polling interval | `5s` | | `SERVER_SERVICES` | Services to run | `["all"]` | | `SERVER_PAUSED_CONTROLLERS` | Paused controllers | | | `SERVER_ENABLE_DATA_RETENTION` | Enable data retention | `true` | | `SERVER_ENABLE_WORKER_RETENTION` | Enable worker retention | `false` | | `SERVER_MAX_PENDING_INVITES` | Max pending invites | `100` | | `SERVER_DISABLE_TENANT_PUBS` | Disable tenant pubsub | | | `SERVER_MAX_INTERNAL_RETRY_COUNT` | Max internal retry count | `10` | | `SERVER_PREVENT_TENANT_VERSION_UPGRADE` | Prevent tenant version upgrades | `false` | | `SERVER_DEFAULT_ENGINE_VERSION` | Default engine version | `V1` | | `SERVER_REPLAY_ENABLED` | Enable task replay | `true` | ## Database Configuration | Variable | Description | Default Value | | ---------------------------- | ---------------------------- | ------------------- | | `DATABASE_URL` | PostgreSQL connection string | `127.0.0.1` | | `DATABASE_POSTGRES_HOST` | PostgreSQL host | `127.0.0.1` | | `DATABASE_POSTGRES_PORT` | PostgreSQL port | `5431` | | `DATABASE_POSTGRES_USERNAME` | PostgreSQL username | `hatchet` | | `DATABASE_POSTGRES_PASSWORD` | PostgreSQL password | `hatchet` | | `DATABASE_POSTGRES_DB_NAME` | PostgreSQL database name | `hatchet` | | `DATABASE_POSTGRES_SSL_MODE` | PostgreSQL SSL mode | `disable` | | `DATABASE_MAX_CONNS` | Max database connections | `50` | | `DATABASE_MIN_CONNS` | Min database connections | `10` | | `DATABASE_MAX_QUEUE_CONNS` | Max queue connections | `50` | | `DATABASE_MIN_QUEUE_CONNS` | Min queue connections | `10` | | `DATABASE_LOG_QUERIES` | Log database queries | `false` | | `CACHE_DURATION` | Cache duration | `5s` | | `ADMIN_EMAIL` | Admin email for seeding | `admin@example.com` | | `ADMIN_PASSWORD` | Admin password for seeding | `Admin123!!` | | `ADMIN_NAME` | Admin name for seeding | `Admin` | | `DEFAULT_TENANT_NAME` | Default tenant name | `Default` | | `DEFAULT_TENANT_SLUG` | Default tenant slug | `default` | | `DEFAULT_TENANT_ID` | Default tenant ID | | | `SEED_DEVELOPMENT` | Development seeding flag | | | `READ_REPLICA_ENABLED` | Enable read replica | `false` | | `READ_REPLICA_DATABASE_URL` | Read replica database URL | | | `READ_REPLICA_MAX_CONNS` | Read replica max connections | `50` | | `READ_REPLICA_MIN_CONNS` | Read replica min connections | `10` | | `DATABASE_LOGGER_LEVEL` | Database logger level | | | `DATABASE_LOGGER_FORMAT` | Database logger format | | ## Security Check Configuration | Variable | Description | Default Value | | -------------------------------- | ----------------------- | ------------------------------ | | `SERVER_SECURITY_CHECK_ENABLED` | Enable security check | `true` | | `SERVER_SECURITY_CHECK_ENDPOINT` | Security check endpoint | `https://security.hatchet.run` | ## Limit Configuration | Variable | Description | Default Value | | ----------------------------------------------- | ------------------------------- | ------------- | | `SERVER_LIMITS_DEFAULT_TENANT_RETENTION_PERIOD` | Default tenant retention period | `720h` | | `SERVER_LIMITS_DEFAULT_WORKER_LIMIT` | Default worker limit | `4` | | `SERVER_LIMITS_DEFAULT_WORKER_ALARM_LIMIT` | Default worker alarm limit | `2` | | `SERVER_LIMITS_DEFAULT_EVENT_LIMIT` | Default event limit | `1000` | | `SERVER_LIMITS_DEFAULT_EVENT_ALARM_LIMIT` | Default event alarm limit | `750` | | `SERVER_LIMITS_DEFAULT_EVENT_WINDOW` | Default event window | `24h` | | `SERVER_LIMITS_DEFAULT_CRON_LIMIT` | Default cron limit | `5` | | `SERVER_LIMITS_DEFAULT_CRON_ALARM_LIMIT` | Default cron alarm limit | `2` | | `SERVER_LIMITS_DEFAULT_SCHEDULE_LIMIT` | Default schedule limit | `1000` | | `SERVER_LIMITS_DEFAULT_SCHEDULE_ALARM_LIMIT` | Default schedule alarm limit | `750` | | `SERVER_LIMITS_DEFAULT_TASK_RUN_LIMIT` | Default task run limit | `2000` | | `SERVER_LIMITS_DEFAULT_TASK_RUN_ALARM_LIMIT` | Default task run alarm limit | `1500` | | `SERVER_LIMITS_DEFAULT_TASK_RUN_WINDOW` | Default task run window | `24h` | | `SERVER_LIMITS_DEFAULT_WORKER_SLOT_LIMIT` | Default worker slot limit | `4000` | | `SERVER_LIMITS_DEFAULT_WORKER_SLOT_ALARM_LIMIT` | Default worker slot alarm limit | `3000` | ## Alerting Configuration | Variable | Description | Default Value | | -------------------------------------- | ---------------------------------------- | ------------- | | `SERVER_ALERTING_SENTRY_ENABLED` | Enable Sentry for alerting | | | `SERVER_ALERTING_SENTRY_DSN` | Sentry DSN | | | `SERVER_ALERTING_SENTRY_ENVIRONMENT` | Sentry environment | `development` | | `SERVER_ALERTING_SENTRY_SAMPLE_RATE` | Sentry sample rate | `1.0` | | `SERVER_ANALYTICS_POSTHOG_ENABLED` | Enable PostHog analytics | | | `SERVER_ANALYTICS_POSTHOG_API_KEY` | PostHog API key | | | `SERVER_ANALYTICS_POSTHOG_ENDPOINT` | PostHog endpoint | | | `SERVER_ANALYTICS_POSTHOG_FE_API_HOST` | PostHog frontend API host | | | `SERVER_ANALYTICS_POSTHOG_FE_API_KEY` | PostHog frontend API key | | | `SERVER_PYLON_ENABLED` | Enable Pylon | | | `SERVER_PYLON_APP_ID` ⚠️ | Pylon app ID (required if Pylon enabled) | | | `SERVER_PYLON_SECRET` | Pylon secret | | ## Encryption Configuration | Variable | Description | Default Value | | --------------------------------------------- | ---------------------------------------------- | ------------- | | `SERVER_ENCRYPTION_MASTER_KEYSET` | Raw master keyset, base64-encoded JSON string | | | `SERVER_ENCRYPTION_MASTER_KEYSET_FILE` | Path to the master keyset file | | | `SERVER_ENCRYPTION_JWT_PUBLIC_KEYSET` | Public JWT keyset, base64-encoded JSON string | | | `SERVER_ENCRYPTION_JWT_PUBLIC_KEYSET_FILE` | Path to the public JWT keyset file | | | `SERVER_ENCRYPTION_JWT_PRIVATE_KEYSET` | Private JWT keyset, base64-encoded JSON string | | | `SERVER_ENCRYPTION_JWT_PRIVATE_KEYSET_FILE` | Path to the private JWT keyset file | | | `SERVER_ENCRYPTION_CLOUDKMS_ENABLED` | Whether Google Cloud KMS is enabled | `false` | | `SERVER_ENCRYPTION_CLOUDKMS_KEY_URI` | URI of the key in Google Cloud KMS | | | `SERVER_ENCRYPTION_CLOUDKMS_CREDENTIALS_JSON` | JSON credentials for Google Cloud KMS | | ## Authentication Configuration | Variable | Description | Default Value | | -------------------------------------- | ----------------------------------------------------------- | -------------------------------- | | `SERVER_AUTH_RESTRICTED_EMAIL_DOMAINS` | Restricted email domains | | | `SERVER_AUTH_BASIC_AUTH_ENABLED` | Whether basic auth is enabled | `true` | | `SERVER_AUTH_SET_EMAIL_VERIFIED` | Whether the user's email is set to verified automatically | `false` | | `SERVER_AUTH_COOKIE_NAME` | Name of the cookie | `hatchet` | | `SERVER_AUTH_COOKIE_DOMAIN` | Domain for the cookie | | | `SERVER_AUTH_COOKIE_SECRETS` | Cookie secrets | | | `SERVER_AUTH_COOKIE_INSECURE` | Whether the cookie is insecure | `false` | | `SERVER_AUTH_GOOGLE_ENABLED` | Whether Google auth is enabled | `false` | | `SERVER_AUTH_GOOGLE_CLIENT_ID` ⚠️ | Google auth client ID (required if Google auth enabled) | | | `SERVER_AUTH_GOOGLE_CLIENT_SECRET` ⚠️ | Google auth client secret (required if Google auth enabled) | | | `SERVER_AUTH_GOOGLE_SCOPES` | Google auth scopes | `["openid", "profile", "email"]` | | `SERVER_AUTH_GITHUB_ENABLED` | Whether GitHub auth is enabled | `false` | | `SERVER_AUTH_GITHUB_CLIENT_ID` ⚠️ | GitHub auth client ID (required if GitHub auth enabled) | | | `SERVER_AUTH_GITHUB_CLIENT_SECRET` ⚠️ | GitHub auth client secret (required if GitHub auth enabled) | | | `SERVER_AUTH_GITHUB_SCOPES` | GitHub auth scopes | `["read:user", "user:email"]` | ## Task Queue Configuration | Variable | Description | Default Value | | --------------------------------- | ------------------------ | ------------- | | `SERVER_MSGQUEUE_KIND` | Message queue kind | `rabbitmq` | | `SERVER_MSGQUEUE_RABBITMQ_URL` | RabbitMQ URL | | | `SERVER_MSGQUEUE_RABBITMQ_QOS` | RabbitMQ QoS | `100` | | `SERVER_REQUEUE_LIMIT` | Requeue limit | `100` | | `SERVER_SINGLE_QUEUE_LIMIT` | Single queue limit | `100` | | `SERVER_UPDATE_HASH_FACTOR` | Update hash factor | `100` | | `SERVER_UPDATE_CONCURRENT_FACTOR` | Update concurrent factor | `10` | ## TLS Configuration | Variable | Description | Default Value | | -------------------------------------------------------- | -------------------------------- | ------------- | | `SERVER_TLS_STRATEGY` | TLS strategy | | | `SERVER_TLS_CERT` | TLS certificate | | | `SERVER_TLS_CERT_FILE` | Path to the TLS certificate file | | | `SERVER_TLS_KEY` | TLS key | | | `SERVER_TLS_KEY_FILE` | Path to the TLS key file | | | `SERVER_TLS_ROOT_CA` | TLS root CA | | | `SERVER_TLS_ROOT_CA_FILE` | Path to the TLS root CA file | | | `SERVER_TLS_SERVER_NAME` | TLS server name | | | `SERVER_INTERNAL_CLIENT_BASE_STRATEGY` | Internal client TLS strategy | | | `SERVER_INTERNAL_CLIENT_BASE_INHERIT_BASE` | Inherit base TLS config | `true` | | `SERVER_INTERNAL_CLIENT_TLS_BASE_CERT` | Internal client TLS cert | | | `SERVER_INTERNAL_CLIENT_TLS_BASE_CERT_FILE` | Internal client TLS cert file | | | `SERVER_INTERNAL_CLIENT_TLS_BASE_KEY` | Internal client TLS key | | | `SERVER_INTERNAL_CLIENT_TLS_BASE_KEY_FILE` | Internal client TLS key file | | | `SERVER_INTERNAL_CLIENT_TLS_BASE_ROOT_CA` | Internal client TLS root CA | | | `SERVER_INTERNAL_CLIENT_TLS_BASE_ROOT_CA_FILE` | Internal client TLS root CA file | | | `SERVER_INTERNAL_CLIENT_TLS_SERVER_NAME` | Internal client TLS server name | | | `SERVER_INTERNAL_CLIENT_INTERNAL_GRPC_BROADCAST_ADDRESS` | Internal gRPC broadcast address | | ## Logging Configuration | Variable | Description | Default Value | | ------------------------------------------- | ----------------------- | ------------- | | `SERVER_LOGGER_LEVEL` | Logger level | | | `SERVER_LOGGER_FORMAT` | Logger format | | | `SERVER_LOG_INGESTION_ENABLED` | Enable log ingestion | `true` | | `SERVER_ADDITIONAL_LOGGERS_QUEUE_LEVEL` | Queue logger level | | | `SERVER_ADDITIONAL_LOGGERS_QUEUE_FORMAT` | Queue logger format | | | `SERVER_ADDITIONAL_LOGGERS_PGXSTATS_LEVEL` | PGX stats logger level | | | `SERVER_ADDITIONAL_LOGGERS_PGXSTATS_FORMAT` | PGX stats logger format | | ## OpenTelemetry Configuration | Variable | Description | Default Value | | ----------------------------------- | ---------------------------------------------------------- | ------------- | | `SERVER_OTEL_SERVICE_NAME` | Service name for OpenTelemetry | | | `SERVER_OTEL_COLLECTOR_URL` | Collector URL for OpenTelemetry | | | `SERVER_OTEL_INSECURE` | Whether to use an insecure connection to the collector URL | | | `SERVER_OTEL_TRACE_ID_RATIO` | OpenTelemetry trace ID ratio | | | `SERVER_OTEL_COLLECTOR_AUTH` | OpenTelemetry Collector Authorization header value | | | `SERVER_OTEL_METRICS_ENABLED` | Enable OpenTelemetry metrics collection | `false` | | `SERVER_PROMETHEUS_ENABLED` | Enable Prometheus | `false` | | `SERVER_PROMETHEUS_ADDRESS` | Prometheus address | `:9090` | | `SERVER_PROMETHEUS_PATH` | Prometheus metrics path | `/metrics` | | `SERVER_PROMETHEUS_SERVER_URL` | Prometheus server URL | | | `SERVER_PROMETHEUS_SERVER_USERNAME` | Prometheus server username | | | `SERVER_PROMETHEUS_SERVER_PASSWORD` | Prometheus server password | | ## Tenant Alerting Configuration | Variable | Description | Default Value | | -------------------------------------------- | ----------------------------------- | ---------------------- | | `SERVER_TENANT_ALERTING_SLACK_ENABLED` | Enable Slack for tenant alerting | | | `SERVER_TENANT_ALERTING_SLACK_CLIENT_ID` | Slack client ID | | | `SERVER_TENANT_ALERTING_SLACK_CLIENT_SECRET` | Slack client secret | | | `SERVER_TENANT_ALERTING_SLACK_SCOPES` | Slack scopes | `["incoming-webhook"]` | | `SERVER_EMAIL_KIND` | Email integration kind | `postmark` | | `SERVER_EMAIL_POSTMARK_ENABLED` | Enable Postmark | | | `SERVER_EMAIL_POSTMARK_SERVER_KEY` | Postmark server key | | | `SERVER_EMAIL_POSTMARK_FROM_EMAIL` | Postmark from email | | | `SERVER_EMAIL_POSTMARK_FROM_NAME` | Postmark from name | `Hatchet Support` | | `SERVER_EMAIL_POSTMARK_SUPPORT_EMAIL` | Postmark support email | | | `SERVER_EMAIL_SMTP_ENABLED` | Enable SMTP | | | `SERVER_EMAIL_SMTP_SERVER_ADDR` | SMTP server address | | | `SERVER_EMAIL_SMTP_FROM_EMAIL` | SMTP from email | | | `SERVER_EMAIL_SMTP_FROM_NAME` | SMTP from name | `Hatchet Support` | | `SERVER_EMAIL_SMTP_SUPPORT_EMAIL` | SMTP support email | | | `SERVER_EMAIL_SMTP_AUTH_USERNAME` | SMTP authentication username | | | `SERVER_EMAIL_SMTP_AUTH_PASSWORD` | SMTP authentication password | | | `SERVER_MONITORING_ENABLED` | Enable monitoring | `true` | | `SERVER_MONITORING_PERMITTED_TENANTS` | Permitted tenants for monitoring | | | `SERVER_MONITORING_PROBE_TIMEOUT` | Monitoring probe timeout | `30s` | | `SERVER_MONITORING_TLS_ROOT_CA_FILE` | Monitoring TLS root CA file | | | `SERVER_SAMPLING_ENABLED` | Enable sampling | `false` | | `SERVER_SAMPLING_RATE` | Sampling rate | `1.0` | | `SERVER_OPERATIONS_JITTER` | Operations jitter in milliseconds | `0` | | `SERVER_OPERATIONS_POLL_INTERVAL` | Operations poll interval in seconds | `2` | ## Cron Operations Configuration | Variable | Description | Default Value | | --------------------------------------------------- | --------------------------------------------------------------------- | ------------- | | `SERVER_CRON_OPERATIONS_TASK_ANALYZE_CRON_INTERVAL` | Interval for running ANALYZE on task-related tables | `3h` | | `SERVER_CRON_OPERATIONS_OLAP_ANALYZE_CRON_INTERVAL` | Interval for running ANALYZE on OLAP/analytics tables | `3h` | | `SERVER_CRON_OPERATIONS_DB_HEALTH_METRICS_INTERVAL` | Interval for collecting database health metrics (OTel) | `60s` | | `SERVER_CRON_OPERATIONS_OLAP_METRICS_INTERVAL` | Interval for collecting OLAP metrics (OTel) | `5m` | | `SERVER_CRON_OPERATIONS_WORKER_METRICS_INTERVAL` | Interval for collecting worker metrics (OTel) | `60s` | | `SERVER_CRON_OPERATIONS_YESTERDAY_RUN_COUNT_HOUR` | Hour (0-23) at which to collect yesterday's workflow run count (OTel) | `0` | | `SERVER_CRON_OPERATIONS_YESTERDAY_RUN_COUNT_MINUTE` | Minute (0-59) at which to collect yesterday's workflow run count | `5` | | `SERVER_WAIT_FOR_FLUSH` | Default wait for flush | `1ms` | | `SERVER_MAX_CONCURRENT` | Default max concurrent | `50` | | `SERVER_FLUSH_PERIOD_MILLISECONDS` | Default flush period | `10ms` | | `SERVER_FLUSH_ITEMS_THRESHOLD` | Default flush threshold | `100` | | `SERVER_FLUSH_STRATEGY` | Default flush strategy | `DYNAMIC` | ## OLAP Database Configuration | Variable | Description | Default Value | | ------------------------------------------------- | ------------------------------------------------ | ------------- | | `SERVER_OLAP_STATUS_UPDATE_DAG_BATCH_SIZE_LIMIT` | Batch size limit for running DAG status updates | `1000` | | `SERVER_OLAP_STATUS_UPDATE_TASK_BATCH_SIZE_LIMIT` | Batch size limit for running task status updates | `1000` | --- ## Prometheus Metrics for Hatchet > **Warning:** Only works with v1 tenants This document provides an overview of the Prometheus metrics exposed by Hatchet, setup instructions for the metrics endpoint, and example PromQL queries to analyze them. ### Setup To enable Prometheus metrics for your Hatchet instance, you can set the following environment variables. The corresponding configuration YAML values are mentioned in parantheses. If you are deploying [Hatchet in HA mode](/self-hosting/high-availability), these should be set on both the `controllers` as well as `scheduler` deployments. - Required - **`SERVER_PROMETHEUS_ENABLED`** (`prometheus.enabled`) - Default: `false` - Description: Enables or disables the Prometheus metrics HTTP server. - Optional - **`SERVER_PROMETHEUS_ADDRESS`** (`prometheus.address`) - Default: `":9090"` - Description: The network address and port to bind the Prometheus metrics server to. - **`SERVER_PROMETHEUS_PATH`** (`prometheus.path`) - Default: `"/metrics"` - Description: The HTTP path at which metrics will be exposed. Once enabled, you can setup any scraper that supports ingesting Prometheus metrics. #### Tenant metrics endpoint > **Info:** This step requires communication with a service that scrapes Hatchet > Prometheus metrics. To enable the [tenant API endpoint](/home/prometheus-metrics) you can set the following environment variables: - Required - **`SERVER_PROMETHEUS_SERVER_URL`** (`prometheus.prometheusServerURL`) - Description: The Prometheus server URL. - Optional - **`SERVER_PROMETHEUS_SERVER_USERNAME`** (`prometheus.prometheusServerUsername`) - Description: The username to access the Prometheus instance via HTTP basic auth. - **`SERVER_PROMETHEUS_SERVER_PASSWORD`** (`prometheus.prometheusServerPassword`) - Description: The password to access the Prometheus instance via HTTP basic auth. **Example environment setup:** ```bash export SERVER_PROMETHEUS_ENABLED=true export SERVER_PROMETHEUS_ADDRESS=":9999" export SERVER_PROMETHEUS_PATH="/custom-metrics" ``` Restart your Hatchet server after setting these variables to apply the changes. --- ### Global Metrics | Metric Name | Type | Description | | ------------------------------------ | --------- | --------------------------------------------------------------------------------- | | `hatchet_queue_invocations_total` | Counter | The total number of invocations of the queuer function | | `hatchet_created_tasks_total` | Counter | The total number of tasks created | | `hatchet_retried_tasks_total` | Counter | The total number of tasks retried | | `hatchet_succeeded_tasks_total` | Counter | The total number of tasks that succeeded | | `hatchet_failed_tasks_total` | Counter | The total number of tasks that failed (in a final state, not including retries) | | `hatchet_skipped_tasks_total` | Counter | The total number of tasks that were skipped | | `hatchet_cancelled_tasks_total` | Counter | The total number of tasks cancelled | | `hatchet_assigned_tasks_total` | Counter | The total number of tasks assigned to a worker | | `hatchet_scheduling_timed_out_total` | Counter | The total number of tasks that timed out while waiting to be scheduled | | `hatchet_rate_limited_total` | Counter | The total number of tasks that were rate limited | | `hatchet_queued_to_assigned_total` | Counter | The total number of unique tasks that were queued and later assigned to a worker | | `hatchet_queued_to_assigned_seconds` | Histogram | Buckets of time (in seconds) spent in the queue before being assigned to a worker | | `hatchet_reassigned_tasks_total` | Counter | The total number of tasks that were reassigned to a worker | #### Example PromQL Queries ##### 1. Rate of calls to the queuer method ```promql rate(hatchet_queue_invocations_total[5m]) ``` ##### 2. Average queue time in milliseconds ```promql # Calculates average queue time over the past 5 minutes, converted to ms rate(hatchet_queued_to_assigned_seconds_sum[5m]) / rate(hatchet_queued_to_assigned_seconds_count[5m]) * 1e3 ``` ##### 3. Success and failure rates ```promql rate(hatchet_succeeded_tasks_total[5m]) rate(hatchet_failed_tasks_total[5m]) ``` ##### 4. Queue time distribution (histogram) ```promql sum by (le) ( rate(hatchet_queued_to_assigned_seconds_bucket[5m]) ) ``` ##### 5. Rate of tasks created vs. retried ```promql rate(hatchet_created_tasks_total[5m]) rate(hatchet_retried_tasks_total[5m]) ``` ##### 6. Task Assignment Rate ```promql rate(hatchet_assigned_tasks_total[5m]) ``` ##### 7. Scheduling Timeout Rate ```promql rate(hatchet_scheduling_timed_out_total[5m]) ``` ##### 8. Rate Limiting Impact ```promql rate(hatchet_rate_limited_total[5m]) ``` ##### 9. Task Completion Ratio (Success vs Total) ```promql rate(hatchet_succeeded_tasks_total[5m]) / (rate(hatchet_succeeded_tasks_total[5m]) + rate(hatchet_failed_tasks_total[5m])) ``` ##### 10. Task Cancellation Rate ```promql rate(hatchet_cancelled_tasks_total[5m]) ``` ##### 11. Task Skip Rate ```promql rate(hatchet_skipped_tasks_total[5m]) ``` ##### 12. Queue Processing Efficiency (Assigned vs Created) ```promql rate(hatchet_assigned_tasks_total[5m]) / rate(hatchet_created_tasks_total[5m]) ``` ##### 13. Task Reassignment Rate ```promql rate(hatchet_reassigned_tasks_total[5m]) ``` ### Tenant Metrics | Metric Name | Type | Description | | ------------------------------------------------ | --------- | ------------------------------------------------------------------------------------ | | `hatchet_tenant_workflow_duration_milliseconds` | Histogram | Duration of workflow execution in milliseconds (DAGs and single tasks) | | `hatchet_tenant_queue_invocations_total` | Counter | The total number of invocations of the queuer function | | `hatchet_tenant_created_tasks_total` | Counter | The total number of tasks created | | `hatchet_tenant_retried_tasks_total` | Counter | The total number of tasks retried | | `hatchet_tenant_succeeded_tasks_total` | Counter | The total number of tasks that succeeded | | `hatchet_tenant_failed_tasks_total` | Counter | The total number of tasks that failed (in a final state, not including retries) | | `hatchet_tenant_skipped_tasks_total` | Counter | The total number of tasks that were skipped | | `hatchet_tenant_cancelled_tasks_total` | Counter | The total number of tasks cancelled | | `hatchet_tenant_assigned_tasks` | Counter | The total number of tasks assigned to a worker | | `hatchet_tenant_scheduling_timed_out` | Counter | The total number of tasks that timed out while waiting to be scheduled | | `hatchet_tenant_rate_limited` | Counter | The total number of tasks that were rate limited | | `hatchet_tenant_queued_to_assigned` | Counter | The total number of unique tasks that were queued and later got assigned to a worker | | `hatchet_tenant_queued_to_assigned_time_seconds` | Histogram | Buckets of time in seconds spent in the queue before being assigned to a worker | | `hatchet_tenant_reassigned_tasks` | Counter | The total number of tasks that were reassigned to a worker | | `hatchet_tenant_used_worker_slots` | Gauge | The current number of worker slots being used | | `hatchet_tenant_available_worker_slots` | Gauge | The current number of worker slots available (free) | | `hatchet_tenant_worker_slots` | Gauge | The total number of worker slots (free + used) | #### Example PromQL Queries ##### 1. Workflow Duration by Tenant and Status ```promql rate(hatchet_tenant_workflow_duration_milliseconds_sum[5m]) by (tenant_id, workflow_name, status) / rate(hatchet_tenant_workflow_duration_milliseconds_count[5m]) by (tenant_id, workflow_name, status) ``` ##### 2. Tenant Queue Performance (95th percentile) ```promql histogram_quantile(0.95, rate(hatchet_tenant_queued_to_assigned_time_seconds_bucket[5m]) ) by (tenant_id) ``` ##### 3. Tenant Error Rate by Workflow ```promql rate(hatchet_tenant_failed_tasks_total[5m]) by (tenant_id) / rate(hatchet_tenant_created_tasks_total[5m]) by (tenant_id) ``` ##### 4. Tenant Task Throughput ```promql rate(hatchet_tenant_succeeded_tasks_total[5m]) by (tenant_id) ``` ##### 5. Tenant Retry Rate ```promql rate(hatchet_tenant_retried_tasks_total[5m]) by (tenant_id) / rate(hatchet_tenant_created_tasks_total[5m]) by (tenant_id) ``` ##### 6. Workflow Duration Distribution by Tenant ```promql sum by (tenant_id, le) ( rate(hatchet_tenant_workflow_duration_milliseconds_bucket[5m]) ) ``` ##### 7. Tenant Rate Limiting Impact ```promql rate(hatchet_tenant_rate_limited[5m]) by (tenant_id) ``` ##### 8. Per-Tenant Queue Utilization ```promql rate(hatchet_tenant_queue_invocations_total[5m]) by (tenant_id) ``` ##### 9. Tenant Scheduling Timeouts ```promql rate(hatchet_tenant_scheduling_timed_out[5m]) by (tenant_id) ``` ##### 10. Tenant Task Assignment Success Rate ```promql rate(hatchet_tenant_assigned_tasks[5m]) by (tenant_id) / rate(hatchet_tenant_created_tasks_total[5m]) by (tenant_id) ``` ##### 11. Tenant Task Reassignment Rate ```promql rate(hatchet_tenant_reassigned_tasks[5m]) by (tenant_id) ``` ### Cross-Tenant Analysis #### Example PromQL Queries ##### 1. Top 5 Tenants by Task Volume ```promql topk(5, sum by (tenant_id) ( rate(hatchet_tenant_created_tasks_total[1h]) ) ) ``` ##### 2. Slowest Workflows Across All Tenants ```promql topk(10, rate(hatchet_tenant_workflow_duration_milliseconds_sum[5m]) / rate(hatchet_tenant_workflow_duration_milliseconds_count[5m]) ) by (tenant_id, workflow_name) ``` ##### 3. Tenant Resource Consumption Comparison ```promql sum by (tenant_id) ( rate(hatchet_tenant_workflow_duration_milliseconds_sum[1h]) ) / 1000 / 60 # Convert to minutes ``` ### Integration with Prometheus This endpoint can be used to configure Prometheus to scrape tenant-specific metrics: ```yaml scrape_configs: - job_name: "hatchet-tenant-metrics" static_configs: - targets: ["cloud.onhatchet.run"] metrics_path: "/api/v1/tenants/707d0855-80ab-4e1f-a156-f1c4546cbf52/prometheus-metrics" scheme: "https" authorization: credentials: "your-api-token-here" ``` **Note:** Replace `cloud.onhatchet.run` with the URL where your Hatchet instance is hosted. This provides tenant-isolated metrics that can be scraped directly by Prometheus or consumed by other monitoring tools that support the Prometheus text format. --- # Worker Configuration Options The Hatchet worker can be configured via environment variables and programmatic options. This document contains a list of all available options. ## Basic Configuration | Variable | Description | Default Value | | ----------------------------------------- | ----------------------------------- | ----------------------- | | `HATCHET_CLIENT_TOKEN` | Authentication token for the worker | | | `HATCHET_CLIENT_HOST_PORT` | GRPC server host and port | \* Inherited from token | | `HATCHET_CLIENT_API_URL` (TypeScript SDK) | API server host and port | \* Inherited from token | | `HATCHET_CLIENT_SERVER_URL` (Go SDK) | API server host and port | \* Inherited from token | | `HATCHET_CLIENT_NAMESPACE` | Namespace prefix for the worker | \* Inherited from token | ## Worker Runtime Configuration | Variable | Description | Default Value | | --------------- | ------------------------------------------ | ------------- | | `name` | Friendly name of the worker | | | `slots` | Maximum number of concurrent runs | `100` | | `durable_slots` | Maximum number of concurrent durable tasks | `1000` | ## Worker healthcheck server (Python SDK) These variables enable a local HTTP server that exposes `/health` and `/metrics` for a running worker. | Variable | Description | Default Value | | ---------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------ | ------------- | | `HATCHET_CLIENT_WORKER_HEALTHCHECK_ENABLED` | Enable the local worker healthcheck server | `false` | | `HATCHET_CLIENT_WORKER_HEALTHCHECK_PORT` | Port for the local worker healthcheck server | `8001` | | `HATCHET_CLIENT_WORKER_HEALTHCHECK_EVENT_LOOP_BLOCK_THRESHOLD_SECONDS` | If the worker listener process event loop is blocked longer than this threshold, `/health` returns 503 | `5.0` | ## TLS Configuration | Variable | Description | Default Value | | --------------------------------- | ------------------------------ | ------------- | | `HATCHET_CLIENT_TLS_STRATEGY` | TLS strategy (tls, mtls, none) | `tls` | | `HATCHET_CLIENT_TLS_CERT_FILE` | Path to TLS certificate file | | | `HATCHET_CLIENT_TLS_KEY_FILE` | Path to TLS key file | | | `HATCHET_CLIENT_TLS_ROOT_CA_FILE` | Path to TLS root CA file | | | `HATCHET_CLIENT_TLS_SERVER_NAME` | TLS server name | | ## Logging Configuration | Variable | Description | Default Value | | --------------------------------------------- | --------------------------------------------------- | ------------- | | `HATCHET_CLIENT_LOG_LEVEL` | Log level for the worker | `INFO` | | `HATCHET_CLIENT_GRPC_MAX_RECV_MESSAGE_LENGTH` | Maximum gRPC message receive size (Python SDK only) | `4MB` | | `HATCHET_CLIENT_GRPC_MAX_SEND_MESSAGE_LENGTH` | Maximum gRPC message send size (Python SDK only) | `4MB` | --- # Downgrading Hatchet Versions This guide explains how to safely downgrade your Hatchet instance to a previous version. > **Warning:** Downgrading may result in data loss. Always test downgrades in a > non-production environment first. ## Overview Downgrading Hatchet involves two steps: 1. Running down migrations to revert database schema changes 2. Deploying the older Hatchet version ## Prerequisites - **Critical:** Backup your database before downgrading - Ensure the target version supports the current data in your database - Have access to run `hatchet-migrate` command - Verify that all migrations between your current version and target version have down migrations ## Finding the Target Migration Version To downgrade to a specific Hatchet version, you need to identify the last migration that was included in that version. Visit the Hatchet GitHub repository for your target version: ``` https://github.com/hatchet-dev/hatchet/tree/{GIT_TAG}/cmd/hatchet-migrate/migrate/migrations ``` Replace `{GIT_TAG}` with your target version (e.g., `v0.71.0`). Find the last migration file in that directory - the timestamp at the beginning of the filename is your target migration version. **Example:** - Target Hatchet version: `v0.71.0` - Last migration file: `20250813183355_v1_0_36.sql` - Migration version: `20250813183355` ## Running Down Migrations > **Info:** Use a stable release of the `hatchet-migrate` binary (avoid alpha tags) from > the [Hatchet releases page](https://github.com/hatchet-dev/hatchet/tags) to > ensure down migrations work correctly. Once you have identified the target migration version, use the `hatchet-migrate` command with the `--down` flag: ```bash hatchet-migrate --down 20241023223039 ``` This will: 1. Connect to your database using the `DATABASE_URL` environment variable 2. Check the current migration version 3. Run all down migrations from the current version to the target version 4. Display progress and confirm when complete ## Deploying the Older Version After successfully running the down migrations, deploy the older Hatchet version: ### Docker Compose Update your `docker-compose.yml`: ```yaml services: hatchet-engine: image: ghcr.io/hatchet-dev/hatchet/hatchet-engine:v0.71.0 # ... rest of configuration hatchet-dashboard: image: ghcr.io/hatchet-dev/hatchet/hatchet-dashboard:v0.71.0 # ... rest of configuration ``` Then restart: ```bash docker-compose down docker-compose up -d ``` --- # Benchmarking Hatchet This page provides example benchmarks for Hatchet throughput and latency on an 8 CPU database (Amazon RDS, `m7g.2xlarge` instance type). These benchmarks were all run against a v1 Hatchet engine running version `v0.55.26`. For more information on the setup, see the [Setup](#setup) section. Note that on better hardware, there will be significantly better performance: we have tested up to 10k/s on an `m7g.8xlarge` instance. The best way to benchmark Hatchet is to run your own benchmarks in your own environment. The benchmarks below are provided as a reference point for what you might expect to see in a typical setup. To run your own benchmarks, see the [Running your own benchmarks](#running-your-own-benchmarks) section. ## Throughput Below are summarized throughput benchmarks run at different incoming event rates. For each run, we note the database CPU utilization and estimated IOPS, which are the most relevant metrics for tracking performance on the database. | Throughput (runs/s) | Database CPU | Database IOPs | | ------------------- | ------------ | ------------- | | 100 | 15% | 400 | | 500 | 60% | 600 | | 2000 | 83% | 800 | ## Latency Benchmarks run using event-based triggering: this approximately doubles the queueing time of a workflow. The average latency of events in Hatchet can be approximated by two measurements that Hatchet reports: - **Average execution time per executed event**: The time from when the event starts execution to when it completes. - **Average write time per event**: The acknowledgement time for Hatchet to write the event. Below is a table summarizing these latencies: | Throughput (runs/s) | Average Execution Time (ms) | Average Write Time (ms) | | ------------------- | --------------------------- | ----------------------- | | 100 | ~40 | ~2.5 | | 500 | ~48 | ~2.6 | | 2000 | ~220 | ~5.7 | For workloads up to around 100-500 events per second, the latency remains relatively low. As throughput scales toward 2000 events per second, the overall average execution time increases (though the Hatchet engine remained stable throughout the tests). ## Running your own benchmarks Hatchet publishes a public load testing container which can be used for benchmarking. This container is available at `ghcr.io/hatchet-dev/hatchet/hatchet-loadtest`. It acts as a Hatchet worker and event emitter, so it simply expects a `HATCHET_CLIENT_TOKEN` to be set in the environment. For example, to run 100 events/second for 60 seconds, you can use the following command: ```bash docker run -e HATCHET_CLIENT_TOKEN=your-token ghcr.io/hatchet-dev/hatchet/hatchet-loadtest -e "100" -d "60s" --level "warn" --slots "100" ``` The event emitter which is bundled into the container has difficulty emitting more than 2k events/s. As a result, to test higher throughputs, it is recommended to run multiple containers in parallel. Since each container manages its own workflows and worker, it is recommended to use the `HATCHET_CLIENT_NAMESPACE` environment variable to ensure that workflows are not duplicated across containers. For example: ```bash # first container docker run -e HATCHET_CLIENT_TOKEN=your-token -e HATCHET_CLIENT_NAMESPACE=loadtest1 ghcr.io/hatchet-dev/hatchet/hatchet-loadtest -e "2000" -d "60s" --level "warn" --slots "100" # second container docker run -e HATCHET_CLIENT_TOKEN=your-token -e HATCHET_CLIENT_NAMESPACE=loadtest2 ghcr.io/hatchet-dev/hatchet/hatchet-loadtest -e "2000" -d "60s" --level "warn" --slots "100" ``` ### Reference This container takes the following arguments: ```sh Usage: loadtest [flags] Flags: -c, --concurrency int concurrency specifies the maximum events to run at the same time -D, --delay duration delay specifies the time to wait in each event to simulate slow tasks -d, --duration duration duration specifies the total time to run the load test (default 10s) -F, --eventFanout int eventFanout specifies the number of events to fanout (default 1) -e, --events int events per second (default 10) -f, --failureRate float32 failureRate specifies the rate of failure for the worker -h, --help help for loadtest -l, --level string logLevel specifies the log level (debug, info, warn, error) (default "info") -P, --payloadSize string payload specifies the size of the payload to send (default "0kb") -s, --slots int slots specifies the number of slots to use in the worker -w, --wait duration wait specifies the total time to wait until events complete (default 10s) -p, --workerDelay duration workerDelay specifies the time to wait before starting the worker ``` ### Running a benchmark on Kubernetes You can use the following Pod manifest to run the load test on Kubernetes (make sure to fill in `HATCHET_CLIENT_TOKEN`): ```yaml apiVersion: v1 kind: Pod metadata: name: loadtest1a namespace: staging spec: restartPolicy: Never containers: - image: ghcr.io/hatchet-dev/hatchet/hatchet-loadtest:v0.56.0 imagePullPolicy: Always name: loadtest command: ["/hatchet/hatchet-load-test"] args: - loadtest - --duration - "60s" - --events - "100" - --slots - "100" - --wait - "10s" - --level - warn env: - name: HATCHET_CLIENT_TOKEN value: "your-token" - name: HATCHET_CLIENT_NAMESPACE value: "loadtest1a" resources: limits: memory: 1Gi requests: cpu: 500m memory: 1Gi ``` ## Setup All tests were run on a Kubernetes cluster on AWS configured with: - **Hatchet engine replicas:** 2 (using `c7i.4xlarge` instances to ensure CPU was not a bottleneck) - **Database:** `m7g.2xlarge` instance type (Amazon RDS) - **Hatchet version:** `v0.55.26` - **AWS region:** `us-west-1` The database configuration was chosen to avoid disk and CPU contention until higher throughputs were reached. We observed that up to around 2000 events/second, the chosen database instance size kept up without major performance degradation. The Hatchet engine was deployed with 2 replicas, and each engine instance had ample CPU headroom on `c7i.4xlarge` nodes. --- # Data Retention In Hatchet engine version `0.36.0` and above, you can configure the default data retention per tenant for workflow runs and events. The default value is set to 30 days, which means that all workflow runs which were created over 30 days ago and are in a final state (i.e. completed or failed), and all events which were created over 30 days ago, will be deleted. This can be configured by setting the following environment variable to a Go duration string: ```sh SERVER_LIMITS_DEFAULT_TENANT_RETENTION_PERIOD=720h # 30 days ``` --- # Tuning Hatchet for Performance Generally, with a reasonable database instance (4 CPU, 8GB RAM) and small payload sizes, Hatchet can handle hundreds of events and workflow runs per second. However, as throughput increases, you will start to see performance degradation. The most common causes of performance degradation are listed below. ## Database Connection Pooling The default max connection pool size is 50 per engine instance. If you have a high throughput, you may need to increase this value. This value can be set via the `DATABASE_MAX_CONNS` environment variable on the engine. Note that if you increase this value, you will need to increase the [`max_connections`](https://www.postgresql.org/docs/current/runtime-config-connection.html) value on your Postgres instance as well. ## High Database CPU Due to the nature of Hatchet workloads, the first bottleneck you will typically see on the database is CPU. If you have access to database query performance metrics, it is worth checking the cause of high CPU. If there is high lock contention on a query, please let the Hatchet team know, as we are looking to reduce lock contention in future releases. Otherwise, if you are seeing high CPU usage without any lock contention, you should increase the number of cores on your database instance. If you are performing a high number of inserts, particularly in a short period of time, and this correlates with high CPU usage, you can improve performance in several ways by using bulk endpoints or tuning the buffer settings. ### Using bulk endpoints There are two main ways to initiate workflows, by sending events to Hatchet and by starting workflows directly. In most example workflows, we push a single event or workflow at a time, but it is possible to send multiple events or workflows in one request. #### Events #### Python ```python hatchet.event.bulk_push( events=[ BulkPushEventWithMetadata( key="user:create", payload={"userId": str(i), "should_skip": False}, ) for i in range(10) ] ) ``` #### Typescript ```typescript const events = [ { payload: { test: 'test1' }, additionalMetadata: { user_id: 'user1', source: 'test' }, }, { payload: { test: 'test2' }, additionalMetadata: { user_id: 'user2', source: 'test' }, }, { payload: { test: 'test3' }, additionalMetadata: { user_id: 'user3', source: 'test' }, }, ]; await hatchet.events.bulkPush('user:create', events); ``` #### Go ```go c, err := client.New( client.WithHostPort("127.0.0.1", 7077), ) if err != nil { panic(err) } events := []client.EventWithMetadata{ { Event: &events.TestEvent{ Name: "testing", }, AdditionalMetadata: map[string]string{"hello": "world1"}, Key: "user:create", }, { Event: &events.TestEvent{ Name: "testing2", }, AdditionalMetadata: map[string]string{"hello": "world2"}, Key: "user:create", }, } c.Event().BulkPush( context.Background(), events, ) ``` > **Warning:** There is a maximum limit of 1000 events per request. #### Workflows #### Python ```python @bulk_parent_wf.task(execution_timeout=timedelta(minutes=5)) async def spawn(input: ParentInput, ctx: Context) -> dict[str, list[dict[str, Any]]]: # πŸ‘€ Create each workflow run to spawn child_workflow_runs = [ bulk_child_wf.create_bulk_run_item( input=ChildInput(a=str(i)), key=f"child{i}", options=TriggerWorkflowOptions(additional_metadata={"hello": "earth"}), ) for i in range(input.n) ] # πŸ‘€ Run workflows in bulk to improve performance spawn_results = await bulk_child_wf.aio_run_many(child_workflow_runs) return {"results": spawn_results} ``` #### Typescript ```typescript const parent = hatchet.task({ name: 'simple', fn: async (input: SimpleInput, ctx) => { // Bulk run two tasks in parallel const child = await ctx.bulkRunChildren([ { workflow: simple, input: { Message: 'Hello, World!', }, }, { workflow: simple, input: { Message: 'Hello, Moon!', }, }, ]); return { TransformedMessage: `${child[0].TransformedMessage} ${child[1].TransformedMessage}`, }; }, }); ``` #### Go ```go w.RegisterWorkflow( &worker.WorkflowJob{ Name: "parent-workflow", On: worker.Event("fanout:create"), Description: "Example workflow for spawning child workflows.", Steps: []*worker.WorkflowStep{ worker.Fn(func(ctx worker.HatchetContext) error { // Prepare the batch of workflows to spawn childWorkflows := make([]*worker.SpawnWorkflowsOpts, 10) for i := 0; i < 10; i++ { childInput := "child-input-" + strconv.Itoa(i) childWorkflows[i] = &worker.SpawnWorkflowsOpts{ WorkflowName: "child-workflow", Input: childInput, Key: "child-key-" + strconv.Itoa(i), } } // Spawn all workflows in bulk using SpawnWorkflows createdWorkflows, err := ctx.SpawnWorkflows(childWorkflows) if err != nil { return err } return nil }), }, }, ) ``` > **Warning:** There is a maximum limit of 1000 workflows per bulk request. ### Tuning Buffer Settings Hatchet has configurable write buffers which enable it to reduce the total number of database queries by batching DB writes. This speeds up throughput dramatically at the expense of a slight increase in latency. In general, increasing the buffer size and reducing the buffer flush frequency reduces the CPU load on the DB. The two most important configurable settings for the buffers are 1. **Flush Period:** The amount of milliseconds to wait between subsequent writes to the database 2. **Max Buffer Size:** The maximum size of the internal buffer writing to the database. The following environment variables are all configurable: ```sh # Default values if the values below are not set SERVER_FLUSH_PERIOD_MILLISECONDS SERVER_FLUSH_ITEMS_THRESHOLD # Settings for writing workflow runs to the database SERVER_WORKFLOWRUNBUFFER_FLUSH_PERIOD_MILLISECONDS SERVER_WORKFLOWRUNBUFFER_FLUSH_ITEMS_THRESHOLD # Settings for writing events to the database SERVER_EVENTBUFFER_FLUSH_PERIOD_MILLISECONDS SERVER_EVENTBUFFER_FLUSH_ITEMS_THRESHOLD # Settings for releasing slots for workers SERVER_RELEASESEMAPHOREBUFFER_FLUSH_PERIOD_MILLISECONDS SERVER_RELEASESEMAPHOREBUFFER_FLUSH_ITEMS_THRESHOLD # Settings for writing queue items to the database SERVER_QUEUESTEPRUNBUFFER_FLUSH_PERIOD_MILLISECONDS SERVER_QUEUESTEPRUNBUFFER_FLUSH_ITEMS_THRESHOLD ``` A buffer configuration for higher throughput might look like the following: ```sh # Default values if the values below are not set SERVER_FLUSH_PERIOD_MILLISECONDS=250 SERVER_FLUSH_ITEMS_THRESHOLD=1000 # Settings for writing workflow runs to the database SERVER_WORKFLOWRUNBUFFER_FLUSH_PERIOD_MILLISECONDS=100 SERVER_WORKFLOWRUNBUFFER_FLUSH_ITEMS_THRESHOLD=500 # Settings for writing events to the database SERVER_EVENTBUFFER_FLUSH_PERIOD_MILLISECONDS=1000 SERVER_EVENTBUFFER_FLUSH_ITEMS_THRESHOLD=1000 # Settings for releasing slots for workers SERVER_RELEASESEMAPHOREBUFFER_FLUSH_PERIOD_MILLISECONDS=100 SERVER_RELEASESEMAPHOREBUFFER_FLUSH_ITEMS_THRESHOLD=200 # Settings for writing queue items to the database SERVER_QUEUESTEPRUNBUFFER_FLUSH_PERIOD_MILLISECONDS=100 SERVER_QUEUESTEPRUNBUFFER_FLUSH_ITEMS_THRESHOLD=500 ``` Benchmarking and tuning on your own infrastructure is recommended to find the optimal values for your workload and use case. ## Slow Time to Start With higher throughput, you may see a slower time to start for each step run in a workflow. The reason for this is typically that each step run needs to be processed in an internal message queue before getting sent to the worker. You can increase the throughput of this internal queue by setting the following environment variable (default value of `100`): ``` SERVER_MSGQUEUE_RABBITMQ_QOS=200 ``` Note that this refers to the number of messages that can be processed in parallel, and each message typically corresponds to at least one database write, so it will not improve performance if this value is significantly higher than the `DATABASE_MAX_CONNS` value. If you are seeing warnings in the engine logs that you are saturating connections, consider decreasing this value. ## Database Settings and Autovacuum There are several scenarios where Postgres flags may need to be modified to improve performance. By default, every workflow run and step run are stored for 30 days in the Postgres instance. Without tuning autovacuum settings, you may see high table bloat across many tables. If you are storing > 500 GB of workflow run or step run data, we recommend the following autovacuum settings to autovacuum more aggressively: ``` autovacuum_max_workers=10 autovacuum_vacuum_scale_factor=0.1 autovacuum_analyze_scale_factor=0.05 autovacuum_vacuum_threshold=25 autovacuum_analyze_threshold=25 autovacuum_vacuum_cost_delay=10 autovacuum_vacuum_cost_limit=1000 ``` If your database has enough memory capacity, you may need to increase the `work_mem` or `maintenance_work_mem` value. For example, on database instances with a large amount of memory available, we typically set the following settings: ``` maintenance_work_mem=2147483647 work_mem=125828 ``` Additionally, if there is enough disk capacity, you may see improved performance setting the following flag: ``` max_wal_size=15040 ``` ## Scaling the Hatchet Engine By default, the Hatchet engine runs all internal services on a single instance. The internal services on the Hatchet engine are as follows: - **grpc-api**: the gRPC endpoint for the Hatchet engine. This is the primary endpoint for Hatchet workers. Not to be confused with the Hatchet REST API, which is a separate service that we typically refer to as `api`. - **controllers**: the internal service that manages the lifecycle of workflow runs, step runs, and events. This service is write-heavy on the database and read-heavy from the message queue. - **scheduler**: the internal service that schedules step runs to workers. This service is both read-heavy and write-heavy on the database. It is possible to horizontally scale the Hatchet engine by running multiple instances of the engine. However, if you are seeing a large number of warnings from the scheduler when running the other services in the same engine instance, we recommend running the scheduler on a separate instance. See the [high availability](./high-availability) documentation for more information on how to run the scheduler on a separate instance. --- # Read Replica Support For high-throughput production deployments, Hatchet supports database read replicas to distribute database load and improve read performance. This feature allows you to direct read queries to a separate database instance while continuing to send write operations to the primary database. **This can significantly improve performance in read-heavy workloads without requiring application changes.** You can enable read replica support by setting the following environment variables: ```bash READ_REPLICA_ENABLED=true READ_REPLICA_DATABASE_URL='postgresql://hatchet:hatchet@127.0.0.1:5432/hatchet' READ_REPLICA_MAX_CONNS=200 READ_REPLICA_MIN_CONNS=50 ``` ## Configuration Options - `READ_REPLICA_ENABLED`: Set to `true` to enable read replica support - `READ_REPLICA_DATABASE_URL`: Connection string for the read replica database - `READ_REPLICA_MAX_CONNS`: Maximum number of connections in the read replica connection pool - `READ_REPLICA_MIN_CONNS`: Minimum number of connections to maintain in the read replica connection pool ## Limitations - Replication lag may result in slightly stale or missing data being returned from read operations - The read replica is only utilized by analytical queries (to load workflow runs, task runs, and metrics in the UI) --- # Trace Sampling For a very high-volume setup, it may be desirable to sample results for the dashboard for the purpose of limiting the amount of data stored in the Hatchet database. **This does not impact the behavior of the Hatchet engine and all tasks will still be processed.** This can be done by setting the following environment variables: ```bash SERVER_SAMPLING_ENABLED=t SERVER_SAMPLING_RATE=0.1 # only 10% of results will be sampled ``` Sampling is done at the workflow run level, so all tasks within the same workflow will be sampled, along with all of their events. Sampling has the following limitations: - Parent tasks which spawn child tasks are not guaranteed to be sampled, even if their children are. This means that the child task may be shown in the dashboard without a corresponding parent task, and vice versa. - There is no way to configure sampling to ensure that failure events are sampled. - Only tasks which are sampled can be cancelled or replayed via the REST APIs: do not use this feature if dependent on programmatic cancellations and replays. --- # SMTP Server Configure email delivery for tenant invites and Hatchet alerts using any standard SMTP provider (Gmail, SendGrid, AWS SES, etc). ## Prerequisites - An SMTP provider that supports [PLAIN](https://datatracker.ietf.org/doc/html/rfc4616/) authentication with a username and password. ## Configuration Set the following environment variables: ```bash # Enable SMTP export SERVER_EMAIL_KIND=smtp export SERVER_EMAIL_SMTP_ENABLED=true # Connection Settings export SERVER_EMAIL_SMTP_SERVER_ADDR=smtp.gmail.com:587 # Host and port export SERVER_EMAIL_SMTP_AUTH_USERNAME=your-email@yourdomain.com # Username or API Key ID export SERVER_EMAIL_SMTP_AUTH_PASSWORD=your-password # Password or API Secret Key # Sender Identity export SERVER_EMAIL_SMTP_FROM_EMAIL=noreply@yourdomain.com # Sender email address export SERVER_EMAIL_SMTP_SUPPORT_EMAIL=support@yourdomain.com # Support contact email export SERVER_EMAIL_SMTP_FROM_NAME="Hatchet" # (Optional) Display name ``` ## Provider Reference Common configuration values for major providers: | Provider | Server Address | Username | Password | | ------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------- | ------------ | --------------------------------------------------------- | | **[Gmail](https://support.google.com/mail/answer/185833?hl=en)** | `smtp.gmail.com:587` | Your Email | [App Password](https://myaccount.google.com/apppasswords) | | **[SendGrid](https://docs.sendgrid.com/for-developers/sending-email/integrating-with-the-smtp-api)** | `smtp.sendgrid.net:587` | `apikey` | Your API Key | | **[AWS SES](https://docs.aws.amazon.com/ses/latest/dg/send-email-smtp.html)** | `email-smtp.us-east-1.amazonaws.com:587` | IAM Username | IAM Secret | | **[Outlook](https://support.microsoft.com/en-us/office/pop-imap-and-smtp-settings-8361e398-8af4-4e97-b147-6c6c4ac95353)** | `smtp.office365.com:587` | Your Email | Your Password | > **Info:** To request another provider or SMTP authentication protocol, open a feature > request on > [GitHub](https://github.com/hatchet-dev/hatchet/issues/new?template=feature_request.md). --- # Contributing > **Note:** this guide### Setup 1. Start the Database and Queue services: ```sh task start-db ``` 2. Install dependencies, run migrations, generate encryption keys, and seed the database: ```sh task setup ``` ### Starting the dev server Start the Hatchet engine, API server, dashboard, and Prisma studio: ```sh task start-dev # or task start-dev-tmux if you want to use tmux panes ``` ### Creating and testing workflows To create and test workflows, run the examples in the `./examples` directory. You will need to add the tenant (output from the `task seed-dev` command) to the `.env` file in each example directory. An example `.env` file for the `./examples/simple` directory can be generated via: ```sh alias get_token='go run ./cmd/hatchet-admin token create --name local --tenant-id 707d0855-80ab-4e1f-a156-f1c4546cbf52' cat > ./examples/simple/.env < # optional OTEL_EXPORTER_OTLP_HEADERS= # optional OTEL_EXPORTER_OTLP_ENDPOINT= ``` ### CloudKMS CloudKMS can be used to generate master encryption keys: ``` gcloud kms keyrings create "development" --location "global" gcloud kms keys create "development" --location "global" --keyring "development" --purpose "encryption" gcloud kms keys list --location "global" --keyring "development" ``` From the last step, copy the Key URI and set the following environment variable: ``` SERVER_ENCRYPTION_CLOUDKMS_KEY_URI=gcp-kms://projects//locations/global/keyRings/development/cryptoKeys/development ``` Generate a service account in GCP which can encrypt/decrypt on CloudKMS, then download a service account JSON file and set it via: ``` SERVER_ENCRYPTION_CLOUDKMS_CREDENTIALS_JSON='{...}' ``` ## Issues ### Query engine leakage Sometimes the spawned query engines from Prisma don't get killed when hot reloading. You can run `task kill-query-engines` on OSX to kill the query engines. Make sure you call `.Disconnect` on the database config object when writing CLI commands which interact with the database. If you don't, and you try to wrap these CLI commands in a new command, it will never exit, for example: ``` export HATCHET_CLIENT_TOKEN="$(go run ./cmd/hatchet-admin token create --tenant-id )" ``` --- ## Setup ### Using `ngrok` You can use `ngrok` to expose a local port to the internet to accept incoming webhooks from Github. To do this, run the following: ```sh task start-ngrok ``` Make note of the `https` URL as you will need it later. ### Github App Creation To create a Github app that can read from your repositories, navigate to your organization settings page (alternately, you can navigate to your personal settings page) and select **Developer Settings** in the sidebar. Go to **Github Apps** and select **New Github App**. You should use the following settings: - Homepage URL: you can set this as https://hatchet.run, or some other domain for your organization. - Callback URL: `:///api/v1/users/github-app/callback` - The **Request user authorization (OAuth) during installation** checkbox should be checked. - Webhook URL: `:///api/v1/github/webhook` - Webhook secret: generate a random webhook secret for your domain, for example by running `cat /dev/urandom | base64 | head -c 32`. **Make note of this secret, as you will need it later**. - Permissions: - **Repository:** - **Checks (Read & write)**: required to write Github checks for each commit/PR. - **Contents (Read):** required for Hatchet to read files from the repository. - **Metadata (Read-only):** mandatory, required for Github apps that integrate with repositories. - **Pull Requests (Read & write):** required for Hatchet to add comments to Github PRs, and to create PRs. - **Webhooks (Read & write):** required for Hatchet to create a Github repository webhooks that notify the Hatchet instance when PRs are updated. - **Account:** - **Email addresses (read-only)**: required for Hatchet to read your Github email address for authentication. ### Creating a Secret and Private Key After creating the Github App, create the following: - In the "Client secrets" section, select **Generate a new client secret**. You will need this secret in the following section. - In the "Private keys" section, download a new private key for your app. You will need this private key in the following section. ### Private Keys and Environment Variables After creating the private key, you can place it somewhere in your filesystem and set the `SERVER_VCS_GITHUB_APP_SECRET_PATH` environment variable to the path of the private key. Make sure the following environment variables are set: ```txt SERVER_VCS_KIND=github SERVER_VCS_GITHUB_ENABLED=true SERVER_VCS_GITHUB_APP_CLIENT_ID= SERVER_VCS_GITHUB_APP_CLIENT_SECRET= SERVER_VCS_GITHUB_APP_NAME= SERVER_VCS_GITHUB_APP_WEBHOOK_SECRET= SERVER_VCS_GITHUB_APP_WEBHOOK_URL= SERVER_VCS_GITHUB_APP_ID= SERVER_VCS_GITHUB_APP_SECRET_PATH= ``` --- # SDKs This document tracks the feature support of the various SDKs, and aims to consolidate the expected behavior around environment variables and configuration loading. ## Environment Variables Each SDK should support the following environment variables: | Variable | Description | Required | Default | | --------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------- | ----------------------------------------- | | `HATCHET_CLIENT_TOKEN` | The tenant-scoped API token to use. | Yes | N/A | | `HATCHET_CLIENT_HOST_PORT` | The host and port of the Hatchet server to connect to, in `host:port` format. SDKs should handle schemes and trailing slashes, i.e. `https://host:port | No | Automatically detected in new tokens. | | `HATCHET_CLIENT_TLS_STRATEGY` | The TLS strategy to use. Valid values are `none`, `tls`, and `mtls`. | No | `tls` | | `HATCHET_CLIENT_TLS_CERT_FILE` | The path to the TLS client certificate file to use. | Only if strategy is set to `mtls` | N/A | | `HATCHET_CLIENT_TLS_CERT` | The TLS client key file to use. | Only if strategy is set to `mtls` | N/A | | `HATCHET_CLIENT_TLS_KEY_FILE` | The path to the TLS client key file to use. | Only if strategy is set to `mtls` | N/A | | `HATCHET_CLIENT_TLS_KEY` | The TLS client key to use. | Only if strategy is set to `mtls` | N/A | | `HATCHET_CLIENT_TLS_ROOT_CA_FILE` | The path to the TLS root CA file to use. | Only if the server certificate is not signed by a public authority that's available to your environment | N/A | | `HATCHET_CLIENT_TLS_ROOT_CA` | The TLS root CA to use. | Only if the server certificate is not signed by a public authority that's available to your environment | N/A | | `HATCHET_CLIENT_TLS_SERVER_NAME` | The TLS server name to use. | No | Defaults to the `host` of the `host:port` | The following environment variables are deprecated: | Variable | Description | Explanation | | -------------------------- | --------------------- | ------------------------------ | | `HATCHET_CLIENT_TENANT_ID` | The tenant ID to use. | This is now part of the token. | ## Compatibility Matrices ### DAGs Whether the SDKs support full DAG-style execution. | SDK | DAGs? | Notes | | -------------- | ----- | ----- | | Go SDK | Yes | | | Python SDK | Yes | | | Typescript SDK | Yes | | ### Timeouts Whether the SDKs support setting timeouts and cancelling after timeouts. | SDK | Timeouts? | Step cancellation? | Notes | | -------------- | --------- | ------------------ | ---------------------------------------------- | | Go SDK | Yes | Yes | | | Python SDK | Yes | Yes | If thread is blocking, this won't be respected | | Typescript SDK | Yes | Unknown | | ### Middleware Whether the SDKs support setting middleware. | SDK | Middleware? | Notes | | -------------- | ----------- | ----- | | Go SDK | Yes | | | Python SDK | No | | | Typescript SDK | No | | ### Separately Registering and Calling Actions Whether the SDKs support separately registering and calling actions, instead of defining them inline in the workflows. | SDK | Supported? | Notes | | -------------- | ---------- | ----- | | Go SDK | Yes | | | Python SDK | No | | | Typescript SDK | No | | ### Custom Services Whether the SDKs support defining services to logically separate workflows and actions. | SDK | Supported? | Notes | | -------------- | ---------- | ----- | | Go SDK | Yes | | | Python SDK | No | | | Typescript SDK | No | | ### Scheduled Workflows Whether the SDKs support defining scheduled workflows. | SDK | Supported? | Notes | | -------------- | ---------- | ----- | | Go SDK | Yes | | | Python SDK | No | | | Typescript SDK | No | | --- # Hatchet CLI Setup & Installation > **Warning:** The Hatchet CLI is currently in beta and may have breaking changes in future > releases. The Hatchet CLI is a command-line tool with utilities for running workers locally, interacting with a running Hatchet deployment, and running a local Hatchet instance for development. ## Features - **Quickstarts**: the `hatchet quickstart` command sets up a local Hatchet instance with a sample project to help you get started quickly. - **Local worker reloading**: the [`hatchet worker dev`](/cli/running-workers-locally) command lets you run a worker locally with automatic reloading when code changes are detected. - **A full built-in TUI**: the [`hatchet tui`](/cli/tui) command lets you interact with your Hatchet deployment through a terminal user interface (TUI) that provides real-time observability into tasks, workflows, workers, and more. - **Profiles**: the [`hatchet profile`](/cli/profiles) commands allow you to manage multiple Hatchet instances and tenants with named profiles, making it easy to switch between different environments. ## Installation The recommended way to install the Hatchet CLI is via our install script or Homebrew: #### Native Install (Recommended) **MacOS, Linux, WSL** ```sh curl -fsSL https://install.hatchet.run/install.sh | bash ``` #### Homebrew **MacOS** ```sh brew install hatchet-dev/hatchet/hatchet --cask ``` ## Verifying Installation After installation, verify that the Hatchet CLI is installed correctly by checking its version: ```sh hatchet --version ``` --- # Profiles The Hatchet CLI supports managing multiple Hatchet instances and tenants using named profiles. This feature makes it easy to switch between different environments, such as development, staging, and production. ## Creating a Profile You can create a new profile using the `hatchet profile add` command. You will need to provide a Hatchet API token for the profile. ```sh hatchet profile add ``` This command will prompt you to enter the API token followed by the profile name. You can also provide these as flags: ```sh hatchet profile add --name [name] --token [token] ``` ## Listing Profiles You can list all the profiles you have configured using the `hatchet profile list` command: ```sh hatchet profile list ``` This will display all configured profiles, with the default profile marked with `(default)` if one is set. ## Setting a Default Profile You can set a profile as the default using the `hatchet profile set-default` command. The default profile will be automatically used when no profile is specified with the `--profile` flag. ```sh # Set default profile interactively (prompts for selection) hatchet profile set-default # Set a specific profile as default hatchet profile set-default --name [name] ``` Once a default profile is set, you can run commands without specifying the `--profile` flag: ```sh # Uses the default profile hatchet worker dev ``` To unset the default profile: ```sh hatchet profile unset-default ``` ## Using a Profile To use a specific profile for your Hatchet CLI commands, you can specify the profile name using the `--profile` flag. This overrides the default profile if one is set. ```sh hatchet worker dev --profile [name] ``` ## Updating a Profile You can update an existing profile using the `hatchet profile update` command. This allows you to change the API token associated with a profile. ```sh hatchet profile update ``` ## Deleting a Profile You can delete a profile using the `hatchet profile remove` command: ```sh hatchet profile remove ``` If you remove a profile that is set as the default, the default profile setting will be automatically cleared. --- # Running Hatchet Locally The Hatchet CLI provides the `hatchet server` commands to run a local instance of Hatchet for development and testing purposes. This local instance relies on Docker to run the necessary services. ## Prerequisites Before running Hatchet locally, you must have Docker installed on your machine. You can download Docker from [here](https://www.docker.com/get-started). ## Starting Hatchet Locally To start a local instance of Hatchet, run the following command in your terminal: ```sh hatchet server start ``` ## Stopping Hatchet Locally To stop the local Hatchet instance, run the following command: ```sh hatchet server stop ``` ## Reference #### `hatchet server start` ```txt Start a local Hatchet server environment using Docker containers. This command will start both a PostgreSQL database and a Hatchet server instance, automatically creating a local profile for easy access. Usage: hatchet server start [flags] Examples: # Start server with default settings (port 8888) hatchet server start # Start server with custom dashboard port hatchet server start --dashboard-port 9000 # Start server with custom ports and project name hatchet server start --dashboard-port 9000 --grpc-port 8077 --project-name my-hatchet # Start server with custom profile name hatchet server start --profile my-local Flags: -d, --dashboard-port int Port for the Hatchet dashboard (default: auto-detect starting at 8888) -g, --grpc-port int Port for the Hatchet gRPC server (default: auto-detect starting at 7077) -h, --help help for start -n, --profile string Name for the local profile (default: local) (default "local") -p, --project-name string Docker project name for containers (default: hatchet-cli) Global Flags: -v, --version The version of the hatchet cli. ``` #### `hatchet server stop` ```txt Stop a local Hatchet server environment that was started using Docker containers with the 'hatchet server start' command. Usage: hatchet server stop [flags] Examples: # Stop the local Hatchet server hatchet server stop # Stop the local Hatchet server with a custom project name hatchet server stop --project-name my-hatchet Flags: -h, --help help for stop -p, --project-name string Docker project name for containers (default: hatchet-cli) Global Flags: -v, --version The version of the hatchet cli. ``` --- # Running Workers Locally The Hatchet CLI provides the `hatchet worker` commands to run Hatchet workers locally for development and testing purposes. ## Setting up a hatchet.yaml file > **Info:** If you've set up a project using `hatchet quickstart`, a `hatchet.yaml` file > is already created for you in the project directory. The `hatchet worker` commands rely on a `hatchet.yaml` configuration file to define the worker settings. You can create a `hatchet.yaml` file in your project directory which resembles the following (you will need to adjust the `preCmds` and `runCmd` fields to match your project's setup): #### Python ```yaml dev: preCmds: ["poetry install"] runCmd: "poetry run python src/worker.py" files: - "**/*.py" - "!**/__pycache__/**" - "!**/.venv/**" reload: true ``` #### Typescript ```yaml dev: preCmds: ["pnpm install"] runCmd: "pnpm start" files: - "**/*.ts" - "!**/node_modules/**" reload: true ``` #### Go ```yaml dev: preCmds: ["go mod download"] runCmd: "go run ./cmd/worker" files: - "**/*.go" reload: true ``` ## Running a worker Once you have a `hatchet.yaml` file set up, you can run a worker locally using the following command: ```sh hatchet worker dev ``` To run a worker with a specific profile, you can run: ```sh hatchet worker dev --profile ``` ### Disabling auto-reload If you want to run the worker without auto-reloading on file changes, you can set the `dev.reload` field to `false` in your `hatchet.yaml` file: ```yaml dev: reload: false ``` Or you can pass the `--no-reload` flag when running the worker: ```sh hatchet worker dev --no-reload ``` ### Overriding the run command You can override the `runCmd` specified in the `hatchet.yaml` file by using the `--run-cmd` flag: ```sh hatchet worker dev --run-cmd "npm run dev" ``` --- # Triggering Workflows You can use the `hatchet trigger` command to trigger workflows locally for testing and development purposes. This command allows you to set up triggers in your `hatchet.yaml` file that define how to run specific workflows. ## Example In your `hatchet.yaml` file, you can define a trigger for a simple workflow like this: ```yaml triggers: - name: "simple" command: "poetry run python src/run.py" description: "Trigger a simple workflow" ``` Then, you can select this trigger when running the `hatchet trigger` command: ```sh hatchet trigger simple ``` Or just `hatchet trigger`, which will prompt you to select a trigger interactively. --- # Using the Hatchet TUI The Hatchet CLI includes a built-in terminal user interface (TUI) that you can use to interact with your Hatchet deployment directly from the terminal. The TUI provides real-time observability into tasks, workflows, workers, and more: ```sh hatchet tui ``` # Features > **Info:** You can access help documentation by pressing the `h` key within the TUI. This > will display a list of available commands and their descriptions. ## Runs View The Runs view provides a similar experience to the Runs page in the Hatchet dashboard. You can filter and list runs based on filters. ## Workflows View The Workflows view allows you to see the list of workflows defined in your Hatchet deployment. You can view some details about each workflow, along with recent runs: ## Workers View The Workers view shows the status of workers connected to your Hatchet deployment. You can see which workers are online, their registered workflows, and some other information.