Batch Processing
Batch processing involves running the same operation across a large set of items like images, documents, records, or API calls. We’ll structure batch workloads in Hatchet with fan-out, retry, and concurrency control.
At its core, batch processing is Fanout applied at scale. If your batch also has fixed stages (e.g., validate → transform → load), you can combine it with Pre-Determined Pipelines.
Step-by-step walkthrough
You’ll build a parent workflow that fans out to one child task per item and aggregates results.
Define the parent workflow
Create a parent workflow that receives a batch of item IDs and spawns one child per item.
Process each item
Each child task processes a single item independently. Failed items are retried according to your retry policy.
Run the worker
Register and start the worker with both parent and child workflows. For large batches, use durable workflows so the parent does not hold a slot while waiting.
For batches with thousands of items, use durable workflows so the parent task doesn’t hold a worker slot while waiting for all children to complete. See Durable Workflows for details.
Common Patterns
| Pattern | Description |
|---|---|
| Image processing | Resize, transcode, or analyze images in parallel across workers |
| Data enrichment | Enrich records by calling external APIs (geocoding, company info, email validation) |
| Report generation | Generate per-customer reports in parallel, then aggregate into a summary |
| Database migrations | Process and migrate records in batches with retry and progress tracking |
| Notification delivery | Send emails, SMS, or push notifications to a user list with rate limiting |
Related Patterns
The core pattern behind batch processing, spawning N children from a parent.
FanoutChain batch processing with multi-stage transforms in a DAG.
Pre-Determined PipelinesA specialized batch processing use case for document indexing pipelines.
RAG & IndexingProcess paginated results one page at a time with iterative child spawning.
CyclesNext Steps
- Child Spawning: learn the fan-out API for batch processing
- Bulk Run: trigger large batches efficiently
- Concurrency Control: limit concurrent item processing
- Rate Limits: protect external APIs during batch operations