Add to Cursor Add to Claude Copy for LLM View as MD

RAG & Data Indexing

RAG and indexing pipelines share a common shape: ingest documents, split them into chunks, generate embeddings, and write to a vector database. Because the stages are known upfront, these pipelines map naturally to a DAG workflow, where each stage is a task and dependencies between stages are declared before execution begins.

You declare the full graph (ingest → chunk → embed → index) and Hatchet executes tasks in order, running independent tasks in parallel automatically. You can add fanout within the chunking stage to process documents in parallel.

Step-by-step walkthrough

You’ll define a workflow, then add tasks for ingesting, chunking, embedding, and querying, all using a mock embedding client so you can run it without API keys.

Define the workflow

Define your input type and create an empty DAG workflow. You’ll add tasks to this workflow in the following steps.

examples/python/guides/rag_indexing/worker.py

class DocInput(BaseModel):
    doc_id: str
    content: str


rag_wf = hatchet.workflow(name="RAGPipeline", input_validator=DocInput)

examples/typescript/guides/rag-and-indexing/workflow.ts

type DocInput = { doc_id: string; content: string };

const ragWf = hatchet.workflow<DocInput>({ name: 'RAGPipeline' });

examples/go/guides/rag-and-indexing/main.go

workflow := client.NewWorkflow("RAGPipeline")

The Ruby SDK is in early access, and may change. We'd love your feedback!

examples/ruby/guides/rag_indexing/worker.rb

RAG_WF = HATCHET.workflow(name: 'RAGPipeline')

Define the ingest task

Add a task that ingests documents. A trigger (event, cron, or API call) starts the pipeline with document references.

examples/python/guides/rag_indexing/worker.py

@rag_wf.task()
async def ingest(input: DocInput, ctx: Context) -> dict[str, Any]:
    return {"doc_id": input.doc_id, "content": input.content}

Chunk the documents

The ingest task (Step 2) fans out to one child per document. Each child splits its document into chunks. Use child spawning for per-document parallelism.

examples/python/guides/rag_indexing/worker.py

def _chunk_content(content: str, chunk_size: int = 100) -> list[str]:
    return [content[i : i + chunk_size] for i in range(0, len(content), chunk_size)]

Embed and index

Define a standalone embed-chunk task, then spawn one child task per chunk from the DAG’s chunk-and-embed task. Each child runs on any available worker and is individually retryable, so a single embedding failure does not restart the entire batch. Rate Limits throttle embedding API calls across all workers.

examples/python/guides/rag_indexing/worker.py

@hatchet.task(name="embed-chunk")
async def embed_chunk(input: dict, ctx: Context) -> dict[str, Any]:
    embedder = get_embedding_service()
    return {"vector": embedder.embed(input["chunk"])}


@rag_wf.durable_task(parents=[ingest])
async def chunk_and_embed(input: DocInput, ctx: Context) -> dict[str, Any]:
    ingested = ctx.task_output(ingest)
    chunks = [ingested["content"][i : i + 100] for i in range(0, len(ingested["content"]), 100)]
    results = await embed_chunk.aio_run_many(
        [embed_chunk.create_bulk_run_item(input={"chunk": c}) for c in chunks]
    )
    return {"doc_id": ingested["doc_id"], "vectors": [r["vector"] for r in results]}

The examples above use a mock embedding client. To use a real provider, swap get_embedding_service() with one of these. Pick a provider, then your language:

OpenAI’s Embeddings API converts text into high-dimensional vectors. It supports configurable dimensions and is a popular default for semantic search and RAG pipelines.

pip install openai

poetry add openai

uv add openai

examples/python/guides/integrations/embedding_openai.py

def embed(text: str) -> list[float]:
    r = client.embeddings.create(model="text-embedding-3-small", input=text)
    return r.data[0].embedding

Query

Add a query task that reuses the same embed-chunk child task to embed the query, then performs a vector similarity search. In production, replace the empty results with a real vector DB lookup.

examples/python/guides/rag_indexing/worker.py

@hatchet.durable_task(name="rag-query")
async def query_task(input: dict, ctx: Context) -> dict[str, Any]:
    result = await embed_chunk.aio_run(input={"chunk": input["query"]})
    # Replace with a real vector DB lookup in production
    return {"query": input["query"], "vector": result["vector"], "results": []}

Run the worker

Start the worker and register the DAG workflow, the embed-chunk child task, and the rag-query task.

examples/python/guides/rag_indexing/worker.py

worker = hatchet.worker(
    "rag-worker",
    workflows=[rag_wf, embed_chunk, query_task],
)
worker.start()

⚠️

When fanning out to many chunks, ensure your workers have enough slots or use Concurrency Control to limit how many run simultaneously.

Multi-Tenant Indexing

For SaaS applications where multiple tenants share the same pipeline:

GROUP_ROUND_ROBIN concurrency distributes scheduling fairly so no single tenant monopolizes workers
Additional metadata tags each run with a tenant ID for filtering in the dashboard
Priority queues allow higher-priority indexing jobs to run ahead of lower-priority ones

Declare tasks and dependencies upfront so Hatchet can execute them in order.

DAG Workflows

Parallelize document and chunk processing across your worker fleet.

Fanout

Implement incremental indexing that re-crawls until all changes are processed.

Cycles

General-purpose batch processing patterns that apply to indexing workloads.

Batch Processing

Extract and transform documents (invoices, contracts, forms), distinct from RAG’s chunk-and-embed for retrieval.

Document Processing