Add to Cursor Add to Claude Copy for LLM View as MD

LLM Pipelines

LLM pipelines chain multiple model calls together with validation, retries, and structured outputs. Hatchet turns each step into a durable task so failures retry individually, rate limits protect provider APIs, and the full pipeline is observable in the dashboard.

Because each LLM call maps to a task and validation steps gate what runs next, these pipelines are a natural fit for DAG Workflows.

Step-by-step walkthrough

You’ll build a three-stage DAG pipeline (prompt, generate, validate) using a mock LLM so you can run it without API keys.

Define the pipeline

Create a workflow with prompt construction, LLM generation, and validation stages.

examples/python/guides/llm_pipelines/worker.py

class PipelineInput(BaseModel):
    prompt: str


llm_wf = hatchet.workflow(name="LLMPipeline", input_validator=PipelineInput)


@llm_wf.task()
async def prompt_task(input: PipelineInput, ctx: Context) -> dict:
    return {"prompt": input.prompt}

examples/typescript/guides/llm-pipelines/workflow.ts

const llmWf = hatchet.workflow<PipelineInput>({ name: 'LLMPipeline' });

const promptTask = llmWf.task({
  name: 'prompt-task',
  fn: async (input) => ({ prompt: input.prompt }),
});

examples/go/guides/llm-pipelines/main.go

workflow := client.NewWorkflow("LLMPipeline")

The Ruby SDK is in early access, and may change. We'd love your feedback!

examples/ruby/guides/llm_pipelines/worker.rb

LLM_WF = HATCHET.workflow(name: 'LLMPipeline')

PROMPT_TASK = LLM_WF.task(:prompt_task) do |input, _ctx|
  { 'prompt' => input['prompt'] }
end

Prompt task

The prompt task depends on the pipeline input (Step 1). Build the prompt from user input and context. This step may include retrieval from a vector database (see RAG & Indexing).

examples/python/guides/llm_pipelines/worker.py

def _build_prompt(user_input: str, context: str = "") -> str:
    return f"Process the following: {user_input}" + (f"\nContext: {context}" if context else "")

The prompt is passed to your LLM service for generation. The examples above use a mock. To use a real provider, swap get_llm_service() with one of these:

OpenAI’s Chat Completions API provides access to GPT models for text generation, function calling, and structured outputs. It’s the most widely adopted LLM API and supports streaming, tool use, and JSON mode.

pip install openai

poetry add openai

uv add openai

examples/python/guides/integrations/llm_openai.py

def complete(messages: list[dict]) -> dict:
    r = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=messages,
        tool_choice="auto",
        tools=[
            {
                "type": "function",
                "function": {
                    "name": "get_weather",
                    "description": "Get weather for a location",
                    "parameters": {
                        "type": "object",
                        "properties": {"location": {"type": "string"}},
                        "required": ["location"],
                    },
                },
            }
        ],
    )
    msg = r.choices[0].message
    tool_calls = [
        {"name": tc.function.name, "args": json.loads(tc.function.arguments or "{}")}
        for tc in (msg.tool_calls or [])
    ]
    return {"content": msg.content or "", "tool_calls": tool_calls, "done": not tool_calls}

Anthropic’s Messages API powers the Claude family of models, including Claude Sonnet and Claude Haiku. Claude excels at long-context reasoning, careful instruction following, and tool use with extended thinking support.

pip install anthropic

poetry add anthropic

uv add anthropic

examples/python/guides/integrations/llm_anthropic.py

def complete(messages: list[dict]) -> dict:
    resp = client.messages.create(
        model="claude-3-5-haiku-20241022",
        max_tokens=1024,
        messages=[{"role": m["role"], "content": m["content"]} for m in messages],
        tools=[{"name": "get_weather", "description": "Get weather", "input_schema": {"type": "object", "properties": {"location": {"type": "string"}}}}],
    )
    for block in resp.content:
        if block.type == "tool_use":
            return {"content": "", "tool_calls": [{"name": block.name, "args": block.input}], "done": False}
    text = "".join(b.text for b in resp.content if hasattr(b, "text"))
    return {"content": text, "tool_calls": [], "done": True}

Groq provides ultra-fast inference for open-source models like Llama and Mixtral using custom LPU hardware. Its OpenAI-compatible API makes it a drop-in replacement when you need low latency.

pip install groq

poetry add groq

uv add groq

examples/python/guides/integrations/llm_groq.py

def complete(messages: list[dict]) -> dict:
    r = client.chat.completions.create(
        model="llama-3.3-70b-versatile",
        messages=messages,
        tool_choice="auto",
        tools=[
            {
                "type": "function",
                "function": {
                    "name": "get_weather",
                    "description": "Get weather for a location",
                    "parameters": {
                        "type": "object",
                        "properties": {"location": {"type": "string"}},
                        "required": ["location"],
                    },
                },
            }
        ],
    )
    msg = r.choices[0].message
    tool_calls = [
        {"name": tc.function.name, "args": json.loads(tc.function.arguments or "{}")}
        for tc in (msg.tool_calls or [])
    ]
    return {"content": msg.content or "", "tool_calls": tool_calls, "done": not tool_calls}

The Vercel AI SDK is a TypeScript toolkit that provides a unified interface across providers (OpenAI, Anthropic, Google, and more). It includes helpers for streaming, tool calls, and structured object generation via generateText and streamText.

Vercel AI SDK is JavaScript/TypeScript only. Use OpenAI, Anthropic, or Groq SDK directly.

Ollama runs open-source models locally with no API key required. It supports Llama, Mistral, Gemma, and others through a simple REST API on localhost:11434. Ideal for development, air-gapped environments, or when you want full control over your model.

pip install ollama

poetry add ollama

uv add ollama

examples/python/guides/integrations/llm_ollama.py

def complete(messages: list[dict]) -> dict:
    resp = ollama.chat(model="llama2", messages=messages)
    content = resp.get("message", {}).get("content", "")
    tool_calls = resp.get("message", {}).get("tool_calls") or []
    return {"content": content, "tool_calls": tool_calls, "done": not tool_calls}

Generate and validate

This task takes the prompt from Step 2, calls the LLM, and validates the response. If validation fails, Retry Policies retry just this step with a corrective prompt.

examples/python/guides/llm_pipelines/worker.py

@llm_wf.task(parents=[prompt_task])
async def generate_task(input: PipelineInput, ctx: Context) -> dict:
    prev = ctx.task_output(prompt_task)
    output = get_llm_service().generate(prev["prompt"])
    if not output.get("valid"):
        raise ValueError("Validation failed")
    return output

Run the worker

Start the worker. Configure Rate Limits to stay within LLM provider quotas.

examples/python/guides/llm_pipelines/worker.py

worker = hatchet.worker(
    "llm-pipeline-worker",
    workflows=[llm_wf],
)
worker.start()

⚠️

Always set timeouts on LLM call steps. Model providers can hang or respond slowly under load. See Timeouts for configuration.

Common Patterns

Pattern	Description
Generate → Validate	Call LLM, validate structured output, retry with error context on failure
Chain of thought	Multi-step reasoning where each LLM call refines the previous output
Parallel evaluation	Fan out the same prompt to multiple models, then pick the best response
Translation pipeline	Generate content in one language, translate to others in parallel
Summarize → Classify	Summarize long text, then classify the summary for routing or tagging