Add to Cursor Add to Claude Copy for LLM View as MD

Reasoning Loop

AI agents follow a reason-act-observe loop that can run for minutes or hours, repeating until the LLM determines the task is complete or a deterministic exit condition is met (max iterations, timeout, tool signal).

In Hatchet, this is implemented as a durable task with a loop. At each iteration, the task spawns a child to call the LLM, execute any tool calls, and determine whether additional iterations are required. Each completed iteration is checkpointed, so the agent survives crashes and worker slots are freed between iterations.

When to use

Scenario	Fit
Chatbot that picks tools based on user messages	Good: the loop runs until the agent has a final answer
Multi-step research that may take minutes	Good: durable execution survives long-running loops
Agent that needs human approval mid-loop	Good: combine with Human-in-the-Loop
Fixed pipeline (prompt, generate, validate)	Skip: use LLM Pipelines instead
One-shot classification or extraction	Skip: a single task is simpler

Step-by-step walkthrough

You’ll build a durable agent task that streams tokens and survives restarts.

Reasoning loop

Define the core loop. Each iteration calls the LLM, executes any tool calls, and checks whether the task is complete.

examples/python/guides/ai_agents/worker.py

async def agent_reasoning_loop(query: str) -> dict:
    llm = get_llm_service()
    tools = get_tool_service()
    messages = [{"role": "user", "content": query}]
    for _ in range(10):
        resp = llm.complete(messages)
        if resp.get("done"):
            return {"response": resp["content"]}
        for tc in resp.get("tool_calls", []):
            result = tools.run(tc["name"], tc.get("args", {}))
            messages.append({"role": "tool", "content": result})
    return {"response": "Max iterations reached"}

The examples above use a mock LLM. To call a real provider, swap get_llm_service() with one of these. Tool execution is typically your own APIs; encapsulate them in a service module like the get_tool_service() helper shown above.

OpenAI’s Chat Completions API provides access to GPT models for text generation, function calling, and structured outputs. It’s the most widely adopted LLM API and supports streaming, tool use, and JSON mode.

pip install openai

poetry add openai

uv add openai

examples/python/guides/integrations/llm_openai.py

def complete(messages: list[dict]) -> dict:
    r = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=messages,
        tool_choice="auto",
        tools=[
            {
                "type": "function",
                "function": {
                    "name": "get_weather",
                    "description": "Get weather for a location",
                    "parameters": {
                        "type": "object",
                        "properties": {"location": {"type": "string"}},
                        "required": ["location"],
                    },
                },
            }
        ],
    )
    msg = r.choices[0].message
    tool_calls = [
        {"name": tc.function.name, "args": json.loads(tc.function.arguments or "{}")}
        for tc in (msg.tool_calls or [])
    ]
    return {"content": msg.content or "", "tool_calls": tool_calls, "done": not tool_calls}

Anthropic’s Messages API powers the Claude family of models, including Claude Sonnet and Claude Haiku. Claude excels at long-context reasoning, careful instruction following, and tool use with extended thinking support.

pip install anthropic

poetry add anthropic

uv add anthropic

examples/python/guides/integrations/llm_anthropic.py

def complete(messages: list[dict]) -> dict:
    resp = client.messages.create(
        model="claude-3-5-haiku-20241022",
        max_tokens=1024,
        messages=[{"role": m["role"], "content": m["content"]} for m in messages],
        tools=[{"name": "get_weather", "description": "Get weather", "input_schema": {"type": "object", "properties": {"location": {"type": "string"}}}}],
    )
    for block in resp.content:
        if block.type == "tool_use":
            return {"content": "", "tool_calls": [{"name": block.name, "args": block.input}], "done": False}
    text = "".join(b.text for b in resp.content if hasattr(b, "text"))
    return {"content": text, "tool_calls": [], "done": True}

Groq provides ultra-fast inference for open-source models like Llama and Mixtral using custom LPU hardware. Its OpenAI-compatible API makes it a drop-in replacement when you need low latency.

pip install groq

poetry add groq

uv add groq

examples/python/guides/integrations/llm_groq.py

def complete(messages: list[dict]) -> dict:
    r = client.chat.completions.create(
        model="llama-3.3-70b-versatile",
        messages=messages,
        tool_choice="auto",
        tools=[
            {
                "type": "function",
                "function": {
                    "name": "get_weather",
                    "description": "Get weather for a location",
                    "parameters": {
                        "type": "object",
                        "properties": {"location": {"type": "string"}},
                        "required": ["location"],
                    },
                },
            }
        ],
    )
    msg = r.choices[0].message
    tool_calls = [
        {"name": tc.function.name, "args": json.loads(tc.function.arguments or "{}")}
        for tc in (msg.tool_calls or [])
    ]
    return {"content": msg.content or "", "tool_calls": tool_calls, "done": not tool_calls}

The Vercel AI SDK is a TypeScript toolkit that provides a unified interface across providers (OpenAI, Anthropic, Google, and more). It includes helpers for streaming, tool calls, and structured object generation via generateText and streamText.

Vercel AI SDK is JavaScript/TypeScript only. Use OpenAI, Anthropic, or Groq SDK directly.

Ollama runs open-source models locally with no API key required. It supports Llama, Mistral, Gemma, and others through a simple REST API on localhost:11434. Ideal for development, air-gapped environments, or when you want full control over your model.

pip install ollama

poetry add ollama

uv add ollama

examples/python/guides/integrations/llm_ollama.py

def complete(messages: list[dict]) -> dict:
    resp = ollama.chat(model="llama2", messages=messages)
    content = resp.get("message", {}).get("content", "")
    tool_calls = resp.get("message", {}).get("tool_calls") or []
    return {"content": content, "tool_calls": tool_calls, "done": not tool_calls}

Wrap it in a durable task

Create a durable task that invokes the reasoning loop from Step 1. Concurrency is set to CANCEL_IN_PROGRESS so new user messages cancel stale runs.

examples/python/guides/ai_agents/worker.py

@hatchet.durable_task(
    name="ReasoningLoopAgent",
    concurrency=ConcurrencyExpression(
        expression="input.session_id != null ? string(input.session_id) : 'constant'",
        max_runs=1,
        limit_strategy=ConcurrencyLimitStrategy.CANCEL_IN_PROGRESS,
    ),
)
async def agent_task(input: EmptyModel, ctx: DurableContext) -> dict:
    """Agent loop: reason, act, observe. Streams output, survives restarts."""
    query = "Hello"
    if isinstance(input, dict) and input.get("query"):
        query = str(input["query"])
    elif hasattr(input, "query") and input.query:
        query = str(input.query)
    return await agent_reasoning_loop(query)

Stream the response

Emit LLM tokens from the task as they are generated. Clients subscribe to the stream and receive them in real-time. See Streaming for the full API.

examples/python/guides/ai_agents/worker.py

@hatchet.durable_task(
    name="StreamingAgentTask",
    concurrency=ConcurrencyExpression(
        expression="input.session_id != null ? string(input.session_id) : 'constant'",
        max_runs=1,
        limit_strategy=ConcurrencyLimitStrategy.CANCEL_IN_PROGRESS,
    ),
)
async def streaming_agent(input: EmptyModel, ctx: DurableContext) -> dict:
    """Stream tokens to the client as they're generated."""
    tokens = ["Hello", " ", "world", "!"]
    for t in tokens:
        await ctx.aio_put_stream(t)
    return {"done": True}

Run the worker

Start the worker. The task definitions above use CANCEL_IN_PROGRESS concurrency so new user messages cancel stale runs. Pass session_id in input for per-session grouping.

examples/python/guides/ai_agents/worker.py

worker = hatchet.worker(
    "agent-worker",
    workflows=[agent_task, streaming_agent],
    slots=5,
)
worker.start()

examples/typescript/guides/ai-agents/worker.ts

const worker = await hatchet.worker('agent-worker', {
  workflows: [agentTask, streamingAgentTask],
  slots: 5,
});
await worker.start();

examples/go/guides/ai-agents/main.go

worker, err := client.NewWorker("agent-worker",
	hatchet.WithWorkflows(agentTask, streamingTask),
	hatchet.WithSlots(5),
	hatchet.WithDurableSlots(5),
)
if err != nil {
	log.Fatalf("failed to create worker: %v", err)
}

interruptCtx, cancel := cmdutils.NewInterruptContext()
defer cancel()

if err := worker.StartBlocking(interruptCtx); err != nil {
	cancel()
	log.Fatalf("failed to start worker: %v", err)
}

The Ruby SDK is in early access, and may change. We'd love your feedback!

examples/ruby/guides/ai_agents/worker.rb

worker = HATCHET.worker('agent-worker', slots: 5, workflows: [AGENT_TASK, STREAMING_AGENT])
worker.start

⚠️

Always set a timeout and max iteration count on agent loops. Without bounds, an agent can loop indefinitely. See Timeouts for configuration.

Variant: Evaluator-Optimizer

The evaluator-optimizer is a specialized reasoning loop that uses two LLM calls per iteration: one to generate a candidate output and one to evaluate it against a rubric. If the score is below a threshold, the evaluator provides feedback and the generator tries again. This trades compute cost for output quality.

Use case	Generator	Evaluator
Content writing	Draft post/email/copy	Score clarity, tone, length; provide edit suggestions
Code generation	Write function or query	Run tests or linter; feed back errors
Data extraction	Extract fields from text	Validate against schema; flag missing fields
Translation	Translate text	Back-translate and compare; score fidelity

Define the generator and evaluator tasks

Create separate tasks for generation and evaluation. The generator takes a topic and optional feedback; the evaluator scores a draft.

examples/python/guides/evaluator_optimizer/worker.py

@generator_wf.task()
async def generate_draft(input: dict, ctx: Context) -> dict:
    prompt = (
        f"Improve this draft.\n\nDraft: {input['previous_draft']}\nFeedback: {input['feedback']}"
        if input.get("feedback")
        else f"Write a social media post about \"{input['topic']}\" for {input['audience']}. Under 100 words."
    )
    return {"draft": mock_generate(prompt)}


@evaluator_wf.task()
async def evaluate_draft(input: dict, ctx: Context) -> dict:
    return mock_evaluate(input["draft"])

examples/typescript/guides/evaluator-optimizer/workflow.ts

const generatorTask = hatchet.task({
  name: 'generate-draft',
  fn: async (input: GeneratorInput) => {
    const prompt = input.feedback
      ? `Improve this draft.\n\nDraft: ${input.previousDraft}\nFeedback: ${input.feedback}`
      : `Write a social media post about "${input.topic}" for ${input.audience}. Under 100 words.`;
    return { draft: mockGenerate(prompt) };
  },
});

const evaluatorTask = hatchet.task({
  name: 'evaluate-draft',
  fn: async (input: EvaluatorInput) => {
    return mockEvaluate(input.draft);
  },
});

examples/go/guides/evaluator-optimizer/main.go

generatorTask := client.NewStandaloneTask("generate-draft", func(ctx hatchet.Context, input GeneratorInput) (map[string]interface{}, error) {
	var prompt string
	if input.Feedback != nil {
		prompt = fmt.Sprintf("Improve this draft.\n\nDraft: %s\nFeedback: %s", *input.PreviousDraft, *input.Feedback)
	} else {
		prompt = fmt.Sprintf("Write a social media post about \"%s\" for %s. Under 100 words.", input.Topic, input.Audience)
	}
	return map[string]interface{}{"draft": MockGenerate(prompt)}, nil
})

evaluatorTask := client.NewStandaloneTask("evaluate-draft", func(ctx hatchet.Context, input EvaluatorInput) (map[string]interface{}, error) {
	result := MockEvaluate(input.Draft)
	return map[string]interface{}{"score": result.Score, "feedback": result.Feedback}, nil
})

The Ruby SDK is in early access, and may change. We'd love your feedback!

examples/ruby/guides/evaluator_optimizer/worker.rb

GENERATOR_WF.task(:generate_draft) do |input, _ctx|
  prompt = if input['feedback']
             "Improve this draft.\n\nDraft: #{input['previous_draft']}\nFeedback: #{input['feedback']}"
           else
             "Write a social media post about \"#{input['topic']}\" for #{input['audience']}. Under 100 words."
           end
  { 'draft' => mock_generate(prompt) }
end

EVALUATOR_WF.task(:evaluate_draft) do |input, _ctx|
  mock_evaluate(input['draft'])
end

Optimization loop

The evaluator-optimizer task loops: generate, evaluate, check score. Each generator and evaluator call is a spawned child task that is checkpointed on completion.

examples/python/guides/evaluator_optimizer/worker.py

@hatchet.durable_task(name="EvaluatorOptimizer", execution_timeout="5m")
async def evaluator_optimizer(input: EmptyModel, ctx: DurableContext) -> dict:
    max_iterations = 3
    threshold = 0.8
    draft = ""
    feedback = ""

    for i in range(max_iterations):
        generated = await generator_wf.aio_run(
            input={
                "topic": input["topic"],
                "audience": input["audience"],
                "previous_draft": draft or None,
                "feedback": feedback or None,
            }
        )
        draft = generated["draft"]

        evaluation = await evaluator_wf.aio_run(
            input={"draft": draft, "topic": input["topic"], "audience": input["audience"]}
        )

        if evaluation["score"] >= threshold:
            return {"draft": draft, "iterations": i + 1, "score": evaluation["score"]}
        feedback = evaluation["feedback"]

    return {"draft": draft, "iterations": max_iterations, "score": -1}

The core loop pattern behind agent reasoning, where a task re-spawns itself until a goal is met.

Child Spawning

Pause agents for human feedback or scheduled retries without holding worker slots.

Long Waits

Add approval gates to agent workflows. Pause for human review, then resume.

Human-in-the-Loop

Agents that spawn parallel tool calls or sub-agent tasks.

Fanout

Route agent behavior based on LLM tool call decisions or user preferences.

Branching