Benchmarking Hatchet

This page provides example benchmarks for Hatchet throughput and latency on an 8 CPU database (Amazon RDS, m7g.2xlarge instance type). These benchmarks were all run against a v1 Hatchet engine running version v0.55.26. For more information on the setup, see the Setup section. Note that on better hardware, there will be significantly better performance: we have tested up to 10k/s on an m7g.8xlarge instance.

The best way to benchmark Hatchet is to run your own benchmarks in your own environment. The benchmarks below are provided as a reference point for what you might expect to see in a typical setup. To run your own benchmarks, see the Running your own benchmarks section.

Throughput

Below are summarized throughput benchmarks run at different incoming event rates. For each run, we note the database CPU utilization and estimated IOPS, which are the most relevant metrics for tracking performance on the database.

Throughput (runs/s)	Database CPU	Database IOPs
100	15%	400
500	60%	600
2000	83%	800

Latency

Benchmarks run using event-based triggering: this approximately doubles the queueing time of a workflow. The average latency of events in Hatchet can be approximated by two measurements that Hatchet reports:

Average execution time per executed event: The time from when the event starts execution to when it completes.
Average write time per event: The acknowledgement time for Hatchet to write the event.

Below is a table summarizing these latencies:

Throughput (runs/s)	Average Execution Time (ms)	Average Write Time (ms)
100	~40	~2.5
500	~48	~2.6
2000	~220	~5.7

For workloads up to around 100-500 events per second, the latency remains relatively low. As throughput scales toward 2000 events per second, the overall average execution time increases (though the Hatchet engine remained stable throughout the tests).

Running your own benchmarks

Hatchet publishes a public load testing container which can be used for benchmarking. This container is available at ghcr.io/hatchet-dev/hatchet/hatchet-loadtest. It acts as a Hatchet worker and event emitter, so it simply expects a HATCHET_CLIENT_TOKEN to be set in the environment.

For example, to run 100 events/second for 60 seconds, you can use the following command:

docker run -e HATCHET_CLIENT_TOKEN=your-token ghcr.io/hatchet-dev/hatchet/hatchet-loadtest -e "100" -d "60s" --level "warn" --slots "100"

The event emitter which is bundled into the container has difficulty emitting more than 2k events/s. As a result, to test higher throughputs, it is recommended to run multiple containers in parallel. Since each container manages its own workflows and worker, it is recommended to use the HATCHET_CLIENT_NAMESPACE environment variable to ensure that workflows are not duplicated across containers. For example:

# first container
docker run -e HATCHET_CLIENT_TOKEN=your-token -e HATCHET_CLIENT_NAMESPACE=loadtest1 ghcr.io/hatchet-dev/hatchet/hatchet-loadtest -e "2000" -d "60s" --level "warn" --slots "100"
 
# second container
docker run -e HATCHET_CLIENT_TOKEN=your-token -e HATCHET_CLIENT_NAMESPACE=loadtest2 ghcr.io/hatchet-dev/hatchet/hatchet-loadtest -e "2000" -d "60s" --level "warn" --slots "100"

Reference

This container takes the following arguments:

Usage:
  loadtest [flags]
 
Flags:
  -c, --concurrency int        concurrency specifies the maximum events to run at the same time
  -D, --delay duration         delay specifies the time to wait in each event to simulate slow tasks
  -d, --duration duration      duration specifies the total time to run the load test (default 10s)
  -F, --eventFanout int        eventFanout specifies the number of events to fanout (default 1)
  -e, --events int             events per second (default 10)
  -f, --failureRate float32    failureRate specifies the rate of failure for the worker
  -h, --help                   help for loadtest
  -l, --level string           logLevel specifies the log level (debug, info, warn, error) (default "info")
  -P, --payloadSize string     payload specifies the size of the payload to send (default "0kb")
  -s, --slots int              slots specifies the number of slots to use in the worker
  -w, --wait duration          wait specifies the total time to wait until events complete (default 10s)
  -p, --workerDelay duration   workerDelay specifies the time to wait before starting the worker

Running a benchmark on Kubernetes

You can use the following Pod manifest to run the load test on Kubernetes (make sure to fill in HATCHET_CLIENT_TOKEN):

apiVersion: v1
kind: Pod
metadata:
  name: loadtest1a
  namespace: staging
spec:
  restartPolicy: Never
  containers:
    - image: ghcr.io/hatchet-dev/hatchet/hatchet-loadtest:v0.56.0
      imagePullPolicy: Always
      name: loadtest
      command: ["/hatchet/hatchet-load-test"]
      args:
        - loadtest
        - --duration
        - "60s"
        - --events
        - "100"
        - --slots
        - "100"
        - --wait
        - "10s"
        - --level
        - warn
      env:
        - name: HATCHET_CLIENT_TOKEN
          value: "your-token"
        - name: HATCHET_CLIENT_NAMESPACE
          value: "loadtest1a"
      resources:
        limits:
          memory: 1Gi
        requests:
          cpu: 500m
          memory: 1Gi

Setup

All tests were run on a Kubernetes cluster on AWS configured with:

Hatchet engine replicas: 2 (using c7i.4xlarge instances to ensure CPU was not a bottleneck)
Database: m7g.2xlarge instance type (Amazon RDS)
Hatchet version: v0.55.26
AWS region: us-west-1

The database configuration was chosen to avoid disk and CPU contention until higher throughputs were reached. We observed that up to around 2000 events/second, the chosen database instance size kept up without major performance degradation. The Hatchet engine was deployed with 2 replicas, and each engine instance had ample CPU headroom on c7i.4xlarge nodes.

Worker Configuration Options Data Retention