Concurrency Control in Hatchet Workflows
Hatchet provides powerful concurrency control features to help you manage the execution of your workflows. This is particularly useful when you have workflows that may be triggered frequently or have long-running steps, and you want to limit the number of concurrent executions to prevent overloading your system, ensure fairness, or avoid race conditions.
Concurrency strategies can be added to both Tasks
and Workflows
.
Why use concurrency control?
There are several reasons why you might want to use concurrency control in your Hatchet workflows:
-
Fairness: When you have multiple clients or users triggering workflows, concurrency control can help ensure fair access to resources. By limiting the number of concurrent runs per client or user, you can prevent a single client from monopolizing the system and ensure that all clients get a fair share of the available resources.
-
Resource management: If your workflow steps are resource-intensive (e.g., they make external API calls or perform heavy computations), running too many instances concurrently can overload your system. By limiting concurrency, you can ensure your system remains stable and responsive.
-
Avoiding race conditions: If your workflow steps modify shared resources, running multiple instances concurrently can lead to race conditions and inconsistent data. Concurrency control helps you avoid these issues by ensuring only a limited number of instances run at a time.
-
Compliance with external service limits: If your workflow steps interact with external services that have rate limits, concurrency control can help you stay within those limits and avoid being throttled or blocked.
-
Spike Protection: When you have workflows that are triggered by external events, such as webhooks or user actions, you may experience spikes in traffic that can overwhelm your system. Concurrency control can help you manage these spikes by limiting the number of concurrent runs and queuing new runs until resources become available.
Available Strategies:
GROUP_ROUND_ROBIN
: Distribute workflow instances across available slots in a round-robin fashion based on thekey
function.CANCEL_IN_PROGRESS
: Cancel the currently running workflow instances for the same concurrency key to free up slots for the new instance.CANCEL_NEWEST
: Cancel the newest workflow instance for the same concurrency key to free up slots for the new instance.
We’re always open to adding more strategies to fit your needs. Join our discord to let us know.
Setting concurrency on workers
In addition to setting concurrency limits at the workflow level, you can also control concurrency at the worker level by passing the maxRuns
option when creating a new Worker
instance:
This example will only let 1 run in each group run at a given time to fairly distribute the load across the workers.
Group Round Robin
How it works
When a new workflow instance is triggered, the GROUP_ROUND_ROBIN
strategy will:
- Determine the group that the instance belongs to based on the
key
function defined in the workflow’s concurrency configuration. - Check if there are any available slots for the instance’s group based on the
maxRuns
limit of available workers. - If a slot is available, the new workflow instance starts executing immediately.
- If no slots are available, the new workflow instance is added to a queue for its group.
- When a running workflow instance completes and a slot becomes available for a group, the next queued instance for that group (in round-robin order) is dequeued and starts executing.
This strategy ensures that workflow instances are processed fairly across different groups, preventing any one group from monopolizing the available resources. It also helps to reduce latency for instances within each group, as they are processed in a round-robin fashion rather than strictly in the order they were triggered.
When to use GROUP_ROUND_ROBIN
The GROUP_ROUND_ROBIN
strategy is particularly useful in scenarios where:
- You have multiple clients or users triggering workflow instances, and you want to ensure fair resource allocation among them.
- You want to process instances within each group in a round-robin fashion to minimize latency and ensure that no single instance within a group is starved for resources.
- You have long-running workflow instances and want to avoid one group’s instances monopolizing the available slots.
Keep in mind that the GROUP_ROUND_ROBIN
strategy may not be suitable for all use cases, especially those that require strict ordering or prioritization of the most recent events.
CANCEL_IN_PROGRESS
How it works
When a new workflow instance is triggered, the CANCEL_IN_PROGRESS
strategy will:
- Determine the group that the instance belongs to based on the
key
function defined in the workflow’s concurrency configuration. - Check if there are any available slots for the instance’s group based on the
maxRuns
limit of available workers. - If a slot is available, the new workflow instance starts executing immediately.
- If there are no available slots, currently running workflow instances for the same concurrency key are cancelled to free up slots for the new instance.
- The new workflow instance starts executing immediately.
When to use CANCEL_IN_PROGRESS
The CANCEL_IN_PROGRESS
strategy is particularly useful in scenarios where:
- You have long-running workflow instances that may become stale or irrelevant if newer instances are triggered.
- You want to prioritize processing the most recent data or events, even if it means canceling older workflow instances.
- You have resource-intensive workflows where it’s more efficient to cancel an in-progress instance and start a new one than to wait for the old instance to complete.
- Your user UI allows for multiple inputs, but only the most recent is relevant (i.e. chat messages, form submissions, etc.).
CANCEL_NEWEST
How it works
The CANCEL_NEWEST
strategy is similar to CANCEL_IN_PROGRESS
, but it cancels the newly enqueued run instead of the oldest.
When to use CANCEL_NEWEST
The CANCEL_NEWEST
strategy is particularly useful in scenarios where:
- You want to allow in progress runs to complete before starting new work.
- You have long-running workflow instances and want to avoid one group’s instances monopolizing the available slots.