At a Glance:
Hatchet is an orchestration engine that provides durable background task processing, AI agent workflows, and task orchestration with built-in Postgres-backed durability, automatic retries, real-time monitoring, and multi-tenant support for Python, TypeScript, Go, and Ruby applications.
Overview:
Hatchet is an orchestration platform for background tasks, AI agents, and durable workflows designed for systems where correctness, reliability, and horizontal scalability are essential. It provides a complete platform including queuing, configurable retry policies, cron scheduling, event-based triggering, and real-time observability with OpenTelemetry and Prometheus metrics. Hatchet uses Postgres as its durability layer for both task runtime and observability, making it particularly suited for self-hosted deployments. It supports DAG-based workflows and durable execution patterns, allowing teams to centralize async processing in a single platform. The system is multi-tenant by default and includes user roles, worker affinity scheduling, concurrency controls, and rate limiting capabilities.
Key Decision Points:
Self-hosting or cloud: Available as a managed cloud service with autoscaling, multi-region deployments, and SSO, or can be self-hosted with a CLI-based setup process that requires Docker.
Durability model: Uses Postgres as a persistence layer for both task execution history and observability data, unlike Redis or RabbitMQ-based queues that do not persist tasks after execution completes.
Throughput vs durability tradeoff: Load-tested up to 10k tasks/second but consumes more resources than broker-based systems built on Redis or RabbitMQ, which can achieve higher throughput.
Multi-paradigm orchestration: Supports durable execution (drop-in replacement for Temporal/DBOS), general-purpose queuing, and DAG-based workflows within a single platform.
Observability stack: Includes a real-time web UI with alerting, OpenTelemetry collection, and Prometheus metrics export as built-in components.
Core Features:
Durable tasks: Fault-tolerant, long-running workflows that persist execution history and can recover from failures with intermediate state preserved.
Configurable retry policies: Flexible retry strategies with optional exponential backoff for background tasks and workflows.
Worker affinity and routing: Task routing based on worker labels and weighted scheduling rules for complex distribution requirements.
Concurrency policies: Fair scheduling with dynamic concurrency limits based on configurable keys to prevent workers from taking on more work than they can handle.
Dynamic rate limiting: Rate limit enforcement for third-party API calls or per-user limits using dynamic rate limit keys.
OpenTelemetry integration: Built-in collector for trace and metric export, with support for external OpenTelemetry destinations.
Use Cases:
Engineering teams running fault-tolerant, long-running workflows that must recover gracefully from infrastructure failures and require persistent execution history for debugging.
Developers building event-driven distributed systems that combine cron scheduling, event-based triggers, webhook-triggered tasks, and durable execution in a single orchestration layer.
Teams replacing Temporal or DBOS workflows with a Postgres-backed durable execution engine that includes built-in multi-tenancy, user roles, and real-time observability.
Applications requiring centralized async processing across background tasks, DAG-based data pipelines, and AI agent orchestration with rate limiting and concurrency controls.
Open-Source Alternative Value:
Hatchet provides a self-hostable orchestration platform that replaces separate queue, DAG, and durable execution tools with a single Postgres-backed system. Its durability model persists full execution history rather than discarding task state after completion, which addresses a common limitation in Redis or RabbitMQ-based task queues like Celery or BullMQ. The platform also functions as a drop-in replacement for Temporal or DBOS workflows while adding built-in observability through OpenTelemetry and Prometheus, multi-tenancy, and user role management without requiring additional infrastructure components.




