Durable orchestration platform for managing AI agents, scheduling background tasks, and running mission-critical workflows.

At a Glance:

Hatchet is an orchestration engine that provides durable background task processing, AI agent workflows, and task orchestration with built-in Postgres-backed durability, automatic retries, real-time monitoring, and multi-tenant support for Python, TypeScript, Go, and Ruby applications.

Overview:

Hatchet is an orchestration platform for background tasks, AI agents, and durable workflows designed for systems where correctness, reliability, and horizontal scalability are essential. It provides a complete platform including queuing, configurable retry policies, cron scheduling, event-based triggering, and real-time observability with OpenTelemetry and Prometheus metrics. Hatchet uses Postgres as its durability layer for both task runtime and observability, making it particularly suited for self-hosted deployments. It supports DAG-based workflows and durable execution patterns, allowing teams to centralize async processing in a single platform. The system is multi-tenant by default and includes user roles, worker affinity scheduling, concurrency controls, and rate limiting capabilities.

Key Decision Points:

  • Self-hosting or cloud: Available as a managed cloud service with autoscaling, multi-region deployments, and SSO, or can be self-hosted with a CLI-based setup process that requires Docker.

  • Durability model: Uses Postgres as a persistence layer for both task execution history and observability data, unlike Redis or RabbitMQ-based queues that do not persist tasks after execution completes.

  • Throughput vs durability tradeoff: Load-tested up to 10k tasks/second but consumes more resources than broker-based systems built on Redis or RabbitMQ, which can achieve higher throughput.

  • Multi-paradigm orchestration: Supports durable execution (drop-in replacement for Temporal/DBOS), general-purpose queuing, and DAG-based workflows within a single platform.

  • Observability stack: Includes a real-time web UI with alerting, OpenTelemetry collection, and Prometheus metrics export as built-in components.

Core Features:

  • Durable tasks: Fault-tolerant, long-running workflows that persist execution history and can recover from failures with intermediate state preserved.

  • Configurable retry policies: Flexible retry strategies with optional exponential backoff for background tasks and workflows.

  • Worker affinity and routing: Task routing based on worker labels and weighted scheduling rules for complex distribution requirements.

  • Concurrency policies: Fair scheduling with dynamic concurrency limits based on configurable keys to prevent workers from taking on more work than they can handle.

  • Dynamic rate limiting: Rate limit enforcement for third-party API calls or per-user limits using dynamic rate limit keys.

  • OpenTelemetry integration: Built-in collector for trace and metric export, with support for external OpenTelemetry destinations.

Use Cases:

  • Engineering teams running fault-tolerant, long-running workflows that must recover gracefully from infrastructure failures and require persistent execution history for debugging.

  • Developers building event-driven distributed systems that combine cron scheduling, event-based triggers, webhook-triggered tasks, and durable execution in a single orchestration layer.

  • Teams replacing Temporal or DBOS workflows with a Postgres-backed durable execution engine that includes built-in multi-tenancy, user roles, and real-time observability.

  • Applications requiring centralized async processing across background tasks, DAG-based data pipelines, and AI agent orchestration with rate limiting and concurrency controls.

Open-Source Alternative Value:

Hatchet provides a self-hostable orchestration platform that replaces separate queue, DAG, and durable execution tools with a single Postgres-backed system. Its durability model persists full execution history rather than discarding task state after completion, which addresses a common limitation in Redis or RabbitMQ-based task queues like Celery or BullMQ. The platform also functions as a drop-in replacement for Temporal or DBOS workflows while adding built-in observability through OpenTelemetry and Prometheus, multi-tenancy, and user role management without requiring additional infrastructure components.

PartagerXLinkedInReddit

Outils associés

Statistiques du projet

Étoiles

7,393

Forks

424

Licence

MIT

Métadonnées

Alternative à
Make