At a Glance:
dstack is an open-source unified control plane for GPU provisioning and orchestration that works across any GPU cloud, Kubernetes, and on-prem clusters, streamlining development, training, and inference workflows through a CLI, API, or AI agent skills.
Overview:
dstack is an open-source GPU orchestration control plane designed to provision and manage compute resources across diverse infrastructures, including any GPU cloud provider, Kubernetes clusters, and on-prem servers. It acts as a unified layer for streamlining development, training, and inference tasks. Users interact with dstack by defining fleets, dev environments, tasks, services, and volumes as YAML files within a repository and applying them via its CLI, a programmatic API, or integrated AI agent skills. The platform automates underlying complexities such as provisioning, job queuing, auto-scaling, and networking, making it suitable for developers and teams needing a consistent workflow for hardware-accelerated workloads without being tied to a specific cloud vendor.
Key Decision Points:
Unified multi-cloud & on-prem provisioning:
dstackworks with any GPU cloud, Kubernetes, or on-prem clusters, allowing users to manage diverse compute pools from a single control plane.Workload definition via YAML in-repo: All compute needs, including fleets, dev environments, tasks, and services, are defined as YAML configuration files stored directly in the project repository.
Interaction through CLI, API, and AI agents: Users can apply configurations and manage resources using the
dstackCLI, a programmatic API, or by installing agent skills that enable AI tools like Claude and Cursor to act on their behalf.Server deployed by the user: Users must launch and configure their own
dstackserver to connect to their backends, with the server supported on Linux, macOS, and Windows.Online compute broker model: The
dstackserver must be running for users to provision and orchestrate compute resources; it is not a local-only tool.
Core Features:
Fleets: Management of cloud and on-prem computing clusters.
Dev environments: Interactive development setups intended for use with a desktop IDE.
Tasks: Scheduling of batch jobs, distributed jobs, or running web apps.
Services: Deployment of models and web apps with auto-scaling and authorization.
Volumes: Management of persistent disk volumes.
AI agent skills: A mechanism to allow AI agents like Claude, Codex, and Cursor to manage fleets and submit workloads by editing configuration files and using the CLI.
Use Cases:
Developers who need to provision GPUs across different cloud providers and on-prem servers using a single, consistent YAML-based workflow.
Practitioners running machine learning training or inference jobs who want to automate provisioning, job queuing, and failure handling across diverse backends.
Teams deploying models and web apps with auto-scaling capabilities without managing each cloud provider's native tooling directly.
Open-Source Alternative Value:
dstack provides an open-source option for GPU orchestration that is not limited to a single cloud provider, allowing developers to configure a unified provisioning layer on their own infrastructure. Its value lies in consolidating multi-cloud and on-prem GPU management into a single server and workflow, with the added ability for AI agents to directly manage resources through its skills system. The configuration-as-code model, where all resource definitions live as YAML files in a repo, gives users a transparent and reproducible way to manage their compute environment.

