Arize Phoenix

Open-source platform for LLM tracing, evaluation, and optimization. Features automatic instrumentation, prompt playground, and real-time AI application monitoring.

At a Glance:

Phoenix is an open-source AI observability platform for tracing, evaluating, and experimenting with LLM applications, supporting OpenTelemetry-based instrumentation and running locally, in notebooks, containers, or the cloud.

Overview:

Phoenix is an open-source AI observability platform built for experimentation, evaluation, and troubleshooting of LLM applications. It provides tracing through OpenTelemetry-based instrumentation, LLM-assisted evaluation for responses and retrieval, versioned datasets, and systematic experiment tracking for prompts and models. A built-in playground supports prompt optimization, model comparison, parameter adjustment, and replay of traced calls, while prompt management enables version control and tagging. Phoenix works with popular frameworks and LLM providers, and can run on a local machine, inside a Jupyter notebook, in a containerized deployment, or in the cloud.

Key Decision Points:

Deployment flexibility: Runs locally, in Jupyter notebooks, as a containerized deployment, or in the cloud, fitting different development environments.
OpenTelemetry-native tracing: Uses OpenTelemetry for instrumentation, making it vendor- and framework-agnostic with out-of-the-box support for numerous Python and JavaScript integrations.
Multiple language packages: Available as a full platform or as lightweight Python and TypeScript subpackages for clients, OTEL wrappers, evaluations, and an MCP server.
Agent-assisted debugging: Includes an opt-in, permission-gated built-in agent (PXI) for debugging traces and iterating on prompts directly within the platform.
Coding agent skills: Provides skills for Claude Code, Cursor, and compatible tools to fetch traces, build evaluators, and work with instrumentation.

Core Features:

Tracing: Traces LLM application runtime behavior using OpenTelemetry-based instrumentation across frameworks and providers.
Evaluation: Benchmarks application performance with LLM-powered response and retrieval evaluations.
Datasets: Creates versioned datasets of examples for experimentation, evaluation, and fine-tuning workflows.
Experiments: Tracks and evaluates the impact of changes to prompts, LLMs, and retrieval configurations.
Playground: Optimizes prompts, compares models, adjusts parameters, and replays previously traced LLM calls.
Prompt Management: Manages prompt versions systematically through version control, tagging, and experimentation.

Use Cases:

Developers tracing LLM application behavior across multiple framework and provider integrations during development.
ML engineers and researchers evaluating prompt, model, and retrieval changes through structured experiments and versioned datasets.
Developers and coding agents debugging traces and optimizing prompts using the built-in agent or provided CLI skills.

Open-Source Alternative Value:

As an open-source AI observability tool, Phoenix allows developers to run tracing and evaluation workloads on local machines, in notebooks, or in self-managed containers. Its OpenTelemetry foundation provides a vendor-neutral approach to instrumentation, while the availability of lightweight Python and TypeScript packages lets users integrate observability into existing applications without depending on a hosted service. The licensed source code is available under the Elastic License 2.0, making the platform’s internals accessible for inspection and adaptation.

PartagerX LinkedIn Reddit

Outils associés

Mem059,029

Langfuse29,464

Supermemory27,256

Statistiques du projet

Étoiles

9,496

Forks

851

Licence

Other

Métadonnées

Alternative à: LangSmith
Catégorie: LLM Application Frameworks