Category

Evaluation

4 products in this category

Langfuse logo

Langfuse

Featured

Open-source LLM observability, tracing, and evaluation platform

Langfuse is an open-source platform for LLM application observability. It provides traces, spans, scores, and evals for debugging and monitoring LLM pipelines, with SDKs for Python and JavaScript and integrations with LangChain, LlamaIndex, and more.

Arize Phoenix logo

Arize Phoenix

Verified

Open-source AI observability and evaluation for LLMs and ML models

Phoenix by Arize is an open-source AI observability platform that provides real-time tracing, evaluation datasets, and retrieval analysis for LLM and ML applications. Runs locally or on cloud infrastructure.

DeepEval logo

DeepEval

Verified

Open-source LLM evaluation framework for CI/CD pipelines

DeepEval is an open-source evaluation framework for testing and monitoring LLM applications. It provides 14+ evaluation metrics, integrates with pytest, and enables continuous LLM quality testing in CI/CD workflows.

RAGAS logo

RAGAS

Featured

Automated evaluation framework for RAG pipelines

RAGAS is an open-source framework for evaluating Retrieval-Augmented Generation (RAG) pipelines. It provides reference-free metrics like faithfulness, answer relevance, and context precision using LLM-as-judge methods.