Evaluation

4 products in this category

Langfuse

Featured

Open-source LLM observability, tracing, and evaluation platform

Langfuse is an open-source platform for LLM application observability. It provides traces, spans, scores, and evals for debugging and monitoring LLM pipelines, with SDKs for Python and JavaScript and integrations with LangChain, LlamaIndex, and more.

Observability Evaluation OSS

View Details ↗

Arize Phoenix

Verified

Open-source AI observability and evaluation for LLMs and ML models

Phoenix by Arize is an open-source AI observability platform that provides real-time tracing, evaluation datasets, and retrieval analysis for LLM and ML applications. Runs locally or on cloud infrastructure.

Observability Evaluation OSS

View Details ↗

DeepEval

Verified

Open-source LLM evaluation framework for CI/CD pipelines

DeepEval is an open-source evaluation framework for testing and monitoring LLM applications. It provides 14+ evaluation metrics, integrates with pytest, and enables continuous LLM quality testing in CI/CD workflows.

Evaluation OSS

View Details ↗

RAGAS

Featured

Automated evaluation framework for RAG pipelines

RAGAS is an open-source framework for evaluating Retrieval-Augmented Generation (RAG) pipelines. It provides reference-free metrics like faithfulness, answer relevance, and context precision using LLM-as-judge methods.

Evaluation RAG OSS

View Details ↗

Browse Other Categories

AI Agents Multi-Agent Frameworks RAG Vector Databases NL-to-SQL Semantic Layer MLOps Observability Guardrails Prompt Engineering AI IDEs AI Coding Assistants AI Infrastructure Knowledge Graphs Workflow Orchestration LLM Hosting Synthetic Data Fine-Tuning Enterprise AI Platforms

← Back to All Tools