LLM Hosting

4 products in this category

vLLM

Featured

High-throughput LLM inference engine with PagedAttention

vLLM is an open-source, high-performance LLM inference and serving engine. It uses PagedAttention for efficient KV-cache management, achieving 24× higher throughput than Hugging Face Transformers for production serving.

LLM Hosting AI Infrastructure OSS

View Details ↗

LiteLLM

Featured

Unified API for 100+ LLMs — OpenAI-compatible proxy

LiteLLM provides a unified OpenAI-compatible API interface across 100+ LLM providers including OpenAI, Anthropic, Gemini, Mistral, and self-hosted models. It includes a proxy server, cost tracking, rate limiting, and load balancing.

LLM Hosting AI Infrastructure OSS

View Details ↗

Ollama

Featured

Run large language models locally with one command

Ollama enables developers to run open-source LLMs locally on macOS, Linux, and Windows with a simple CLI and REST API. It manages model downloads, quantization, and GPU acceleration automatically.

LLM Hosting AI Infrastructure OSS

View Details ↗

Hugging Face Transformers

Featured

The standard library for working with pretrained ML models

Transformers by Hugging Face is the most widely used open-source library for working with pretrained models across NLP, vision, audio, and multimodal tasks. Supports PyTorch, TensorFlow, and JAX with 500k+ models on the Hub.

AI Infrastructure Fine-Tuning LLM Hosting OSS

View Details ↗

Browse Other Categories

AI Agents Multi-Agent Frameworks RAG Vector Databases NL-to-SQL Semantic Layer MLOps Observability Evaluation Guardrails Prompt Engineering AI IDEs AI Coding Assistants AI Infrastructure Knowledge Graphs Workflow Orchestration Synthetic Data Fine-Tuning Enterprise AI Platforms

← Back to All Tools