Category

LLM Hosting

4 products in this category

vLLM logo

vLLM

Featured

High-throughput LLM inference engine with PagedAttention

vLLM is an open-source, high-performance LLM inference and serving engine. It uses PagedAttention for efficient KV-cache management, achieving 24ร— higher throughput than Hugging Face Transformers for production serving.

LiteLLM logo

LiteLLM

Featured

Unified API for 100+ LLMs โ€” OpenAI-compatible proxy

LiteLLM provides a unified OpenAI-compatible API interface across 100+ LLM providers including OpenAI, Anthropic, Gemini, Mistral, and self-hosted models. It includes a proxy server, cost tracking, rate limiting, and load balancing.

Ollama logo

Ollama

Featured

Run large language models locally with one command

Ollama enables developers to run open-source LLMs locally on macOS, Linux, and Windows with a simple CLI and REST API. It manages model downloads, quantization, and GPU acceleration automatically.

Hugging Face Transformers logo

Hugging Face Transformers

Featured

The standard library for working with pretrained ML models

Transformers by Hugging Face is the most widely used open-source library for working with pretrained models across NLP, vision, audio, and multimodal tasks. Supports PyTorch, TensorFlow, and JAX with 500k+ models on the Hub.