vLLM
FeaturedHigh-throughput LLM inference engine with PagedAttention
vLLM is an open-source, high-performance LLM inference and serving engine. It uses PagedAttention for efficient KV-cache management, achieving 24ร higher throughput than Hugging Face Transformers for production serving.