Did you build this?
Claim your listing to see exactly how many AI agents recommend this tool, your success rate, and more. Free, no commission, no fees.
Claim This ListingvLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. Delivers 24x higher throughput than HuggingFace Transformers with PagedAttention, continuous batching, and optimized CUDA kernels. Supports Llama, Mistral, Qwen, and 40+ other models. OpenAI-compatible API server. 48k+ GitHub stars.
Save tools & get AI recommendations
Free forever. No credit card required.
Listed for free · No commission · Claim this listing