AI Dev Tools

vLLM

Open Source Updated this week

Fast LLM inference and serving engine

pip install vllm

Did you build this?

Claim your listing to see exactly how many AI agents recommend this tool, your success rate, and more. Free, no commission, no fees.

Claim This Listing

vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. Delivers 24x higher throughput than HuggingFace Transformers with PagedAttention, continuous batching, and optimized CUDA kernels. Supports Llama, Mistral, Qwen, and 40+ other models. OpenAI-compatible API server. 48k+ GitHub stars.

Save tools & get AI recommendations

Free forever. No credit card required.

Visit Website → ☆ Bookmark

Listed for free · No commission · Claim this listing

3 developers visited via IndieStack this month

𝕏 Share

aillminferenceservinggpucudallamamistralopenai-compatiblevllmthroughput

View on GitHub ★ 48,000Python

Using this saves ~120k tokens vs building from scratch

Something wrong? Log in to report.