AI Dev Tools

vLLM

Open Source Updated this week

Fast LLM inference and serving engine

pip install vllm

Did you build this?

Claim your listing to see exactly how many AI agents recommend this tool, your success rate, and more. Free, no commission, no fees.

Claim This Listing

vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. Delivers 24x higher throughput than HuggingFace Transformers with PagedAttention, continuous batching, and optimized CUDA kernels. Supports Llama, Mistral, Qwen, and 40+ other models. OpenAI-compatible API server. 48k+ GitHub stars.

Save tools & get AI recommendations

Free forever. No credit card required.

Sign Up Free
Visit Website → ☆ Bookmark

Listed for free · No commission · Claim this listing

3 developers visited via IndieStack this month
𝕏 Share
aillminferenceservinggpucudallamamistralopenai-compatiblevllmthroughput
View on GitHub ★ 48,000Python
Using this saves ~120k tokens vs building from scratch
Something wrong? Log in to report.
Get weekly indie picks straight to your inbox