vLLM

An open source project that optimizes the speed and affordability of deploying large language models for inference.

Overview