# vLLM (/docs/runtime/components/vllm)



Component Category [#component-category]

Inference / LLM serving engine

Component Description [#component-description]

vLLM is a high-throughput and memory-efficient inference engine designed for serving large language models.

Why It Is Used [#why-it-is-used]

In BullSequana AI Runtime, vLLM powers efficient LLM inference with strong performance characteristics for production workloads, especially where throughput and GPU utilization matter.

Learn More [#learn-more]

* [vLLM documentation](https://docs.vllm.ai/en/stable/)
* [vllm-project/vllm on GitHub](https://github.com/vllm-project/vllm)

Interacts With [#interacts-with]

* `KubeAI`, which uses vLLM as one of its inference backends for model serving.
* `Model Installer` and other serving workflows, which deploy or operate models on top of vLLM-backed runtimes.
