KubeAI

Component Category

Inference / model serving orchestration

Component Description

KubeAI is a Kubernetes-native inference operator for deploying and scaling AI models in production.

Why It Is Used

In BullSequana AI Runtime, KubeAI provides the operational layer that helps run model-serving workloads on Kubernetes with more predictable scaling, routing, and platform integration.

Learn More

KubeAI documentation
substratusai/kubeai on GitHub

Interacts With

MinIO, which provides object storage and dedicated credentials for KubeAI.
vLLM and FasterWhisper, which are part of the model-serving runtime KubeAI orchestrates.
Model Installer, which targets the KubeAI service endpoint to register and manage models.