Components

KubeAI

Inference orchestration component for serving AI workloads in Runtime.

Agentic Friendly

Component Category

Inference / model serving orchestration

Component Description

KubeAI is a Kubernetes-native inference operator for deploying and scaling AI models in production.

Why It Is Used

In BullSequana AI Runtime, KubeAI provides the operational layer that helps run model-serving workloads on Kubernetes with more predictable scaling, routing, and platform integration.

Learn More

Interacts With

  • MinIO, which provides object storage and dedicated credentials for KubeAI.
  • vLLM and FasterWhisper, which are part of the model-serving runtime KubeAI orchestrates.
  • Model Installer, which targets the KubeAI service endpoint to register and manage models.

On this page