Model Installer API

The Model Installer API is the CoreAI service used to turn model locations and model artifacts into actual serving deployments.

It sits between model sources such as Hugging Face, S3-compatible storage, and MLflow on one side, and the Runtime serving path on the other side.

In practice, this is the service that lets teams do two kinds of work:

import models into the local MLflow repository
register a model for inference so it becomes available through the platform serving layer

What The Service Actually Does

When a model is registered for inference, the service does more than just store metadata.

It:

creates a kubeai.org/v1 Model resource in Kubernetes
registers the same model in the internal proxy layer used for OpenAI-compatible access
keeps long-running import flows asynchronous when artifact download or upload takes time
protects the API with Keycloak token validation

For direct inference registration, the request goes through POST /register_model.

For artifact import and repository management, the service also exposes endpoints for:

POST /download_hf_model
POST /download_s3_model
POST /register_s3_model
DELETE /delete_model
DELETE /unregister_model/{name}/{namespace}

Choose The Right Path

Use these rules of thumb:

If you already have a model URL that the serving layer can use, register it directly with register_model.
If you want the model stored in the platform's local repository first, import it into MLflow, then deploy it from the repository.
If weights already exist in S3 or MinIO and you do not want to copy them again, use register_s3_model.

Direct Deployment For Inference

The main inference deployment endpoint is:

POST /register_model

The backend schema requires these core fields:

name
engine
features
url

Common deployment fields include:

namespace, defaulting to kubeai
resourceProfile, in the format <profile>:<count> such as nvidia-gpu-l4:1
replicas, minReplicas, and maxReplicas
timeout, stream_timeout, and max_retries
model_mode and model metadata such as token limits or embedding dimensions

The service supports these serving engines at API level:

VLLM
OLlama
FasterWhisper
Infinity

The supported URL patterns depend on the engine and storage path:

hf://<org>/<model>
pvc://<pvcName> or pvc://<pvcName>/<subpath>
s3://<bucket>/<path>
gs://<bucket>/<path>
oss://<bucket>/<path>
ollama://<model> for OLlama

For object storage URLs such as s3://, gs://, and oss://, the API supports cache-based serving with cacheProfile.

Example: Deploy A Model Directly

{
  "name": "mistral-small",
  "namespace": "kubeai",
  "engine": "VLLM",
  "features": ["TextGeneration"],
  "url": "hf://mistralai/Mistral-Small-3.2-24B-Instruct-2506",
  "resourceProfile": "nvidia-gpu-l4:1",
  "replicas": 1,
  "minReplicas": 0,
  "maxReplicas": 2,
  "timeout": 30,
  "stream_timeout": 1,
  "max_retries": 5,
  "model_mode": "chat",
  "max_input_tokens": 8192,
  "max_output_tokens": 2048
}

After a successful registration, the service creates the Kubernetes model resource and also registers the model in the proxy layer so it can be surfaced through the platform's OpenAI-compatible path.

Importing Models Into The Local Repository

The local repository path is built around MLflow.

This is useful when you want a platform-managed artifact location and a reusable repository entry before deployment.

Hugging Face Import

POST /download_hf_model

This flow:

downloads the model from the Hugging Face Hub
uploads the artifacts to MLflow
registers a model version in MLflow
optionally registers the imported model for inference if register_inference is set to true

These jobs run in the background.

S3 Import With Copy

POST /download_s3_model

This flow downloads model artifacts from S3-compatible storage and then uploads them into MLflow.

The API accepts direct credentials or an OpenBao secret reference for S3 access.

S3 Registration Without Copy

POST /register_s3_model

This is the lighter-weight path when the model is already present in object storage.

Instead of downloading and re-uploading the weights, the service registers the S3 URI directly as the model source in MLflow.

That makes it the better choice when:

artifacts are already in a stable bucket
duplicate storage is not wanted
you still want an MLflow model entry and version history

How The Portal Uses It

The CoreAI Portal uses the service in two main ways: model deployment and model repository import.

Portal Deployment Flow

For deployment, the portal ultimately sends a POST /register_model request.

The user-facing flow looks like this:

Open the model deployment flow from the Models area.
Choose either an easy preset-driven path or the advanced deployment form.
Fill in the model source, deployment name, engine, resource profile, scaling, and optional advanced settings.
Submit the form so the portal sends the deployment payload to the Model Installer service.

Easy Setup In The Portal

The easy setup path is preset-based.

It:

lets users browse curated model presets by category such as llm, embedding, audio, and multimodal
pre-fills deployment defaults such as engine, resource profile, token limits, tags, and features
keeps most preset fields read-only in simple mode
allows switching to Customize Deployment when teams need to tune the deployment

In easy mode, the portal also pre-fills:

namespace as kubeai
owner from the signed-in user email

Advanced Setup In The Portal

The advanced form is closer to the raw API.

It has separate steps for:

model information
deployment configuration
advanced configuration

The portal exposes a Model URL selector with two sources:

Manual URL
From Repository

When From Repository is used, the portal:

fetches registered MLflow models
allows selection of only READY versions
resolves the artifact URI for the latest usable version
pre-fills the deployment form with that artifact-backed URL

The portal also auto-generates the deployment name from the chosen URL when possible.

Portal Field Mapping

The portal does a small amount of transformation before calling the backend.

It combines resourceProfile and instances into the API's resourceProfile format, such as nvidia-gpu-l4:2.
It maps model mode choices into serving features.

Current portal mappings include:

completion -> TextGeneration
embedding -> TextEmbedding
audio-transcription -> SpeechToText

The portal deployment UI currently exposes VLLM and FasterWhisper as engine choices, even though the backend API supports additional engines.

Portal Repository Flow

The portal also exposes an MLflow repository view.

From there, users can:

browse registered models and versions
inspect the latest version status
open a model detail page
click Deploy Model to jump into the advanced deployment page with the artifact URI pre-filled

This is the cleanest user path when a model is already present in the local repository and the next step is only inference deployment.

Portal Downloader Flow

The portal has a separate Model Downloader page for importing Hugging Face models into the local repository.

That page:

collects the Hugging Face model name and revision
auto-generates an MLflow experiment name from the model name unless the user overrides it
defaults the artifact path to model
starts a background download_hf_model request
polls MLflow until the imported model becomes READY
checks Model Installer pod logs to detect download failures

An important current behavior is that the downloader imports into MLflow, but does not automatically deploy the model for inference in the current portal flow.

The portal sends register_inference: false for that path, so deployment still happens as a separate step afterward.

Endpoint Summary

Endpoint	Purpose	Used in portal today
`POST /register_model`	Register a serving deployment in Kubernetes and proxy	Yes
`DELETE /unregister_model/{name}/{namespace}`	Remove a serving deployment	Yes
`POST /download_hf_model`	Import a Hugging Face model into MLflow	Yes
`POST /download_s3_model`	Import a model from S3 into MLflow	Not in the current portal flow
`POST /register_s3_model`	Register an existing S3 URI directly in MLflow	Not in the current portal flow
`DELETE /delete_model`	Delete model records and optionally artifacts from MLflow and storage	Yes

Authentication And Access

The REST API is protected with OAuth2 bearer tokens and validates them through Keycloak token introspection.

In the portal, model management actions are gated by the can_manage_models permission. The portal then acquires a backend client-credentials token before calling the Model Installer service.

Direct Deploy Vs Repository Import

Use direct deploy when:

you already know the model URL to serve
the main goal is to expose the model quickly in the cluster
you do not need a separate local repository onboarding step first

Use repository import first when:

you want a managed MLflow entry and version history
you want a reusable local artifact source for later deployments
you want operators to deploy from repository entries instead of raw model URLs

Model Installer API

On this page