# Model Installer API (/docs/coreai/models/model-installer-api)


The `Model Installer API` is the CoreAI service used to turn model locations and model artifacts into actual serving deployments.

It sits between model sources such as `Hugging Face`, `S3`-compatible storage, and `MLflow` on one side, and the Runtime serving path on the other side.

In practice, this is the service that lets teams do two kinds of work:

* import models into the local `MLflow` repository
* register a model for inference so it becomes available through the platform serving layer

What The Service Actually Does [#what-the-service-actually-does]

When a model is registered for inference, the service does more than just store metadata.

It:

* creates a `kubeai.org/v1` `Model` resource in Kubernetes
* registers the same model in the internal proxy layer used for OpenAI-compatible access
* keeps long-running import flows asynchronous when artifact download or upload takes time
* protects the API with `Keycloak` token validation

For direct inference registration, the request goes through `POST /register_model`.

For artifact import and repository management, the service also exposes endpoints for:

* `POST /download_hf_model`
* `POST /download_s3_model`
* `POST /register_s3_model`
* `DELETE /delete_model`
* `DELETE /unregister_model/{name}/{namespace}`

Choose The Right Path [#choose-the-right-path]

Use these rules of thumb:

* If you already have a model URL that the serving layer can use, register it directly with `register_model`.
* If you want the model stored in the platform's local repository first, import it into `MLflow`, then deploy it from the repository.
* If weights already exist in `S3` or `MinIO` and you do not want to copy them again, use `register_s3_model`.

Direct Deployment For Inference [#direct-deployment-for-inference]

The main inference deployment endpoint is:

```text
POST /register_model
```

The backend schema requires these core fields:

* `name`
* `engine`
* `features`
* `url`

Common deployment fields include:

* `namespace`, defaulting to `kubeai`
* `resourceProfile`, in the format `<profile>:<count>` such as `nvidia-gpu-l4:1`
* `replicas`, `minReplicas`, and `maxReplicas`
* `timeout`, `stream_timeout`, and `max_retries`
* `model_mode` and model metadata such as token limits or embedding dimensions

The service supports these serving engines at API level:

* `VLLM`
* `OLlama`
* `FasterWhisper`
* `Infinity`

The supported URL patterns depend on the engine and storage path:

* `hf://<org>/<model>`
* `pvc://<pvcName>` or `pvc://<pvcName>/<subpath>`
* `s3://<bucket>/<path>`
* `gs://<bucket>/<path>`
* `oss://<bucket>/<path>`
* `ollama://<model>` for `OLlama`

For object storage URLs such as `s3://`, `gs://`, and `oss://`, the API supports cache-based serving with `cacheProfile`.

Example: Deploy A Model Directly [#example-deploy-a-model-directly]

```json
{
  "name": "mistral-small",
  "namespace": "kubeai",
  "engine": "VLLM",
  "features": ["TextGeneration"],
  "url": "hf://mistralai/Mistral-Small-3.2-24B-Instruct-2506",
  "resourceProfile": "nvidia-gpu-l4:1",
  "replicas": 1,
  "minReplicas": 0,
  "maxReplicas": 2,
  "timeout": 30,
  "stream_timeout": 1,
  "max_retries": 5,
  "model_mode": "chat",
  "max_input_tokens": 8192,
  "max_output_tokens": 2048
}
```

After a successful registration, the service creates the Kubernetes model resource and also registers the model in the proxy layer so it can be surfaced through the platform's OpenAI-compatible path.

Importing Models Into The Local Repository [#importing-models-into-the-local-repository]

The local repository path is built around `MLflow`.

This is useful when you want a platform-managed artifact location and a reusable repository entry before deployment.

Hugging Face Import [#hugging-face-import]

```text
POST /download_hf_model
```

This flow:

* downloads the model from the `Hugging Face Hub`
* uploads the artifacts to `MLflow`
* registers a model version in `MLflow`
* optionally registers the imported model for inference if `register_inference` is set to `true`

These jobs run in the background.

S3 Import With Copy [#s3-import-with-copy]

```text
POST /download_s3_model
```

This flow downloads model artifacts from `S3`-compatible storage and then uploads them into `MLflow`.

The API accepts direct credentials or an `OpenBao` secret reference for S3 access.

S3 Registration Without Copy [#s3-registration-without-copy]

```text
POST /register_s3_model
```

This is the lighter-weight path when the model is already present in object storage.

Instead of downloading and re-uploading the weights, the service registers the `S3` URI directly as the model source in `MLflow`.

That makes it the better choice when:

* artifacts are already in a stable bucket
* duplicate storage is not wanted
* you still want an `MLflow` model entry and version history

How The Portal Uses It [#how-the-portal-uses-it]

The `CoreAI Portal` uses the service in two main ways: model deployment and model repository import.

Portal Deployment Flow [#portal-deployment-flow]

For deployment, the portal ultimately sends a `POST /register_model` request.

The user-facing flow looks like this:

1. Open the model deployment flow from the `Models` area.
2. Choose either an easy preset-driven path or the advanced deployment form.
3. Fill in the model source, deployment name, engine, resource profile, scaling, and optional advanced settings.
4. Submit the form so the portal sends the deployment payload to the Model Installer service.

Easy Setup In The Portal [#easy-setup-in-the-portal]

The easy setup path is preset-based.

It:

* lets users browse curated model presets by category such as `llm`, `embedding`, `audio`, and `multimodal`
* pre-fills deployment defaults such as engine, resource profile, token limits, tags, and features
* keeps most preset fields read-only in simple mode
* allows switching to `Customize Deployment` when teams need to tune the deployment

In easy mode, the portal also pre-fills:

* `namespace` as `kubeai`
* `owner` from the signed-in user email

Advanced Setup In The Portal [#advanced-setup-in-the-portal]

The advanced form is closer to the raw API.

It has separate steps for:

* model information
* deployment configuration
* advanced configuration

The portal exposes a `Model URL` selector with two sources:

* `Manual URL`
* `From Repository`

When `From Repository` is used, the portal:

* fetches registered `MLflow` models
* allows selection of only `READY` versions
* resolves the artifact URI for the latest usable version
* pre-fills the deployment form with that artifact-backed URL

The portal also auto-generates the deployment name from the chosen URL when possible.

Portal Field Mapping [#portal-field-mapping]

The portal does a small amount of transformation before calling the backend.

* It combines `resourceProfile` and `instances` into the API's `resourceProfile` format, such as `nvidia-gpu-l4:2`.
* It maps model mode choices into serving features.

Current portal mappings include:

* `completion` -> `TextGeneration`
* `embedding` -> `TextEmbedding`
* `audio-transcription` -> `SpeechToText`

The portal deployment UI currently exposes `VLLM` and `FasterWhisper` as engine choices, even though the backend API supports additional engines.

Portal Repository Flow [#portal-repository-flow]

The portal also exposes an `MLflow` repository view.

From there, users can:

* browse registered models and versions
* inspect the latest version status
* open a model detail page
* click `Deploy Model` to jump into the advanced deployment page with the artifact URI pre-filled

This is the cleanest user path when a model is already present in the local repository and the next step is only inference deployment.

Portal Downloader Flow [#portal-downloader-flow]

The portal has a separate `Model Downloader` page for importing `Hugging Face` models into the local repository.

That page:

* collects the Hugging Face model name and revision
* auto-generates an `MLflow` experiment name from the model name unless the user overrides it
* defaults the artifact path to `model`
* starts a background `download_hf_model` request
* polls `MLflow` until the imported model becomes `READY`
* checks Model Installer pod logs to detect download failures

An important current behavior is that the downloader imports into `MLflow`, but does not automatically deploy the model for inference in the current portal flow.

The portal sends `register_inference: false` for that path, so deployment still happens as a separate step afterward.

Endpoint Summary [#endpoint-summary]

| Endpoint                                      | Purpose                                                               | Used in portal today           |
| --------------------------------------------- | --------------------------------------------------------------------- | ------------------------------ |
| `POST /register_model`                        | Register a serving deployment in Kubernetes and proxy                 | Yes                            |
| `DELETE /unregister_model/{name}/{namespace}` | Remove a serving deployment                                           | Yes                            |
| `POST /download_hf_model`                     | Import a Hugging Face model into MLflow                               | Yes                            |
| `POST /download_s3_model`                     | Import a model from S3 into MLflow                                    | Not in the current portal flow |
| `POST /register_s3_model`                     | Register an existing S3 URI directly in MLflow                        | Not in the current portal flow |
| `DELETE /delete_model`                        | Delete model records and optionally artifacts from MLflow and storage | Yes                            |

Authentication And Access [#authentication-and-access]

The REST API is protected with OAuth2 bearer tokens and validates them through `Keycloak` token introspection.

In the portal, model management actions are gated by the `can_manage_models` permission. The portal then acquires a backend client-credentials token before calling the Model Installer service.

Direct Deploy Vs Repository Import [#direct-deploy-vs-repository-import]

Use direct deploy when:

* you already know the model URL to serve
* the main goal is to expose the model quickly in the cluster
* you do not need a separate local repository onboarding step first

Use repository import first when:

* you want a managed `MLflow` entry and version history
* you want a reusable local artifact source for later deployments
* you want operators to deploy from repository entries instead of raw model URLs

Related Pages [#related-pages]

* [Models](/docs/coreai/models)
* [Model Installer](/docs/coreai/components/model-installer)
* [MLflow](/docs/coreai/components/mlflow)
* [CoreAI API](/docs/coreai/components/coreai-api)