# Cloud Models (/docs/coreai/models/cloud-models)



Cloud models are handled differently from locally hosted models.

They are usually not deployed through the `Model Installer API` into `KubeAI`. Instead, they are configured as provider-backed models in `LiteLLM`, which then exposes them through the shared CoreAI model gateway.

This is the right pattern for models coming from services such as:

* `Azure OpenAI` or `Azure AI Foundry`
* `OpenRouter`
* `OpenAI`
* `Anthropic`
* other providers supported by `LiteLLM`

When To Use This Path [#when-to-use-this-path]

Use the LiteLLM cloud-model path when:

* the model is hosted by an external provider
* you do not want to run the weights inside the cluster
* the platform only needs a routed provider connection, not a Kubernetes inference deployment

Use the `Model Installer API` instead when the model should be imported, registered, or served from platform-managed infrastructure.

Important Positioning [#important-positioning]

This page is about platform configuration for operators and administrators.

For application development, the preferred stable integration surface is still the `CoreAI API`, not the internal `LiteLLM` endpoint directly.

What Actually Happens [#what-actually-happens]

The operational flow for cloud models is:

1. A platform operator adds the provider-backed model in the `LiteLLM` admin UI.
2. `LiteLLM` stores the provider configuration and credentials reference.
3. CoreAI services route model calls through `LiteLLM` using the configured model alias.
4. Applications should continue to use the higher-level CoreAI entry points exposed by the platform.

Before You Start [#before-you-start]

Make sure you have:

* the `LiteLLM` admin URL, usually `https://<litellm-host>/ui`
* administrator access to the LiteLLM UI, or the cluster's LiteLLM admin key in `sk-...` form
* the provider endpoint and API key
* the model alias that the consuming service expects

That last point matters for embeddings in particular. Some internal services expect a specific LiteLLM model name, so the alias you register may need to match an existing configuration rather than a name you invent on the spot.

Add A Chat Model In LiteLLM [#add-a-chat-model-in-litellm]

The simplest onboarding path is through the LiteLLM admin UI.

Step-by-Step [#step-by-step]

1. Open the LiteLLM admin UI:

```text
https://<litellm-host>/ui
```

2. Sign in with an admin account or the cluster's LiteLLM admin key.
3. Open `Add model`.
4. Select the provider.
5. Enter a stable model alias.
6. Fill in the provider endpoint, API version if required, and provider API key.
7. Test the connection.
8. Save the model.

Azure Chat Example [#azure-chat-example]

For an Azure-hosted chat model, use placeholder values like these:

| LiteLLM field | Example placeholder                                     |
| ------------- | ------------------------------------------------------- |
| Provider      | `Azure`                                                 |
| Model name    | `<chat-model-alias>`                                    |
| API base      | `https://<azure-resource>.cognitiveservices.azure.com/` |
| API version   | `<azure-api-version>`                                   |
| Azure API key | `<azure-api-key>`                                       |

Good model aliases are stable names that make sense to platform consumers.

Examples:

* `<team-chat-model>`
* `<provider>/<chat-model-alias>`
* `<product-default-chat-model>`

Do not treat the example values above as fixed requirements. Use the alias, provider endpoint, and version that match your own deployment.

Add An Embedding Model In LiteLLM [#add-an-embedding-model-in-litellm]

Embedding models follow the same general flow, but there are two extra concerns:

* the model must be registered in `Embedding` mode
* the output vector size must match what downstream systems expect

Step-by-Step [#step-by-step-1]

1. Open `Add model` in the LiteLLM admin UI.
2. Select the provider.
3. Enter the embedding model alias expected by the consuming workflow.
4. Set the model mode to `Embedding`.
5. Fill in the provider endpoint, version, and credentials.
6. In advanced settings, set the embedding output dimension expected by the workload.
7. Test the connection.
8. Save the model.

Azure Embedding Example [#azure-embedding-example]

For an Azure-hosted embedding model, the fields will often look similar to the chat-model setup:

| LiteLLM field | Example placeholder                                     |
| ------------- | ------------------------------------------------------- |
| Provider      | `Azure`                                                 |
| Model name    | `<embedding-model-alias>`                               |
| Mode          | `Embedding`                                             |
| API base      | `https://<azure-resource>.cognitiveservices.azure.com/` |
| API version   | `<azure-api-version>`                                   |
| Azure API key | `<azure-api-key>`                                       |

Example advanced settings:

```json
{
  "output_vector_size": 1536
}
```

Use the correct dimension for the actual embedding model you are registering.

Note About RAG Pipelines [#note-about-rag-pipelines]

If a retrieval workflow such as `rag-pipeline` already expects a specific embedding alias, register the model under that exact LiteLLM name or update the consuming configuration to match.

For example, a deployment may already expect something like:

```text
<provider>/<embedding-model-name>
```

The important part is consistency between:

* the LiteLLM alias
* the application or pipeline configuration
* the embedding dimension expected by the vector store and retrieval flow

After The Model Is Added [#after-the-model-is-added]

Once the cloud model is registered in LiteLLM:

* it can be routed through the shared LLM gateway
* CoreAI services can target it by the configured alias
* the Portal or other internal services can use it if they are configured with that alias

For application developers, the recommended pattern still remains:

`Application -> CoreAI API -> internal model routing`

not:

`Application -> LiteLLM directly`

Practical Rules For Naming [#practical-rules-for-naming]

Choose model aliases deliberately.

* Use stable aliases rather than provider-generated deployment names where possible.
* If an internal workflow already expects a specific alias, preserve that alias.
* Keep chat and embedding aliases clearly distinct.
* Avoid changing aliases casually after applications or pipelines depend on them.

Same Principle For Other Providers [#same-principle-for-other-providers]

The Azure examples above are only one common case.

The same operational pattern applies to providers such as `OpenRouter` and similar services:

1. choose the provider in LiteLLM
2. define a stable alias
3. enter the provider credentials and endpoint settings
4. set the correct mode for the workload
5. test the connection
6. save and use the alias consistently

Always prefer provider-agnostic naming in your docs and operational runbooks unless a particular consuming workflow requires an exact provider-prefixed alias.

Common Mistakes [#common-mistakes]

* registering an embedding model as a chat model
* forgetting to set the embedding dimension in advanced settings
* using an alias that does not match the consuming application or pipeline
* copying environment-specific URLs, keys, or versions into shared documentation
* exposing LiteLLM directly as the long-term developer integration point

Related Pages [#related-pages]

* [LiteLLM](/docs/coreai/components/litellm)
* [Models](/docs/coreai/models)
* [Model Installer API](/docs/coreai/models/model-installer-api)
* [Use Local Models via API](/docs/development/use-local-models-via-api)
