Cloud Models

Cloud models are handled differently from locally hosted models.

They are usually not deployed through the Model Installer API into KubeAI. Instead, they are configured as provider-backed models in LiteLLM, which then exposes them through the shared CoreAI model gateway.

This is the right pattern for models coming from services such as:

Azure OpenAI or Azure AI Foundry
OpenRouter
OpenAI
Anthropic
other providers supported by LiteLLM

When To Use This Path

Use the LiteLLM cloud-model path when:

the model is hosted by an external provider
you do not want to run the weights inside the cluster
the platform only needs a routed provider connection, not a Kubernetes inference deployment

Use the Model Installer API instead when the model should be imported, registered, or served from platform-managed infrastructure.

Important Positioning

This page is about platform configuration for operators and administrators.

For application development, the preferred stable integration surface is still the CoreAI API, not the internal LiteLLM endpoint directly.

What Actually Happens

The operational flow for cloud models is:

A platform operator adds the provider-backed model in the LiteLLM admin UI.
LiteLLM stores the provider configuration and credentials reference.
CoreAI services route model calls through LiteLLM using the configured model alias.
Applications should continue to use the higher-level CoreAI entry points exposed by the platform.

Before You Start

Make sure you have:

the LiteLLM admin URL, usually https://<litellm-host>/ui
administrator access to the LiteLLM UI, or the cluster's LiteLLM admin key in sk-... form
the provider endpoint and API key
the model alias that the consuming service expects

That last point matters for embeddings in particular. Some internal services expect a specific LiteLLM model name, so the alias you register may need to match an existing configuration rather than a name you invent on the spot.

Add A Chat Model In LiteLLM

The simplest onboarding path is through the LiteLLM admin UI.

Step-by-Step

Open the LiteLLM admin UI:

https://<litellm-host>/ui

Sign in with an admin account or the cluster's LiteLLM admin key.
Open Add model.
Select the provider.
Enter a stable model alias.
Fill in the provider endpoint, API version if required, and provider API key.
Test the connection.
Save the model.

Azure Chat Example

For an Azure-hosted chat model, use placeholder values like these:

LiteLLM field	Example placeholder
Provider	`Azure`
Model name	`<chat-model-alias>`
API base	`https://<azure-resource>.cognitiveservices.azure.com/`
API version	`<azure-api-version>`
Azure API key	`<azure-api-key>`

Good model aliases are stable names that make sense to platform consumers.

Examples:

<team-chat-model>
<provider>/<chat-model-alias>
<product-default-chat-model>

Do not treat the example values above as fixed requirements. Use the alias, provider endpoint, and version that match your own deployment.

Add An Embedding Model In LiteLLM

Embedding models follow the same general flow, but there are two extra concerns:

the model must be registered in Embedding mode
the output vector size must match what downstream systems expect

Step-by-Step

Open Add model in the LiteLLM admin UI.
Select the provider.
Enter the embedding model alias expected by the consuming workflow.
Set the model mode to Embedding.
Fill in the provider endpoint, version, and credentials.
In advanced settings, set the embedding output dimension expected by the workload.
Test the connection.
Save the model.

Azure Embedding Example

For an Azure-hosted embedding model, the fields will often look similar to the chat-model setup:

LiteLLM field	Example placeholder
Provider	`Azure`
Model name	`<embedding-model-alias>`
Mode	`Embedding`
API base	`https://<azure-resource>.cognitiveservices.azure.com/`
API version	`<azure-api-version>`
Azure API key	`<azure-api-key>`

Example advanced settings:

{
  "output_vector_size": 1536
}

Use the correct dimension for the actual embedding model you are registering.

Note About RAG Pipelines

If a retrieval workflow such as rag-pipeline already expects a specific embedding alias, register the model under that exact LiteLLM name or update the consuming configuration to match.

For example, a deployment may already expect something like:

<provider>/<embedding-model-name>

The important part is consistency between:

the LiteLLM alias
the application or pipeline configuration
the embedding dimension expected by the vector store and retrieval flow

After The Model Is Added

Once the cloud model is registered in LiteLLM:

it can be routed through the shared LLM gateway
CoreAI services can target it by the configured alias
the Portal or other internal services can use it if they are configured with that alias

For application developers, the recommended pattern still remains:

Application -> CoreAI API -> internal model routing

not:

Application -> LiteLLM directly

Practical Rules For Naming

Choose model aliases deliberately.

Use stable aliases rather than provider-generated deployment names where possible.
If an internal workflow already expects a specific alias, preserve that alias.
Keep chat and embedding aliases clearly distinct.
Avoid changing aliases casually after applications or pipelines depend on them.

Same Principle For Other Providers

The Azure examples above are only one common case.

The same operational pattern applies to providers such as OpenRouter and similar services:

choose the provider in LiteLLM
define a stable alias
enter the provider credentials and endpoint settings
set the correct mode for the workload
test the connection
save and use the alias consistently

Always prefer provider-agnostic naming in your docs and operational runbooks unless a particular consuming workflow requires an exact provider-prefixed alias.

Common Mistakes

registering an embedding model as a chat model
forgetting to set the embedding dimension in advanced settings
using an alias that does not match the consuming application or pipeline
copying environment-specific URLs, keys, or versions into shared documentation
exposing LiteLLM directly as the long-term developer integration point

Cloud Models

On this page