Cloud Models
How CoreAI works with cloud-hosted models.
Cloud models are handled differently from locally hosted models.
They are usually not deployed through the Model Installer API into KubeAI. Instead, they are configured as provider-backed models in LiteLLM, which then exposes them through the shared CoreAI model gateway.
This is the right pattern for models coming from services such as:
Azure OpenAIorAzure AI FoundryOpenRouterOpenAIAnthropic- other providers supported by
LiteLLM
When To Use This Path
Use the LiteLLM cloud-model path when:
- the model is hosted by an external provider
- you do not want to run the weights inside the cluster
- the platform only needs a routed provider connection, not a Kubernetes inference deployment
Use the Model Installer API instead when the model should be imported, registered, or served from platform-managed infrastructure.
Important Positioning
This page is about platform configuration for operators and administrators.
For application development, the preferred stable integration surface is still the CoreAI API, not the internal LiteLLM endpoint directly.
What Actually Happens
The operational flow for cloud models is:
- A platform operator adds the provider-backed model in the
LiteLLMadmin UI. LiteLLMstores the provider configuration and credentials reference.- CoreAI services route model calls through
LiteLLMusing the configured model alias. - Applications should continue to use the higher-level CoreAI entry points exposed by the platform.
Before You Start
Make sure you have:
- the
LiteLLMadmin URL, usuallyhttps://<litellm-host>/ui - administrator access to the LiteLLM UI, or the cluster's LiteLLM admin key in
sk-...form - the provider endpoint and API key
- the model alias that the consuming service expects
That last point matters for embeddings in particular. Some internal services expect a specific LiteLLM model name, so the alias you register may need to match an existing configuration rather than a name you invent on the spot.
Add A Chat Model In LiteLLM
The simplest onboarding path is through the LiteLLM admin UI.
Step-by-Step
- Open the LiteLLM admin UI:
https://<litellm-host>/ui- Sign in with an admin account or the cluster's LiteLLM admin key.
- Open
Add model. - Select the provider.
- Enter a stable model alias.
- Fill in the provider endpoint, API version if required, and provider API key.
- Test the connection.
- Save the model.
Azure Chat Example
For an Azure-hosted chat model, use placeholder values like these:
| LiteLLM field | Example placeholder |
|---|---|
| Provider | Azure |
| Model name | <chat-model-alias> |
| API base | https://<azure-resource>.cognitiveservices.azure.com/ |
| API version | <azure-api-version> |
| Azure API key | <azure-api-key> |
Good model aliases are stable names that make sense to platform consumers.
Examples:
<team-chat-model><provider>/<chat-model-alias><product-default-chat-model>
Do not treat the example values above as fixed requirements. Use the alias, provider endpoint, and version that match your own deployment.
Add An Embedding Model In LiteLLM
Embedding models follow the same general flow, but there are two extra concerns:
- the model must be registered in
Embeddingmode - the output vector size must match what downstream systems expect
Step-by-Step
- Open
Add modelin the LiteLLM admin UI. - Select the provider.
- Enter the embedding model alias expected by the consuming workflow.
- Set the model mode to
Embedding. - Fill in the provider endpoint, version, and credentials.
- In advanced settings, set the embedding output dimension expected by the workload.
- Test the connection.
- Save the model.
Azure Embedding Example
For an Azure-hosted embedding model, the fields will often look similar to the chat-model setup:
| LiteLLM field | Example placeholder |
|---|---|
| Provider | Azure |
| Model name | <embedding-model-alias> |
| Mode | Embedding |
| API base | https://<azure-resource>.cognitiveservices.azure.com/ |
| API version | <azure-api-version> |
| Azure API key | <azure-api-key> |
Example advanced settings:
{
"output_vector_size": 1536
}Use the correct dimension for the actual embedding model you are registering.
Note About RAG Pipelines
If a retrieval workflow such as rag-pipeline already expects a specific embedding alias, register the model under that exact LiteLLM name or update the consuming configuration to match.
For example, a deployment may already expect something like:
<provider>/<embedding-model-name>The important part is consistency between:
- the LiteLLM alias
- the application or pipeline configuration
- the embedding dimension expected by the vector store and retrieval flow
After The Model Is Added
Once the cloud model is registered in LiteLLM:
- it can be routed through the shared LLM gateway
- CoreAI services can target it by the configured alias
- the Portal or other internal services can use it if they are configured with that alias
For application developers, the recommended pattern still remains:
Application -> CoreAI API -> internal model routing
not:
Application -> LiteLLM directly
Practical Rules For Naming
Choose model aliases deliberately.
- Use stable aliases rather than provider-generated deployment names where possible.
- If an internal workflow already expects a specific alias, preserve that alias.
- Keep chat and embedding aliases clearly distinct.
- Avoid changing aliases casually after applications or pipelines depend on them.
Same Principle For Other Providers
The Azure examples above are only one common case.
The same operational pattern applies to providers such as OpenRouter and similar services:
- choose the provider in LiteLLM
- define a stable alias
- enter the provider credentials and endpoint settings
- set the correct mode for the workload
- test the connection
- save and use the alias consistently
Always prefer provider-agnostic naming in your docs and operational runbooks unless a particular consuming workflow requires an exact provider-prefixed alias.
Common Mistakes
- registering an embedding model as a chat model
- forgetting to set the embedding dimension in advanced settings
- using an alias that does not match the consuming application or pipeline
- copying environment-specific URLs, keys, or versions into shared documentation
- exposing LiteLLM directly as the long-term developer integration point