# Troubleshooting (/docs/troubleshooting)



Use this page as the main entry point when something is not working as expected on BullSequana AI.

The goal is simple:

1. identify the failure domain quickly
2. run a few high-signal checks
3. jump to the right section of the documentation

Start With Scope [#start-with-scope]

Before looking at a specific component, confirm what is actually failing:

* a single application or use case
* one platform service
* one namespace
* one environment
* the full platform rollout

That distinction avoids spending time in the wrong layer.

Route By Symptom [#route-by-symptom]

Access, login, or permission issues [#access-login-or-permission-issues]

Check these areas first:

* [Security Model](/docs/runtime/runtime-security-model)
* [Keycloak](/docs/runtime/components/keycloak)
* [OpenFGA](/docs/runtime/components/openfga)
* [Use local models via API](/docs/development/use-local-models-via-api)

Typical questions:

* is the user authenticated successfully
* is the token or API key valid
* is the right endpoint being used
* is authorization blocking access after authentication

Deployment, rollout, or upgrade failures [#deployment-rollout-or-upgrade-failures]

Check these pages first:

* [Deployment](/docs/deployment)
* [Configuration Model](/docs/deployment/configuration-model)
* [Deployment Sequence](/docs/deployment/playbooks/deployment-sequence)
* [Harbor Delivery](/docs/deployment/playbooks/harbor-delivery)

Typical questions:

* is the cluster configured with the right storage classes
* are registry credentials valid
* is Git pointing to the correct manifests branch and path
* is DNS and certificate configuration complete

Runtime service failures [#runtime-service-failures]

Check these pages first:

* [Runtime](/docs/runtime)
* [Observability Model](/docs/runtime/runtime-observability-model)
* [Components](/docs/runtime/components/apisix)

Typical questions:

* is the failing service healthy
* are its dependencies healthy
* is ingress reaching the service
* is storage, database, or secret access available

Developer integration or application issues [#developer-integration-or-application-issues]

Check these pages first:

* [Development](/docs/development)
* [Use local models via API](/docs/development/use-local-models-via-api)
* [Deploy apps on the platform](/docs/development/deploy-apps-on-the-platform)
* [Use Cases](/docs/development/use-cases)

Typical questions:

* is the application using the stable `CoreAI API`
* is the model name valid in the current environment
* is the bearer token valid
* is the application deployed through the expected GitOps path

First Operational Checks [#first-operational-checks]

These checks are useful in almost every incident:

```bash
kubectl get ns
kubectl get pods -A
kubectl get ingress -A
kubectl get events -A --sort-by=.lastTimestamp
```

If the problem is isolated to one namespace:

```bash
kubectl get pods -n <namespace>
kubectl describe pod <pod-name> -n <namespace>
kubectl logs <pod-name> -n <namespace>
```

Argo CD Tips [#argo-cd-tips]

Argo CD is often the fastest way to understand whether the issue is in desired state, sync, or runtime behavior.

What to look at [#what-to-look-at]

For each affected application, check:

* `Health`
* `Sync Status`
* target namespace
* source path and revision
* recent sync operation state

Quick checks:

```bash
kubectl get applications -n argocd
kubectl get application <app-name> -n argocd -o yaml
```

Common Argo CD patterns [#common-argo-cd-patterns]

`OutOfSync`

* the live cluster no longer matches Git
* the wrong branch, path, or values may be referenced
* a dependency may have changed without the application being updated

`Progressing`

* resources are still reconciling
* a dependency is not yet ready
* a hook or sync wave may still be running

`Degraded`

* the application synced, but some resources are unhealthy
* this usually means the problem has moved from GitOps into workload runtime behavior

Practical Argo CD questions [#practical-argo-cd-questions]

Ask these in order:

1. Is the application present in Argo CD?
2. Is it `Synced`?
3. Is it `Healthy`?
4. If not, which resource is failing?
5. Is the failing resource blocked by secret, database, ingress, or dependency readiness?

Sync-order issues [#sync-order-issues]

The platform uses hooks and sync waves in several places. If an application is present but not becoming healthy, check whether:

* prerequisite secrets exist
* the database or storage resource is ready
* the required CRDs are installed
* the application depends on another service that has not finished reconciling

Observability Checks [#observability-checks]

If the deployment exists but behavior is unclear, move to the observability stack:

* [Grafana](/docs/runtime/components/grafana)
* [Prometheus](/docs/runtime/components/prometheus)
* [Grafana Loki](/docs/runtime/components/grafana-loki)
* [Grafana Tempo](/docs/runtime/components/grafana-tempo)
* [Observability Model](/docs/runtime/runtime-observability-model)

Use:

* metrics to confirm health and saturation
* logs to identify failing components
* traces to understand cross-service behavior

Authentication And API Checks [#authentication-and-api-checks]

For CoreAI and developer-facing integrations, validate these points early:

* the application is calling the `CoreAI API`, not internal LiteLLM endpoints
* the model exists in `/v1/models`
* the token or `sk-bsq-...` API key is valid
* the failing request is authorized for the current user or key

When To Escalate [#when-to-escalate]

Escalate beyond first-line troubleshooting when:

* multiple platform layers fail at once
* Argo CD, ingress, and observability all show inconsistent state
* the issue appears related to cluster infrastructure rather than platform configuration
* the problem is reproducible across environments with the same manifests

Related Pages [#related-pages]

<Cards>
  <Card title="Deployment" href="/docs/deployment" />

  <Card title="Runtime" href="/docs/runtime" />

  <Card title="Development" href="/docs/development" />

  <Card title="Security Model" href="/docs/runtime/runtime-security-model" />

  <Card title="Observability Model" href="/docs/runtime/runtime-observability-model" />
</Cards>
