Observability Model
How Runtime handles metrics, logs, traces, and operational visibility.
Runtime observability is designed as a shared telemetry pipeline rather than a disconnected set of monitoring tools.
Signal Types
Runtime collects and exposes three main signal categories:
| Signal | Main tools | Primary purpose |
|---|---|---|
| Metrics | Prometheus, Grafana | health, performance, alerting, capacity tracking |
| Logs | Grafana Alloy, Loki, Grafana | operational debugging and incident investigation |
| Traces | Tempo, Grafana | distributed request and workflow visibility |
Stack Roles
- Prometheus collects and stores metrics from Runtime services.
- Grafana provides dashboards and a common exploration interface. The current live dashboard inventory is documented on the Grafana component page.
- Grafana Alloy collects and forwards telemetry from workloads and nodes.
- Loki stores and queries logs.
- Tempo stores and explores distributed traces.
Operating Pattern
The observability model is intentionally integrated:
- metrics show service health and behavior over time
- logs explain what happened inside components
- traces show how requests move across services
Together, these signals make it easier to understand both infrastructure problems and cross-service platform issues.
Why This Matters
Because Runtime is the common base of the platform, its telemetry model also becomes the common telemetry model for CoreAI and ProAI. That gives operators one place to inspect the platform instead of separate monitoring islands for each layer.