Observability Model

Runtime observability is designed as a shared telemetry pipeline rather than a disconnected set of monitoring tools.

Signal Types

Runtime collects and exposes three main signal categories:

Signal	Main tools	Primary purpose
Metrics	Prometheus, Grafana	health, performance, alerting, capacity tracking
Logs	Grafana Alloy, Loki, Grafana	operational debugging and incident investigation
Traces	Tempo, Grafana	distributed request and workflow visibility

Stack Roles

Prometheus collects and stores metrics from Runtime services.
Grafana provides dashboards and a common exploration interface. The current live dashboard inventory is documented on the Grafana component page.
Grafana Alloy collects and forwards telemetry from workloads and nodes.
Loki stores and queries logs.
Tempo stores and explores distributed traces.

Operating Pattern

The observability model is intentionally integrated:

metrics show service health and behavior over time
logs explain what happened inside components
traces show how requests move across services

Together, these signals make it easier to understand both infrastructure problems and cross-service platform issues.

Because Runtime is the common base of the platform, its telemetry model also becomes the common telemetry model for CoreAI and ProAI. That gives operators one place to inspect the platform instead of separate monitoring islands for each layer.

Observability Model

Signal Types

Stack Roles

Operating Pattern

Why This Matters

On this page