Observability Model

How Runtime handles metrics, logs, traces, and operational visibility.

Agentic Friendly

Runtime observability is designed as a shared telemetry pipeline rather than a disconnected set of monitoring tools.

Signal Types

Runtime collects and exposes three main signal categories:

SignalMain toolsPrimary purpose
MetricsPrometheus, Grafanahealth, performance, alerting, capacity tracking
LogsGrafana Alloy, Loki, Grafanaoperational debugging and incident investigation
TracesTempo, Grafanadistributed request and workflow visibility

Stack Roles

  • Prometheus collects and stores metrics from Runtime services.
  • Grafana provides dashboards and a common exploration interface. The current live dashboard inventory is documented on the Grafana component page.
  • Grafana Alloy collects and forwards telemetry from workloads and nodes.
  • Loki stores and queries logs.
  • Tempo stores and explores distributed traces.

Operating Pattern

The observability model is intentionally integrated:

  • metrics show service health and behavior over time
  • logs explain what happened inside components
  • traces show how requests move across services

Together, these signals make it easier to understand both infrastructure problems and cross-service platform issues.

Why This Matters

Because Runtime is the common base of the platform, its telemetry model also becomes the common telemetry model for CoreAI and ProAI. That gives operators one place to inspect the platform instead of separate monitoring islands for each layer.

On this page