Telemetry collection in COS HA charms using the Coordinator/Workers pattern

ca-scribner · 22 August 2025 14:43

Many COS HA charms, such as Tempo, Loki, and Mimir, follow a “Coordinator/Workers”. This pattern is described in detail here. The following post extends this description, focusing on how telemetry is collected from Coordinator/Worker charms.

Metrics

Metrics configuration is provided by the Coordinator to a metrics scraper via the metrics-endpoint relation endpoint on the prometheus_scrape interface. The metrics configuration provided includes all workloads (the Coordinator’s nginx and all Worker workloads)
Coordinator:
- uses MetricsEndpointProvider to send alert rules to the scraper
- uses MetricsEndpointProvider to send scrape targets for all workloads in the coordinated-worker solution (nginx and all worker workloads)
- nginx workload’s metrics are scraped by the scraper
Worker:
- no metrics configuration occurs in the Worker charm
- workload metrics are scraped by the scraper GET requests direct to every Worker unit

Relation topology:

Data flow:

Logs

Logging configuration is provided to the Coordinator via the logging relation endpoint using the loki_push_api interface
Coordinator:
- uses LogForwarder to configure pebble to send all coordinator workload (nginx) logs to the logging store
- extracts logging configuration by interrogating the logging relation data directly (using loki_endpoints_by_unit, not by using a library from loki) and injects that into the Cluster relation
Worker:
- uses ManualLogForwarder (a local version similar to LogForwarder) to configure pebble to send all Worker workload logs to the logging store, using configuration from the Cluster relation

Relation topology:

Data flow:

Traces

Tracing configuration is provided to the Coordinator via the charm-tracing and workload-tracing relation endpoints, both using the tracing interface
- charm-tracing in the Coordinator is used to configure tracing in all coordinated-worker charms (Coordinator and all related Workers)
Coordinator:
- gets tracing configuration from charm-tracing and workload-tracing by instantiating a TracingEndpointRequirer for each
- sends the protocols required for charm-tracing and workload-tracing back to the related tracing provider using the instantiated TracingEndpointRequirer objects
- uses the charm-tracing configuration to:
  - configure the Coordinator charm to emit traces using ops.tracing machinery
  - forward charm-tracing configuration to the Workers via the Cluster databag
- uses the workload-tracing configuration to:
  - forward workload-tracing configuration to the Workers via the Cluster relation
Worker:
- gets tracing configuration from the Cluster relation
- configures the workload to emit traces to the provided tracing store

Relation topology:

Data flow: