Telemetry collection in COS HA charms using the Coordinator/Workers pattern

Many COS HA charms, such as Tempo, Loki, and Mimir, follow a “Coordinator/Workers”. This pattern is described in detail here. The following post extends this description, focusing on how telemetry is collected from Coordinator/Worker charms.

Metrics

  • Metrics configuration is provided by the Coordinator to a metrics scraper via the metrics-endpoint relation endpoint on the prometheus_scrape interface. The metrics configuration provided includes all workloads (the Coordinator’s nginx and all Worker workloads)
  • Coordinator:
    • uses MetricsEndpointProvider to send alert rules to the scraper
    • uses MetricsEndpointProvider to send scrape targets for all workloads in the coordinated-worker solution (nginx and all worker workloads)
    • nginx workload’s metrics are scraped by the scraper
  • Worker:
    • no metrics configuration occurs in the Worker charm
    • workload metrics are scraped by the scraper GET requests direct to every Worker unit

Relation topology:

Data flow:

Logs

  • Logging configuration is provided to the Coordinator via the logging relation endpoint using the loki_push_api interface
  • Coordinator:
    • uses LogForwarder to configure pebble to send all coordinator workload (nginx) logs to the logging store
    • extracts logging configuration by interrogating the logging relation data directly (using loki_endpoints_by_unit, not by using a library from loki) and injects that into the Cluster relation
  • Worker:
    • uses ManualLogForwarder (a local version similar to LogForwarder) to configure pebble to send all Worker workload logs to the logging store, using configuration from the Cluster relation

Relation topology:

Data flow:

Traces

  • Tracing configuration is provided to the Coordinator via the charm-tracing and workload-tracing relation endpoints, both using the tracing interface
    • charm-tracing in the Coordinator is used to configure tracing in all coordinated-worker charms (Coordinator and all related Workers)
  • Coordinator:
    • gets tracing configuration from charm-tracing and workload-tracing by instantiating a TracingEndpointRequirer for each
    • sends the protocols required for charm-tracing and workload-tracing back to the related tracing provider using the instantiated TracingEndpointRequirer objects
    • uses the charm-tracing configuration to:
      • configure the Coordinator charm to emit traces using ops.tracing machinery
      • forward charm-tracing configuration to the Workers via the Cluster databag
    • uses the workload-tracing configuration to:
      • forward workload-tracing configuration to the Workers via the Cluster relation
  • Worker:
    • gets tracing configuration from the Cluster relation
    • configures the workload to emit traces to the provided tracing store

Relation topology:

Data flow:

1 Like