Tempo HA docs - Explanation: correlating traces, metrics, logs

Certain advanced data-correlation features of Tempo, such as the Service Map aka Service Graph, and Trace-To-Logs require the grafana admin to configure the Tempo datasource and point it to a Prometheus or Loki datasource respectively.

The Tempo datasource present in COS’ grafana, however, is managed by the tempo-coordinator-k8s charm and therefore this configuration should happen automatically as a response of the user integrating tempo with a prometheus/mimir and a loki backend respectively.

The grafana_datasource interface

The grafana_datasource interface is meant for applications such as Mimir to register themselves with grafana as datasources, so that grafana (and dashboards in particular) can query them to surface useful information to the end user.

As soon as it receives a datasource configuration, grafana assigns to it an internal UID. So if we want to configure Mimir to use a certain Tempo datasource for cross-referencing metrics generated from the traces, we must give Mimir Tempo’s datasource UID. So the first step for getting this to work is to use the grafana_uid field exposed by the grafana_datasource interface.

If tempo and grafana had a conversation, it would go like this:

It’s important to understand that this happens on a per-unit basis. Each tempo-coordinator-k8s unit exposes a datasource endpoint, and grafana can query them independently, and will assign to each one a separate UID. So, the correct picture is:

The grafana_datasource_exchange interface

The grafana_datasource_exchange interface is meant for applications such as Tempo to tell other charms (for example Mimir) what their datasource UIDs are, so that Tempo can configure itself in Grafana to cross-reference its data with the one in Mimir.

Using this data, Tempo can configure itself to reference data stored in Mimir through grafana, that is, without talking directly to the tempo backend but accessing the data it needs by querying grafana.

So when we relate tempo to grafana over grafana_datasource, Tempo is going to add some configuration telling Grafana where to find a Mimir backend to fetch trace metrics from.

Multiple grafanas

Let’s complicate the picture a bit and suppose Tempo and Mimir are related to two different grafanas.

Now, Tempo is only related to grafana-a, and Mimir to grafana-b. When Tempo receives Mimir’s datasource IDs over grafana_datasource_exchange, it must be able to determine whether the grafana application that that datasource ID is valid for the grafana instance that Tempo is talking to. In the picture above, that wouldn’t be the case.

grafana_uid

For this reason, the grafana_datasource interface includes a grafana_uid field that uniquely identifies the grafana instance that provisioned the datasource.

Now, Tempo can add some logic to verify whether the mimir datasources it is aware of (via grafana_datasource_exchange) are addressable by its datasource because they are backed by the same grafana instance, or not. By comparing the grafana_uid associated with each datasource UID, it can know whether it’s the same grafana that Tempo is related to, or a different one.

It is important to keep in mind that, for the purposes of data correlation, it is irrelevant which mimir unit’s datasource UID is used by Tempo, as the data available in each endpoint should be exactly the same.

So Tempo can filter out from the datasource UIDs it receives from Mimir all those that are associated with ‘the wrong grafana’, and pick a random one from the rest.

1 Like