Distributed tracing is something that we’ve chatted about a few times as a team. I thought it would be worthwhile to start a discussion to record notes and ideas. I feel that Juju could have a role to play facilitating traces/spans, but am quite murky on what might be best.
The ecosystem seems to have settled on the OpenTelemetry standard.
An excerpt from the standard’s overview about what a distrubuted trace actually is:
Traces in OpenTelemetry are defined implicitly by their Spans . In particular, a Trace can be thought of as a directed acyclic graph (DAG) of Spans , where the edges between Spans are defined as parent/child relationship.
For example, the following is an example Trace made up of 6 Spans :
Causal relationships between Spans in a single Trace [Span A] ←←←(the root span) | +------+------+ | | [Span B] [Span C] ←←←(Span C is a `child` of Span A) | | [Span D] +---+-------+ | | [Span E] [Span F]
Sometimes it’s easier to visualize Traces with a time axis as in the diagram below:
Temporal relationships between Spans in a single Trace ––|–––––––|–––––––|–––––––|–––––––|–––––––|–––––––|–––––––|–> time [Span A···················································] [Span B··············································] [Span D··········································] [Span C········································] [Span E·······] [Span F··]