Tempo HA resource consumption

Table of Contents

The purpose of this document is to get an idea on how Tempo HA performs under different loads and assign a sensible value to use for CPU and memory for both, coordinator and worker.

Environment

  • No resource limit set
  • Microk8s v1.28.12
  • Juju 3.4.5
  • x86_64 8 core CPU
  • 16GB RAM
  • SSD disk

The results were obtained with the following charm versions:

App Version Charm Channel Rev
alertmanager 0.27.0 alertmanager-k8s edge 129
catalogue catalogue-k8s edge 59
grafana 9.5.3 grafana-k8s edge 118
grafana-agent-k8s 0.40.4 grafana-agent-k8s edge 86
loki 2.9.6 loki-k8s edge 163
prometheus 2.52.0 prometheus-k8s
traefik 2.11.0 traefik-k8s edge 203
tempo-coordinator tempo-coordinator-k8s edge 2
tempo-worker 2.4.0 tempo-worker-k8s edge 6
minio minio edge 357
s3-integrator s3-integrator edge 33

Method

We’ll be using Tempo workload metrics and Microk8’s kubelet cAdvisor metrics scraped by Prometheus to monitor the workloads and inspect them in Grafana.

Metrics

Tempo metrics

Tempo exposes a set of metrics that would be useful for our testing purposes. Below are the expressions used with those metrics:

  • rate(tempo_distributor_bytes_received_total[5m]) / 1024
  • rate(tempo_distributor_spans_received_total[5m])
  • tempodb_backend_bytes_total / 1024 / 1024 / 1024

cAdvisor metrics

cAdvisor is component integrated in kubernetes’ kubelet that exposes container-level metrics that include CPU and memory usage for each container deployed on the kubernetes cluster. Below are the expressions used with those metrics:

  • container_memory_usage_bytes{pod=~"<POD NAME>"} / 1024 / 1024
  • rate(container_cpu_usage_seconds_total{pod=~"<POD NAME>"}[5m]) * 1000

Ingestion

We’ll be ingesting spans using a tracing generation script to Tempo Coordinator at 3 different ingested spans per second rates and observe how will that affect the CPU and memory usage:

  • 100 spans/sec
  • 500 spans/sec
  • 1000 spans/sec

Setup

First, deploy cos-lite and grafana-agent-k8s

juju deploy cos-lite --channel=latest/edge --trust
juju deploy grafana-agent-k8s --channel=latest/edge

Deploy tempo-bundle

Note that there are several modes of operation for the tempo cluster:

  • Monolithic mode
  • Distributed microservices mode

Both modes will be used to run the same set of loads.

Deploy Tempo in Monolithic Mode

git clone https://github.com/canonical/tempo-bundle.git
cd tempo-bundle
tox -e render-bundle
juju deploy ./bundle.yaml --trust

Deploy Tempo in Microservices Mode

git clone https://github.com/canonical/tempo-bundle.git
cd tempo-bundle
tox -e render-bundle -- --mode=recommended-microservices
juju deploy ./bundle.yaml --trust

In order for the Tempo cluster to fully work, you need to integrate with an S3-like object storage like here.

Scrape Microk8s metrics

Currently, Prometheus charm does not have a mechanism to scrape cAdvisor metrics. In order to collect those metrics, modify Prometheus’s configuration and add the below scrape job:

- job_name: kubernetes-cadvisor
  scheme: https
  tls_config:
    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    insecure_skip_verify: true
  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  kubernetes_sd_configs:
  - role: node
  relabel_configs:
  - action: labelmap
    regex: __meta_kubernetes_node_label_(.+)
  - target_label: __address__
    replacement: kubernetes.default.svc.cluster.local:443
  - source_labels: [__meta_kubernetes_node_name]
    regex: (.+)
    target_label: __metrics_path__
    replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor

Integrate

jhack imatrix fill

Results

Note that the CPU and memory requests values would be for the highest resource consuming container in the pod, which is charm in coordinator and tempo in worker.

Monolithic mode

100 spans/sec

CPU (min-max millicore) RAM (min-max MiB)
Worker ~10-30 ~220-497
Coordinator ~2.4-9.2 ~107-110

500 spans/sec

CPU (min-max millicore) RAM (min-max MiB)
Worker ~25-82 ~172-584
Coordinator ~3-10 ~90-107

1000 spans/sec

CPU (min-max millicore) RAM (min-max MiB)
Worker ~34-100 ~186-684
Coordinator ~4-12 ~86-118

Microservices mode

100 spans/sec

CPU (min-max millicore) RAM (min-max MiB)
Compactor ~6-48 ~117-445
Ingester ~5.6-10.6 ~145-183
Distributor ~4-8 ~99-103
Querier ~2-4.5 ~115-121
Metrics Generator ~1-3.7 ~70
Query Frontend ~1.8-4.5 ~94-99
Coordinator ~2.6-15.8 ~94-100

500 spans/sec

CPU (min-max millicore) RAM (min-max MiB)
Compactor ~4-60 ~114-528
Ingester ~7.5-14 ~126-187
Distributor ~6-11 ~93-99
Querier ~3-6 ~112-117
Metrics Generator ~1-4 ~64-68
Query Frontend ~1.7-5 ~89-98
Coordinator ~3-16 ~92-100

1000 spans/sec

CPU (min-max millicore) RAM (min-max MiB)
Compactor ~7-66 ~116-550
Ingester ~10-17 ~127-190
Distributor ~9-12 ~88-97
Querier ~1.6-5 ~107-115
Metrics Generator ~1.2-4 ~62-67
Query Frontend ~4-12 ~79-93
Coordinator ~3-20 ~88-97

Observations

  • Most resource-consuming components are the compactor and ingester. Shifting from monolithic to microservices mode clearly showed the resource spikes for each component separately.
  • Per-pod resource consumption is a bit lower in microservices mode than in monolithic mode. However, the overall resource consumption is higher.
  • Coordinator CPU and memory usage is not highly affected by the rate of spans ingested per second.

Compactor

The Compactor is responsible for most of the memory and CPU consumption amongst the Tempo workers and there’s a significant gap between the min and max CPU/memory consumed during a compactor’s lifecycle and that is due to its compaction cycle of Tempo as shown below:

CPU

Memory

Ingester

The second most CPU consuming component is the Ingester and in this microservices mode, there are 3 units of the ingester getting the workload distributed among them as this is the recommended microservices deployment. The spikes occur when the ingester cuts a block (i.e when memory-buffered data are batched and the batch reaches a predefined size or a time threshold is met and is now ready to be flushed to the backend).

Conclusions

The values chosen below for CPU and memory are values to be used as a request not as a limit to the pod’s needed resources.

They are sensible values to what the workload might need and they don’t guarantee that the workload will not ever need more resources.

Coordinator

From the above observations, the coordinator is not highly affected by the mode of operation and the rate of ingested spans per second.

From the above results, the CPU consumption varies between 2-20millicores. A reasonable CPU request for the container might be 50m since its a relatively small value and would cover the coordinator needs for CPU without consuming much of the cluster’s pool.

From the above results, the memory consumption varies between 85-100 MiB. A reasonable Memory request for the container might be 100MiB to cover the pod’s observed minimum needs.

Worker

From the above results, the CPU consumption in monolithic mode varies between 10-100 millicores and 1-70 millicores in the microservices mode. A reasonable CPU request for each workload container might be 50m.

From the above results, the memory consumption in monolithic mode varies between 170-700 MiB and 100-550 MiB in the microservices mode. A 500 MiB request for each pod to cover the compactor spikes seems an overkill, so a reasonable memory request for each workload container might be 200MiB to cover most worker roles needs and the compactor can still request for more RAM during its spikes.

2 Likes