After adding scrape or remote write relations to prometheus, it may take juju some time to settle the model. After the model is settled, some data you expect to have may be missing (scrape jobs / alert rules).
Checklist
juju status --relations
includes all the relations you expect to have, and if you have cross-model relations, then the SAAS section has a non-zero count for prometheus relations.
Rules and scrape targets are listed in prometheus
Query a prometheus unit ip to confirm all the rules are configured:
curl 10.1.207.168:9090/api/v1/rules | jq
and scrape jobs are healthy:
curl 10.1.207.168:9090/api/v1/targets \
| jq '.data.activeTargets | .[] | {scrapeUrl, health}'
If something is missing, proceed to the next section.
Rules are present on prometheus filesystem
Rule files have topology information in their filename. Confirm you have everything you expect to find:
juju ssh --container prometheus prom/0 ls /etc/prometheus/rules
If something is missing, proceed to the next section.
Rules and scrape jobs are listed in relation data
To the relation data that is incoming into prometheus, you can use show-unit
:
juju show-unit --format json prom/0 \
| jq '."prom/0"."relation-info"'
To filter out all relations except the cross model relations,
juju show-unit --format json prom/0 \
| jq '."prom/0"."relation-info" | .[] | select(."cross-model" == true)'
To further filter out all relations except the ones related to the “receive-remote-write” relation,
juju show-unit --format json prom/0 \
| jq '."prom/0"."relation-info" | .[] | select(."cross-model" == true) | select(.endpoint == "receive-remote-write")'
To inspect all the scrape jobs coming in via the metrics-endpoint relation,
juju show-unit --format json prom/0 \
| jq -r '."prom/0"."relation-info" | .[] | select(.endpoint == "metrics-endpoint")."application-data"."scrape_jobs"'
Similarly for alert rules:
juju show-unit --format json prom/0 \
| jq -r '."prom/0"."relation-info" | .[] | select(.endpoint == "metrics-endpoint")."application-data"."alert_rules"'
For convenience you could use a function:
app_data () {
# Usage examples:
# app_data prom/0 receive-remote-write alert_rules
# app_data prom/0 metrics-endpoint scrape_jobs
# $1 = unit, e.g. prom/0
local UNIT="$1"
# $2 = endpoint, e.g. receive-remote-write
local ENDPOINT="$2"
# $3 = app relation data key, e.g. scrape_jobs (optional)
local KEY="$3"
if [[ $# -eq 3 ]]; then
juju show-unit --format json $UNIT \
| jq -r ".\"$UNIT\".\"relation-info\" | .[] | select(.endpoint == \"$ENDPOINT\").\"application-data\".\"$KEY\""
elif [[ $# -eq 2 ]]; then
juju show-unit --format json $UNIT \
| jq -r ".\"$UNIT\".\"relation-info\" | .[] | select(.endpoint == \"$ENDPOINT\").\"application-data\""
else
echo "Illegal number of parameters" >&2
fi
}
many-to-many matching not allowed: matching labels must be unique on one side
This is a PromQL error message that shows up when an aggregation operation such as on
produces timeseries with non-unique label set.
For example:
(ceph_pool_bytes_used{}) *on (pool_id) group_left(name)(ceph_pool_metadata{})
Checklist
- No redundant telemetry relations in place. Make sure your application has only one of the following relations:
grafana-agent:juju-info
grafana-agent:cos-agent
cos-proxy:prometheus-target