Integrating a charm with COS means:
- having your app’s metrics and corresponding alert rules reach prometheus
- having your app’s logs and corresponding alert rules reach loki
- having your app’s dashboards reach grafana
The COS team is responsible for some aspects of testing, and some aspects of testing belong to the charms integrating with COS.
Tests for the built-in alert rules
Unit tests
- You can use promtool test rulesto make sure they fire when you expect them to fire. As part of the test you hard-code the time series values you are testing for.
- promtool check rules
- cos-tool validate. The advantage of cos-tool is that the same executable can validate both prometheus and loki rules.
- Make sure your alerts manifest matches the output of:
- juju ssh prometheus/0 curl localhost:9090/api/v1/rules | jq -r '.data.groups | .[] | .rules | .[] | .name'.
- juju ssh loki/0 curl localhost:3100/loki/api/v1/rules
 
Integration tests
- A fresh deployment shouldn’t fire alerts, e.g. due to missing past data that is interpreted as 0.
Tests for the metrics endpoint and scrape job
Integration tests
- promtool check metricsto lint the the metrics endpoint, e.g.- curl -s http://localhost:8080/metrics | promtool check metrics.
- For scrape targets: when related to prometheus, and after a scrape interval elapses (default: 1m), all prometheus targets listed in GET /api/v1/targetsshould be"health": "up". Repeat the test with/without ingress, TLS.
- For remote-write (and scrape targets): when related to prometheus, make sure that GET /api/v1/labelsandGET /api/v1/label/juju_unithave your charm listed.
- Make sure that the metric names in your alert rules have matching metrics in your own /metricsendpoint.
Tests for log lines
Integration tests
- When related to loki, make sure your logging sources are listed in:
- GET /loki/api/v1/label/filename/values
- GET /loki/api/v1/label/juju_unit/values.
 
Tests for dashboards
Unit tests
- json lint
Integration tests
- Make sure the dashboards manifest you have in the charm matches juju ssh grafana/0 curl http://admin:password@localhost:3000/api/search.
Remove duplications
Multiple grafana-agent apps related to the same principle
Charms should use limit: 1 for the cos-agent relation (example), but this cannot be enforced by grafana-agent itself.  You can confirm this is the case with jq:
juju export-bundle | yq -o json '.' | jq -r '
  .applications as $apps |
  .relations as $relations |
  $apps
  | to_entries
  | map(select(.value.charm == "grafana-agent")) | map(.key) as $grafana_agents |
  $apps     
  | to_entries
  | map(.key) as $valid_apps |     
  $relations                      
  | map({
      app1: (.[0] | split(":")[0]),                                                 
      app2: (.[1] | split(":")[0])                                  
    })          
  | map(select(                     
      ((.app1 | IN($grafana_agents[])) and (.app2 | IN($valid_apps[]))) or
      ((.app2 | IN($grafana_agents[])) and (.app1 | IN($valid_apps[])))
    ))
  | map(if .app1 | IN($grafana_agents[]) then .app2 else .app1 end) 
  | group_by(.) 
  | map({app: .[0], count: length}) 
  | map(select(.count > 1))
'
If the same principal has more than one cos-agent relation, you would see output such as:
[
  {
    "app": "openstack-exporter",
    "count": 2
  }
]
Otherwise, you’d get:
jq: error (at <stdin>:19): Cannot iterate over null (null)
(which is good).
You can achieve this also using the status yaml. Save the following script to is_multi_agent.py and run with:
juju status --format=yaml | ./is_multi_agent.py
If there is a problem, you would see output such as:
openstack-exporter/19 is related to more than one grafana-agent subordinate: {'grafana-agent-container', 'grafana-agent-vm'}
#!/usr/bin/env python3
import yaml, sys
status = yaml.safe_load(sys.stdin.read())
# A mapping from grafana-agent app name to the list of apps it's subordiante to
agents = {
    k: v["subordinate-to"]
    for k, v in status["applications"].items()
    if v["charm"] == "grafana-agent"
}
# print(agents)
for agent, principals in agents.items():
    for p in principals:
        for name, unit in status["applications"][p].get("units", {}).items():
            subord_apps = {u.split("/", -1)[0] for u in unit["subordinates"].keys()}
            subord_agents = subord_apps & agents.keys()
            if len(subord_agents) > 1:
                print(
                    f"{name} is related to more than one grafana-agent subordinate: {subord_agents}"
                )
Grafana-agent related to multiple principles on the same machine
The grafana-agent vm charm can only be related to one principal in the same machine.
Save the following script to is_multi.py and run with:
juju status --format=yaml | ./is_multi.py
If there is a problem, you would see output such as:
ga is subordinate to both 'co', 'nc' in the same machines {'24'}
#!/usr/bin/env python3
import yaml, sys
status = yaml.safe_load(sys.stdin.read())
# A mapping from grafana-agent app name to the list of apps it's subordiante to
agents = {
    k: v["subordinate-to"]
    for k, v in status["applications"].items()
    if v["charm"] == "grafana-agent"
}
for agent, principals in agents.items():
    # A mapping from app name to machines
    machines = {
        p: [u["machine"] for u in status["applications"][p].get("units", {}).values()]
        for p in principals
    }
    from itertools import combinations
    for p1, p2 in combinations(principals, 2):
        if overlap := set(machines[p1]) & set(machines[p2]):
            print(
                f"{agent} is subordinate to both '{p1}', '{p2}' in the same machines {overlap}"
            )
Additional thoughts
- A rock’s CI could dump a record of the /metricsendpoint each time the rock is built. This way some integration tests could turn into unit tests.