We are happy to announce a new feature for generic HostHealth (up/absent) alert rules in Prometheus and Grafana Agent!
This alleviates charm authors from having to implement their own HostHealth rules per charm and reduces implementation error.
How does it work?
See this explanation doc for further implementation details.
When we relate a metrics provider (e.g. some server) to prometheus, we expect prometheus to post an alert if the server is not responding. With prometheus’s PromQL this can be expressed universally with up and absent expressions:
up < 1absent(up)
Instead of having every single charm in the ecosystem duplicate the same alert rules, they are automatically generated by the prometheus_scrape, prometheus_remote_write, and cos_agent charm libraries.
Upgrade Notes
Charm revisions:
By fetching the new libraries you would get a set of new alerts automatically. If charms already had up/absent alerts, this will result in duplication of alerts and rules. These alerts are ubiquitous and are handled by the Prometheus prometheus_scrape, prometheus_remote_write, and cos_agent libraries. Any custom alerts duplicating this behaviour can be removed.
References
- Support for adding generic alerts is centralized in
cos-liballowing the Prometheus libraries to consume this functionality (link to PR #115, PR #117). - The Prometheus
prometheus_scrapeandprometheus_remote_writelibraries inject genericup/absentrules to the existing rule set (link to PR #660). - The Grafana-agent
cos_agentlibrary injects genericup/absentrules to the existing rule set (link to PR #232). - The semantics of the
upalert rules is formalized in an architecture decision record (ADR). (link to PR #224).