We are happy to announce a new feature for generic HostHealth (up/absent) alert rules in Prometheus and Grafana Agent!
This alleviates charm authors from having to implement their own HostHealth
rules per charm and reduces implementation error.
How does it work?
See this explanation doc for further implementation details.
When we relate a metrics provider (e.g. some server) to prometheus, we expect prometheus to post an alert if the server is not responding. With prometheus’s PromQL this can be expressed universally with up
and absent
expressions:
up < 1
absent(up)
Instead of having every single charm in the ecosystem duplicate the same alert rules, they are automatically generated by the prometheus_scrape
, prometheus_remote_write
, and cos_agent
charm libraries.
Upgrade Notes
Charm revisions:
By fetching the new libraries you would get a set of new alerts automatically. If charms already had up
/absent
alerts, this will result in duplication of alerts and rules. These alerts are ubiquitous and are handled by the Prometheus prometheus_scrape
, prometheus_remote_write
, and cos_agent
libraries. Any custom alerts duplicating this behaviour can be removed.
References
- Support for adding generic alerts is centralized in
cos-lib
allowing the Prometheus libraries to consume this functionality (link to PR #115, PR #117). - The Prometheus
prometheus_scrape
andprometheus_remote_write
libraries inject genericup
/absent
rules to the existing rule set (link to PR #660). - The Grafana-agent
cos_agent
library injects genericup
/absent
rules to the existing rule set (link to PR #232). - The semantics of the
up
alert rules is formalized in an architecture decision record (ADR). (link to PR #224).