Hi everyone!
Below are the team’s updates for weeks 11 and 12. First, as always, let me introduce the fantastic team and what we’re building.
The Team
The observability team at Canonical consists of Dylan, Jose, Leon, Luca, Pietro, Ryan, and Simme. Our goal is to provide you with the best open-source observability stack possible, turning your day-2 operations into smooth sailing.
COS Lite
COS Lite is a light-weight, highly-integrated observability suite, powered by python operators and running on Juju. Find more information on charmhub or go straight to github.
The Work
Who is watching the watchmen?
As some of you may remember, I posted an article on my blog about monitoring your observability platform last September, which received quite a lot of attention, making it all the way onto the Hackernews frontpage. What I discussed in this article was the need to monitor the health of our alerting pipeline, making sure we are notified as soon as there are any hiccups in our alerting pipeline.
Let me point out that I’m aware that there are other projects which fill similar needs, like the commercial Dead Man’s Snitch and the open source Dead Man’s Switch; there really weren’t any projects available were both:
- Fully open source
- Actively maintained
After discussing internally and making sure we completed work-in-progress of even higher priority, we’ve finally had an opportunity to start working on bridging this gap for the COS Lite stack. The result of this effort is something we’ve decided to call the COS Alerter. The approach itself is not novel in any way and is made up of a small, portable watchdog service, which expects to be pinged continuously by an always-firing alert rule.
While this project is not really ready for use yet, we hope to have it available for initial trials in a couple of weeks. If you have thoughts or opinions about how this is best accomplished - let us know in the project repository!
Observing the Data Platform
Together with the Data Platform team, we’re making strides in how we efficiently and effortlessly monitor Data Platforms. The first step on this journey was the Grafana Agent machine charm, making it possible to observe machine charms in general. The next step has been to integrate this work with the Kafka machine charm.
Our findings so far suggest that this combination is both powerful and convenient to use. Integrating your own charms with the machine charm is fairly straightforward following this how-to guide written by José.
Dashing LMA Dashboards
Last but not least, we’ve been hard at work making sure that the existing LMA Dashboards, to the extent possible, are cross-compatible with the COS stack when related over the COS Proxy. If you encounter dashboards that do not work as intended, raise an issue in the proxy repo and we’ll either fix it or try to provide you with options.
Feedback welcome
As always, feedback is very welcome! Feel free to let us know your thoughts, questions, or suggestions either here or on the CharmHub Mattermost.
That’s all for this time! See you again in two weeks!