Observability Core team updates - weeks 13 - 14 (pulse #7 of 2026)

crucible · 13 April 2026 17:53

Hello, Hola, Ciao, Hej, Hallo, Witam, مرحباً بكم ,درود, שלום everyone!

Here are some of the works of the mighty Observability Core team from weeks 13 and 14 of 2026, corresponding to pulse 7.

The Team

As always, a quick reminder on our team members! On the Core team we have:

Together with the Tracing and Profiling and Service Mesh teams, we combine to create the greater Observability team, managed by Simme.

The Work

Prometheus scrape_configs with capture groups caused the charm to go into error state which is now fixed in track/2 thanks to this patch.
juju_unit was missing in metrics of Alertmanager, Grafana, Loki, Prometheus; fixed in this patch.
We were busy with SRR planning this pulse and are excited to work on:
- “Day-2 ad-hoc threshold adjustment for admins”
- “Streamline observability integrations for juju-controller (VM + K8s)”
- “Terraform-first end-to-end charm lifecycle”
Thank you abilash-p (Abi) · GitHub for contributing the pebble_service label in Loki-exported Pebble logs feature.
The newly contributed OTLP interface in charmlibs is available for charms which want to send or receive OTLP data received a few major improvements regarding Rules in relation databags.
The opentelemetry-collector charms are now searchable on charmhub.
The GrafanaDashboardProvider now allows you to load dashboards recursively

Feedback welcome

As always, feedback is very welcome! Feel free to let us know your thoughts, questions, or suggestions either here or on our Matrix channel .

That’s all for this time! See you again in two weeks!

arshiarasekhi · 19 April 2026 14:22

Thanks for the detailed update! The fix for scrape_configs Capturing groups causing the Prometheus charm to enter error state is a good catch. That kind of edge case in regex handling can be tricky to track down. Really interesting to see the OTLP interface in charmlibs maturing with better Rules support in relation to databags; cleaner inter-charm telemetry flow makes a big difference for operators managing complex deployments.

The upcoming “Day-2 ad-hoc threshold adjustment for admins” work sounds particularly useful. One thing I’ve noticed when exploring the COS stack is that post-deployment alert tuning is often where operators spend a lot of time. Looking forward to seeing how that shapes up.

Coming from a Python/Docker background and currently studying data analysis, I’ve been diving into the Charmed Observability Stack to better understand how production-grade monitoring is built on top of Juju. Exciting work, keep it up!