Using the Grafana Agent Machine Charm

@zmraul Done! :wink:

1 Like

Thanks! Done! :wink:

I’m trying to understand how to define which logs you’d like to have forwarded to loki in this situation. I see:

log_slots: Snap slots to connect to for scraping logs in the form ["snap-name:slot", ...].

Does this mean you can only send logs to loki if your client application is using snaps?

log_slots is only for snaps but it isn’t the only way to get logs. We automatically scrape all of /var/log and the journal as well

So if you write your logs to /var/log/..., they will get scraped

1 Like

Is this still true?

I’m working on a machine charm for testing out all this https://github.com/erik78se/juju-operators-examples/tree/main/observed

… and I have used this guide: Charmhub | Using the Grafana Agent Machine Charm

… which unfortunately is full of outdated information and will not work. Its a great guide once its updated. @tmihoc

I can help there once I’ve figured out what goes on here with my cross-model-integrations: Cross Model Integration COS light lxd-plus-microk8 makes juju error - juju - Charmhub

It is, but if you use a modern charmcraft which supports PYDEPS directives, it should be handled for you.

1 Like

Hi @erik-lonroth!

May you mention which information you found outdated so we can fix it?

Thanks!

Jose

Sure, I can try, but its likely not complete…

  • Wrong class names in charm.py. (COSAgentProvider vs COSMachineProvider)
  • Missing explicit references to dict/json elements for alert-rules and dashboards. How can I know how to manipulate the paths where these files are?)
  • pydantic inclusion in requirements.txt doesn’t seem to be needed.
  • No mention that the traefik edge version is needed with COS light to work.
  • No examples for any rules for prometheus/loki
  • No mention that “ALL THREE” relations needs to be established between the grafana-agent and COS or the charms will not work (There is no such information from the deploy/messages).
  • The reference to loki-logging endpoint seems to have changed name to only “logging” ?
  • No mention if/or what happens when changes to the dashboards/rules are modify as part of a charm upgrade… Will they change/update or do an operator need to change them?
  • more?
  • Do not worry, it’s like this charm and documentation that are not stable yet :wink:
  • FIXED: Typo from a previous doc version.
  • FIXED.
  • FIXED: Was there from a previous doc version.
  • FIXED
  • FIXED.
  • FIXED. It was there. Now is more explicit.
  • loki-logging is the name of the offer provided by this overlay and logging is the endpoint. You can change this name by using a custom overlay.
  • The idea is to maintain the same behaviour as libraries provided by Prometheus, Loki and Grafana charms. So, dashboards and rules will be updated if they change in the new version of the charm. What happens to these files in an upgrade process is a little bit out of the scope of this HOW-TO, it is more for a REFERENCE doc.
  • If you have more comments, please share them. There is no need to ask rhetorical questions anyway. :wink:
4 Likes

Thanx alot for the updates. I’ve managed to getting it to work and the changes you made are also good. Thanx for the effort put in to it. This is extremely valuable.

We are producing some internal documents that are made to cover the setup of a local development environment. We could share this with you once we have tested it out on some more members of our team.

We have some thoughts about how to set a COSlLight stack up which I’ll be sharing in a separate post.

1 Like

Is there a reason this overlay also offers the prometheus scraping endpoint https://github.com/canonical/cos-lite-bundle/blob/b014892672258f1d4c9d88e4bfd413a17ca71c5d/overlays/offers-overlay.yaml#L17? That’s not in the docs…

Yes, because Prometheus support two ways of getting metrics:

  • PULL (AKA scrape)
  • PUSH (AKA remote write)

So, with these options you can have Prometheus in COS-Lite scraping metrics or receiving metrics.

1 Like

Ah, I see! This is also very good to mention. We only offer remote-write as of now but will for sure expand this to allow for the PULL (crape) method as well.

Actually, we have recently discussed discontinuing support for scraping cross model. This is mostly because of issues with routing. Most likely the scrape endpoint will be removed from the overlay. You will, of course, still be able to create an offer but it will not be the recommended method.

1 Like

I don’t understand why routing would be a concern for Juju since this anyway is a networking related issue from the start? Am I missing out on something here? What are those issues you refer to?

The reason is actually not as much technical as it is about user experience. Getting network topology right is hard, especially so when you have tens, hundreds, or even thousands of remote models to scrape and observe.

By saying “cross-model metrics will always be pushed by an agent rather than scraped by Prometheus”, we invert the data flow and move from N firewall configs that need to be properly setup to one: the one that goes into COS.

1 Like

@0x12b - I get that totally, but why discontinue a feature which would make alot of sense in the cases where PULL/PUSH have different implications on how metrics are collected?

I mean, why not support both methods rather than confining the solution to a single method?

So, this guide is getting really good. But there is this missing piece as how to add and test the “ALERTING” part of the COS.

I’m starting to explore how this would work with the library and also with prometheus and loki which isn’t covered by the guide - which is definitely needed.

  • How to setup some initial alert-rule for loki, prometheus + using the juju-topology with this.
  • How to test the alerts.
  • Possibly some hints as how to integrate with - lets say - pagerduty, webhooks or whatever.
  • How can I monitor lets say an individual UNIT as opposed to a whole APPLICATION in the alert rules? I’m fighting with understanding how much I should add for the rules as opposed to what magic juju (-topology) adds to the rules.

This would be fairly complete if this is covered here @tmihoc @jose

I’ll be happy to assist in peering the whole thing.

1 Like