Using the Grafana Agent Machine Charm

jose · 25 May 2023 03:46

Do not worry, it’s like this charm and documentation that are not stable yet

FIXED: Typo from a previous doc version.

FIXED.

FIXED: Was there from a previous doc version.

FIXED

FIXED.

FIXED. It was there. Now is more explicit.

loki-logging is the name of the offer provided by this overlay and logging is the endpoint. You can change this name by using a custom overlay.

The idea is to maintain the same behaviour as libraries provided by Prometheus, Loki and Grafana charms. So, dashboards and rules will be updated if they change in the new version of the charm. What happens to these files in an upgrade process is a little bit out of the scope of this HOW-TO, it is more for a REFERENCE doc.

If you have more comments, please share them. There is no need to ask rhetorical questions anyway.

erik-lonroth · 26 May 2023 07:03

Thanx alot for the updates. I’ve managed to getting it to work and the changes you made are also good. Thanx for the effort put in to it. This is extremely valuable.

We are producing some internal documents that are made to cover the setup of a local development environment. We could share this with you once we have tested it out on some more members of our team.

We have some thoughts about how to set a COSlLight stack up which I’ll be sharing in a separate post.

erik-lonroth · 26 May 2023 08:31

Is there a reason this overlay also offers the prometheus scraping endpoint cos-lite-bundle/overlays/offers-overlay.yaml at b014892672258f1d4c9d88e4bfd413a17ca71c5d · canonical/cos-lite-bundle · GitHub? That’s not in the docs…

jose · 26 May 2023 14:56

Yes, because Prometheus support two ways of getting metrics:

PULL (AKA scrape)
PUSH (AKA remote write)

So, with these options you can have Prometheus in COS-Lite scraping metrics or receiving metrics.

erik-lonroth · 27 May 2023 10:36

Ah, I see! This is also very good to mention. We only offer remote-write as of now but will for sure expand this to allow for the PULL (crape) method as well.

dylanstathis · 27 May 2023 11:54

Actually, we have recently discussed discontinuing support for scraping cross model. This is mostly because of issues with routing. Most likely the scrape endpoint will be removed from the overlay. You will, of course, still be able to create an offer but it will not be the recommended method.

erik-lonroth · 27 May 2023 12:55

I don’t understand why routing would be a concern for Juju since this anyway is a networking related issue from the start? Am I missing out on something here? What are those issues you refer to?

0x12b · 29 May 2023 09:45

The reason is actually not as much technical as it is about user experience. Getting network topology right is hard, especially so when you have tens, hundreds, or even thousands of remote models to scrape and observe.

By saying “cross-model metrics will always be pushed by an agent rather than scraped by Prometheus”, we invert the data flow and move from N firewall configs that need to be properly setup to one: the one that goes into COS.

erik-lonroth · 29 May 2023 11:34

@0x12b - I get that totally, but why discontinue a feature which would make alot of sense in the cases where PULL/PUSH have different implications on how metrics are collected?

I mean, why not support both methods rather than confining the solution to a single method?

erik-lonroth · 31 May 2023 16:03

So, this guide is getting really good. But there is this missing piece as how to add and test the “ALERTING” part of the COS.

I’m starting to explore how this would work with the library and also with prometheus and loki which isn’t covered by the guide - which is definitely needed.

How to setup some initial alert-rule for loki, prometheus + using the juju-topology with this.
How to test the alerts.
Possibly some hints as how to integrate with - lets say - pagerduty, webhooks or whatever.
How can I monitor lets say an individual UNIT as opposed to a whole APPLICATION in the alert rules? I’m fighting with understanding how much I should add for the rules as opposed to what magic juju (-topology) adds to the rules.

This would be fairly complete if this is covered here @tmihoc @jose

I’ll be happy to assist in peering the whole thing.

erik-lonroth · 16 June 2023 11:29

alesstimec · 3 July 2023 07:51

Hi…

It seems pydantic 2 is out now… And cos_agent does not build with 2, so we have to fix pydantic version to something like 1.10.10…

jose · 3 July 2023 11:42

Yes!! Let’s follow this issue here: pydantic 2.0 dependency · Issue #214 · canonical/grafana-agent-k8s-operator · GitHub

gbeuzeboc · 21 August 2023 14:51

In the “Step 2”, we are supposed to get the cos_agent from the grafana_agent with the CLI:

charmcraft fetch-lib charms.grafana_agent.v0.cos_agent

This command only pulls the file “lib/charms/grafana_agent/v0/cos_agent.py” There is no such files as: metadata.yaml and src/charm.py that we are supposed to modify. I cannot proceed with this tutorial.

Would anyone know if I missed something or if this tutorial needs to be updated?

jose · 21 August 2023 19:11

Hi @gbeuzeboc

Yes, charmcraft fetch-lib charms.grafana_agent.v0.cos_agent only fetch the lib you need to add to your charm.

In Step 2 you can see that you need to add to metadata.yaml:

  cos-agent:
    interface: cos_agent

and this in charm.py:

        self._grafana_agent = COSAgentProvider(
            self,
            metrics_endpoints=[
                {"path": "/metrics", "port": NODE_EXPORTER_PORT},
                {"path": "/metrics", "port": JMX_PORT},
                {"path": "/metrics", "port": METRICS_PROVIDER_PORT},
            ],
            metrics_rules_dir="./src/alert_rules/prometheus",
            logs_rules_dir="./src/alert_rules/loki",
            dashboard_dirs=["./src/grafana_dashboards"],
            log_slots=["charmed-zookeeper:logs"],
        )

Is there anything missing??

gbeuzeboc · 22 August 2023 12:20

@jose, I am not sure where the metadata.yaml and charm.py come from. Are they the ones from the Zookeeper charm? I was expecting a complete bundle to be deployed on the client machine, with zookeeper and the agent.

gbeuzeboc · 22 August 2023 13:00

I guess that is it, we have to modify the zookeeper charm to make it connectible to the cos-agent and then deploy the new zookeeper charm and the grafana-agent. Sorry this wasn’t clear to me.

jose · 22 August 2023 13:01

@gbeuzeboc,

Perhaps I’m missing something, but the metadata.yaml and charm.py belongs to the charm you want to integrate to grafana-agent charm.
Zookeeper is just the charm we use as a guinea pig to explain how to integrate Grafana Agent into any machine charm

gbeuzeboc · 22 August 2023 13:07

Thank you for your reply, for now I am just learning about the COS so in my case I will use the zookeeper charm for the example. I wasn’t expecting that the charm we wanted to monitor needed to be adapted to the grafana-agent. Now I understand, thank you for your replies.

erik-lonroth · 27 September 2023 16:44

@gbeuzeboc - we are also using the machine version of the grafana-agent and also have a COSlite stack up which @marcus has been spearheading from our end (Dwellir). Let us know if we can help or collaborate. We have ran in to several challenges but we’ll get there.