Using the Grafana Agent Machine Charm

Metadata
Key Value
Summary Using the Grafana Agent Machine Charm
Categories integrations
Difficulty 3
Author Jose Massón

Preview

Initial situation

Let’s assume we have an application that:

  • Is running in a virtual machine or in a regular machine.
  • Is managed with Juju by its own Charmed Operator.
  • Needs to be observed (monitored).

Desired situation

We would like to collect telemetry from the charmed application into COS Lite, but COS Lite runs in Kubernetes and the application in a machine.

So the question arises: How can we connect both worlds? The answer is simple: use this Grafana Agent machine charm!

The Grafana Agent machine charm

The Grafana Agent machine charm handles installation, configuration, and Day 2 operations specific to Grafana Agent, using Juju and the Charmed Operator Lifecycle Manager (OLM).

This charm was designed to run in virtual machines as a subordinate. Application units are typically run in an isolated container on a machine with no knowledge or access to other applications deployed onto the same machine. With subordinate charms, units of different applications to be deployed into the same container and to have knowledge of each other. Subordinate units scale together with their principal.

Step 0: Understanding the desired situation

As we said before, we are going to use COS Lite to monitor our application, and Grafana Agent machine charm to send application’s telemetry (metris, logs and dashboards)

For that we will have two Juju models:

  • applications, on a lxd controller, for our application and grafana agent
  • cos, on a k8s controller, for COS Lite

Step 1: Make sure COS Lite is up and running.

We have to make sure that the observability stack is up and running in our cos model (follow the instructions in COS Lite) in a K8s controller, like this:

Note that COS Lite is offering several interfaces so we can create cross-model relations with applications that are running in different controllers/models:

Offer                            Application   Charm             Rev  Connected  Endpoint              Interface                Role
alertmanager-karma-dashboard     alertmanager  alertmanager-k8s  73   0/0        karma-dashboard       karma_dashboard          provider
grafana-dashboards               grafana       grafana-k8s       80   0/0        grafana-dashboard     grafana_dashboard        requirer
loki-logging                     loki          loki-k8s          87   0/0        logging               loki_push_api            provider
prometheus-receive-remote-write  prometheus    prometheus-k8s    125  0/0        receive-remote-write  prometheus_remote_write  provider
prometheus-scrape                prometheus    prometheus-k8s    125  0/0        metrics-endpoint      prometheus_scrape        requirer

Step 2: Instrument Grafana Agent in our Application’s Charmed Operator

In this example we use COS Lite to observe Zookeeper.

In order to instrument it, we will have to:

  • Obtain cos_agent lib from grafana-agent charm:

    charmcraft fetch-lib charms.grafana_agent.v0.cos_agent
    
  • Modify only 2 files in your machine charm code: metadata.yaml, src/charm.py.

    In metadata.yaml we need to add to provides section:

      cos-agent:
        interface: cos_agent
    

    In src/charm.py we import the library:

    from charms.grafana_agent.v0.cos_agent import COSAgentProvider
    

    and instantiate COSAgentProvider object in __init__ method. For zookeeper it may look like this:

            self._grafana_agent = COSAgentProvider(
                self,
                metrics_endpoints=[
                    {"path": "/metrics", "port": NODE_EXPORTER_PORT},
                    {"path": "/metrics", "port": JMX_PORT},
                    {"path": "/metrics", "port": METRICS_PROVIDER_PORT},
                ],
                metrics_rules_dir="./src/alert_rules/prometheus",
                logs_rules_dir="./src/alert_rules/loki",
                dashboard_dirs=["./src/grafana_dashboards"],
                log_slots=["charmed-zookeeper:logs"],
            )
    

    Note that you can specify the paths where metrics and logs alert rules and dashboards files are stored. In order to know how alert rules and dashboards are written, check these examples.

  • Re-pack the charm:

    charmcraft pack
    

and voilà!

Step 3: Deploy Zookeeper and Grafana Agent in our VM Juju model.

In order to deploy these applications make sure you already have a VM Juju controller bootstrapped.

In this controller we will create a new model named for instance applications and deploy (and relate) zookeeper and grafana-agent.

  • Create the model:

    $ juju add-model applications
    Added 'applications' model on localhost/localhost with credential 'localhost' for user 'admin'
    
  • Deploy Zookeeper using the previous packed *.charm file:

    $ pwd
    /home/ubuntu/repos/zookeeper-operator
    
    $ juju deploy ./*.charm zookeeper
    Located local charm "zookeeper", revision 0
    Deploying "zookeeper" from local charm "zookeeper", revision 0 on jammy
    
  • Deploy Grafana Agent machine charm

    juju deploy grafana-agent --channel edge
    

    After running these commands the status of the model will be active/idle for both units:

    At this point we have one zookeeper unit in active state, and there is no grafana-agent units. This is because grafana-agent is a subordinate application.

  • Relate zookeeper to grafana-agentover the cos-agent relation:

    juju relate zookeeper:cos-agent grafana-agent
    

    Once the relation is established, and grafana-agent is deployed inside zookeeper unit, the status of the model will be:

    Note that despite of the fact at this point grafana-agent is collecting telemetry data from zookeeper it is not forwarding them to the COS Lite deployment we have in the K8s controller.

Step 4: Relate Grafana Agent to COS-Lite (Prometheus, Loki and Grafana)

Since Grafana Agent is meant to send telemetry to COS-Lite, the next step is to relate Grafana Agent to the COS Lite components: Prometheus, Loki and Grafana. This charm need these three relations.

From the model our application is running, we can verify the offers COS Lite is exposing:

$ juju find-offers -m microk8s:cos

Store     URL                                        Access  Interfaces
microk8s  admin/cos.loki-logging                     admin   loki_push_api:logging
microk8s  admin/cos.prometheus-receive-remote-write  admin   prometheus_remote_write:receive-remote-write
microk8s  admin/cos.prometheus-scrape                admin   prometheus_scrape:metrics-endpoint
microk8s  admin/cos.alertmanager-karma-dashboard     admin   karma_dashboard:karma-dashboard
microk8s  admin/cos.grafana-dashboards               admin   grafana_dashboard:grafana-dashboard

As we said before, we will use only three of these offers:

  • Prometheus: admin/cos.prometheus-receive-remote-write
  • Loki: admin/cos.loki-logging
  • Grafana: admin/cos.grafana-dashboards

The first step to use these offers is to consume them:

$ juju consume microk8s:admin/cos.prometheus-receive-remote-write
Added microk8s:admin/cos.prometheus-receive-remote-write as prometheus-receive-remote-write
$ juju consume microk8s:admin/cos.loki-logging
Added microk8s:admin/cos.loki-logging as loki-logging
$ juju consume microk8s:admin/cos.grafana-dashboards
Added microk8s:admin/cos.grafana-dashboards as grafana-dashboards

Once these commands are executed, the status of our model will change slightly:

Note that in the status we now have a new section named SAAS. In that section we can see all the interfaces offered by other applications running in other models that we can integrate to.

So now let’s relate Grafana Agent with these 3 applications:

 juju relate grafana-agent prometheus-receive-remote-write
 juju relate grafana-agent loki-logging
juju relate grafana-agent grafana-dashboards

And the three new relations are established, see the relations sections of the model status:

Note that because of this bug you can’t create a new offer after relating an existing one. So first create all offers you intend to consume, and then relate all endpoints. Otherwise juju will complain with: ERROR cannot update application offer "<app>": application endpoint "<endpoint>" has active consumers

Step 5: Verify that metrics and logs reach Prometheus and Loki

Now that the Cross Model Relations are established between our application model and our Observability model, we can easily verify that the metrics zookeeper exposes reaches Prometheus:

$ curl -s http://192.168.122.10/cos-prometheus-0/api/v1/query\?query\=zookeeper_DataDirSize | jq
{
  "status": "success",
  "data": {
    "resultType": "vector",
    "result": [
      {
        "metric": {
          "__name__": "zookeeper_DataDirSize",
          "instance": "applications_f201dfb6-896c-4d5e-83c0-55e6bb8b08f3_zookeeper_zookeeper/0",
          "job": "zookeeper_1",
          "juju_application": "zookeeper",
          "juju_model": "applications",
          "juju_model_uuid": "f201dfb6-896c-4d5e-83c0-55e6bb8b08f3",
          "juju_unit": "zookeeper/0",
          "memberType": "Leader",
          "replicaId": "1"
        },
        "value": [
          1678971938.463,
          "67108880"
        ]
      }
    ]
  }
}

We can also check that the logs are being sent to Loki, in this case using Grafana:

And finally, we can verify the dashboards Zookeepers provides:

5 Likes

Our team in Dwellir (https://dwellir.com) is following and testing this out as we speak. Very much interesting! I’d love to have a demo of this in the community this week or the next if you would be able to show up as a participant @jose ?

Hi @erik-lonroth!

Thanks for the comment!

Since this documentation as well as the charm itself are still in a WIP state, we were thinking in organising a community demo in 2/3 weeks once the charm is in a more mature state.

I’ll let you know when we are ready!

1 Like

Absolutely. We will continue to explore it and we have purchased a few servers actually that we hope to build a COS-Light core out of and start relate machine-charms into. Looking forward to start using all this.

1 Like

So, we have now a separate, physical, server installed with Ubuntu on a separate subnet from our production networks.

We plan to use this for the COSlight installation, but need to add the microk8 cloud in to our production LXD juju controller that handles multiple LXD clouds.

We plan to do Cross Model Relations from our LXC models/charms with the grafana agent as soon as it is possible.

Anything we should be aware of?

So, we have now a separate, physical, server installed with Ubuntu on a separate subnet from our production networks.

We plan to use this for the COSlight installation, but need to add the microk8 cloud in to our production LXD juju controller that handles multiple LXD clouds.

We plan to do Cross Model Relations from our LXC models/charms with the grafana agent as soon as it is possible.

Anything we should be aware of?

Hi @erik-lonroth

Sorry for the late response.

I do not know if I understand the question, but keep in mind enabling metallb in microk8s, so Traefik can use one of the external addresses:

For instance

sudo microk8s enable metallb 192.168.122.10-192.168.122.100

Grafana agent charm needs to be able to reach this addresses.

1 Like

Should we mention anywhere that only one subordinate per principal is supported? Kind of stating the obvious perhaps, but still.

https://github.com/canonical/grafana-agent-k8s-operator/issues/144

Should we mention anywhere that only one subordinate per principal is supported? Kind of stating the obvious perhaps, but still.

https://github.com/canonical/grafana-agent-k8s-operator/issues/144

Yes, absolutely.

I kind think we might want to avoid putting that idea in to anyone’s head.

1 Like

Hey @jose! Now that the grafana agent is more stable and being used more widely, could you check that the tutorial is up to date? There are some mentions to the old interface name:

  cos-machine:
    interface: cos_machine

Thanks!

1 Like

After testing this out on charm, I found out that pydantic is also required, since it’s used in cos_agent.

1 Like

@zmraul Done! :wink:

1 Like

Thanks! Done! :wink:

I’m trying to understand how to define which logs you’d like to have forwarded to loki in this situation. I see:

log_slots: Snap slots to connect to for scraping logs in the form ["snap-name:slot", ...].

Does this mean you can only send logs to loki if your client application is using snaps?

log_slots is only for snaps but it isn’t the only way to get logs. We automatically scrape all of /var/log and the journal as well

So if you write your logs to /var/log/..., they will get scraped

1 Like

Is this still true?

I’m working on a machine charm for testing out all this https://github.com/erik78se/juju-operators-examples/tree/main/observed

… and I have used this guide: Charmhub | Using the Grafana Agent Machine Charm

… which unfortunately is full of outdated information and will not work. Its a great guide once its updated. @tmihoc

I can help there once I’ve figured out what goes on here with my cross-model-integrations: Cross Model Integration COS light lxd-plus-microk8 makes juju error - juju - Charmhub

It is, but if you use a modern charmcraft which supports PYDEPS directives, it should be handled for you.

1 Like

Hi @erik-lonroth!

May you mention which information you found outdated so we can fix it?

Thanks!

Jose

Sure, I can try, but its likely not complete…

  • Wrong class names in charm.py. (COSAgentProvider vs COSMachineProvider)
  • Missing explicit references to dict/json elements for alert-rules and dashboards. How can I know how to manipulate the paths where these files are?)
  • pydantic inclusion in requirements.txt doesn’t seem to be needed.
  • No mention that the traefik edge version is needed with COS light to work.
  • No examples for any rules for prometheus/loki
  • No mention that “ALL THREE” relations needs to be established between the grafana-agent and COS or the charms will not work (There is no such information from the deploy/messages).
  • The reference to loki-logging endpoint seems to have changed name to only “logging” ?
  • No mention if/or what happens when changes to the dashboards/rules are modify as part of a charm upgrade… Will they change/update or do an operator need to change them?
  • more?