The cos-lite
bundle is meant to be run by Juju. However, not all workloads that you may want to monitor do. The good news is that you can use cos-lite
to monitor workloads that are not charmed (aka ‘not managed by Juju’). The bad news is that it’s relatively straightforward to do so. Not bad at all.
Contents:
- Deploy
cos-lite
- Deploy
grafana-agent
- Get the API endpoints from
traefik
- Add custom dashboards and alerts
- TLS
- Known limitations and upcoming features
- Only export metrics with
prometheus-scrape-target
Deploy cos-lite
The first step will be to get a hold of a machine, somewhere, and follow this guide on how to get started with COS lite on microk8s.
And be sure to follow the best practices!
Unless you’re also planning to monitor some charmed applications with this cos-lite deployment, you will not need to use the offers
overlay.
Deploy grafana-agent
The Grafana agent will act as an intermediary between the applications you want to monitor and the cos-lite
stack. It will gather telemetry from your applications and send them to cos-lite
, where you will be able to inspect them through the Grafana dashboards.
We recommend to host the Grafana agent as close as possible to the workloads you intend to monitor, to minimise the risk of network faults and the resulting gaps in telemetry collection. Also, we recommend to install the Grafana agent via a handy snap we maintain:
However, Grafana agent is also available as a single Go binary, and you are free to install it and run it the way you like. See the official documentation for the publisher’s recommendations and guides.
Last but not least, we also have it containerized and petrified.
Now that you have Grafana Agent up and running, you will need to configure it.
Get the API endpoints from traefik
cos-lite
includes a traefik
instance that takes care of load balancing and ingressing the various observability components of the stack. Since cos-lite
runs on Kubernetes, this allows you to talk to them via traefik
over a stable URL.
Before you can use Traefik from an external service such as Grafana agent, you will need to ensure that the Traefik URL is routable from the service host, and that the address is stable. (e.g. not a dynamic IP) In other words, Traefik’s own URL aso needs to be stable.
In the Juju model where cos-lite
is installed, you can run:
juju run traefik/0 show-proxied-endpoints
Assuming you have configured the traefik
charm to use an external hostname, for example "traefik.url"
, you will see something like:
proxied-endpoints: '{
"prometheus/0": {"url": "https://traefik.url/mymodel-prometheus-0"},
"loki/0": {"url": "https://traefik.url/mymodel-loki-0"},
"alertmanager": {"url": "https://traefik.url/mymodel-alertmanager"},
"catalogue": {"url": "https://traefik.url/mymodel-catalogue"},
}'
You can also open https://traefik.url/mymodel-catalogue
in a browser to see a page with links to all cos-lite
components’ user interfaces.
At this point you will need to follow the documentation on how to configure the Grafana agent. Use the urls you obtained from traefik to tell the agent where to send its telemetry.
Add custom dashboards and alerts
In order to add your own dashboards and alerts to cos-lite
you will need to deploy the cos-config
charm on top of cos-lite
.
Follow this guide to set up cos-config
in the same Juju model in which cos-lite
is deployed.
TLS
You can deploy cos-lite with the tls overlay to enable secure communications with and within COS Lite.
You can follow this guide to enable TLS in Traefik and COS Lite.
Known limitations and upcoming features
Identity
We are “working towards”[citation needed] an integration with canonical’s IAM bundle to provide a charmed identity solution to support locking down your observability stack behind an identity provider. Stay tuned for updates!
Tracing
We are “working towards”[citation needed] a tracing overlay to add distributed tracing capabilities to cos-lite. Once that work is done, you will be able to add Grafana Tempo to the stack.
Only export metrics with prometheus-scrape-target
In some rare circumstances, you might prefer to use prometheus-scrape-target
instead of grafana-agent
.
Namely:
- when you only need metrics (no logs, traces, etc…)
- when you’d rather make the necessary firewall changes in the workload you want to monitor, than ingress cos-lite
- when you’re not able to install anything (or the grafana-agent anyway) on the workload you want to monitor
If this is your situation, we’ve got you covered. You can deploy prometheus-scrape-target
and configure it to scrape your workload.