Interface code for promtail loki vm-charms

erik-lonroth · 16 November 2022 13:26

@ppasotti in relation to

promtail: https://github.com/erik78se/promtail-vm-operator/
loki: https://github.com/erik78se/loki-vm-operator/

and a discussion I had with @0x12b about the k8 versions of LOKI… (Also, perhaps @roadmr would be interested)

I’m anxiously waiting for a charm interface to be able to use for my LXD/vm charms. I really would love to be able to use these charms together with k8 and not have to replicate too much code, but rather copy as much as possible from the k8 charms.

Is there anything yet written to be able to relate promtail (vm) to a k8-loki (and possibly get some help with code for loki-vm)?

ppasotti · 16 November 2022 14:06

Sounds great! Have you looked at the interfaces used by the k8s counterparts of those charms? In particular https://github.com/canonical/loki-k8s-operator/blob/main/lib/charms/loki_k8s/v0/loki_push_api.py looks like a good starting point for the loki charm. There’s only one event handler attached to pebble-related stuff. If you override that one and attach it to, say, start or install, maybe the rest will ‘just work’? I don’t see any k8s/container-specific stuff in there aside from that.

erik-lonroth · 16 November 2022 15:31

Yes. I read it and discovered the interface that way and reached out here for that purpose. In-fact, I was planning to mimic the k8 versions of those charms as much as possible.

But, since the code for the interface isn’t available yet apart from integrated in the k8 charm, it feels like I quickly can get into a situation that the charms would divert too much.

Ideally, I could have the vm charm be integrated into the same repo or something which would perhaps help when maintaining the charms over time.

I hate to register the charm name “loki” and “promtail”, but I will if I need to…

ppasotti · 16 November 2022 16:04

But, since the code for the interface isn’t available yet apart from integrated in the k8 charm, it feels like I quickly can get into a situation that the charms would divert too much.

What do you mean by that? The interface code is in a charm lib, so you should be able to import it and share it without fear of divergence. Or do you mean the actual ‘workload’ config management code?

If that is what you mean, maybe a more productive way to go forward is for you to contribute a feature to the charm to make it work on both k8s and machine. I’ve heard of someone doing that using a subclassing mechanism.

For example you could have class CharmLokiK8s(LokiCharmBase) and class CharmLokiLXD(LokiCharmBase), and a __main__ routine which decides which one to instantiate based on the substrate.

I don’t think that the Framework has any abstraction yet in place for that, but allowing for multi-substrate charms was on the agenda in several meetings at some point

Best to ask @jnsgruk and @benhoyt re the state of the art in that respect. The o11y team also has several ‘machine’ stories in the roadmap, better to involve @0x12b in the conversation as well.

benhoyt · 17 November 2022 02:34

As far as I know we don’t yet have any charms that work for both VMs and K8s, but I think @ppasotti is right that the current best approach would be two code paths (a subclass or similar), until we add first class support for that. Two questions on my mind:

Do we have the data we need in the hook to determine whether it’s running on VM or K8s?
Can metadata.yaml handle this scenario? If we have “containers: …” can it still be deployed as a VM charm?

ppasotti · 17 November 2022 08:51

Afaik

we can solve by querying for the presence of specific envvars in the charm process. E.g. if os.getenv('KUBERNETES_PORT', None):
yes. The only metadata.yaml option that affects how juju deploys things is assumes: k8s-api. If k8s-api is assumed, juju will refuse to deploy on anything but a kubernetes cloud. This is the only constraint at the moment: there is no way to e.g. assume a VM environment. So if we omit that field, the charm will deploy on any substrate (ATM. Not sure if they’re planning to fix this asymmetry).

jnsgruk · 17 November 2022 08:52

The current loki_push library does make some assumptions about Pebble being present etc. I think @dylanstathis has been looking into this over the last few days - our intent is certainly to support log egress from machines to the loki-k8s charm.

If you don’t include containers, then the charm can actually deploy on both machines and kubernetes. There are a couple of examples where we actually do this (prometheus-scrape-target, for example). They’re a little bit limited, but useful for modelling in some cases.

@erik-lonroth: AFAIK the Observability team are not planning on releasing Loki (or any of the COS stack) for VMs. That said, I’d recommend giving it a go - COS Lite is designed to be an “appliance” you can deploy on MicroK8s and use to ingest metrics, logs, etc. from anywhere. We have a full-time team dedicated to this stack, so it’ll likely be less hassle than maintaining a separate VM charm. If you do decide to implement it, then I’d ask that you try to implement the same relation interface.

In any case, you should drop into the observability channel on Charmhub to discuss options with the team Mattermost

dylanstathis · 17 November 2022 11:13

Okay so there a few things to address here.

The loki_push_api library should be compatible with machine charms very soon. This is not a guarantee but I think it will happen within the next few days. I think your best bet in the short term would be to use loki_push_api to push logs from your charm to loki-k8s.
We are in the process of creating a subordinate machine charm for grafana-agent. This charm would scrape logs and metrics from machine charms and send them back to Loki and Prometheus. It you can wait for this, it is definitely our recommended solution.
Which charm are you suggesting to make hybrid machine/k8s? It is unlikely that we would accept a PR right now that made the loki-k8s charm a hybrid. It is easy enough to deploy one node of microk8s and drop loki-k8s on there.
We have discussed the possibility of hybrid charms in general. It seems it would be possible by doing what has been mentioned here. I have a pretty significant concern though. metadata.yaml would have to specify containers which would only be used when running on k8s. It is my understanding that pebble support for machine charms is coming some time in the future. When this happens, would containers just start spinning up on machine charms because you updated Juju? @benhoyt will this be an issue?

@0x12b Have I missed anything.

benhoyt · 17 November 2022 23:10

Re point 4. Yeah, I think we’d need to discuss first-class support for this in Juju. Currently if you specify “containers” in metadata.yaml it does actually deploy successfully to a VM (I just tried to deploy the sidecar K8s charm snappass-test to an LXD VM and it succeeded, though of course the charm/workload didn’t work). But what else we’d need to support to make that seamless I’m not sure. We’d need to think through metadata.yaml implications for sure, and Pebble on VMs, and so on.

Doesn’t look like that’s on the roadmap for this cycle, but I could be wrong. @jnsgruk?

erik-lonroth · 17 November 2022 23:11

Would this be “promtail” ?

erik-lonroth · 17 November 2022 23:12

Super! Could you point to some code already?

erik-lonroth · 17 November 2022 23:15

I don’t suggest making hybrid charms really. I used to think that was a good idea, but I’ve found that the challenges maintaining such charms is far more difficult than having multiple charms.

A fairly nice pattern has instead been to develop a python module that manages everything that has to do with messing with the OS, and make that library contain code that can be tested outside of a juju-context. This library could then be used easily with different versions of a charm. That pattens works fairly well in a juju context.

erik-lonroth · 17 November 2022 23:22

So, for us to run the COS stack - we have to run a K8 cluster (however small)? E.g. we need to introduce a complete software stack to monitor another?

This makes no sense to me at all yet.

ppasotti · 18 November 2022 08:28

RE the code: probably you can take a look at: https://github.com/canonical/loki-k8s-operator/pull/215

jnsgruk · 18 November 2022 11:45

In general, we’d aim for people to run their monitoring on different infrastructure than the production services it’s monitoring - that way if there is a critical problem in the production infrastructure, you’re less likely to be flying blind with no monitoring or historical log information.

@0x12b is the expert here, but I shouldn’t imagine the current setup will add much overhead into your ops - it’s designed to keep the ops burden as low as possible

erik-lonroth · 18 November 2022 12:21

Thanx!

@0x12b was participating today in the workshop and we talked about this and I think the context around the “grafana-agent” explained alot and also placed the features it provide in a great context.

The promtail subordinate I was showing, I learned today, comes with the agent - which if getting implemented also as a vm charm would make all the sense in the world. Then, the COS stack would be totally justified for us to consider on a K8 platform.

I’ll be super glad to test that grafana-agent as soon as it sees daylight.

erik-lonroth · 18 November 2022 12:22

I’ll take a look. Thanx @ppasotti !

0x12b · 18 November 2022 12:50

As for the relation interface specification itself, that work is currently in progress. I just have some minor finishing touches left to do before I’ll push it to the relation interfaces repo. But as @ppasotti is saying, if you’re able to use the libs we already have in place, then you won’t have to worry about the interface spec at all.