More on configuring nagios machine charm

I’m continuing using nagios with a charm that implements the “local-monitors” interface, derived from this repo: https://github.com/erik78se/juju-operators-examples/tree/main/monitoring-nrpe

My charm plays nicely with nagios much thatnx to the superb help from @mthaddon. To this point its been really helpful and I’m successfully using it to send alerts now also to pagerduty.

However, it seems I can’t get nagios to understand the config.yaml elements from my related charm “polkadot”:

juju config nagios_servicegroups = "rpc"

or

 juju config nagios_context = "dwellir"

This has no impact on the nagios side of things.

In the interface there is nothing like what I use:

… and the code from the ops side of things looks like this:

def render_checks(self):
    """Render nrpe checks."""
    nrpe = NRPE()
    if not os.path.exists(self.plugins_dir):
        os.makedirs(self.plugins_dir)

    # Register a basic test.
    # Just add more with add_check before nrpe.write()

    nrpe.add_check(
        shortname="check-substrate",
        description="RPC blocksync",
        check_cmd="check_substrate.sh",
    )
    nrpe.write()

Anyone with experience here?

Hello Erik,

Please have a look at this example (pastebin)

The local-monitors interface behaves in a similar way to the nrpe-external-master interface. For example, charm-nrpe will aggregate the relation data from both interfaces and send it to charm-nagios via the monitors interface (see the implementation in charmhelpers lib).

In the pastebin that I shared, you can see a target-id attribute that is updated on the nrpe application, and shared with nagios via the monitors interface.

The charm code you shared implements the local-monitors interface, but not the monitors one, so Nagios will use its own default settings for “context”.

Note: if juju config nagios nagios_host_context="test2" is used, that value will only affect the hosts and services created for the nagios host itself (e.g. in the localhost_nagios2.cfg file).

Hope it helps.

Kind regards, -Alvaro.

I’ll try have a look at this since there are some pieces I don’t get.

My ambition is to somehow be able to group monitored units based on where they are coming from in terms of models. I haven’t been able to get there apart from what `nagios_host_context provides which is not enough…

Oh, I see.

The charmhelpers lib already supports the nagios_servicegroups config parameter, which is optional on principal charms.

The approach is to use 3 different charms:

  1. The nagios charm (relates to the nrpe charm via the monitors interface)
  2. The nrpe subordinate charm (relates to nagios via monitors, and to a principal charm via the subordinates nrpe-external-master or local-monitors relations)
  3. A principal charm which allows nagios_context and nagios_servicegroups in its config.yaml file (relates to the nrpe charm).

See an example on how charm-keystone supports the nagios_servicegroups config parameter, which is used in the nrpe relation handler (keystone_hooks.py) when nrpe_setup.write is called.

See also the nagios_servicegroups implementation in charmhelpers.

Cheers, -Alvaro.

1 Like

Its strange because I did add the following to my config.yaml:

nagios_context:
   default: "dwellir"
   type: string
   description: "..."
  nagios_servicegroups:
   default: "rpc"
   type: string
   description: "..."

But I don’t see this having any impact on the nagios side of things. I will try look some more at it during the weekend.

My charm is named “polkadot” and this is what I can see from “juju show-unit polkadot/6”

erik@frozen:~$ juju show-unit polkadot/6
polkadot/6:
  workload-version: 3.0.0
  machine: "7"
  opened-ports: []
  public-address: 192.168.2.128
  charm: local:focal/polkadot-10
  leader: true
  relation-info:
  - endpoint: local-monitors
    related-endpoint: local-monitors
    application-data: {}
    related-units:
      nrpe/15:
        in-scope: true
        data:
          egress-subnets: 192.168.2.128/32
          ingress-address: 192.168.2.128
          nagios_host_context: monitor-dev
          nagios_hostname: monitor-dev-polkadot-6
          private-address: 192.168.2.128

… nor on nagios side of the relation:

erik@frozen:~$ juju show-unit nagios/0
nagios/0:
  machine: "1"
  opened-ports:
  - 80/tcp
  public-address: 192.168.2.207
  charm: ch:amd64/bionic/nagios-46
  leader: true
  relation-info:
  - endpoint: monitors
    related-endpoint: monitors
    application-data: {}
    related-units:
      nrpe/15:
        in-scope: true
        data:
          egress-subnets: 192.168.2.128/32
          ingress-address: 192.168.2.128
          machine_id: "7"
          model_id: d30634de-1d37-4855-8451-6e0cc7ffb995
          monitors: '{''monitors'': {''remote'': {''nrpe'': {''rpc-blocksync'': {''command'':
            ''check_rpc-blocksync''}, ''rpc-performance'': {''command'': ''check_rpc-performance''}}}},
            ''version'': ''0.3''}'
          nagios_host_context: juju
          nagios_hostname: juju-juju-ffb995-7
          private-address: 192.168.2.128
          target-address: 192.168.2.128
          target-id: monitor-dev-polkadot-6

… and as you can see, the config elements have been set:

erik@frozen:~$ juju config polkadot nagios_context
dwellir
erik@frozen:~$ juju config polkadot nagios_servicegroups
rpc

But, in my nagios config, nothing that includes “dwellir” or “rpc” is present.

No service-groups is there…