[Solved] Nagios + pagerduty ... anyone?

Hello!

I’m deploying nagios (vm charm) and trying to use it with pagerduty.

I have no luck and I’m looking for help.

Anyone which have experience in setting it up properly?

@emcp @hallback @joakimnyman @mmrezaie @mthaddon ? anyone?

Hi Erik!

What sorts of errors are you running into? What version of pagerduty are you linking to? Are you certain that you can route from your Juju model into wherever pagerduty is hosted?

If you take the time to put that information together, it’s a lot easier for us to help you. Nagios is an actively maintained charm (I believe that @mthaddon’s team uses it extensively), and everything should work smoothly. If it doesn’t, there might be a difference in your environment that the charm authors didn’t account for, and we’d need more information to be able to diagnose that.

1 Like

I’ll try describe it better here.

First, I deploy nagios on to a lxd cloud at home. No problem there.

I have it monitoring a service and all is fine.

I then configured nagios as I thought it should be done. This is the first problem: “The charm is’t giving much information as how”.

But I configured these (from docs):

Pagerduty Configuration

  • enable_pagerduty - Config variable to enable pagerduty notifications or not.
  • pagerduty_key - Pagerduty API key to use for notifications
  • pagerduty_path - Path for Pagerduty notifications to be queued, default is /var/lib/nagios3/pagerduty.

I did this:

juju config nagios enable_pagerduty=true
juju config nagios pagerduty_key="*************"

A comment here is that its not clear if it is the “Integration key” that is needed or the “API key” because I think from reading the pagerduty information it is suggesting that this might be the “intagration key” rather than the “api key”… I tried both to no avail.

I have a pager duty service setup from which I got the key(s).

From this, I generated an alarm from the nagios web-interface, it seems that the notification is triggered, but nothing turns up in pagerduty.

I also see that the method used by the charm is this: https://github.com/PagerDuty/pagerduty-nagios-pl

Which from the description needs more stuff which is not in the charm. For example the cron-job, which I added manually to see if it would do any difference. It didn’t.

I have previously managed to get prometheus-alert-manager to send pager-duty stuff to me, so I know that nothing prevents me from achieving the communication with pagerduty.

What I would be helped from is to know like step by step what is required to get this up which should be very useful from the docs.

From the logs (/var/log/nagios/nagios.log) I can see traces of my triggering of a notification:

[1647288494] SERVICE NOTIFICATION: pagerduty;kilt-rpc.dwellir.com;check_kilt_rpc;CUSTOM (CRITICAL);notify-service-by-pagerduty;Chain out of sync by 1 blocks 1145169/1145170;nagiosadmin;test2

Which seems about legit to me - nagios seems to be doing something… But pagerduty gets nothing.

Hello Erik,

I can confirm the pagerduty_key is the “Integration Key” defined in a Pagerduty service.

By enabling Pagerduty from Juju config, the “pagerduty” contact will become a member of the “admins” contact group.

A /etc/nagios3/conf.d/pagerduty_nagios.cfg file is also generated, and the pagerduty_nagios.pl script also allows a proxy value (--proxy http://host:port).

The notify-service-by-pagerduty command looks like this: command_line /usr/local/bin/pagerduty_nagios.pl enqueue -f pd_nagios_object=service -q /var/lib/nagios3/pagerduty

A cronjob should have also been generated at /etc/cron.d/nagios-pagerduty-flush.

Are there other checks apart from check_kilt_rpc that could be tested? Pagerduty may take a minute to trigger a page from the moment a nagios check alerts.

Another question I have is about the CUSTOM (CRITICAL) type of alert. In my case, I get alerts triggered as CRITICAL, not custom. I hope that level of alert matches the definition in the “pagerduty” contact, in /etc/nagios3/conf.d/pagerduty_nagios.cfg.

Kind regards, -Alvaro.

1 Like

This might be the problem I face. I trigger the alarms as part of the nagios interface.

“Send custom service notification” (see the picture).

However, it certainly seems from the logs, that a notification is sent even if its a CUSTOM type. I still don’t see it in pagerduty.

Is this perhaps something going on in pagerduty? Thanx alot for assisting. This information would be great to have in the charm docs…

[UPDATE] The"Send custom service notification" will NOT produce a notification. Try insted shutting down SSH on a node will generate a real notification is the way to test all this with pagerduty."

It there a way to trigger test alarms which are CRITICAL that you know of?

@aluria THANK YOU! After shutting down the SSH service - a CRITICAL alarm was eventually resulting in a notification even. This and making sure to use the “Integration key” instead of the “API key”, the notification was sent and picked up by pagerduty. The missing crontab I thought missing was also my mistake to have overlooked.

From your description, I was able to complete it.

It would be extremely helpfulfor others if you could update the documentation with this information as well. Especially since I think the reference to the API key is there and possibly even in the desciption of the configuration option…

I have still many questions and hope that this charm will be maintained for a long time and I would love to understand how I could contribute later!

Nagios 4 is out there now for example =)

Thank you again!

Hello Erik,

I’m glad it worked.

FYI, I’ve created a PR to update the README.

Cheers, -Alvaro.

1 Like