Stop and start fail with 'process still runs after SIGTERM and SIGKILL'

Doing a simple stop and start on a service in a container with container.stop(services) and container.start(services) getting the following stacktrace:

File “/var/lib/juju/agents/unit-smokeping-k8s-0/charm/venv/ops/model.py”, line 1050, in stop
self._pebble.stop_services(service_names)
File “/var/lib/juju/agents/unit-smokeping-k8s-0/charm/venv/ops/pebble.py”, line 813, in stop_services
return self._services_action(‘stop’, services, timeout, delay)
File “/var/lib/juju/agents/unit-smokeping-k8s-0/charm/venv/ops/pebble.py”, line 831, in _services_action
raise ChangeError(change.err, change)
ops.pebble.ChangeError: cannot perform the following tasks:

  • Stop service “smokeping” (process still runs after SIGTERM and SIGKILL)

Using Docker Hub

Welcome @dparv!

Just for context: Pebble (the underlying process manager) waits up to 5s after sending SIGTERM for the process to die, then if it’s still running it sends SIGKILL, and waits another 5s after sending SIGKILL.

It is possible for SIGKILL not to cause the process to die right away. From my reading (this Stack Exchange answer has some really good info) this happens when the kernel itself is blocking (“uninterruptible sleep”), often caused by a NFS I/O operation, buggy driver, or zombie process.

Can you successfully stop this “smokeping” process outside the context of Pebble? (By sending SIGKILL, and assuming it’s in the same state as when you’re trying to stop it in Pebble.)

I also noticed that the code that sends these signals does not check errors from the syscall.Kill call, so it’s possible but unlikely that an error is being returned. Unlikely because that usually happens when you don’t have permission due to not being the one that started the process, but in this case Pebble itself is starting the process, so I don’t think that’s the issue here – but I’ve opened Pebble issue #62 to fix that.

Ok, I logged in the container and I can see the following processes:

root 300 0.0 0.0 196 4 ? S 12:32 0:00 s6-supervise smokeping
abc 305 0.0 1.0 53072 42788 ? Ss 12:32 0:00 /usr/bin/perl /usr/bin/smokeping --config=/etc/smokeping/config --nodaemon
abc 339 0.0 0.8 53072 34280 ? S 12:32 0:00 /usr/bin/smokeping [FPing]
abc 341 0.1 1.1 56232 48084 ? S 12:32 0:00 /usr/bin/perl /usr/bin/smokeping_cgi /etc/smokeping/config

Which one do I try to kill manually here?