While trying to upgrade the Indico charm -our first sidecar charm- we’ve noticed a strange behavior. The refresh operation was carried out on a charm consisting of two units. Unlike the second unit, the first one was upgraded successfully without any kind of error.
We hit the following exception while refreshing the Indico charm, preventing one of the containers to start. After 14 restarts, the deployment succeeded without any manual intervention.
Traceback (most recent call last):
File "./src/charm.py", line 537, in <module>
main(IndicoOperatorCharm, use_juju_for_storage=True)
File "/var/lib/juju/agents/unit-indico-0/charm/venv/ops/main.py", line 429, in main
framework.reemit()
File "/var/lib/juju/agents/unit-indico-0/charm/venv/ops/framework.py", line 794, in reemit
self._reemit()
File "/var/lib/juju/agents/unit-indico-0/charm/venv/ops/framework.py", line 857, in _reemit
custom_handler(event)
File "./src/charm.py", line 456, in _on_config_changed
self._config_pebble(self.unit.get_container(container_name))
File "./src/charm.py", line 165, in _config_pebble
self._install_plugins(container, plugins)
File "./src/charm.py", line 483, in _install_plugins
process.wait_output()
File "/var/lib/juju/agents/unit-indico-0/charm/venv/ops/pebble.py", line 1098, in wait_output
exit_code = self._wait()
File "/var/lib/juju/agents/unit-indico-0/charm/venv/ops/pebble.py", line 1044, in _wait
change = self._client.wait_change(self._change_id, timeout=timeout)
File "/var/lib/juju/agents/unit-indico-0/charm/venv/ops/pebble.py", line 1521, in wait_change
return self._wait_change_using_wait(change_id, timeout)
File "/var/lib/juju/agents/unit-indico-0/charm/venv/ops/pebble.py", line 1542, in _wait_change_using_wait
return self._wait_change(change_id, this_timeout)
File "/var/lib/juju/agents/unit-indico-0/charm/venv/ops/pebble.py", line 1557, in _wait_change
resp = self._request('GET', '/v1/changes/{}/wait'.format(change_id), query)
File "/var/lib/juju/agents/unit-indico-0/charm/venv/ops/pebble.py", line 1297, in _request
response = self._request_raw(method, path, query, headers, data)
File "/var/lib/juju/agents/unit-indico-0/charm/venv/ops/pebble.py", line 1344, in _request_raw
raise ConnectionError(e.reason)
Apparently, pebble was not able to connect to the container, although the code is explicitly checking that is actually possible. The code being executed before the occurrence of the exception is pasted below:
self.framework.observe(self.on.indico_pebble_ready, self._on_pebble_ready)
def _on_pebble_ready(self, event):
"""Handle the on pebble ready event for the containers."""
if not self._are_relations_ready(event) or not event.workload.can_connect():
event.defer()
return
self._config_pebble(event.workload)
def _config_pebble(self, container):
"""Apply pebble changes."""
self.unit.status = MaintenanceStatus("Adding {} layer to pebble".format(container.name))
if container.name in ["indico", "indico-celery"]:
self._set_git_proxy_config(container)
plugins = (
self.config["external_plugins"].split(",")
if self.config["external_plugins"]
else []
)
self._install_plugins(container, plugins)
def _install_plugins(self, container, plugins):
"""Install the external plugins."""
if plugins:
process = container.exec(
["pip", "install"] + plugins,
environment=self._get_http_proxy_configuration(),
)
process.wait_output()
As it can be seen from the snippet above, the exec command is installing a plugin. When logging into the container we were able to confirm that the plugin was actually installed although the container kept on crashing.
Does this error actually mean that pebble is not ready yet? We are wondering if we’re missing some mandatory checks before executing a command in the container or if this is abnormal behavior.