How to run workloads with a charm - Kubernetes

The recommended way to create charms for Kubernetes is using the sidecar pattern with the workload container running Pebble.

See more: Pebble

Pebble is a lightweight, API-driven process supervisor designed for use with charms. If you specify the containers field in a charm’s charmcraft.yaml, Juju will deploy the charm code in a sidecar container, with Pebble running as the workload container’s ENTRYPOINT.

When the workload container starts up, Juju fires a PebbleReadyEvent, which can be handled using Framework.observe as shown in Framework Constructs under “Containers”. This gives the charm author access to event.workload, a Container instance.

The Container class has methods to modify the Pebble configuration “plan”, start and stop services, read and write files, and run commands. These methods use the Pebble API, which communicates from the charm container to the workload container using HTTP over a Unix domain socket.

The rest of this document provides details of how a charm interacts with the workload container via Pebble, using the Python Operator Framework Container methods.

Contents:

If you ever wish to access the Pebble client directly:

The Container.pebble property returns the pebble.Client instance for the given container.

Set up the workload container

Configure Juju to set up Pebble in the workload container

The preferred way to run workloads on Kubernetes with charms is to start your workload with Pebble. You do not need to modify upstream container images to make use of Pebble for managing your workload. The Juju controller automatically injects Pebble into workload containers using an Init Container and Volume Mount. The entrypoint of the container is overridden so that Pebble starts first and is able to manage running services. Charms communicate with the Pebble API using a UNIX socket, which is mounted into both the charm and workload containers.

By default, you’ll find the Pebble socket at /var/lib/pebble/default/pebble.sock in the workload container, and /charm/<container>/pebble.sock in the charm container.

Most Kubernetes charms will need to define a containers map in their charmcraft.yaml in order to start a workload with a known OCI image:

# ...
containers:
  myapp:
    resource: myapp-image
  redis:
    resource: redis-image

resources:
  myapp-image:
    type: oci-image
    description: OCI image for my application
  redis-image:
    type: oci-image
    description: OCI image for Redis
# ...

In some cases, you may wish not to specify a containers map, which will result in an “operator-only” charm. These can be useful when writing “integrator charms” (sometimes known as “proxy charms”), which are used to represent some external service in the Juju model.

For each container, a resource of type oci-image must also be specified. The resource is used to inform the Juju controller how to find the correct OCI-compliant container image for your workload on Charmhub.

If multiple containers are specified in charmcraft.yaml (as above), each Pod will contain an instance of every specified container. Using the example above, each Pod would be created with a total of 3 running containers:

  • a container running the myapp-image
  • a container running the redis-image
  • a container running the charm code

The Juju controller emits PebbleReadyEvents to charms when Pebble has initialised its API in a container. These events are named <container_name>_pebble_ready. Using the example above, the charm would receive two Pebble related events (assuming the Pebble API starts correctly in each workload):

  • myapp_pebble_ready
  • redis_pebble_ready.

Consider the following example snippet from a charmcraft.yaml:

# ...
containers:
  pause:
    resource: pause-image

resources:
  pause-image:
    type: oci-image
    description: Docker image for google/pause
# ...

Once the containers are initialised, the charm needs to tell Pebble how to start the workload. Pebble uses a series of “layers” for its configuration. Layers contain a description of the processes to run, along with the path and arguments to the executable, any environment variables to be specified for the running process and any relevant process ordering (more information available in the Pebble README).

In many cases, using the container’s specified entrypoint may be desired. You can find the original entrypoint of an image locally like so:

$ docker pull <image> $ docker inspect <image>

When using an OCI-image that is not built specifically for use with Pebble, layers are defined at runtime using Pebble’s API. Recall that when Pebble has initialised in a container (and the API is ready), the Juju controller emits a PebbleReadyEvent event to the charm. Often it is in the callback bound to this event that layers are defined, and services started:

# ...
import ops
# ...

class PauseCharm(ops.CharmBase):
    # ...
    def __init__(self, framework):
        super().__init__(framework)
        # Set a friendly name for your charm. This can be used with the Operator
        # framework to reference the container, add layers, or interact with
        # providers/consumers easily.
        self.name = "pause"
        # This event is dynamically determined from the service name
        # in ops.pebble.Layer
        # 
        # If you set self.name as above and use it in the layer definition following this
        # example, the event will be <self.name>_pebble_ready
        framework.observe(self.on.pause_pebble_ready, self._on_pause_pebble_ready)
        # ...

    def _on_pause_pebble_ready(self, event: ops.PebbleReadyEvent) -> None:
        """Handle the pebble_ready event"""
        # You can get a reference to the container from the PebbleReadyEvent
        # directly with:
        # container = event.workload
        #
        # The preferred method is through get_container()
        container = self.unit.get_container(self.name)
        # Add our initial config layer, combining with any existing layer
        container.add_layer(self.name, self._pause_layer(), combine=True)
        # Start the services that specify 'startup: enabled'
        container.autostart()
        self.unit.status = ops.ActiveStatus()

    def _pause_layer(self) -> ops.pebble.Layer:
        """Returns Pebble configuration layer for google/pause"""
        return ops.pebble.Layer(
            {
                "summary": "pause layer",
                "description": "pebble config layer for google/pause",
                "services": {
                    self.name: {
                        "override": "replace",
                        "summary": "pause service",
                        "command": "/pause",
                        "startup": "enabled",
                    }
                },
            }
        )
# ...

A common method for configuring container workloads is by manipulating environment variables. The layering in Pebble makes this easy. Consider the following extract from a config-changed callback which combines a new overlay layer (containing some environment configuration) with the current Pebble layer and restarts the workload:

# ...
import ops
# ...
def _on_config_changed(self, event: ops.ConfigChangedEvent) -> None:
    """Handle the config changed event."""
    # Get a reference to the container so we can manipulate it
    container = self.unit.get_container(self.name)

    # Create a new config layer - specify 'override: merge' in 
    # the 'pause' service definition to overlay with existing layer
    layer = ops.pebble.Layer(
        {
            "services": {
                "pause": {
                    "override": "merge",
                    "environment": {
                        "TIMEOUT": self.model.config["timeout"],
                    },
                }
            },
        }
    )

    try:
        # Add the layer to Pebble
        container.add_layer(self.name, layer, combine=True)
        logging.debug("Added config layer to Pebble plan")

        # Tell Pebble to update the plan, which will restart any services if needed.
        container.replan()
        logging.info("Updated pause service")
        # All is well, set an ActiveStatus
        self.unit.status = ops.ActiveStatus()
    except ops.pebble.PathError, ops.pebble.ProtocolError, ops.pebble.ConnectionError:
        # handle errors (for example: the container might not be ready yet)
        .....

In this example, each time a config-changed event is fired, a new overlay layer is created that only includes the environment config, populated using the charm’s config. Pebble will ensure that that the application is only restarted if the configuration has changed.

Configure a Pebble layer

Pebble services are configured by means of layers, with higher layers adding to or overriding lower layers, forming the effective Pebble configuration, or “plan”.

When a workload container is created and Pebble starts up, it looks in /var/lib/pebble/default/layers (if that exists) for configuration layers already present in the container image, such as 001-layer.yaml. If there are existing layers there, that becomes the starting configuration, otherwise Pebble is happy to start with an empty configuration, meaning no services.

In the latter case, Pebble is configured dynamically via the API by adding layers at runtime.

See the layer specification for more details.

Add a configuration layer

To add a configuration layer, call Container.add_layer with a label for the layer, and the layer’s contents as a YAML string, Python dict, or pebble.Layer object.

You can see an example of add_layer under the “Replan” heading. The combine=True argument tells Pebble to combine the named layer into an existing layer of that name (or add a layer if none by that name exists). Using combine=True is common when dynamically adding layers.

Because combine=True combines the layer with an existing layer of the same name, it’s normally used with override: replace in the YAML service configuration. This means replacing the entire service configuration with the fields in the new layer.

If you’re adding a single layer with combine=False (default option) on top of an existing base layer, you may want to use override: merge in the service configuration. This will merge the fields specified with the service by that name in the base layer. See an example of overriding a layer.

Fetch the effective plan

Charm authors can also introspect the current plan using Container.get_plan. It returns a pebble.Plan object whose services attribute maps service names to pebble.Service instances.

Below is an example of how you might use get_plan to introspect the current configuration, and only add the layer with its services if they haven’t been added already:

class MyCharm(ops.CharmBase):
    ...

    def _on_config_changed(self, event):
        container = self.unit.get_container("main")
        plan = container.get_plan()
        if not plan.services:
            layer = {"services": ...}
            container.add_layer("layer", layer)
            container.start("svc")
        ...

Control and monitor services in the workload container

The main purpose of Pebble is to control and monitor services, which are usually long-running processes like web servers and databases.

In the context of Juju sidecar charms, Pebble is run with the --hold argument, which prevents it from automatically starting the services marked with startup: enabled. This is to give the charm full control over when the services in Pebble’s configuration are actually started.

Replan

After adding a configuration layer to the plan (details below), you need to call replan to make any changes to services take effect. When you execute replan, Pebble will automatically restart any services that have changed, respecting dependency order. If the services are already running, it will stop them first using the normal stop sequence.

The reason for replan is so that you as a user have control over when the (potentially high-impact) action of stopping and restarting your services takes place.

Replan also starts the services that are marked as startup: enabled in the configuration plan, if they’re not running already.

Call Container.replan to execute the replan procedure. For example:

class SnappassTestCharm(ops.CharmBase):
    ...

    def _start_snappass(self):
        container = self.unit.containers["snappass"]
        snappass_layer = {
            "services": {
                "snappass": {
                    "override": "replace",
                    "summary": "snappass service",
                    "command": "snappass",
                    "startup": "enabled",
                }
            },
        }
        container.add_layer("snappass", snappass_layer, combine=True)
        container.replan()
        self.unit.status = ops.ActiveStatus()

Check container health

The Ops library provides a way to ensure that your container is healthy. In the Container class, Container.can_connect() can be used if you only need to know that Pebble is responding at a specific point in time - for example to update a status message. This should not be used to guard against later Pebble operations, because that introduces a race condition where Pebble might be responsive when can_connect() is called, but is not when the later operation is executed. Instead, charms should always include try/except statements around Pebble operations, to avoid the unit going into error state.

See more: ops.Container

Start and stop

To start (or stop) one or more services by name, use the start and stop methods. Here’s an example of how you might stop and start a database service during a backup action:

class MyCharm(ops.CharmBase):
    ...

    def _on_pebble_ready(self, event):
        container = event.workload
        container.start('mysql')

    def _on_backup_action(self, event):
        container = self.unit.get_container('main')
        try:
            container.stop('mysql')
            do_mysql_backup()
            container.start('mysql')
        except ops.pebble.ProtocolError, ops.pebble.PathError, ops.pebble.ConnectionError:
            # handle Pebble errors

It’s not an error to start a service that’s already started, or stop one that’s already stopped. These actions are idempotent, meaning they can safely be performed more than once, and the service will remain in the same state.

When Pebble starts a service, Pebble waits one second to ensure the process doesn’t exit too quickly – if the process exits within one second, the start operation raises an error and the service remains stopped.

To stop a service, Pebble first sends SIGTERM to the service’s process group to try to stop the service gracefully. If the process has not exited after 5 seconds, Pebble sends SIGKILL to the process group. If the process still doesn’t exit after another 5 seconds, the stop operation raises an error. If the process exits any time before the 10 seconds have elapsed, the stop operation succeeds.

Fetch service status

You can use the get_service and get_services methods to fetch the current status of one service or multiple services, respectively. The returned ServiceInfo objects provide a status attribute with various states, or you can use the ServiceInfo.is_running method.

Here is a modification to the start/stop example that checks whether the service is running before stopping it:

class MyCharm(ops.CharmBase):
    ...

    def _on_backup_action(self, event):
        container = self.unit.get_container('main')
        is_running = container.get_service('mysql').is_running()
        if is_running:
            container.stop('mysql')
        do_mysql_backup()
        if is_running:
            container.start('mysql')

Send signals to services

From Juju version 2.9.22, you can use the Container.send_signal method to send a signal to one or more services. For example, to send SIGHUP to the hypothetical “nginx” and “redis” services:

container.send_signal('SIGHUP', 'nginx', 'redis')

This will raise an APIError if any of the services are not in the plan or are not currently running.

View service logs

Pebble stores service logs (stdout and stderr from services) in a ring buffer accessible via the pebble logs command. Each log line is prefixed with the timestamp and service name, using the format 2021-05-03T03:55:49.654Z [snappass] .... Pebble allocates a ring buffer of 100KB per service (not one ring to rule them all), and overwrites the oldest logs in the buffer when it fills up.

When running under Juju, the Pebble server is started with the --verbose flag, which means it also writes these logs to Pebble’s own stdout. That in turn is accessible via Kubernetes using the kubectl logs command. For example, to view the logs for the “redis” container, you could run:

microk8s kubectl logs -n snappass snappass-test-0 -c redis

In the command line above, “snappass” is the namespace (Juju model name), “snappass-test-0” is the pod, and “redis” the specific container defined by the charm configuration.

Configure service auto-restart

From Juju version 2.9.22, Pebble automatically restarts services when they exit unexpectedly.

By default, Pebble will automatically restart a service when it exits (with either a zero or nonzero exit code). In addition, Pebble implements an exponential backoff delay and a small random jitter time between restarts.

You can configure this behavior in the layer configuration, specified under each service. Here is an example showing the complete list of auto-restart options with their defaults:

services:
    server:
        override: replace
        command: python3 app.py

        # auto-restart options (showing defaults)
        on-success: restart   # can also be "shutdown" or "ignore"
        on-failure: restart   # can also be "shutdown" or "ignore"
        backoff-delay: 500ms
        backoff-factor: 2.0
        backoff-limit: 30s

The on-success action is performed if the service exits with a zero exit code, and the on-failure action is performed if it exits with a nonzero code. The actions are defined as follows:

  • restart: automatically restart the service after the current backoff delay. This is the default.
  • shutdown: shut down the Pebble server. Because Pebble is the container’s “PID 1” process, this will cause the container to terminate – useful if you want Kubernetes to restart the container.
  • ignore: do nothing (apart from logging the failure).

The backoff delay between restarts is calculated using an exponential backoff: next = current * backoff_factor, with current starting at the configured backoff-delay. If next is greater than backoff-limit, it is capped at backoff-limit. With the defaults, the delays (in seconds) will be: 0.5, 1, 2, 4, 8, 16, 30, 30, and so on.

The backoff-factor must be greater than or equal to 1.0. If the factor is set to 1.0, next will equal current, so the delay will remain constant.

Just before delaying, a small random time jitter of 0-10% of the delay is added (the current delay is not updated). For example, if the current delay value is 2 seconds, the actual delay will be between 2.0 and 2.2 seconds.

Perform health checks on the workload container

From Juju version 2.9.26, Pebble supports adding custom health checks: first, to allow Pebble itself to restart services when certain checks fail, and second, to allow Kubernetes to restart containers when specified checks fail.

Each check can be one of three types. The types and their success criteria are:

  • http: an HTTP GET request to the URL specified must return an HTTP 2xx status code.
  • tcp: opening the given TCP port must be successful.
  • exec: executing the specified command must yield a zero exit code.

Check configuration

Checks are configured in the layer configuration using the top-level field checks. Here’s an example showing the three different types of checks:

checks:
    up:
        override: replace
        level: alive  # optional, but required for liveness/readiness probes
        period: 10s   # this is the default
        timeout: 3s   # this is the default
        threshold: 3  # this is the default
        exec:
            command: service nginx status

    online:
        override: replace
        level: ready
        tcp:
            port: 8080

    http-test:
        override: replace
        http:
            url: http://localhost:8080/test

Each check is performed with the specified period (the default is 10 seconds apart), and is considered an error if a timeout happens before the check responds – for example, before the HTTP request is complete or before the command finishes executing.

A check is considered healthy until it’s had threshold errors in a row (the default is 3). At that point, the on-check-failure action will be triggered, and the health endpoint will return an error response (both are discussed below). When the check succeeds again, the failure count is reset.

See the layer specification for more details about the fields and options for different types of checks.

Respond to a check failing or recovering

version:

Ops 2.15 and Juju 3.6

To have the charm respond to a check reaching the failure threshold, or passing again afterwards, observe the pebble_check_failed and pebble_check_recovered events and switch on the info’s name:

class PostgresCharm(ops.CharmBase):
    def __init__(self, framework: ops.Framework):
        super().__init__(framework)
        # Note that "db" is the workload container's name
        framework.observe(self.on["db"].pebble_check_failed, self._on_pebble_check_failed)
        framework.observe(self.on["db"].pebble_check_recovered, self._on_pebble_check_recovered)

    def _on_pebble_check_failed(self, event: ops.PebbleCheckFailedEvent):
        if event.info.name == "http-test":
            logger.warning("The http-test has started failing!")
            self.unit.status = ops.ActiveStatus("Degraded functionality ...")

        elif event.info == "online":
            logger.error("The service is no longer online!")

    def _on_pebble_check_recovered(self, event: ops.PebbleCheckRecoveredEvent):
        if event.info.name == "http-test":
            logger.warning("The http-test has stopped failing!")
            self.unit.status = ops.ActiveStatus()

        elif event.info == "online":
            logger.error("The service is online again!")

All check events have an info property with the details of the check’s current status. Note that, by the time that the charm receives the event, the status of the check may have changed (for example, passed again after failing). If the response to the check failing is light (such as changing the status), then it’s fine to rely on the status of the check at the time the event was triggered — there will be a subsequent check-recovered event, and the status will quickly flick back to the correct one. If the response is heavier (such as restarting a service with an adjusted configuration), then the two events should share a common handler and check the current status via the info property; for example:

class PostgresCharm(ops.CharmBase):
    def __init__(self, framework: ops.Framework):
        super().__init__(framework)
        # Note that "db" is the workload container's name
        framework.observe(self.on["db"].pebble_check_failed, self._on_pebble_check_failed)
        framework.observe(self.on["db"].pebble_check_recovered, self._on_pebble_check_recovered)

    def _on_pebble_check_failed(self, event: ops.PebbleCheckFailedEvent):
        if event.info.name != "up":
            # For now, we ignore the other tests.
            return
        if event.info.status == ops.pebble.CheckStatus.DOWN:
            self.activate_alternative_configuration()
        else:
            self.activate_main_configuration()

Fetch check status

You can use the get_check and get_checks methods to fetch the current status of one check or multiple checks, respectively. The returned CheckInfo objects provide various attributes, most importantly a status attribute which will be either UP or DOWN.

Here is a code example that checks whether the uptime check is healthy, and writes an error log if not:

container = self.unit.get_container('main')
check = container.get_check('uptime')
if check.status != ops.pebble.CheckStatus.UP:
    logger.error('Uh oh, uptime check unhealthy: %s', check)

Check auto-restart

To enable Pebble auto-restart behavior based on a check, use the on-check-failure map in the service configuration. For example, to restart the “server” service when the “http-test” check fails, use the following configuration:

services:
    server:
        override: merge
        on-check-failure:
            http-test: restart   # can also be "shutdown" or "ignore" (the default)

Check health endpoint and probes

As of Juju version 2.9.26, Pebble includes an HTTP /v1/health endpoint that allows a user to query the health of configured checks, optionally filtered by check level with the query string ?level=<level> This endpoint returns an HTTP 200 status if the checks are healthy, HTTP 502 otherwise.

Each check can optionally specify a level of “alive” or “ready”. These have semantic meaning: “alive” means the check or the service it’s connected to is up and running; “ready” means it’s properly accepting network traffic. These correspond to Kubernetes “liveness” and “readiness” probes.

When Juju creates a sidecar charm container, it initialises the Kubernetes liveness and readiness probes to hit the /v1/health endpoint with ?level=alive and ?level=ready filters, respectively.

Ready implies alive, and not alive implies not ready. If you’ve configured an “alive” check but no “ready” check, and the “alive” check is unhealthy, /v1/health?level=ready will report unhealthy as well, and the Kubernetes readiness probe will act on that.

If there are no checks configured, Pebble returns HTTP 200 so the liveness and readiness probes are successful by default. To use this feature, you must explicitly create checks with level: alive or level: ready in the layer configuration.

Consider the K8s liveness success (level=alive check) to mean “Pebble is alive” rather than “the application is fully alive” (and failure to mean “this container needs to die”). For charms that take a long time to start, you should not have a level=alive check (if Pebble’s running, it will report alive to K8s), and instead use an ordinary Pebble check (without a level) in conjunction with on-check-failure: restart. That way Pebble itself has full control over restarting the service in question.

Test checks

version:

Scenario 7.0

To test charms that use Pebble check events, use the Scenario CheckInfo class and the emit the appropriate event. For example, to simulate the “http-test” check failing, the charm test could do the following:

def test_http_check_failing():
    ctx = scenario.Context(PostgresCharm)
    check_info = scenario.CheckInfo("http-test", failures=3, status=ops.pebble.CheckStatus.DOWN)
    container = scenario.Container("db", check_infos={check_info})
    state_in = scenario.State(containers={container})

    state_out = ctx.run(ctx.on.pebble_check_failed(container, check_info), state_in)

    assert state_out...

Manage files in the workload container

Pebble’s files API allows charm authors to read and write files on the workload container. You can write files (“push”), read files (“pull”), list files in a directory, make directories, and delete files or directories.

Push

Probably the most useful operation is Container.push, which allows you to write a file to the workload, for example, a PostgreSQL configuration file. You can use push as follows (note that this code would be inside a charm event handler):

config = """
port = 7777
max_connections = 1000
"""
container.push('/etc/pg/postgresql.conf', config, make_dirs=True)

The make_dirs=True flag tells push to create the intermediate directories if they don’t already exist (/etc/pg in this case).

There are many additional features, including the ability to send raw bytes (by providing a Python bytes object as the second argument) and write data from a file-like object. You can also specify permissions and the user and group for the file. See the API documentation for details.

Pull

To read a file from the workload, use Container.pull, which returns a file-like object that you can read().

The files API doesn’t currently support update, so to update a file you can use pull to perform a read-modify-write operation, for example:

# Update port to 8888 and restart service
config = container.pull('/etc/pg/postgresql.conf').read()
if 'port =' not in config:
    config += '\nport = 8888\n'
container.push('/etc/pg/postgresql.conf', config)
container.restart('postgresql')

If you specify the keyword argument encoding=None on the pull() call, reads from the returned file-like object will return bytes. The default is encoding='utf-8', which will decode the file’s bytes from UTF-8 so that reads return a Python str.

Push recursive

version:

1.5

To copy several files to the workload, use Container.push_path, which copies files recursively into a specified destination directory. The API docs contain detailed examples of source and destination semantics and path handling.

# copy "/source/dir/[files]" into "/destination/dir/[files]"
container.push_path('/source/dir', '/destination')

# copy "/source/dir/[files]" into "/destination/[files]"
container.push_path('/source/dir/*', '/destination')

A trailing “/*” on the source directory is the only supported globbing/matching.

Pull recursive

version:

1.5

To copy several files to the workload, use Container.pull_path, which copies files recursively into a specified destination directory. The API docs contain detailed examples of source and destination semantics and path handling.

# copy "/source/dir/[files]" into "/destination/dir/[files]"
container.pull_path('/source/dir', '/destination')

# copy "/source/dir/[files]" into "/destination/[files]"
container.pull_path('/source/dir/*', '/destination')

A trailing “/*” on the source directory is the only supported globbing/matching.

List files

To list the contents of a directory or return stat-like information about one or more files, use Container.list_files. It returns a list of pebble.FileInfo objects for each entry (file or directory) in the given path, optionally filtered by a glob pattern. For example:

infos = container.list_files('/etc', pattern='*.conf')
total_size = sum(f.size for f in infos)
logger.info('total size of config files: %d', total_size)
names = set(f.name for f in infos)
if 'host.conf' not in names:
    raise Exception('This charm requires /etc/host.conf!')

If you want information about the directory itself (instead of its contents), call list_files(path, itself=True).

Create directory

To create a directory, use Container.make_dir. It takes an optional make_parents=True argument (like mkdir -p), as well as optional permissions and user/group arguments. Some examples:

container.make_dir('/etc/pg', user='postgres', group='postgres')
container.make_dir('/some/other/nested/dir', make_parents=True)

Remove path

To delete a file or directory, use Container.remove_path. If a directory is specified, it must be empty unless recursive=True is specified, in which case the entire directory tree is deleted, recursively (like rm -r). For example:

# Delete Apache access log
container.remove_path('/var/log/apache/access.log')
# Blow away /tmp/mysubdir and all files under it
container.remove_path('/tmp/mysubdir', recursive=True)

Check file and directory existence

version:

1.4

To check if a paths exists you can use Container.exists for directories or files and Container.isdir for directories. These functions are analogous to python’s os.path.isdir and os.path.exists functions. For example:

# if /tmp/myfile exists
container.exists('/tmp/myfile') # True
container.isdir('/tmp/myfile') # False

# if /tmp/mydir exists
container.exists('/tmp/mydir') # True
container.isdir('/tmp/mydir') # True

Run commands on the workload container

From Juju 2.9.17, Pebble includes an API for executing arbitrary commands on the workload container: the Container.exec method. It supports sending stdin to the process and receiving stdout and stderr, as well as more advanced options.

To run simple commands and receive their output, call Container.exec to start the command, and then use the returned Process object’s wait_output method to wait for it to finish and collect its output.

For example, to back up a PostgreSQL database, you might use pg_dump:

process = container.exec(['pg_dump', 'mydb'], timeout=5*60)
sql, warnings = process.wait_output()
if warnings:
    for line in warnings.splitlines():
        logger.warning('pg_dump: %s', line.strip())
# do something with "sql"

Handle errors

The exec method raises a pebble.APIError if basic checks fail and the command can’t be executed at all, for example, if the executable is not found.

The ExecProcess.wait and ExecProcess.wait_output methods raise pebble.ChangeError if there was an error starting or running the process, and pebble.ExecError if the process exits with a non-zero exit code.

In the case where the process exits via a signal (such as SIGTERM or SIGKILL), the exit code will be 128 plus the signal number. SIGTERM’s signal number is 15, so a process terminated via SIGTERM would give exit code 143 (128+15).

It’s okay to let these exceptions bubble up: Juju will mark the hook as failed and re-run it automatically. However, if you want fine-grained control over error handling, you can catch the ExecError and inspect its attributes. For example:

process = container.exec(['cat', '--bad-arg'])
try:
    stdout, _ = process.wait_output()
    logger.info(stdout)
except ops.pebble.ExecError as e:
    logger.error('Exited with code %d. Stderr:', e.exit_code)
    for line in e.stderr.splitlines():
        logger.error('    %s', line)

That will log something like this:

Exited with code 1. Stderr:
    cat: unrecognized option '--bad-arg'
    Try 'cat --help' for more information.

Use command options

The Container.exec method has various options (see full API documentation), including:

  • environment: a dict of environment variables to pass to the process
  • working_dir: working directory to run the command in
  • timeout: command timeout in seconds
  • user_id, user, group_id, group: UID/username and GID/group name to run command as
  • service_context: run the command in the context of the specified service

Here is a (contrived) example showing the use of most of these parameters:

process = container.exec(
    ['/bin/sh', '-c', 'echo HOME=$HOME, PWD=$PWD, FOO=$FOO'],
    environment={'FOO': 'bar'},
    working_dir='/tmp',
    timeout=5.0,
    user='bob',
    group='staff',
)
stdout, _ = process.wait_output()
logger.info('Output: %r', stdout)

This will execute the echo command in a shell and log something like Output: 'HOME=/home/bob, PWD=/tmp, FOO=bar\n'.

The service_context option allows you to specify the name of a service to “inherit” context from. Specifically, inherit its environment variables, user/group settings, and working directory. The other exec options (user_id, user, group_id, group, working_dir) will override the service’s settings; environment will be merged on top of the service’s environment.

Here’s an example that uses the service_context option:

# Use environment, user/group, and working_dir from "database" service
process = container.exec(['pg_dump', 'mydb'], service_context='database')
process.wait_output()

Use input/output options

The simplest way of receiving standard output and standard error is by using the ExecProcess.wait_output method as shown below. The simplest way of sending standard input to the program is as a string, using the stdin parameter to exec. For example:

process = container.exec(['tr', 'a-z', 'A-Z'],
                         stdin='This is\na test\n')
stdout, _ = process.wait_output()
logger.info('Output: %r', stdout)

By default, input is sent and output is received as Unicode using the UTF-8 encoding. You can change this with the encoding parameter (which defaults to utf-8). The most common case is to set encoding=None, which means “use raw bytes”, in which case stdin must be a bytes object and wait_output() returns bytes objects.

For example, the following will log Output: b'\x01\x02':

process = container.exec(['cat'], stdin=b'\x01\x02',
                         encoding=None)
stdout, _ = process.wait_output()
logger.info('Output: %r', stdout)

You can also pass file-like objects using the stdin, stdout, and stderr parameters. These can be real files, streams, io.StringIO instances, and so on. When the stdout and stderr parameters are specified, call the ExecProcess.wait method instead of wait_output, as output is being written, not returned.

For example, to pipe standard input from a file to the command, and write the result to a file, you could use the following:

with open('LICENSE.txt') as stdin:
    with open('output.txt', 'w') as stdout:
        process = container.exec(
            ['tr', 'a-z', 'A-Z'],
            stdin=stdin,
            stdout=stdout,
            stderr=sys.stderr,
        )
        process.wait()
# use result in "output.txt"

For advanced uses, you can also perform streaming I/O by reading from and writing to the stdin and stdout attributes of the ExecProcess instance. For example, to stream lines to a process and log the results as they come back, use something like the following:

process = container.exec(['cat'])

# Thread that sends data to process's stdin
def stdin_thread():
    try:
        for line in ['one\n', '2\n', 'THREE\n']:
            process.stdin.write(line)
            process.stdin.flush()
            time.sleep(1)
    finally:
        process.stdin.close()
threading.Thread(target=stdin_thread).start()

# Log from stdout stream as output is received
for line in process.stdout:
    logging.info('Output: %s', line.strip())

# Will return immediately as stdin was closed above
process.wait()

That will produce the following logs:

Output: 'one\n'
Output: '2\n'
Output: 'THREE\n'

Caution: it’s easy to get threading wrong and cause deadlocks, so it’s best to use wait_output or pass file-like objects to exec instead if possible.

Send signals to a running command

To send a signal to the running process, use ExecProcess.send_signal with a signal number or name. For example, the following will terminate the “sleep 10” process after one second:

process = container.exec(['sleep', '10'])
time.sleep(1)
process.send_signal(signal.SIGTERM)
process.wait()

Note that because sleep will exit via a signal, wait() will raise an ExecError with an exit code of 143 (128+SIGTERM):

Traceback (most recent call last):
  ..
ops.pebble.ExecError: non-zero exit code 143 executing ['sleep', '10']

Use custom notices from the workload container

See also: Pebble > Notice

Record a notice

To record a custom notice, use the pebble notify CLI command. For example, the workload might have a script to back up the database and then record a notice:

pg_dump mydb >/tmp/mydb.sql
/charm/bin/pebble notify canonical.com/postgresql/backup-done path=/tmp/mydb.sql

The first argument to pebble notify is the key, which must be in the format <domain>/<path>. The caller can optionally provide map data arguments in <name>=<value> format; this example shows a single data argument named path.

The pebble notify command has an optional --repeat-after flag, which tells Pebble to only allow the notice to repeat after the specified duration (the default is to repeat for every occurrence). If the caller says --repeat-after=1h, Pebble will prevent the notice with the same type and key from repeating within an hour – useful to avoid the charm waking up too often when a notice occcurs frequently.

See more: GitHub | Pebble > Notices > pebble notify

Respond to a notice

To have the charm respond to a notice, observe the pebble_custom_notice event and switch on the notice’s key:

class PostgresCharm(ops.CharmBase):
    def __init__(self, framework: ops.Framework):
        super().__init__(framework)
        # Note that "db" is the workload container's name
        framework.observe(self.on["db"].pebble_custom_notice, self._on_pebble_custom_notice)

    def _on_pebble_custom_notice(self, event: ops.PebbleCustomNoticeEvent) -> None:
        if event.notice.key == "canonical.com/postgresql/backup-done":
            path = event.notice.last_data["path"]
            logger.info("Backup finished, copying %s to the cloud", path)
            f = event.workload.pull(path, encoding=None)
            s3_bucket.upload_fileobj(f, "db-backup.sql")

        elif event.notice.key == "canonical.com/postgresql/other-thing":
            logger.info("Handling other thing")

All notice events have a notice property with the details of the notice recorded. That is used in the example above to switch on the notice key and look at its last_data (to determine the backup’s path).

Fetch notices

A charm can also query for notices using the following two Container methods:

  • get_notice, which gets a single notice by unique ID (the value of notice.id).
  • get_notices, which returns all notices by default, and allows filtering notices by specific attributes such as key.

Test notices

To test charms that use Pebble Notices, use the Harness.pebble_notify method to simulate recording a notice with the given details. For example, to simulate the “backup-done” notice handled above, the charm tests could do the following:

class TestCharm(unittest.TestCase):
    @patch("charm.s3_bucket.upload_fileobj")
    def test_backup_done(self, upload_fileobj):
        harness = ops.testing.Harness(PostgresCharm)
        self.addCleanup(harness.cleanup)
        harness.begin()
        harness.set_can_connect("db", True)

        # Pretend backup file has been written
        root = harness.get_filesystem_root("db")
        (root / "tmp").mkdir()
        (root / "tmp" / "mydb.sql").write_text("BACKUP")

        # Notify to record the notice and fire the event
        harness.pebble_notify(
            "db", "canonical.com/postgresql/backup-done", data={"path": "/tmp/mydb.sql"}
        )

        # Ensure backup content was "uploaded" to S3
        upload_fileobj.assert_called_once()
        upload_f, upload_key = upload_fileobj.call_args.args
        self.assertEqual(upload_f.read(), b"BACKUP")
        self.assertEqual(upload_key, "db-backup.sql")
4 Likes

This is looking great @benhoyt, thanks!

One thing to note, check one of the other docs posts for how to create a table of contents and properly annotate the headings with anchors. The plain Markdown headings work great here, but some extra lifting is required so those anchors translate into the published docs pages :slight_smile:

Shout me if I can help! :slight_smile:

Thanks, Jon. I’ve finished the doc now and published it here: Juju | Interacting with Pebble – let me know if you have any feedback!

I also removed the “Files” section I’d added earlier in the Framework Constructs doc and just linked to this instead, which is much more full-fledged.

1 Like

Hey @benhoyt!

This document has been really helpful as I’ve been running through charming a new operator.

I wanted to provide a bit of feedback on add_layer() usage when using the combine=True option. Given that both the example in Autostart in this document along with the hello-world documentation and example template use {“override”: “replace”}, I wasn’t sure what the documentation for Container.add_layer was referring to when it notes: the two layers are combined into a single one ***considering the layer override rules***

As an example, in my declaration of _on_myapp_pebble_ready(), I am using the following:

SERVICE = "myapp"
...
        pebble_layer = {
            "summary": "This is the myapp layer",
            "description": "pebble config layer for myapp",
            "services": {
                SERVICE: {
                    "override": "replace",
                    "summary": "myapp",
                    "command": "/start.sh",
                    "startup": "enabled",
                    "environment": {
                        "ENV1": self._stored.env_var1,
                        "ENV2": self._stored.env_var2,
                    },
                }
            },
        }
        # Add intial Pebble config layer using the Pebble API
        container.add_layer(SERVICE, pebble_layer, combine=True)

I then had in my _on_config_changed() method the following update to change the ENV1/ENV2 variables but nothing else about that service:

SERVICE = "myapp"
...
        container.stop(SERVICE)
        updated_layer = {
            "services": {
                SERVICE: {
                    "override": "replace",
                    "environment": {
                        "ENV1": self._stored.env_var1,
                        "ENV2": self._stored.env_var2,
                    }
                }
            }
        }
        container.add_layer(SERVICE, updated_layer, combine=True)
        container.start(SERVICE)

This resulted in the container.stop() and container.start() calls throwing API errors due to lack of service named “myapp”. It turns out that I needed to use “override”: “merge” to only update the environment variables of my service definition, otherwise the entire service was replaced with the nearly hollow configuration and was no longer recognized as a service by the pebble API.

I think it would be helpful to document either here or in the ops.model.container API what the various “override” options actually do with the data, as I was expecting “replace” along with “combine” to only replace values of the keys that I had defined in the updated_layer, not replace the entire service definition.

Once I changed to “override”: “merge”, all was working as intended. I had thought that “override”: “merge” would end up in some sort of list-type appending of data, hence I had originally left it as “override”: “replace”.

I found this reference very helpful, but would find it useful to have the “override”: “merge” method of add_layer(combined=True) modeled in this tutorial.

Thanks for the feedback! I’ve added a couple of short paragraphs that hopefully clarify that (and link to the README section you noted).

2 Likes

Hi, here are some suggestions on improving this page.

  • The “Pebble layer configuration” has to be before “Service management and status”.
    There are a few terms/concepts that need to be introduced before we use them. For example, in “The Container class has methods to modify the Pebble configuration “plan”, start and stop services, and read and write files”, the “plan”. A second example, in the code of “Service management and status” we have “layers”, “services”, with properties seen for the first time.

  • “To start all the services that are marked as startup: enabled in the configuration plan”. The term service seems overloaded. We need to clearly define what a service is before using it.

  • "In the context of Juju sidecar charms, Pebble is run with the --hold argument, which prevents it from automatically starting the services marked with startup: enabled. " Why would a charm author how the an argument (–hold) is called?

  • “For example (taken from the snappass-test charm)”. Here we jump into another example instead of building on the “hello world”. The reader has no context on what the new example should be doing.

  • It seems that a container can have multiple services with multiple entry points. Is this a good practice (https://stackoverflow.com/questions/18805073/docker-multiple-entrypoints)?

1 Like

Thanks for this post! It was very helpful when I was learning about Pebble.

Would it be possible to expand a little bit on how Pebble’s command field in the Pebble layer compares with the ENTRYPOINT and CMD instructions from a given Dockerfile? In my implementation, I ended up joining them together, and that might be helpful for future readers to be aware of.

1 Like

@kos.tsakalozos thank you very much for your suggestions for improvement! I’ve moved the sections around as you suggested. I’ll try to follow up on your other suggestions soon. Thanks again!

Awesome doc.

Is this doc the right place to mention pebble cli analogues?

For example under “fetch the effective plan” we could also mention e.g.

juju exec --unit alertmanager-k8s/0 -- \
  PEBBLE_SOCKET=/charm/containers/alertmanager/pebble.socket \
  pebble plan
1 Like

Hello @benhoyt @pengale @ppasotti

While writing a charm and working with the Files API it is extremely useful to know what exception each method can raise. This is not documented here as well as not documented in https://ops.readthedocs.io.

Reading the code I can see that these methods may raise:

  • Push (.push):
    • ProtocolError
    • PathError
  • Pull (.pull):
    • ProtocolError
  • List files (.list_files):
    • ProtocolError
    • APIError (?)
    • FileNotFoundError (?)
  • Create directory (.make_dir):
    • ProtocolError
    • PathError

As far as I understand there are more exceptions that can be raise by these methods. Can you help me to document this?

  • list_files, push, pull, make_dir, remove_path can all raise (from ops.pebble.Client._request)
    • ConnectionError
    • APIError
    • ProtocolError

Additionally to that,

  • make_dir, push, pull, remove_path can raise:
    • PathError

This is what I got from a little diving into the code. How do we document this better? Docstring looks like a good place for this info? @rwcarlsen

docstrings are definitely the place to start for this I think (for exception/error details). This page is the one we might consider adding more detail to as well.

@benhoyt, is there a way to run a pebble check from within the charm container rather than from within the workload container? It would be handy (hacky ?) if the exec command of a check could use the tools collection /var/lib/juju/tools/.

(Side question: is it a critical security feature that pebble exec doesn’t work bidirectionally, preventing workload from executing remotely in the charm container?)

@sed-i, there’s no direct support for this. I believe the only way you could do it would be via ssh, figuring out how to use ssh to run a command on the charm container. I’ve never tried it though!