How do I even test an OCI image?

lucabello · 18 February 2025 09:49

The Observability team maintains a lot of rocks. Since the very beginning of oci-factory, I’ve been quite involved in managing them (and trying to make it easier for others), and that includes adding new versions whenever an upstream project releases something.

Being mildly interested in automation, I set up our workflows to do so periodically, by creating a new rockcraft.yaml for the newly-released version. And that’s great! We can then auto-merge the pull request as soon as the tests pass — wait, TESTS?!

How do I even test an OCI image?

To have some quality guarantees (and to be able to increase the scope of our CI), we need to make sure the new image we’re building is actually working. What if rockcraft pack doesn’t fail, but the binary can’t run correctly?

I gathered a set of tools to help us through this task:

just, our task runner;
goss, to run the checks and verify the rock is working;
microk8s, to run the rocks (any Kubernetes cluster would work, really); make sure to microk8s enable registry to enable the registry plugin!

Let’s go through how we test our rocks, step-by-step! I’ll be referencing our opentelemetry-collector-rock repository to guide us through

Structuring your repository

d .
├── d 0.110.0
│   └── rockcraft.yaml
├── d 0.117.0
│   └── rockcraft.yaml
├── d 0.118.0
│   └── rockcraft.yaml
└── ...

Our rock repositories contain a folder for each rock version. This allows us to integrate nicely with both OCI Factory and just, as you’ll see below.

TL;DR: use one folder per each rock version.

Running an OCI image

We need to be able to locally run a freshly-packed rock: we can push it to the local image registry provided by microk8s. The rockcraft snap is bundled with skopeo (accessible via rockcraft.skopeo), which is exactly what we need.

Let’s add that logic in our justfile:

### Snippet from /justfile ###
set quiet # Recipes are silent by default
set export # Just variables are exported to environment variables

rock_name := `echo ${PWD##*/} | sed 's/-rock//'`
# To find the latest version, get the "last" folder that starts with a number
latest_version := `find . -maxdepth 1 -type d -name '[0-9]*' | sort -V | tail -n1 | sed 's@./@@'`

[private]
default:
  just --list

# Pack a rock of a specific version
pack version:
  echo "Packing opentelemetry-collector: $version"
  cd "$version" && rockcraft pack

# Push an OCI image to a local registry
[private]
push-to-registry version:
  echo "Pushing $rock_name $version to local registry"
  rockcraft.skopeo --insecure-policy copy --dest-tls-verify=false \
    "oci-archive:${version}/${rock_name}_${version}_amd64.rock" \
    "docker://localhost:32000/${rock_name}-dev:${version}" >/dev/null

# Run a rock and open a shell into it with `kgoss`
run version=latest_version: (push-to-registry version)
  kubectl run otel-collector --image localhost:32000/${rock_name}-dev:${version}

Other than some just configuration at the start and the default recipe, plus some simple parsing of the rock name (from the repository name) and the latest local version, we have three recipes:

just pack, which allows you to pack a specific version of a rock from the repository root (i.e., just pack 0.117.0);
just push-to-registry, which uses skopeo to push a .rock image to your local registry; it’s set to [private] because you shouldn’t need to call it directly (although you can);
just run, to push the .rock to a local registry (notice the recipe dependency) and spin up a pod so you can do manual testing and exploration (i.e., just run 0.117.0).

All those recipes conveniently assume the latest rock version as their default argument.

The default recipe allows us to simply run just without arguments to list the available recipes:

Available recipes:
    pack version                            # Pack a rock for the specified version
    run version=latest_version              # Run a rock

TL;DR: use skopeo to push rocks to a local registry, and kubectl run to create pods running them.

Testing in isolation

To test the rock in isolation, we’re using kgoss, a community-maintained goss-related utility that does the following:

run a pod with the provided image;
execute the checks defined in goss.yaml from inside the pod; kgoss handles this by:
- copying the goss.yaml file inside the pod;
- running goss via kubectl exec.

Goss is extremely powerful, allowing for easy configuration of timeouts and retry-intervals, so your pod has enough time to settle. Some example checks you could write are:

checking if the process is running;
making sure a configuration file is there;
checking whether something is listening on some ports.

Here’s a real (shortened) example:

### Snippet from /goss.yaml ###
process:
 otelcol:
   running: true
...
port:
 tcp6:8888: # self-monitoring metrics
   listening: true
   port: 'tcp6:8888'
   skip: false

Let’s add a recipe to our justfile so we can easily run these tests with kgoss:

# Test the rock with `kgoss`
[group("test")]
test-isolation version=latest_version: (push-to-registry version)
  GOSS_OPTS="--retry-timeout 60s" kgoss run -i localhost:32000/${rock_name}-dev:${version}

Running just test-isolation will first push the image to your local registry, so then kgoss can do the heavy lifting by running the checks we just wrote in /goss.yaml. It’s very simple to do, and extremely useful!

TL;DR: use kgoss to spin up a pod with your rock, and run some Goss checks on it.

Integration testing

If you want to go a step further, you might want to check whether your rock correctly integrates with other workloads. This is especially useful if your build process doesn’t exactly follow the upstream, and you want to make sure you didn’t break anything.

I wanted to keep this goss-driven approach, while still dodging the docker requirement. The general idea is to write some Kubernetes manifests to deploy the necessary workloads, kubectl apply them, and run some Goss checks for validation.

I introduced a tests/ folder and structured it as such:

d .
└── d tests
    └── d prometheus_integration
        ├── f goss.yaml  # external `goss` checks
        ├── f otel-collector.yaml
        └── f prometheus.yaml

Each YAML file is a Kubernetes manifest, declaring a set of Deployments, Services, and ConfigMaps that form the actual deployment. I won’t paste the files here because they’re lengthy, but you can take a look here. Note that otel-collector.yaml uses the image from localhost:32000, the local image registry.

For reference, here’s how we check that our OpenTelemetry Collector rock can remote-write to Prometheus:

command:
  remote-write:
    exit-status: 0
    exec: |
      echo "Namespace: {{.Env.NAMESPACE}}"
      # Get Prometheus pod
      PROMETHEUS_IP="$(kubectl get pod -n {{.Env.NAMESPACE}} -l app="prometheus" \
        -o jsonpath='{.items[*].status.podIP}')"
      if [ -z "$PROMETHEUS_IP" ]; then
        echo "Prometheus pod IP not found, maybe the pod isn't ready yet"
        exit 1
      fi
      echo "Prometheus IP: $PROMETHEUS_IP"
      # Check there is a `job` label with value `otel-collector`
      LABELS="$(curl -s "${PROMETHEUS_IP}:9090/api/v1/label/job/values")"
      echo "Prometheus 'job' label values: $LABELS"
      if ! echo "$LABELS" | grep -q "otel-collector"; then
        echo "'job=otel-collector' label not found"
        exit 2
      fi

Other than some parsing to get the namespace and Prometheus’ pod IP, the core part of the check is simply a curl command, checking whether the self-monitoring metrics from the Collector are present in Prometheus.

We can put it all together in the justfile, by adding a generic test recipe that runs all tests, and a test-integration recipe for what you just read:

# Run the rock tests
test version=latest_version: (push-to-registry version) \
 (test-isolation version) \
 (test-integration version)

# Test the rock integration with other workloads
[group("test")]
test-integration version=latest_version: (push-to-registry version)
 #!/usr/bin/env bash
 # For all the subfolder in tests/
 for test_folder in $(find tests -mindepth 1 -maxdepth 1 -type d | sed 's@tests/@@'); do
   # Create a namespace for the tests to run in
   namespace="test-${rock_name}-rock-${test_folder//_/-}"
   echo "+ Preparing the testing environment"
   kubectl delete all --all -n "$namespace" >/dev/null
   kubectl delete namespace "$namespace" >/dev/null
   kubectl create namespace "$namespace"
   # For each  '.yaml' file (excluding 'goss.yaml')
   for manifest in $(find tests/${test_folder} -type f -name '*.yaml' | grep -v 'goss.yaml'); do
     kubectl apply -f "$manifest" -n "$namespace"  # deploy it in the test namespace
   done
   sleep 15 # Wait for the pods to settle and otel-collector to remote-write
   NAMESPACE="$namespace" goss \
     --gossfile "tests/${test_folder}/goss.yaml" \
     --log-level debug \
     validate \
     --retry-timeout=120s \
     --sleep=5s
   # Cleanup
   echo "+ Cleaning up the testing environment"
   kubectl delete all --all -n "$namespace"
   kubectl delete namespace "$namespace"
 done

You can now run just test-integration [rock-version] and let the magic happen!

TL;DR: write Kubernetes manifests using your dev rock image, apply them, and use goss to validate the whole thing.

Conclusions

If you look at your justfile now, you’ll see you effectively built some developer tools for your rock, and just makes this simple and easily accessible.

∮ just
Available recipes:
    clean version                           # `rockcraft clean` for a specific version
    pack version                            # Pack a rock
    run version=latest_version              # Run a rock
    test version=latest_version             # Run all the tests

    [test]
    test-integration version=latest_version # Test the rock integration with other workloads
    test-isolation version=latest_version   # Test the rock with `kgoss`

Testing our rocks is extremely important in order to guarantee a higher level of quality in our work — and to allow you to auto-merge pull requests without manual review.

Hope this can be useful!