How Charmed Kubeflow manages Python and Github Action dependencies across 25+ repositories using Renovate, pip-compile, and tox

ca-scribner · 9 January 2023 15:25

Dependency management is a chore

Does this sound familiar?

You start with a Python project, and specify unpinned dependencies in a requirements.txt file. Once your code is working great and your tests are passing, you commit it to GitHub and walk away a happy person.
Some time later, you come back to your pristine code, recreate your old environment with pip install -r requirements.txt, and… your code doesn’t work! One of your dependencies has changed - maybe it deprecated a feature, or introduced a bug? Either way, now you must debug just to get back to where you thought you started. Jaded by this whole experience, you now pin every dependency in your requirements.txt so this won’t happen again. This new python environment will be stable in GitHub forever.
Some more time later, you realize that one of the packages you pinned has had a security vulnerability for months .

Dependency management truly is a chore. You want the stability of pinned packages combined with the convenience of unpinned ones. Thankfully, there is tooling that can help. Read on to see what we use to manage dependencies for Charmed Kubeflow.

For Charmed Kubeflow, dependency management is about 25 chores

The team supporting Charmed Kubeflow manages about 25 (and increasing!) repositories. So while dependency management is normally a chore, for our team it could be our life. And we ran into all the sharp edges, for example:

an unpinned package that receives an update with a breaking change, causing old, stable code to break
bugs that are caused by a pinned package that should have been updated
packages we thought we had updated, only later finding out we missed the update for one or our repos

These problems popped up both with Python packages and GitHub Actions.

To keep ourselves sane, we needed some help to keep things up to date in a managed way so we looked to automation to help out.

Our requirements for package management

After some thinking, our requirements were a solution that:

explicitly pinned all dependencies, even sub-dependencies: we wanted our repository to capture the exact version of every Python dependency used, that way we could have reproducible results.
automatically created PRs bumping the explicitly pinned Python packages: if explicitly pinning packages was priority 1, ensuring that wasn’t a pain to maintain was number 2. Whenever a package has an update, we want a PR proposing that bump in all our repositories and for those PRs to trigger our CI. That way an engineer can quickly review and click ok.
played well with tox: we use tox as an entrypoint to all our testing. Whatever managed our product’s dependencies also needed to manage our test environment dependencies.
let us constrain packages when needed: we usually want everything up to date, but sometimes we deliberately need package_abc==1.2.3 or package_xyz<=2.0
(a nice to have) managed Github Action dependencies, too: just like with Python dependencies, we also want to keep our Github Workflows up to date. If a workflow uses my-fancy-action@1.0.0 and v2.0.0 is released, we want a PR with that change.

Enter pip-compile for deep pinning

The first layer of our solution ended up being pip-compile from pip-tools. pip-compile allows you to compile a fully pinned requirements.txt file from an unconstrained or partially constrained one. For example, say we have the requirements.in file:

requirements.in:

pillow
urllib3<1.26
scipy==1.9.3

Where we want the latest pillow, the latest urllib3 less than v1.26, and exactly scipy==1.9.3. We can compile it to a fully pinned requirements.txt file by executing:

pip-compile requirements.in

Which creates a requirements.txt file of:

requirements.txt:

#
# This file is autogenerated by pip-compile with Python 3.9
# To update, run:
#
#    pip-compile requirements.in
#
numpy==1.23.4
    # via scipy
pillow==9.3.0
    # via -r requirements.in
scipy==1.9.3
    # via -r requirements.in
urllib3==1.25.11
    # via -r requirements.in

This gives us an exact requirements.txt file that is the most up to date possible under our input constraints, and it even pins the sub-dependency numpy. Now if we commit both requirements.in (the file defining our intent) and requirements.txt (the rendered file showing exactly what we used last time) to our repos, we have fully reproducible environments. Next time we clone our repo and pip install -r requirements.txt we get exactly the same dependencies as before. And if there’s a new package version to bump, we can re-run pip-compile and push the new requirements.txt file. This really helps with reproducibility, and also satisfies requirements 1 and 4 right away.

But how can our test environments benefit from this, too?

We use tox to orchestrate environments for our test suite. Explaining tox is outside the scope here, but essentially tox helps us maintain distinct environments for different tests (say, to keep the dependencies for linting separate from unit testing) and gives us entrypoints to run these tests in the distinct environments (ex: tox -e unit, tox -e lint, etc.). This is accomplished by describing each test environment, including the dependencies they need and the commands that should be run.

Traditionally, we’ve written our tox configurations files like this:

tox.ini:

; (truncated a bit, for brevity)
[vars]
src_path = {toxinidir}/src/
tst_path = {toxinidir}/tests/
all_path = {[vars]src_path} {[vars]tst_path} 

[testenv:lint]
description = Check code against coding style standards
deps =
    black
    flake8
commands =
    flake8 {[vars]all_path}
    black --check --diff {[vars]all_path}

[testenv:unittest]
description = Run unit tests
deps =
    pytest
    -r {toxinidir}/requirements.txt
commands =
    pytest {[vars]tst_path}

Where our dependencies were written directly in the tox.ini file’s deps section. tox then automatically installed the latest version of a package, and any version pinning would occur in this file. This works, but it does not do any deep pinning like pip-compile and while it might be possible to connect pip-compile to this directly, it felt overly complex at best.

To get pip-compile involved, we’ve refactored our tox.ini files to pull dependencies from requirements files (named by convention of requirements-ENVNAME.txt). This lets us use pip-compile to manage requirements files, and tox to manage test environments. For example, our linting requirements become:

requirements-lint.in:

black
flake8

Which pip-compile renders into:

requirements-lint.txt

#
# This file is autogenerated by pip-compile with Python 3.9
# To update, run:
#
#    pip-compile requirements-fmt.in
#
black==22.10.0
    # via -r requirements-fmt.in
click==8.1.3
    # via black
# … (truncated)

And we use from our tox.ini file by:

tox.ini:

; (truncated again, showing only the lint env)
[testenv:lint]
description = Check code against coding style standards
deps =
    -f {toxinidir}/requirements-lint.txt
commands =
    flake8 {[vars]all_path}
    black --check --diff {[vars]all_path}

This results in all our Python dependencies being strictly and deeply pinned, with pip-compile doing much of the work. Requirement 3: done!

…but, in doing this, we’ve created a new chore - now, to keep things up to date, someone needs to periodically git clone repo, pip-compile, tox -e unittest, git add requirements.txt … And, for Charmed Kubeflow, repeat this across 25 repos. Ugg… We needed a bit more help.

Automated dependency management with Renovate

Renovate is a multi-platform, multi-language, automated dependency manager. It knows how to find your repository’s package definitions, check for package updates, and even create PRs in GitHub when packages need updating. And if you have CI that triggers on PRs in GitHub, testing will automatically trigger on the Renovate PRs as well.

Renovate knows how to execute pip-compile, too, so it can manage the setup we described above. Getting this running would satisfy requirement 2 nicely.

Next, we discuss the two main steps to setting up Renovate:

setting up a Renovate runner that will look at and act on your repositories
setting up your repositories with any repo-specific Renovate settings

Setting up a Renovate Runner

There are several ways to run Renovate, ranging from managed to self-hosted. To keep similar to our existing CI, we decided on executing Renovate as a self-hosted runner via a scheduled GitHub Workflow. To do this, we wrote the following Github Workflow:

scheduled-renovate.yaml

name: Renovate

on:
  schedule:
    # Every day at 0:53AM UTC
    - cron: '53 0 * * *'
  workflow_dispatch:

jobs:
  renovate:
    name: Renovate all repositories
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
        with:
          fetch-depth: 0
      - name: Self-hosted Renovate
        uses: renovatebot/github-action@v32.238.4
        with:
          configurationFile: ./.github/workflows/renovate-config.js
          token: ${{ secrets.RENOVATE_TOKEN }}

where:

renovate-config.js is a settings file that defines some basic runner configuration, such as a list of repositories to monitor (see our repos for full details)
secrets.RENOVATE_TOKEN is a GitHub PAT that grants access to Renovate to open PRs on the monitored repositories (see footnote pat-note for more detail)

This workflow schedules the self-hosted renovate runner to execute every day at 0:53AM UTC, scanning all the repositories listed in renovate-config.js and opening PRs for any out of date packages. An example PR from is here.

This Workflow also includes a workflow_dispatch trigger, which allows you to a trigger Renovate run outside the usual schedule by clicking a button in the GitHub UI - very handy, both for testing and when you know a package needs updating and don’t want to wait.

Setting up a repository to be renovated

Setting up a repository is easy. If you set up the Renovate runner to look at a repository, the first time it runs it will open up a PR asking to add a renovate.json file. This is the repository-specific settings file that tweaks how the runner handles this repo. Accept the PR and next time Renovate runs, it’ll update your packages.

The key settings for using Renovate on pip-compile files are (see this full renovate.json file for more detail):

renovate.json:

{
    …
    "constraints": {
        "python": "3.8.0"
    },
    "pip-compile": {
        "fileMatch": [
            "(^|/)requirements.*\\.in$"
        ],
        "lockFileMaintenance": {
            "schedule": null
        },
    "pip_requirements": {
        "enabled": false
    },

    }

Where:

constraints: this sets the python version that Renovate uses. This is important because pip-compile’s requirements-*.txt files are rendered with that Python version.
pip-compile: fileMatch: This defines the pattern to match pip-compile input files. By default this is empty, so pip-compile will never run if you don’t add patterns here.
pip-compile: lockFileMaintenance: {“schedule”: null}: tl/dr by default, pip-compile input files will only be checked weekly. Setting schedule: null means they’ll be checked any time Renovate executes (see footnotes for more details).
pip_requirements: enabled: false: this turns off Renovate’s default management of requirements.txt files, which we’ve replaced with pip-compile.

What about managing GitHub Action versions?

Thanks to all the work above, we kind of get GitHub Action version management for free. Renovate can manage these out of the box, automatically detecting all GitHub Actions used from .github/workflows/*.yaml Workflow files. If an Action in these workflows is semantically versioned (rather than pinned to a branch name), Renovate will check for new versions and open PRs accordingly. Here is one example.

Wrapping up

This post summarized how the team behind Charmed Kubeflow manages Python and GitHub Action dependencies. Between pip-compile, tox, and Renovate, we keep our dependencies in check and our sanity intact! For the full implementation, check out our repositories.

Have questions or suggestions? Post them below!

—

Footnotes:

Note on GitHub PAT: for GitHub, the PAT must grant: Contents Access: Read and write, Metadata Access: Read-only, Pull requests Access: Read and write, and Workflows Access: Read and write
More details on Renovate+pip-compile: look into how pip-compile manager is implemented, specifically how it is handled like a lockFile and how they have default settings)

ca-scribner · 9 January 2023 15:25

While figuring out the above setup, we got stuck on a few things along the way. For anyone really interested in the details, these tips help:

Be specific with your pip-compile fileMatch pattern

While implementing our setup, at one point we had an accidentally broad file match that meant pip-compile used a requirements.txt file as an input file, leading to some unusual updates. Be careful with your fileMatch, specifically with the wildcards!

Be consistent with your Python versions when using pip-compile

Python package versions specify which python versions they’re compatible with (eg: numpy might say that for python 3.10 you can use v1.2.3, but for python <3.10 you use 1.2.2). Normally, this just works out behind the scenes (if you pip install -r requirements.txt, it figures out what works for your specific setup), but because pip-compile strictly pins everything these differences matter.

When using pip-compile to render requirements.txt files, you need to ensure that pip-compile is executed in the same python environment as the end user will have. This is why we needed to add a constraints: python: OUR_PYTHON_VERSION in our renovate.json configuration file.

Be mindful of interdependencies between requirements-*.in files

We commonly have something like this:

requirements.in (requirements for the application)

numpy
requests

requirements-unit.in (requirements the unit tests of the application)

pytest
-r requirements.txt  # eg: the requirements for the application

where rendering requirements-unit.in requires the rendered requirements.txt. pip-compile runs on the individual input files rather than a graph of them, so it does not know about interdependencies and rendering order. To get around this, we try to get Renovate to execute pip-compile in a controlled order by setting the pip-compile fileMatch in the repository’s renovate.json to:

{
    … (truncated)
    "pip-compile": {
        "fileMatch": [
            "(^|/)requirements\\.in$",
            "(^|/)requirements-fmt\\.in$",
            "(^|/)requirements-lint\\.in$",
            "(^|/)requirements-unit\\.in$",
            "(^|/)requirements-integration\\.in$",
            "(^|/)requirements.*\\.in$"              # a catch-all, so if we forgot one it hopefully still works
        ],

where the entries are in the order we want pip-compile to address them. Renovate appears to act on these files in the order we specify, although we are not certain this will always be the case and might be something we revisit.

benhoyt · 9 January 2023 22:38

Nice write-up! This is quite a general problem. I wonder if this article is worth broader exposure via the Ubuntu or Canonical blog?