Dependency management is a chore
Does this sound familiar?
- You start with a Python project, and specify unpinned dependencies in a
requirements.txt
file. Once your code is working great and your tests are passing, you commit it to GitHub and walk away a happy person. - Some time later, you come back to your pristine code, recreate your old environment with
pip install -r requirements.txt
, and… your code doesn’t work! One of your dependencies has changed - maybe it deprecated a feature, or introduced a bug? Either way, now you must debug just to get back to where you thought you started. Jaded by this whole experience, you now pin every dependency in yourrequirements.txt
so this won’t happen again. This new python environment will be stable in GitHub forever. - Some more time later, you realize that one of the packages you pinned has had a security vulnerability for months .
Dependency management truly is a chore. You want the stability of pinned packages combined with the convenience of unpinned ones. Thankfully, there is tooling that can help. Read on to see what we use to manage dependencies for Charmed Kubeflow.
For Charmed Kubeflow, dependency management is about 25 chores
The team supporting Charmed Kubeflow manages about 25 (and increasing!) repositories. So while dependency management is normally a chore, for our team it could be our life. And we ran into all the sharp edges, for example:
- an unpinned package that receives an update with a breaking change, causing old, stable code to break
- bugs that are caused by a pinned package that should have been updated
- packages we thought we had updated, only later finding out we missed the update for one or our repos
These problems popped up both with Python packages and GitHub Actions.
To keep ourselves sane, we needed some help to keep things up to date in a managed way so we looked to automation to help out.
Our requirements for package management
After some thinking, our requirements were a solution that:
- explicitly pinned all dependencies, even sub-dependencies: we wanted our repository to capture the exact version of every Python dependency used, that way we could have reproducible results.
- automatically created PRs bumping the explicitly pinned Python packages: if explicitly pinning packages was priority 1, ensuring that wasn’t a pain to maintain was number 2. Whenever a package has an update, we want a PR proposing that bump in all our repositories and for those PRs to trigger our CI. That way an engineer can quickly review and click ok.
- played well with tox: we use tox as an entrypoint to all our testing. Whatever managed our product’s dependencies also needed to manage our test environment dependencies.
-
let us constrain packages when needed: we usually want everything up to date, but sometimes we deliberately need
package_abc==1.2.3
orpackage_xyz<=2.0
- (a nice to have) managed Github Action dependencies, too: just like with Python dependencies, we also want to keep our Github Workflows up to date. If a workflow uses
my-fancy-action@1.0.0
and v2.0.0 is released, we want a PR with that change.
Enter pip-compile for deep pinning
The first layer of our solution ended up being pip-compile from pip-tools. pip-compile allows you to compile a fully pinned requirements.txt
file from an unconstrained or partially constrained one. For example, say we have the requirements.in
file:
requirements.in:
pillow
urllib3<1.26
scipy==1.9.3
Where we want the latest pillow
, the latest urllib3
less than v1.26
, and exactly scipy==1.9.3
. We can compile it to a fully pinned requirements.txt file by executing:
pip-compile requirements.in
Which creates a requirements.txt file of:
requirements.txt:
#
# This file is autogenerated by pip-compile with Python 3.9
# To update, run:
#
# pip-compile requirements.in
#
numpy==1.23.4
# via scipy
pillow==9.3.0
# via -r requirements.in
scipy==1.9.3
# via -r requirements.in
urllib3==1.25.11
# via -r requirements.in
This gives us an exact requirements.txt
file that is the most up to date possible under our input constraints, and it even pins the sub-dependency numpy
. Now if we commit both requirements.in
(the file defining our intent) and requirements.txt
(the rendered file showing exactly what we used last time) to our repos, we have fully reproducible environments. Next time we clone our repo and pip install -r requirements.txt
we get exactly the same dependencies as before. And if there’s a new package version to bump, we can re-run pip-compile
and push the new requirements.txt
file. This really helps with reproducibility, and also satisfies requirements 1 and 4 right away.
But how can our test environments benefit from this, too?
We use tox to orchestrate environments for our test suite. Explaining tox is outside the scope here, but essentially tox helps us maintain distinct environments for different tests (say, to keep the dependencies for linting separate from unit testing) and gives us entrypoints to run these tests in the distinct environments (ex: tox -e unit
, tox -e lint
, etc.). This is accomplished by describing each test environment, including the dependencies they need and the commands that should be run.
Traditionally, we’ve written our tox configurations files like this:
tox.ini:
; (truncated a bit, for brevity)
[vars]
src_path = {toxinidir}/src/
tst_path = {toxinidir}/tests/
all_path = {[vars]src_path} {[vars]tst_path}
[testenv:lint]
description = Check code against coding style standards
deps =
black
flake8
commands =
flake8 {[vars]all_path}
black --check --diff {[vars]all_path}
[testenv:unittest]
description = Run unit tests
deps =
pytest
-r {toxinidir}/requirements.txt
commands =
pytest {[vars]tst_path}
Where our dependencies were written directly in the tox.ini
file’s deps
section. tox then automatically installed the latest version of a package, and any version pinning would occur in this file. This works, but it does not do any deep pinning like pip-compile and while it might be possible to connect pip-compile to this directly, it felt overly complex at best.
To get pip-compile
involved, we’ve refactored our tox.ini
files to pull dependencies from requirements files (named by convention of requirements-ENVNAME.txt
). This lets us use pip-compile to manage requirements files, and tox to manage test environments. For example, our linting requirements become:
requirements-lint.in:
black
flake8
Which pip-compile renders into:
requirements-lint.txt
#
# This file is autogenerated by pip-compile with Python 3.9
# To update, run:
#
# pip-compile requirements-fmt.in
#
black==22.10.0
# via -r requirements-fmt.in
click==8.1.3
# via black
# … (truncated)
And we use from our tox.ini
file by:
tox.ini:
; (truncated again, showing only the lint env)
[testenv:lint]
description = Check code against coding style standards
deps =
-f {toxinidir}/requirements-lint.txt
commands =
flake8 {[vars]all_path}
black --check --diff {[vars]all_path}
This results in all our Python dependencies being strictly and deeply pinned, with pip-compile doing much of the work. Requirement 3: done!
…but, in doing this, we’ve created a new chore - now, to keep things up to date, someone needs to periodically git clone repo
, pip-compile
, tox -e unittest
, git add requirements.txt
… And, for Charmed Kubeflow, repeat this across 25 repos. Ugg… We needed a bit more help.
Automated dependency management with Renovate
Renovate is a multi-platform, multi-language, automated dependency manager. It knows how to find your repository’s package definitions, check for package updates, and even create PRs in GitHub when packages need updating. And if you have CI that triggers on PRs in GitHub, testing will automatically trigger on the Renovate PRs as well.
Renovate knows how to execute pip-compile, too, so it can manage the setup we described above. Getting this running would satisfy requirement 2 nicely.
Next, we discuss the two main steps to setting up Renovate:
- setting up a Renovate runner that will look at and act on your repositories
- setting up your repositories with any repo-specific Renovate settings
Setting up a Renovate Runner
There are several ways to run Renovate, ranging from managed to self-hosted. To keep similar to our existing CI, we decided on executing Renovate as a self-hosted runner via a scheduled GitHub Workflow. To do this, we wrote the following Github Workflow:
scheduled-renovate.yaml
name: Renovate
on:
schedule:
# Every day at 0:53AM UTC
- cron: '53 0 * * *'
workflow_dispatch:
jobs:
renovate:
name: Renovate all repositories
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
with:
fetch-depth: 0
- name: Self-hosted Renovate
uses: renovatebot/github-action@v32.238.4
with:
configurationFile: ./.github/workflows/renovate-config.js
token: ${{ secrets.RENOVATE_TOKEN }}
where:
- renovate-config.js is a settings file that defines some basic runner configuration, such as a list of repositories to monitor (see our repos for full details)
-
secrets.RENOVATE_TOKEN
is a GitHub PAT that grants access to Renovate to open PRs on the monitored repositories (see footnote pat-note for more detail)
This workflow schedules the self-hosted renovate runner to execute every day at 0:53AM UTC, scanning all the repositories listed in renovate-config.js and opening PRs for any out of date packages. An example PR from is here.
This Workflow also includes a workflow_dispatch
trigger, which allows you to a trigger Renovate run outside the usual schedule by clicking a button in the GitHub UI - very handy, both for testing and when you know a package needs updating and don’t want to wait.
Setting up a repository to be renovated
Setting up a repository is easy. If you set up the Renovate runner to look at a repository, the first time it runs it will open up a PR asking to add a renovate.json
file. This is the repository-specific settings file that tweaks how the runner handles this repo. Accept the PR and next time Renovate runs, it’ll update your packages.
The key settings for using Renovate on pip-compile files are (see this full renovate.json
file for more detail):
renovate.json:
{
…
"constraints": {
"python": "3.8.0"
},
"pip-compile": {
"fileMatch": [
"(^|/)requirements.*\\.in$"
],
"lockFileMaintenance": {
"schedule": null
},
"pip_requirements": {
"enabled": false
},
}
Where:
-
constraints
: this sets the python version that Renovate uses. This is important because pip-compile’srequirements-*.txt
files are rendered with that Python version. -
pip-compile: fileMatch
: This defines the pattern to match pip-compile input files. By default this is empty, so pip-compile will never run if you don’t add patterns here. -
pip-compile: lockFileMaintenance: {“schedule”: null}
: tl/dr by default, pip-compile input files will only be checked weekly. Settingschedule: null
means they’ll be checked any time Renovate executes (see footnotes for more details). -
pip_requirements: enabled: false
: this turns off Renovate’s default management of requirements.txt files, which we’ve replaced with pip-compile.
What about managing GitHub Action versions?
Thanks to all the work above, we kind of get GitHub Action version management for free. Renovate can manage these out of the box, automatically detecting all GitHub Actions used from .github/workflows/*.yaml
Workflow files. If an Action in these workflows is semantically versioned (rather than pinned to a branch name), Renovate will check for new versions and open PRs accordingly. Here is one example.
Wrapping up
This post summarized how the team behind Charmed Kubeflow manages Python and GitHub Action dependencies. Between pip-compile, tox, and Renovate, we keep our dependencies in check and our sanity intact! For the full implementation, check out our repositories.
Have questions or suggestions? Post them below!
—
Footnotes:
- Note on GitHub PAT: for GitHub, the PAT must grant: Contents Access: Read and write, Metadata Access: Read-only, Pull requests Access: Read and write, and Workflows Access: Read and write
- More details on Renovate+pip-compile: look into how pip-compile manager is implemented, specifically how it is handled like a lockFile and how they have default settings)