CDK-Addons and custom charm

Hi, we used a custom cdk-addons snap because we needed to change some settings for the metrics module to avoid OOMKill :
Here is what we changed :
- “base_metrics_server_cpu”: “40m”,
- “base_metrics_server_memory”: “40Mi”,
- “metrics_server_memory_per_node”: “4”,
+ “base_metrics_server_cpu”: “200m”,
+ “base_metrics_server_memory”: “256Mi”,
+ “metrics_server_memory_per_node”: “16”,

This is what I get when listing snaps :
ubuntu@juju-806d1a-52-lxd-1:~$ snap list
Name Version Rev Tracking Publisher Notes
cdk-addons 1.16.15 x3 - - -
core 16-2.49 10859 latest/stable canonical✓ core
core18 20210128 1988 latest/stable canonical✓ base
kube-apiserver 1.16.15 1789 1.16/stable canonical✓ in-cohort
kube-controller-manager 1.16.15 1685 1.16/stable canonical✓ in-cohort
kube-proxy 1.16.15 1673 1.16/stable canonical✓ classic
kube-scheduler 1.16.15 1639 1.16/stable canonical✓ in-cohort
kubectl 1.16.15 1639 1.16/stable canonical✓ classic,in-cohort

When trying to upgrade the kubernetes-master charm to 808, we get :

File “/var/lib/juju/agents/unit-kubernetes-master-15/.venv/lib/python3.6/site-packages/charms/reactive/bus.py”, line 359, in _invoke
handler.invoke()
File “/var/lib/juju/agents/unit-kubernetes-master-15/.venv/lib/python3.6/site-packages/charms/reactive/bus.py”, line 181, in invoke
self._action(*args)
File “/var/lib/juju/agents/unit-kubernetes-master-15/charm/reactive/kubernetes_master.py”, line 434, in join_or_update_cohorts
snap.join_cohort_snapshot(snapname, cohort_key)
File “lib/charms/layer/snap.py”, line 425, in join_cohort_snapshot
‘–cohort’, cohort_key])
File “/usr/lib/python3.6/subprocess.py”, line 356, in check_output
**kwargs).stdout
File “/usr/lib/python3.6/subprocess.py”, line 438, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command ‘[‘snap’, ‘refresh’, ‘cdk-addons’, ‘–cohort’, ‘MSBlZUVJQ25EaUIzdGo5NHBLUU0xQnd0RWN2VmdIZTk1biAxNjEzNDU0NzQ4IDhmYmI0NzJmMmZhZjBkNmIyYWY2MzM4Yjk3OWJiZWY1NzRhMmJlODliMjg3ZWI2NWMyODNjYTY4ODdkMDk5Y2Y=’]’ returned non-zero exit status 1.

unit-kubernetes-master-15: 15:36:09 WARNING unit.kubernetes-master/15.upgrade-charm Traceback (most recent call last):
unit-kubernetes-master-15: 15:36:09 WARNING unit.kubernetes-master/15.upgrade-charm File “/var/lib/juju/agents/unit-kubernetes-master-15/charm/hooks/upgrade-charm”, line 22, in
unit-kubernetes-master-15: 15:36:09 WARNING unit.kubernetes-master/15.upgrade-charm main()
unit-kubernetes-master-15: 15:36:09 WARNING unit.kubernetes-master/15.upgrade-charm File “/var/lib/juju/agents/unit-kubernetes-master-15/.venv/lib/python3.6/site-packages/charms/reactive/init.py”, line 74, in main
unit-kubernetes-master-15: 15:36:09 WARNING unit.kubernetes-master/15.upgrade-charm bus.dispatch(restricted=restricted_mode)
unit-kubernetes-master-15: 15:36:09 WARNING unit.kubernetes-master/15.upgrade-charm File “/var/lib/juju/agents/unit-kubernetes-master-15/.venv/lib/python3.6/site-packages/charms/reactive/bus.py”, line 390, in dispatch
unit-kubernetes-master-15: 15:36:09 WARNING unit.kubernetes-master/15.upgrade-charm _invoke(other_handlers)
unit-kubernetes-master-15: 15:36:09 WARNING unit.kubernetes-master/15.upgrade-charm File “/var/lib/juju/agents/unit-kubernetes-master-15/.venv/lib/python3.6/site-packages/charms/reactive/bus.py”, line 359, in _invoke
unit-kubernetes-master-15: 15:36:09 WARNING unit.kubernetes-master/15.upgrade-charm handler.invoke()
unit-kubernetes-master-15: 15:36:09 WARNING unit.kubernetes-master/15.upgrade-charm File “/var/lib/juju/agents/unit-kubernetes-master-15/.venv/lib/python3.6/site-packages/charms/reactive/bus.py”, line 181, in invoke
unit-kubernetes-master-15: 15:36:09 WARNING unit.kubernetes-master/15.upgrade-charm self._action(*args)
unit-kubernetes-master-15: 15:36:09 WARNING unit.kubernetes-master/15.upgrade-charm File “/var/lib/juju/agents/unit-kubernetes-master-15/charm/reactive/kubernetes_master.py”, line 434, in join_or_update_cohorts
unit-kubernetes-master-15: 15:36:09 WARNING unit.kubernetes-master/15.upgrade-charm snap.join_cohort_snapshot(snapname, cohort_key)
unit-kubernetes-master-15: 15:36:09 WARNING unit.kubernetes-master/15.upgrade-charm File “lib/charms/layer/snap.py”, line 425, in join_cohort_snapshot
unit-kubernetes-master-15: 15:36:09 WARNING unit.kubernetes-master/15.upgrade-charm ‘–cohort’, cohort_key])
unit-kubernetes-master-15: 15:36:09 WARNING unit.kubernetes-master/15.upgrade-charm File “/usr/lib/python3.6/subprocess.py”, line 356, in check_output
unit-kubernetes-master-15: 15:36:09 WARNING unit.kubernetes-master/15.upgrade-charm **kwargs).stdout
unit-kubernetes-master-15: 15:36:09 WARNING unit.kubernetes-master/15.upgrade-charm File “/usr/lib/python3.6/subprocess.py”, line 438, in run
unit-kubernetes-master-15: 15:36:09 WARNING unit.kubernetes-master/15.upgrade-charm output=stdout, stderr=stderr)
unit-kubernetes-master-15: 15:36:09 WARNING unit.kubernetes-master/15.upgrade-charm subprocess.CalledProcessError: Command ‘[‘snap’, ‘refresh’, ‘cdk-addons’, ‘–cohort’, ‘MSBlZUVJQ25EaUIzdGo5NHBLUU0xQnd0RWN2VmdIZTk1biAxNjEzNDU0NzQ4IDhmYmI0NzJmMmZhZjBkNmIyYWY2MzM4Yjk3OWJiZWY1NzRhMmJlODliMjg3ZWI2NWMyODNjYTY4ODdkMDk5Y2Y=’]’ returned non-zero exit status 1.
unit-kubernetes-master-15: 15:36:09 ERROR juju.worker.uniter.operation hook “upgrade-charm” (via explicit, bespoke hook script) failed: exit status 1
unit-kubernetes-master-15: 15:36:09 DEBUG juju.machinelock machine lock released for kubernetes-master/15 uniter (run upgrade-charm hook)
unit-kubernetes-master-15: 15:36:09 DEBUG juju.worker.uniter.operation lock released for kubernetes-master/15
unit-kubernetes-master-15: 15:36:09 DEBUG juju.worker.uniter.operation running operation run commands for kubernetes-master/15
unit-kubernetes-master-15: 15:36:09 DEBUG juju.machinelock acquire machine lock for kubernetes-master/15 uniter (run commands)
unit-kubernetes-master-15: 15:36:09 DEBUG juju.machinelock machine lock acquired for kubernetes-master/15 uniter (run commands)
unit-kubernetes-master-15: 15:36:09 DEBUG juju.work

Running the command manually gives this :
root@juju-806d1a-52-lxd-1:~# snap refresh cdk-addons --cohort MSBlZUVJQ25EaUIzdGo5NHBLUU0xQnd0RWN2VmdIZTk1biAxNjEzNDU0NzQ4IDhmYmI0NzJmMmZhZjBkNmIyYWY2MzM4Yjk3OWJiZWY1NzRhMmJlODliMjg3ZWI2NWMyODNjYTY4ODdkMDk5Y2Y=
error: local snap “cdk-addons” is unknown to the store, use --amend to proceed anyway

So, what’s the solution here ?
Is there a way to modify those settings while keeping the original charm ?
Can we somehow bypass this one, since we manage it internally ?

Patrick

It looks like we can’t use a custom snap anymore because of this. Anyone here who can help ?

@Canonical, maybe you can contact me for support options ?

@patrickd75
The Kubernetes charms added the use of cohorts to ensure point releases were consistent across the cluster. This was added in the 1.17 release which you’re upgrading to(Release notes | Ubuntu).

The team is actually working to remove addons and move them to Operators, which will provide additional ability to configure and manage during day 2 operations. Until there is an operator for managing metrics server, I recommend you disable the addon installed metrics server and install the manifest with your customization in it.

I’m concerned that any other approach to this is going to leave you in a situation where you could again get failures on upgrade. Forking the snap or charms is going to leave you outside of normal tests and likely end up with broken edge cases.