Upgrading to k8s 1.25 fails on podsecurity policy admission plugin in api-server

I’ve used Upgrading to 1.25 | Ubuntu to upgrade to 1.25, yet the step with the upgrade for the control-plane does not complete.

The logs show errors on:

journalctl -f -u snap.kube-apiserver.daemon.service 
-- Logs begin at Mon 2023-05-29 06:10:40 UTC. --
May 30 21:10:04 node3 kube-apiserver.daemon[3836634]: I0530 21:10:04.005835 3836634 flags.go:64] FLAG: --v="4"
May 30 21:10:04 node3 kube-apiserver.daemon[3836634]: I0530 21:10:04.005840 3836634 flags.go:64] FLAG: --version="false"
May 30 21:10:04 node3 kube-apiserver.daemon[3836634]: I0530 21:10:04.005847 3836634 flags.go:64] FLAG: --vmodule=""
May 30 21:10:04 node3 kube-apiserver.daemon[3836634]: I0530 21:10:04.005852 3836634 flags.go:64] FLAG: --watch-cache="true"
May 30 21:10:04 node3 kube-apiserver.daemon[3836634]: I0530 21:10:04.005857 3836634 flags.go:64] FLAG: --watch-cache-sizes="[]"
May 30 21:10:04 node3 kube-apiserver.daemon[3836634]: I0530 21:10:04.005884 3836634 services.go:51] Setting service IP to "10.152.183.1" (read-write).
May 30 21:10:04 node3 kube-apiserver.daemon[3836634]: I0530 21:10:04.005899 3836634 server.go:563] external host was not specified, using 192.168.1.185
May 30 21:10:04 node3 kube-apiserver.daemon[3836634]: E0530 21:10:04.006119 3836634 run.go:74] "command failed" err="enable-admission-plugins plugin \"PodSecurityPolicy\" is unknown"
May 30 21:10:04 node3 systemd[1]: snap.kube-apiserver.daemon.service: Main process exited, code=exited, status=1/FAILURE

And indeed, version 3346 of kube-api server has the PodSecurityPolicy admission:

cat   /var/snap/kube-apiserver/3346/args   | grep admiss
--allow-privileged="true" --service-cluster-ip-range="10.152.183.0/24" --min-request-timeout="300" --v="4" --tls-cert-file="/root/cdk/server.crt" --tls-private-key-file="/root/cdk/server.key" --tls-cipher-suites="TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305" --kubelet-certificate-authority="/root/cdk/ca.crt" --kubelet-client-certificate="/root/cdk/client.crt" --kubelet-client-key="/root/cdk/client.key" --logtostderr="true" --storage-backend="etcd3" --profiling="false" --anonymous-auth="false" --authentication-token-webhook-cache-ttl="1m0s" --authentication-token-webhook-config-file="/root/cdk/auth-webhook/auth-webhook-conf.yaml" --service-account-issuer="https://kubernetes.default.svc" --service-account-signing-key-file="/root/cdk/serviceaccount.key" --service-account-key-file="/root/cdk/serviceaccount.key" --kubelet-preferred-address-types="InternalIP,Hostname,InternalDNS,ExternalDNS,ExternalIP" --encryption-provider-config="/var/snap/kube-apiserver/common/encryption/encryption_config.yaml" --advertise-address="192.168.1.185" --etcd-cafile="/root/cdk/etcd/client-ca.pem" --etcd-keyfile="/root/cdk/etcd/client-key.pem" --etcd-certfile="/root/cdk/etcd/client-cert.pem" --etcd-servers="https://192.168.1.156:2379,https://192.168.1.185:2379,https://192.168.1.222:2379" --authorization-mode="Node,RBAC" --enable-admission-plugins="PersistentVolumeLabel,PodSecurityPolicy,NodeRestriction" --requestheader-client-ca-file="/root/cdk/ca.crt" --requestheader-allowed-names="system:kube-apiserver,client" --requestheader-extra-headers-prefix="X-Remote-Extra-" --requestheader-group-headers="X-Remote-Group" --requestheader-username-headers="X-Remote-User" --proxy-client-cert-file="/root/cdk/client.crt" --proxy-client-key-file="/root/cdk/client.key" --enable-aggregator-routing="true" --client-ca-file="/root/cdk/ca.crt" --feature-gates="" --audit-log-path="/root/cdk/audit/audit.log" --audit-log-maxage="30" --audit-log-maxsize="100" --audit-log-maxbackup="10" --audit-policy-file="/root/cdk/audit/audit-policy.yaml"

What is going wrong here?

I stumble yet on another upgrade issue… seems like upgrading a kubernetes-master charm to the latest rev 260, does not work:

kubernetes-master         1.25.10   waiting      2  kubernetes-control-plane  1.24/stable  171  no       Waiting for auth-webhook tokens

It’s stuck on 171

Okay… so much for the upgrade instructions. You’d think that you’re setting the charm to channel 1.25 by doing:

juju config kubernetes-control-plane channel=1.25/stable

But that’s not the actual channel of the charm, which in my case was still set at 1.24. So I had to do

juju refresh kubernetes-master --channel 1.25/stable

Why is this BIG gotcha not at least noted in the upgrade instructions?

What is the real difference between these commands which both seem to do something with channels, in the first place?

Further, the documentation, (i see this in hindsight), actually mentions that your charm channels should be on ‘stable’, whereas I had ‘1.24/stable’. Why is this? Well, that’s because I upgraded using Upgrading | Ubuntu to 1.24 in the past. That section explicitly sets the channel to ‘1.24/stable’.

This is yet another upgrade continuity error I stumble upon. If Canonical wants to get serious about this, these things should simply not happen. I work at a Kubernetes Certified Service Provider, and everytime my chef asks me if Charmed K8s is an option for our customers, I have to answer: only if you like debugging python code and want to assume an upgrade goes wrong.