Often times, work will need to be done on a Kubernetes worker or the underlying hypervisor. This tutorial shows how to stop Kubernetes worker unit and then bring it back online.
To make your way through this tutorial, you will need:
- Juju installed (see Installing Juju)
- Juju controller up and running (see Creating a controller)
- Kubectl command line utility installed (see kubectl snap)
In case that you already have a juju model running a kubernetes cluster with multiple workers, you can skip this section.
Start by deploying Kubernetes Core bundle and increasing the number of Kubernetes Workers.
$ juju deploy cs:bundle/kubernetes-core $ juju add-unit kubernetes-worker
Wait for kubernetes deployment to settle down, occasionaly running
juju status until every unit reaches the
active/idle state. Once the deployment settles, you can download kubectl configuration file from the
$ juju scp kubernetes-master/0:config ~/.kube/config
(Optional) Workload deployment
For better demostration of pod migration during the maintenance we can deploy
dummy workload on the Kubernetes cluster. Create
with following content.
apiVersion: apps/v1 kind: Deployment metadata: name: nginx-deployment labels: app: nginx spec: replicas: 4 selector: matchLabels: app: nginx template: metadata: labels: app: nginx spec: containers: - name: nginx image: nginx ports: - containerPort: 80
Deploy it by running:
$ kubectl apply -f ./nginx-deployment.yaml
This will start 4 pods with nginx image that should be evenly deployed on available kubernetes-worker units.
Run following command to verify that we have multiple kubernetes nodes. Each node represents one kubernetes-worker unit.
$ kubectl get nodes NAME STATUS ROLES AGE VERSION juju-37550a-k8s-1 Ready <none> 10h v1.19.4 juju-37550a-k8s-3 Ready <none> 140m v1.19.4
If you have workload deployed on the kubernetes cluster, you can check distribution of your pods among the nodes by running.
kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-deployment-7848d4b86f-9zcl2 1/1 Running 0 25s 10.1.94.13 juju-37550a-k8s-3 <none> <none> nginx-deployment-7848d4b86f-hpjqv 1/1 Running 0 25s 10.1.100.13 juju-37550a-k8s-1 <none> <none> nginx-deployment-7848d4b86f-pzqkt 1/1 Running 0 25s 10.1.94.14 juju-37550a-k8s-3 <none> <none> nginx-deployment-7848d4b86f-tn7bf 1/1 Running 0 25s 10.1.100.12 juju-37550a-k8s-1 <none> <none>
Draining Kubernetes Worker
The kubernetes-worker charm provides actions for draining workers. These actions are
resume. Behind the scenes,
pause runs a kubectl command to drain the node of all pods running on it. These pods will be rescheduled onto other nodes.
First, check to make sure there’s enough resource capacity in the cluster to handle taking out a node.
$ kubectl top nodes NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% juju-37550a-k8s-1 1121m 28% 1476Mi 21% juju-37550a-k8s-3 1185m 29% 1458Mi 21%
Then you can drain node of your choice identified by unit ID. To drain
kubernetes-worker/0 run following.
$ juju run-action --wait kubernetes-worker/0 pause
If the action fails with error regarding local storage
error: cannot delete Pods with local storage (use --delete-local-data to override)
You will need to re-run the action and pass the
delete-local-data=true argument. Be warned that this will cause loss of localy stored data in pods that use
emptyDir storage volumes.
After the action finishes, you can run
kubectl get nodes and observe that the paused node is in state
$ kubectl get nodes NAME STATUS ROLES AGE VERSION juju-37550a-k8s-1 Ready,SchedulingDisabled <none> 10h v1.19.4 juju-37550a-k8s-3 Ready <none> 150m v1.19.4
You can also verify that your pods were migrated from the stopped node to the other nodes.
$ kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-deployment-7848d4b86f-9zcl2 1/1 Running 0 3m17s 10.1.94.13 juju-37550a-k8s-3 <none> <none> nginx-deployment-7848d4b86f-f5ts7 1/1 Running 0 16s 10.1.94.15 juju-37550a-k8s-3 <none> <none> nginx-deployment-7848d4b86f-pzqkt 1/1 Running 0 3m17s 10.1.94.14 juju-37550a-k8s-3 <none> <none> nginx-deployment-7848d4b86f-wpsdn 1/1 Running 0 16s 10.1.94.16 juju-37550a-k8s-3 <none> <none>
Bringing Kubernetes Worker Back Online
Once the worker is ready to be brought back into circulation, you can bring it back online by running:
juju run-action --wait kubernetes-worker/0 resume
To verify that workers are running again, you can run
kubectl get nodes to check that the resumed node is in
Ready state again.
$ kubectl get nodes NAME STATUS ROLES AGE VERSION juju-37550a-k8s-1 Ready <none> 10h v1.19.4 juju-37550a-k8s-3 Ready <none> 157m v1.19.4
Rebalancing of the cluster: Pods that were previously migrated from this node won’t be automatically migrated back once it’s back online.