Juju charms v1: The documentation below applies to v1 charms where the charm operator runs in a separate pod that of the workload being managed by the charm. From Juju v.2.9 onwards (a limited preview is supported by 2.9), we are beginning to support a new deployment mode where the Juju agent runs in a sidecar container in the same pod as the workload.
Introduction
Kubernetes charms are similar to traditional cloud charms. The same model is used. An application has units, there’s a leader unit, and each relation has a data bag on the application which is writable by the leader unit. The same hooks are invoked.
The only mandatory task for the charm is to tell Juju some key pod / container configuration. Kubernetes charms are recommended to be reactive and there’s a base layer that is similar to the base layer used for traditional charms:
https://github.com/juju-solutions/layer-caas-base
The basic flow for how the charm operates is:
-
charm calls config-get to retrieve the charm settings from Juju
-
charm translates settings to create pod configuration
-
charm calls pod-spec-set to tell Juju how to create pods/units
-
charm can use status-set or juju-log or any other hook command the same as for traditional charms
-
charm can implement hooks the same as for traditional charms
There’s no need for the charm to apt install anything - the operator docker image has all the necessary reactive and charm helper libraries baked in.
The charm can call pod-spec-set at any time and Juju will update any running pods with the new pod spec. This may be done in response to the config-changed hook due to the user changing charm settings, or when relations are joined etc. Juju will check for actual changes before restarting pods so the call is idempotent.
Note: the pod spec applies for the application as a whole. All pods are homogeneous.
Kubernetes charm store
A number of Kubernetes charms already written are available on the charm store.
Container images
Charms specify that they need a container image by including a resource definition.
resources:
mysql_image:
type: oci-image
description: Image used for mysql pod.
oci-image
is a new type of charm resource (we already have file
).
The image is attached to a charm and hosted by the charm store’s inbuilt docker repo. Standard Juju resource semantics apply. A charm is released (published) as a tuple of (charm revision, resource version). This allows the charm and associated image to be published as a known working configuration.
Example workflow
To build and push a charm to the charm store, ensure you have the charm
snap installed.
After hacking on the charm and running charm build
to fully generate it, you push, attach, release:
cd <build dir>
charm push . mariadb-k8s
docker pull mariadb
charm attach cs:~me/mariadb-k8s-8 mysql_image=mariadb
charm release cs:~me/mariadb-k8s-9 --resource mysql_image-0
See
charm help push
charm help attach
Charms in more detail
Use the information below in addition to looking at the charms already written to see how this all hangs together.
To illustrate how a charm tells Juju how to configure a unit’s pod, here’s the template YAML snippet used by the Kubernetes mariadb charm. Note the placeholders which are filled in from the charm config obtained via config-get
.
version: 3
containers:
- name: mariadb
imagePullPolicy: Always
ports:
- containerPort: %(port)s
protocol: TCP
envConfig:
MYSQL_ROOT_PASSWORD: %(rootpassword)
MYSQL_USER: %(user)s
MYSQL_PASSWORD: %(password)s
MYSQL_DATABASE: %(database)s
volumeConfig:
- name: configurations
mountPath: /etc/mysql/conf.d
files:
- path: custom_mysql.cnf
content: |
[mysqld]
skip-host-cache
skip-name-resolve
query_cache_limit = 1M
query_cache_size = %(query-cache-size)s
query_cache_type = %(query-cache-type)s
The charm simply sends this YAML snippet to Juju using the pod_spec_set()
charm helper.
Here’s a code snippet from the mariadb charm.
from charms.reactive import when, when_not
from charms.reactive.flags import set_flag, get_state, clear_flag
from charmhelpers.core.hookenv import (
log,
metadata,
status_set,
config,
network_get,
relation_id,
)
from charms import layer
@when_not('layer.docker-resource.mysql_image.fetched')
def fetch_image():
layer.docker_resource.fetch('mysql_image')
@when('mysql.configured')
def mariadb_active():
status_set('active', '')
@when('layer.docker-resource.mysql_image.available')
@when_not('mysql.configured')
def config_mariadb():
status_set('maintenance', 'Configuring mysql container')
spec = make_pod_spec()
log('set pod spec:\n{}'.format(spec))
layer.caas_base.pod_spec_set(spec)
set_flag('mysql.configured')
....
Important Difference With Cloud Charms
Charms such as databases which have a provides
endpoint often need to set in relation data the IP address to which related charms can connect. The IP address is obtained using network-get
, often something like this:
@when('mysql.configured')
@when('server.database.requested')
def provide_database(mysql):
info = network_get('server', relation_id())
log('network info {0}'.format(info))
host = info.get('ingress-addresses', [""])[0]
if not host:
log("no service address yet")
return
for request, application in mysql.database_requests().items():
database_name = get_state('database')
user = get_state('user')
password = get_state('password')
mysql.provide_database(
request_id=request,
host=host,
port=3306,
database_name=database_name,
user=user,
password=password,
)
clear_flag('server.database.requested')
Workload Status
Currently, there’s no well defined way for a Kubernetes charm to query the status of the workload it is managing. So although the charm can reasonably set status as say blocked
when it’s waiting for a required relation to be created, or maintenance
when the pod spec is being set up, there’s no real way for the charm to know when to set active
.
Juju helps solve this problem by looking at the pod status and uses that in conjunction with the status reported by the charm to determine what to display to the user. Workload status values of waiting
, blocked
, maintenance
, or any error conditions, are always reported directly. However, if the charm sets status as active
, this is not shown as such until the pod is reported as Running
. So all the charm has to do is set status as active
when all of its initial setup is complete and the pod spec has been sent to Juju, and Juju will “Do The Right Thing” from then on. Both the gitlab and mariadb sample charms illustrate how workload status can set correctly set.
A future enhancement will be to allow the charm to directly query the workload status and the above workaround will become unnecessary.
Workload pod in more detail
It’s possible to specify Kubernetes specific pod configuration in the pod spec YAML created by the charm. The supported container attributes are:
- livenessProbe
- readinessProbe
- imagePullPolicy
The syntax used is standard k8s pod spec syntax.
https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/
It’s also possible to set pod attributes as well:
- activeDeadlineSeconds
- dnsPolicy
- restartPolicy
- terminationGracePeriodSeconds
- automountServiceAccountToken
- securityContext
- priorityClassName
- priority
- readinessGates
Again, standard k8s syntax is used for the above attributes.
Pod annoations and labels can be set.
You can also specify the command to run when starting the container.
command: ["sh", "-c"]
args: ["doIt", "--debug"]
workingDir: "/path/to/here"
These pod specific attributes are defined in YAML blocks as shown below:
version: 3
containers:
- name: gitlab
imagePullPolicy: Always
ports:
- containerPort: 80
protocol: TCP
command:
- sh
- -c
- |
set -ex
echo "do some stuff here for gitlab container"
args: ["doIt", "--debug"]
workingDir: "/path/to/here"
kubernetes:
securityContext:
runAsNonRoot: true
privileged: true
livenessProbe:
initialDelaySeconds: 10
httpGet:
path: /ping
port: 8080
readinessProbe:
initialDelaySeconds: 10
httpGet:
path: /pingReady
port: www
startupProbe:
httpGet:
path: /healthz
port: liveness-port
failureThreshold: 30
periodSeconds: 10
kubernetesResources:
pod:
annotations:
foo: baz
labels:
foo: bax
activeDeadlineSeconds: 10
restartPolicy: OnFailure
terminationGracePeriodSeconds: 20
automountServiceAccountToken: true
hostNetwork: true
hostPID: true
dnsPolicy: ClusterFirstWithHostNet
securityContext:
runAsNonRoot: true
fsGroup: 14
priorityClassName: top
priority: 30
readinessGates:
- conditionType: PodScheduled
Workload permissions and capabilities
We allow a set of rules to be associated with the application to confer capabilities to the workload; a set of rules constitutes a role. If a role is required for an application, Juju will create a service account for the application with the same name as the application. Juju takes care of the internal k8s details like creating a role binding etc automatically.
Some applications may require that cluster scoped roles are used. Used global: true
if cluster scoped rules are required.
serviceAccounts:
automountServiceAccountToken: true
# roles are usually scoped to the model namespace, but
# some workloads like istio require binding to cluster wide roles
# use global = true for cluster scoped roles
global: true
#
# these rules are based directly on role rules supported by k8s
rules:
- apiGroups: [""] # "" indicates the core API group
resources: ["pods"]
verbs: ["get", "watch", "list"]
- nonResourceURLs: ["*"]
verbs: ["*"]
Config maps
These are essentially named databags.
configMaps:
mydata:
foo: bar
hello: world
Service scale policy, update strategy and annotations
As well as setting annotations, it’s possible to set the scale policy for services, ie how should the workload pods be started, serially one at a time, or in parallel. The default is parallel
. Also configured here is update strategy, ie how pod updates should be managed.
For reference:
- https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/
- https://kubernetes.io/docs/concepts/workloads/controllers/deployment/
- https://kubernetes.io/docs/tasks/manage-daemon/update-daemon-set/
service:
scalePolicy: serial
annotations:
foo: bar
updateStrategy:
type: Recreate
rollingUpdate:
maxUnavailable: 10%
maxSurge: 25%
Mounting volumes into workloads
As well as creating a directory tree with simple text files (covered earlier), it’s also possible to configure volumes backed by:
- config map
- secret
- host path
- empty dir
With secret and config map, these must be defined in the elsewhere YAML handed to Juju - you can’t reference existing resources not created by the charm. If you leave out the files
block, the entire secret or config map will be mounted. path
is optional - the file will be created with the same name as key
if not specified.
The path
for each file is created relative to the overall mount point.
Here’s an example of what’s possible:
version: 3
...
volumeConfig:
# This is what was covered earlier (simple text files)
- name: configurations
mountPath: /etc/mysql/conf.d
files:
- path: custom_mysql.cnf
content: |
[mysqld]
skip-host-cache
skip-name-resolve
query_cache_limit = 1M
query_cache_size = %(query-cache-size)s
query_cache_type = %(query-cache-type)s
# Additional volume types follow...
# host path
- name: myhostpath1
mountPath: /var/log1
hostPath:
path: /var/log
type: Directory
- name: myhostpath2
mountPath: /var/log2
hostPath:
path: /var/log
# see https://kubernetes.io/docs/concepts/storage/volumes/#hostpath for other types
type: Directory
# empty dir
- name: cache-volume
mountPath: /empty-dir
emptyDir:
medium: Memory # defaults to disk
- name: cache-volume222
mountPath: /empty-dir222
emptyDir:
medium: Memory
- name: cache-volume
mountPath: /empty-dir1
emptyDir:
medium: Memory
# secret
- name: another-build-robot-secret
mountPath: /opt/another-build-robot-secret
secret:
name: another-build-robot-secret
defaultMode: 511
files:
- key: username
path: my-group/username
mode: 511
- key: password
path: my-group/password
mode: 511
# config map
configMap:
name: log-config
defaultMode: 511
files:
- key: log_level
path: log_level
mode: 511
The story so far…
Extending the sample YAML to add in the above features, we get the example YAML below,
version: 3
containers:
- name: gitlab
imagePullPolicy: Always
ports:
- containerPort: 80
protocol: TCP
command:
- sh
- -c
- |
set -ex
echo "do some stuff here for gitlab container"
args: ["doIt", "--debug"]
workingDir: "/path/to/here"
envConfig:
MYSQL_ROOT_PASSWORD: %(rootpassword)
MYSQL_USER: %(user)s
MYSQL_PASSWORD: %(password)s
MYSQL_DATABASE: %(database)s
volumeConfig:
- name: configurations
mountPath: /etc/mysql/conf.d
files:
- path: custom_mysql.cnf
content: |
[mysqld]
skip-host-cache
skip-name-resolve
query_cache_limit = 1M
query_cache_size = %(query-cache-size)s
query_cache_type = %(query-cache-type)s
# host path
- name: myhostpath1
mountPath: /var/log1
hostPath:
path: /var/log
type: Directory
- name: myhostpath2
mountPath: /var/log2
hostPath:
path: /var/log
# see https://kubernetes.io/docs/concepts/storage/volumes/#hostpath for other types
type: Directory
# empty dir
- name: cache-volume
mountPath: /empty-dir
emptyDir:
medium: Memory # defaults to disk
- name: cache-volume222
mountPath: /empty-dir222
emptyDir:
medium: Memory
- name: cache-volume
mountPath: /empty-dir1
emptyDir:
medium: Memory
# secret
- name: another-build-robot-secret
mountPath: /opt/another-build-robot-secret
secret:
name: another-build-robot-secret
defaultMode: 511
files:
- key: username
path: my-group/username
mode: 511
- key: password
path: my-group/password
mode: 511
# config map
configMap:
name: log-config
defaultMode: 511
files:
- key: log_level
path: log_level
mode: 511
kubernetes:
securityContext:
runAsNonRoot: true
privileged: true
livenessProbe:
initialDelaySeconds: 10
httpGet:
path: /ping
port: 8080
readinessProbe:
initialDelaySeconds: 10
httpGet:
path: /pingReady
port: www
startupProbe:
httpGet:
path: /healthz
port: liveness-port
failureThreshold: 30
periodSeconds: 10
configMaps:
mydata:
foo: bar
hello: world
service:
annotations:
foo: bar
scalePolicy: serial
updateStrategy:
type: Recreate
rollingUpdate:
maxUnavailable: 10%
maxSurge: 25%
serviceAccount:
automountServiceAccountToken: true
roles:
- global: true
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "watch", "list"]
kubernetesResources:
pod:
annotations:
foo: baz
labels:
foo: bax
activeDeadlineSeconds: 10
restartPolicy: OnFailure
terminationGracePeriodSeconds: 20
automountServiceAccountToken: true
hostNetwork: true
hostPID: true
dnsPolicy: ClusterFirstWithHostNet
securityContext:
runAsNonRoot: true
fsGroup: 14
priorityClassName: top
priority: 30
readinessGates:
- conditionType: PodScheduled
Next, we’ll cover things like custom resources and their associated custom resource definitions, as well as secrets. All of the resource described in subsequent sections belong under the kubernetesResources:
block.
Custom resources
The YAML syntax is curated from the native k8s YAML to remove the boilerplate and other unnecessary cruft, leaving the business attributes. Here’s an example of defining a custom resource definition and a custom resource. These could well be done by different charms, but are shown together here for brevity.
kubernetesResources:
customResourceDefinitions:
tfjobs.kubeflow.org:
group: kubeflow.org
scope: Namespaced
names:
kind: TFJob
singular: tfjob
plural: tfjobs
versions:
- name: v1
served: true
storage: true
subresources:
status: {}
validation:
openAPIV3Schema:
properties:
spec:
properties:
tfReplicaSpecs:
properties:
# The validation works when the configuration contains
# `Worker`, `PS` or `Chief`. Otherwise it will not be validated.
Worker:
properties:
replicas:
type: integer
minimum: 1
PS:
properties:
replicas:
type: integer
minimum: 1
Chief:
properties:
replicas:
type: integer
minimum: 1
maximum: 1
tfjob1s.kubeflow.org1:
group: kubeflow.org1
scope: Namespaced
names:
kind: TFJob1
singular: tfjob1
plural: tfjob1s
versions:
- name: v1
served: true
storage: true
subresources:
status: {}
validation:
openAPIV3Schema:
properties:
spec:
properties:
tfReplicaSpecs:
properties:
# The validation works when the configuration contains
# `Worker`, `PS` or `Chief`. Otherwise it will not be validated.
Worker:
properties:
replicas:
type: integer
minimum: 1
PS:
properties:
replicas:
type: integer
minimum: 1
Chief:
properties:
replicas:
type: integer
minimum: 1
maximum: 1
customResources:
tfjobs.kubeflow.org:
- apiVersion: "kubeflow.org/v1"
kind: "TFJob"
metadata:
name: "dist-mnist-for-e2e-test"
spec:
tfReplicaSpecs:
PS:
replicas: 2
restartPolicy: Never
template:
spec:
containers:
- name: tensorflow
image: kubeflow/tf-dist-mnist-test:1.0
Worker:
replicas: 8
restartPolicy: Never
template:
spec:
containers:
- name: tensorflow
image: kubeflow/tf-dist-mnist-test:1.0
tfjob1s.kubeflow.org1:
- apiVersion: "kubeflow.org1/v1"
kind: "TFJob1"
metadata:
name: "dist-mnist-for-e2e-test11"
spec:
tfReplicaSpecs:
PS:
replicas: 2
restartPolicy: Never
template:
spec:
containers:
- name: tensorflow
image: kubeflow/tf-dist-mnist-test:1.0
Worker:
replicas: 8
restartPolicy: Never
template:
spec:
containers:
- name: tensorflow
image: kubeflow/tf-dist-mnist-test:1.0
- apiVersion: "kubeflow.org1/v1"
kind: "TFJob1"
metadata:
name: "dist-mnist-for-e2e-test12"
spec:
tfReplicaSpecs:
PS:
replicas: 2
restartPolicy: Never
template:
spec:
containers:
- name: tensorflow
image: kubeflow/tf-dist-mnist-test:1.0
Worker:
replicas: 8
restartPolicy: Never
template:
spec:
containers:
- name: tensorflow
image: kubeflow/tf-dist-mnist-test:1.0
Lifecycle of custom resources
Charms can decide when custom resources get deleted by specifying proper labels.
...
customResourceDefinitions:
- name: tfjobs.kubeflow.org
labels:
foo: bar
juju-resource-lifecycle: model | persistent
...
customResources:
tfjobs.kubeflow.org:
- apiVersion: "kubeflow.org/v1"
kind: "TFJob"
metadata:
name: "dist-mnist-for-e2e-test"
labels:
foo: bar
juju-global-resource-lifecycle: model | persistent
- If no
juju-resource-lifecycle
label set, the custom resource gets deleted with the application - If
juju-resource-lifecycle
is set tomodel
, the custom resource will not get deleted when the application is removed and waits until the model is destroyed. - If
juju-resource-lifecycle
is set topersistent
, the custom resource will never get deleted by Juju even when the model is destroyed.
Secrets
Secrets will ultimately be modelled by Juju. We’re not there yet so we add the secrets definitions to the k8s specific YAML file (initially). The syntax and supported attributes are tied directly the the k8s spec. Both string and base64 encoded data are supported.
kubernetesResources:
secrets:
- name: build-robot-secret
type: Opaque
stringData:
config.yaml: |-
apiUrl: "https://my.api.com/api/v1"
username: fred
password: shhhh
- name: another-build-robot-secret
type: Opaque
data:
username: YWRtaW4=
password: MWYyZDFlMmU2N2Rm
Webhooks
Charms can create mutating and validating webhook resources.
Juju will prefix any global resources with the model name to ensure applications deployed multiple times into different namespaces do not conflict. However, some workloads which Juju has no control over (yet) expect web hooks (in particular) to have fixed names. Charms can now define an annotation on mutating nad validating webhooks to disable this name qualification:
annotations:
model.juju.is/disable-prefix: "true"
Example webhooks:
kubernetesResources:
mutatingWebhookConfigurations:
- name: example-mutatingwebhookconfiguration
labels:
foo: bar
annotations:
model.juju.is/disable-prefix: "true"
webhooks:
- name: "example.mutatingwebhookconfiguration.com"
failurePolicy: Ignore
clientConfig:
service:
name: apple-service
namespace: apples
path: /apple
caBundle: "YXBwbGVz"
namespaceSelector:
matchExpressions:
- key: production
operator: DoesNotExist
rules:
- apiGroups:
- ""
apiVersions:
- v1
operations:
- CREATE
- UPDATE
resources:
- pods
validatingWebhookConfigurations:
- name: pod-policy.example.com
labels:
foo: bar
annotations:
model.juju.is/disable-prefix: "true"
webhooks:
- name: "pod-policy.example.com"
rules:
- apiGroups: [""]
apiVersions: ["v1"]
operations: ["CREATE"]
resources: ["pods"]
scope: "Namespaced"
clientConfig:
service:
namespace: "example-namespace"
name: "example-service"
caBundle: "YXBwbGVz"
admissionReviewVersions: ["v1", "v1beta1"]
sideEffects: None
timeoutSeconds: 5
Ingress resources
Charms can create ingress resources. Example:
kubernetesResources:
ingressResources:
- name: test-ingress
labels:
foo: bar
annotations:
nginx.ingress.kubernetes.io/rewrite-target: /
spec:
rules:
- http:
paths:
- path: /testpath
backend:
serviceName: test
servicePort: 80
Additional service accounts
Sometimes it’s necessary for a charm to create additonal service accounts which are needed by the upstream OCI image they are deploying.
kubernetesResources:
serviceAccounts:
- name: k8sServiceAccount1
automountServiceAccountToken: true
roles:
- name: k8sRole
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "watch", "list"]
- nonResourceURLs: ["/healthz", "/healthz/*"] # '*' in a nonResourceURL is a suffix glob match
verbs: ["get", "post"]
- apiGroups: ["rbac.authorization.k8s.io"]
resources: ["clusterroles"]
verbs: ["bind"]
resourceNames: ["admin", "edit", "view"]
- name: k8sClusterRole
global: true
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "watch", "list"]
Additional services
It may also be necessary to create extra services.
kubernetesResources:
services:
- name: my-service1
labels:
foo: bar
spec:
selector:
app: MyApp
ports:
- protocol: TCP
port: 80
targetPort: 9376
- name: my-service2
labels:
app: test
annotations:
cloud.google.com/load-balancer-type: "Internal"
spec:
selector:
app: MyApp
ports:
- protocol: TCP
port: 80
targetPort: 9376
type: LoadBalancer
Charm deployment info in metadata.yaml
The charm can require that it only be deployed on a certain minimum k8s cluster API version.
It can also specify what type of service to create to sit in front of the workload pods, and can ask for the service to be not created at all using omit
.
deployment:
min-version: x.y
type: stateless | stateful
service: loadbalancer | cluster | omit