Install Charmed Kubeflow in an air-gapped environment

An air-gapped environment is one that does not have access to the public internet. Installing Charmed Kubeflow (CKF) in an air-gapped environment requires special configuration.

Contents:

Air-gapped Environment Requirements

Canonical does not prescribe how you should set up your specific air-gapped environment. However, it is assumed that the environment meets the following conditions:

  • A K8s cluster is running.
  • A container registry such as Artifactory is reachable from the K8s cluster over HTTPS (note: the “S” is important here or else Juju won’t work!).

MicroK8s DNS

If you are using MicroK8s, the DNS add-on should be configured to the host’s local nameserver. This can be achieved by running:

microk8s enable dns:$(resolvectl status | grep "Current DNS Server" | awk '{print $NF}')

Process Outline

  1. Artifact Generation
  2. Set up an airgapped environment with a K8s cluster and HTTPS-enabled registry.
  3. Extract and load the images from images.tar.gz into their container registry.
  4. Extract all charms from charms.tar.gz.
  5. Setup Juju in the airgapped cluster.
  6. Configure each image in bundle.yaml to point at their air-gapped container registry.
  7. Launch CKF.

Artifact Generation

The following artifacts must be generated: images.tar.gz, charms.tar.gz. To generate those tarballs you’ll need to utilise our helper scripts that scan a CKF release and gather all the charm and images files.

Extracting Artifacts

Both charms and OCI images must be extracted. Charms will be extracted to the same machine as bundle.yaml and the Juju client. OCI images will be pushed to the private container registry running in their air-gapped environment.

Setup Juju

See Juju Airgapped.

Configuring Bundle

We provide an air-gapped bundle.yaml. This is configured with placeholders for the OCI image names. These placeholders must be replaced with actual OCI image names from the air-gapped registry.

We do not prescribe exactly how this is done; it is up to you how store your images.

Deploying Kubeflow

To deploy Kubeflow, first the bundle must be deployed. Then, a number of individual charms must be deployed separately. The charms in question cannot be deployed with the bundle due to a known issue. They are omitted from the air-gapped bundle.yaml for this reason.

Deploy the bundle:

juju deploy /path/to/bundle.yaml

Then, deploy the individual charms and add their relations:

juju deploy ./argo-controller_980dd9f.charm --resource oci-image=172.17.0.2:5000/argoproj/workflow-controller:v3.3.8 --config executor-image=172.17.0.2:5000/argoproj/argoexec:v3.3.8
juju deploy ./argo-server_2618292.charm --resource oci-image=172.17.0.2:5000/argoproj/argocli:v3.3.8
juju deploy ./katib-controller_f371975.charm --resource oci-image=172.17.0.2:5000/kubeflowkatib/katib-controller:v0.16.0-rc.1 --config custom_images='{"default_trial_template": "172.17.0.2:5000/kubeflowkatib/mxnet-mnist:v0.16.0-rc.1","early_stopping__medianstop": "172.17.0.2:5000/kubeflowkatib/earlystopping-medianstop:v0.16.0-rc.1","enas_cpu_template": "172.17.0.2:5000/kubeflowkatib/enas-cnn-cifar10-cpu:v0.16.0-rc.1","metrics_collector_sidecar__stdout": "172.17.0.2:5000/kubeflowkatib/file-metrics-collector:v0.16.0-rc.1","metrics_collector_sidecar__file": "172.17.0.2:5000/kubeflowkatib/file-metrics-collector:v0.16.0-rc.1","metrics_collector_sidecar__tensorflow_event": "172.17.0.2:5000/kubeflowkatib/tfevent-metrics-collector:v0.16.0-rc.1","pytorch_job_template__master": "172.17.0.2:5000/kubeflowkatib/pytorch-mnist-cpu:v0.16.0-rc.1","pytorch_job_template__worker": "172.17.0.2:5000/kubeflowkatib/pytorch-mnist-cpu:v0.16.0-rc.1","suggestion__random": "172.17.0.2:5000/kubeflowkatib/suggestion-hyperopt:v0.16.0-rc.1","suggestion__tpe": "172.17.0.2:5000/kubeflowkatib/suggestion-hyperopt:v0.16.0-rc.1","suggestion__grid": "172.17.0.2:5000/kubeflowkatib/suggestion-optuna:v0.16.0-rc.1","suggestion__hyperband": "172.17.0.2:5000/kubeflowkatib/suggestion-hyperband:v0.16.0-rc.1","suggestion__bayesianoptimization": "172.17.0.2:5000/kubeflowkatib/suggestion-skopt:v0.16.0-rc.1","suggestion__cmaes": "172.17.0.2:5000/kubeflowkatib/suggestion-goptuna:v0.16.0-rc.1","suggestion__sobol": "172.17.0.2:5000/kubeflowkatib/suggestion-goptuna:v0.16.0-rc.1","suggestion__multivariate_tpe": "172.17.0.2:5000/kubeflowkatib/suggestion-optuna:v0.16.0-rc.1","suggestion__enas": "172.17.0.2:5000/kubeflowkatib/suggestion-enas:v0.16.0-rc.1","suggestion__darts": "172.17.0.2:5000/kubeflowkatib/suggestion-darts:v0.16.0-rc.1","suggestion__pbt": "172.17.0.2:5000/kubeflowkatib/suggestion-pbt:v0.16.0-rc.1", }'
juju deploy ./kubeflow-volumes_2ee0a84.charm --resource oci-image=172.17.0.2:5000/kubeflownotebookswg/volumes-web-app:v1.7.0
juju deploy ./minio_3ba39ff.charm --resource oci-image=172.17.0.2:5000/minio/minio:RELEASE.2021-09-03T03-56-13Z

juju relate argo-controller minio
juju relate istio-pilot:ingress kubeflow-volumes:ingress
juju relate kubeflow-dashboard:links kubeflow-volumes:dashboard-links
juju relate kfp-api:object-storage minio:object-storage
juju relate kfp-profile-controller:object-storage minio:object-storage
juju relate kfp-ui:object-storage minio:object-storage

In the above commands, the air-gapped registry is assumed to be available at 172.17.0.2:5000. You’ll need to edit this per your setup.

Gateway Service Type

In bundle.yaml, the gateway_service_type for the Istio Gateway Configuration is set to NodePort. However, if you have a load balancer within your cluster, you can remove this configuration, which will cause the configuration option to be reset to the default of LoadBalancer.

Example

Every setup may be different e.g. the choice of K8s (Charmed K8s, EKS, GKE, AKS, microK8s etc.), the choice of cloud provider (GCP, AWS, Azure etc.), the choice of container registry (Docker, Artifactory etc.). It is impossible for us to cover all combinations of these. But we will give a rough example to demonstrate the process.

Example Air-gapped Environment Setup

In this example, the air-gapped setup is as follows:

  • MicroK8s runs inside a single node VM.
  • The VM has cut-off internet connection (default Gateway has been removed).
  • The Docker daemon is running on the VM, alongside MicroK8s, and the Docker CLI is available to those logged into the VM.
  • A Docker registry is deployed as a container inside that VM (not inside Microk8s cluster). See Deploying a Registry Server from Docker documentation.
  • The Docker registry has HTTPS enabled, using a TLS cert that we created, with domain air-gapped.registry.com.
  • The VM has been configured to trust our TLS cert for HTTPS traffic and recognise the domain name for our registry.
  • The MicroK8s cluster can reach the Docker registry container via its domain name air-gapped.registry.com, to fetch images.

Example Extract and Load Images

It is up to you how to extract and load the images provided to them in images.tar.gz. This example just focuses on how the process might look for one image. Within the overall tarball, there will be a sub-tarball per image. For example, the tarball jupyter-web-app.tar will contain the jupyter-web-app image.

The extraction process might look like this:

  1. The main archive is extracted to retrieve all the sub-tarballs: tar -xzvf images.tar.gz. Inside this extracted archive will be jupyter-web-app.tar.
  2. docker load < jupyter-web-app.tar - this will pull the image from the tarball into Docker.
  3. The image pulled will have the default name assigned to it in production: docker.io/kubeflownotebookswg/jupyter-web-app:v1.7.0. Note that this image name implies that it lives in the docker.io public registry.
  4. A new name is given to the image to specify its new home in our air-gapped registry: docker tag docker.io/kubeflownotebookswg/jupyter-web-app:v1.7.0 air-gapped.registry.com/kubeflownotebookswg/jupyter-web-app:v1.7.0. Note: At this point there should be 2 names for the same image, in the docker cache, as can be seen with docker image ls.
  5. The image is pushed to the air-gapped registry with docker push air-gapped.registry.com/kubeflownotebookswg/jupyter-web-app:v1.7.0.

A similar process would then be followed for all images. The new names of the images, as they appear in the air-gapped registry, should be noted, as they will be needed in the bundle configuration step.

Example Bundle Configuration

By the time you are configuring bundle.yaml, you will have pushed all the OCI images provided in images.tar.gz to the air-gapped.registry.com registry. For each OCI image name placeholder in bundle.yaml, you will need to replace that placeholder with the fully qualified name of the actual image pushed to air-gapped.registry.com.

In this example, let’s just focus on a single image. In bundle.yaml, we will have the image placeholder 172.17.0.2:5000/kubeflownotebookswg/jupyter-web-app:v1.7.0. In this case, we do a text find/replace to replace that placeholder with the actual image pushed to the air-gapped registry: air-gapped.registry.com/kubeflownotebookswg/jupyter-web-app:v1.7.0.

This text find/replace process will need to be repeated for all images. However, if you have pushed all the images to the air-gapped registry with the same directory structure as the images appear in the bundle.yaml file, then it would suffice to replace all occurrences of 172.17.0.2:5000 with air-gapped.registry.com.

3 Likes

How can I download the oci image?