How to integrate Charmed Kubeflow with the Canonical Observability Stack (COS)

Charmed Kubeflow (CKF) and the Canonical Observability Stack (COS) can be easily integrated using Juju. This integration opens up the possibility of monitoring Kubeflow.

Contents:

Requirements

  • A Kubeflow deployment and access to the Kubeflow dashboard. For instructions, refer to the CKF getting started guide.
  • A COS deployment. For instructions, refer to the COS getting started on Microk8s guide. Given that Kubeflow is deployed first following the guide above, skip this guide’s Configure Microk8s part.
  • Minimum system requirements: CPU 8 RAM 64GB DISK 150GB
  • Tools: microk8s,juju, yq, jq, curl. For compatible versions of microk8s and juju, refer to CKF supported versions page.

Integrate with COS

As per COS best practices, this guide assumes that COS and CKF are deployed each in their own controllers. This means that after deployment, there will be a kubeflow model and a cos model. Those will have associated controllers kf-controller and cos-controller, respetively. However, the names of the controllers could be anything.

Integrating CKF with COS involves adding relations to Prometheus to provide metrics and alerts and to Grafana to provide dashboards. To avoid cross model relations and ensure COS is accessible, Kubeflow components will be related to COS through the Grafana Agent charm. Data will flow from CKF charms through the Grafana agent and then to COS.

Deploy Grafana Agent

In the kubeflow model, deploy the Grafana agent:

juju switch kf-controller:kubeflow
juju deploy grafana-agent-k8s --channel=stable

Get COS URLs

Get Traefik URLs:

juju switch cos-controller:cos
juju run traefik/leader show-proxied-endpoints --format json | jq -r '.[] | .results | ."proxied-endpoints"' | jq .

Alternatively, use this:

juju show-unit catalogue/0 | grep url

Note the URLs for later. For more information on COS URLs, see the Browse dashboards section of the COS guide.

Check connectivity

Check connectivity from Grafana agent to COS. Try to access any of the URLs (e.g. “catalogue”) from within the Grafana agent:

juju switch kf-controller:kubeflow
juju exec --unit grafana-agent-k8s/0 'curl -I <URL>'

Before continuing with any more steps, make sure you get an OK response. This confirms that the Grafana agent can connect to COS.

Make offers from COS

Make offers for Prometheus and Grafana from COS. Note that if you 've deployed COS with offers overlay, you can also skip this step.

juju offer -c cos-controller cos.prometheus:receive-remote-write prometheus-receive-remote-write
juju offer -c cos-controller cos.grafana:grafana-dashboard grafana-dashboards

Consume Offers in Kubeflow

Switch to the kubeflow model and consume the offers from COS for Prometheus and Grafana:

juju switch kf-controller:kubeflow # skip this line if already in kubeflow model
juju consume -m kf-controller:kubeflow cos-controller:cos.prometheus-receive-remote-write
juju consume -m kf-controller:kubeflow cos-controller:cos.grafana-dashboards

Connect Grafana Agent to endpoints

Tell the Grafana Agent to provide metrics, alerts and dashboards to the two endpoints created by consuming those offers:

juju switch kf-controller:kubeflow # skip this line if already in kubeflow model
juju integrate grafana-agent-k8s prometheus-receive-remote-write
juju integrate grafana-agent-k8s:grafana-dashboards-provider grafana-dashboards 

Verify that the relations for both offers are in place:

juju status -m cos-controller:cos

We should see 1/1 in the Connected column under Offers.

Integrate with Prometheus

Relate the Kubeflow charms to the metrics-endpoint, which will provide their metrics to Prometheus in COS.

juju switch kf-controller:kubeflow # skip this line if already in kubeflow model
juju integrate argo-controller:metrics-endpoint grafana-agent-k8s:metrics-endpoint
juju integrate dex-auth:metrics-endpoint grafana-agent-k8s:metrics-endpoint
juju integrate envoy:metrics-endpoint grafana-agent-k8s:metrics-endpoint
juju integrate istio-pilot:metrics-endpoint grafana-agent-k8s:metrics-endpoint
juju integrate jupyter-controller:metrics-endpoint grafana-agent-k8s:metrics-endpoint
juju integrate katib-controller:metrics-endpoint grafana-agent-k8s:metrics-endpoint
juju integrate kfp-api:metrics-endpoint grafana-agent-k8s:metrics-endpoint
juju integrate knative-operator:metrics-endpoint grafana-agent-k8s:metrics-endpoint
# To enable metrics from knative-eventing and knative-serving charms, we need the following 2 relations.
juju integrate knative-eventing:otel-collector knative-operator:otel-collector
juju integrate knative-serving:otel-collector knative-operator:otel-collector 
juju integrate metacontroller-operator:metrics-endpoint grafana-agent-k8s:metrics-endpoint
juju integrate minio:metrics-endpoint grafana-agent-k8s:metrics-endpoint
juju integrate seldon-controller-manager:metrics-endpoint grafana-agent-k8s:metrics-endpoint
juju integrate training-operator:metrics-endpoint grafana-agent-k8s:metrics-endpoint

Verify the relations were added with juju status --relations.

Integrate with Grafana

Relate Kubeflow charms to the grafana-dashboards-consumer with Grafana in COS:

juju switch kf-controller:kubeflow # skip this line if already in kubeflow model
juju integrate argo-controller:grafana-dashboard grafana-agent-k8s:grafana-dashboards-consumer
juju integrate envoy:grafana-dashboard grafana-agent-k8s:grafana-dashboards-consumer
juju integrate jupyter-controller:grafana-dashboard grafana-agent-k8s:grafana-dashboards-consumer
juju integrate katib-controller:grafana-dashboard grafana-agent-k8s:grafana-dashboards-consumer
juju integrate kubeflow-dashboard:grafana-dashboard grafana-agent-k8s:grafana-dashboards-consumer
juju integrate minio:grafana-dashboard grafana-agent-k8s:grafana-dashboards-consumer
juju integrate seldon-controller-manager:grafana-dashboard grafana-agent-k8s:grafana-dashboards-consumer

Verify the relations were added with juju status --relations.

Access monitoring resources

Using the URLs fetched above, access Prometheus and Grafana in order to view the monitoring resources (metrics, alerts and dashboards) that CKF charms provide.

Prometheus metrics

Navigate to the Prometheus URL. From here, you can query for any metric.

  • In order to view the metrics from a specific charm, query {juju_application="<app-name>"}. For example for argo-controller, querying {juju_application="argo-controller"} should return all its metrics like argo_workflows_count,argo_workflows_error_count,etc.
  • In order to view all the metrics available to Prometheus, use the Metrics explorer by clicking the round icon next to Execute in that query form.

For more information on metrics available by each application, see the CKF charms Prometheus metrics page.

Prometheus alerts

Navigate to the Prometheus URL and click on Alerts from the top bar. This shows all alerts available.

  • Filter from Active, Pending and Firing alerts using the checkboxes on the top.
  • In order to view alerts from a specific charm, type its name in the search bar on the top.

For more information on default alerts, see the CKF charms Prometheus alerts page.

Grafana dashboards

Navigate to the Grafana dashboard URL. Get the admin password:

juju switch cos-controller:cos # skip this line if already in cos model
juju run grafana/leader get-admin-password  

Using admin as the username, log in with the password returned. Browse available dashboards by opening the sidebar menu and navigating to Dashboards.

For more information on default dashboards, see the CKF charms Grafana dashboards page.

How should I solve this when the traefik goes into a pending state??

I already have an istio load balancer.