Charmed Kubeflow (CKF) and the Canonical Observability Stack (COS) can be easily integrated using Juju. This integration opens up the possibility of monitoring Kubeflow.
Contents:
Requirements
- A Kubeflow deployment and access to the Kubeflow dashboard. For instructions, refer to the CKF getting started guide.
- A COS deployment. For instructions, refer to the COS getting started on Microk8s guide. Given that Kubeflow is deployed first following the guide above, skip this guide’s
Configure Microk8s
part. - Minimum system requirements: CPU 8 RAM 64GB DISK 150GB
- Tools:
microk8s
,juju
,yq
,jq
,curl
. For compatible versions of microk8s and juju, refer to CKF supported versions page.
Integrate with COS
As per COS best practices, this guide assumes that COS and CKF are deployed each in their own controllers. This means that after deployment, there will be a kubeflow
model and a cos
model. Those will have associated controllers kf-controller
and cos-controller
, respetively. However, the names of the controllers could be anything.
Integrating CKF with COS involves adding relations to Prometheus to provide metrics and alerts and to Grafana to provide dashboards. To avoid cross model relations and ensure COS is accessible, Kubeflow components will be related to COS through the Grafana Agent charm. Data will flow from CKF charms through the Grafana agent and then to COS.
Deploy Grafana Agent
In the kubeflow
model, deploy the Grafana agent:
juju switch kf-controller:kubeflow
juju deploy grafana-agent-k8s --channel=stable
Get COS URLs
Get Traefik URLs:
juju switch cos-controller:cos
juju run traefik/leader show-proxied-endpoints --format json | jq -r '.[] | .results | ."proxied-endpoints"' | jq .
Alternatively, use this:
juju show-unit catalogue/0 | grep url
Note the URLs for later. For more information on COS URLs, see the Browse dashboards section of the COS guide.
Check connectivity
Check connectivity from Grafana agent to COS. Try to access any of the URLs (e.g. “catalogue”) from within the Grafana agent:
juju switch kf-controller:kubeflow
juju exec --unit grafana-agent-k8s/0 'curl -I <URL>'
Before continuing with any more steps, make sure you get an OK response. This confirms that the Grafana agent can connect to COS.
Make offers from COS
Make offers for Prometheus and Grafana from COS. Note that if you 've deployed COS with offers
overlay, you can also skip this step.
juju offer -c cos-controller cos.prometheus:receive-remote-write prometheus-receive-remote-write
juju offer -c cos-controller cos.grafana:grafana-dashboard grafana-dashboards
Consume Offers in Kubeflow
Switch to the kubeflow
model and consume the offers from COS for Prometheus and Grafana:
juju switch kf-controller:kubeflow # skip this line if already in kubeflow model
juju consume -m kf-controller:kubeflow cos-controller:cos.prometheus-receive-remote-write
juju consume -m kf-controller:kubeflow cos-controller:cos.grafana-dashboards
Connect Grafana Agent to endpoints
Tell the Grafana Agent to provide metrics, alerts and dashboards to the two endpoints created by consuming those offers:
juju switch kf-controller:kubeflow # skip this line if already in kubeflow model
juju integrate grafana-agent-k8s prometheus-receive-remote-write
juju integrate grafana-agent-k8s:grafana-dashboards-provider grafana-dashboards
Verify that the relations for both offers are in place:
juju status -m cos-controller:cos
We should see 1/1
in the Connected
column under Offers
.
Integrate with Prometheus
Relate the Kubeflow charms to the metrics-endpoint
, which will provide their metrics to Prometheus in COS.
juju switch kf-controller:kubeflow # skip this line if already in kubeflow model
juju integrate argo-controller:metrics-endpoint grafana-agent-k8s:metrics-endpoint
juju integrate dex-auth:metrics-endpoint grafana-agent-k8s:metrics-endpoint
juju integrate envoy:metrics-endpoint grafana-agent-k8s:metrics-endpoint
juju integrate istio-pilot:metrics-endpoint grafana-agent-k8s:metrics-endpoint
juju integrate jupyter-controller:metrics-endpoint grafana-agent-k8s:metrics-endpoint
juju integrate katib-controller:metrics-endpoint grafana-agent-k8s:metrics-endpoint
juju integrate kfp-api:metrics-endpoint grafana-agent-k8s:metrics-endpoint
juju integrate knative-operator:metrics-endpoint grafana-agent-k8s:metrics-endpoint
# To enable metrics from knative-eventing and knative-serving charms, we need the following 2 relations.
juju integrate knative-eventing:otel-collector knative-operator:otel-collector
juju integrate knative-serving:otel-collector knative-operator:otel-collector
juju integrate metacontroller-operator:metrics-endpoint grafana-agent-k8s:metrics-endpoint
juju integrate minio:metrics-endpoint grafana-agent-k8s:metrics-endpoint
juju integrate seldon-controller-manager:metrics-endpoint grafana-agent-k8s:metrics-endpoint
juju integrate training-operator:metrics-endpoint grafana-agent-k8s:metrics-endpoint
Verify the relations were added with juju status --relations
.
Integrate with Grafana
Relate Kubeflow charms to the grafana-dashboards-consumer
with Grafana in COS:
juju switch kf-controller:kubeflow # skip this line if already in kubeflow model
juju integrate argo-controller:grafana-dashboard grafana-agent-k8s:grafana-dashboards-consumer
juju integrate envoy:grafana-dashboard grafana-agent-k8s:grafana-dashboards-consumer
juju integrate jupyter-controller:grafana-dashboard grafana-agent-k8s:grafana-dashboards-consumer
juju integrate katib-controller:grafana-dashboard grafana-agent-k8s:grafana-dashboards-consumer
juju integrate kubeflow-dashboard:grafana-dashboard grafana-agent-k8s:grafana-dashboards-consumer
juju integrate minio:grafana-dashboard grafana-agent-k8s:grafana-dashboards-consumer
juju integrate seldon-controller-manager:grafana-dashboard grafana-agent-k8s:grafana-dashboards-consumer
Verify the relations were added with juju status --relations
.
Access monitoring resources
Using the URLs fetched above, access Prometheus and Grafana in order to view the monitoring resources (metrics, alerts and dashboards) that CKF charms provide.Prometheus metrics
Navigate to the Prometheus URL. From here, you can query for any metric.
- In order to view the metrics from a specific charm, query
{juju_application="<app-name>"}
. For example for argo-controller, querying{juju_application="argo-controller"}
should return all its metrics likeargo_workflows_count
,argo_workflows_error_count
,etc. - In order to view all the metrics available to Prometheus, use the Metrics explorer by clicking the round icon next to Execute in that query form.
For more information on metrics available by each application, see the CKF charms Prometheus metrics page.
Prometheus alerts
Navigate to the Prometheus URL and click on Alerts from the top bar. This shows all alerts available.
- Filter from Active, Pending and Firing alerts using the checkboxes on the top.
- In order to view alerts from a specific charm, type its name in the search bar on the top.
For more information on default alerts, see the CKF charms Prometheus alerts page.
Grafana dashboards
Navigate to the Grafana dashboard URL. Get the admin password:
juju switch cos-controller:cos # skip this line if already in cos model
juju run grafana/leader get-admin-password
Using admin
as the username, log in with the password returned. Browse available dashboards by opening the sidebar menu and navigating to Dashboards
.
For more information on default dashboards, see the CKF charms Grafana dashboards page.