Charmed Apache Spark K8s Documentation - How to use Integration Hub

deusebio · 3 June 2024 21:09

Configure service accounts using the Integration Hub charm

The Integration Hub charm allows seamless configuration of Charmed Apache Spark service accounts via Juju relations, therefore providing a charming, integrated user experience.

The Integration Hub charm is part of the Charmed Apache Spark bundle, that can be deployed by following the How-to deploy guide. Alternatively, you can also deploy the Integration Hub for Apache Spark charm standalone by running the following command:

juju deploy spark-integration-hub-k8s --channel edge -n1

Once deployed, the Integration Hub for Apache Spark will automatically manage the properties for all the service accounts created either with the spark-client snap or using the spark8t python library. Refer to the how-to guides for more information on the snap usage and on the python library.

Enable object storage integration

Integration Hub for Apache Spark can consume:

s3-credentials relation provided by the S3-integrator to enable integration with an S3-compatible object storage system
azure-credentials relation provided by the Azure Storage Integrator to enable integration with Azure Storages, such as Azure Blob Storage (WASB) and Azure DataLake Gen2 Storage (ABFS).

S3-compatible object storage

To enable integration with an S3-compatible storage, deploy the S3-integrator charm:

juju deploy s3-integration --channel stable

And configure the appropriate parameters via config options, i.e.

juju config s3-integration \
  bucket=<S3_BUCKET> \
  endpoint=<S3_ENDPOINT> \
  path=spark-events

In the s3-integrator, credentials are fed using an action:

juju run s3-integrator/leader sync-s3-credentials \
  access-key=$S3_ACCESS_KEY \
  secret-key=$S3_SECRET_KEY

Please refer to the How-To Setup Environment for guidance on how to set up and retrieve the different parameters for a MinIO deployed on MicroK8s and AWS S3. For more information on how to deploy and configure the s3-integrator charm refer to the charm documentation.

Once the s3-integrator is set up and in an idle/active state, the Integration Hub for Apache Spark charm can be integrated with

juju integrate s3-integrator spark-integration-hub-k8s

This will automatically add relevant configuration properties to your Spark jobs, depending on the storage backend. This can be verified using the tools provided in the spark-client snap, e.g.

spark-client.service-account-registry get-config --username <service_account> --namespace <namespace>

You should see the following configuration automatically added to your service account:

spark.hadoop.fs.s3a.aws.credentials.provider=org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider
spark.hadoop.fs.s3a.connection.ssl.enabled=false
spark.hadoop.fs.s3a.path.style.access=true \
spark.hadoop.fs.s3a.access.key=<S3_ACCESS_KEY>
spark.hadoop.fs.s3a.endpoint=<S3_ENDPOINT>
spark.hadoop.fs.s3a.secret.key=<S3_SECRET_KEY>

Azure storage

To enable integration with an Azure storage, deploy the Azure Storage Integrator charm

juju deploy azure-storage-integrator --channel edge

storage_account and container are provided to the charm using normal configuration, while storage_key is provided using Juju secrets, to ensure confidentiality and security over its value.

Thus, create a Juju secret holding its value:

juju add-secret azure-credentials secret-key=<AZURE_STORAGE_KEY>

This should prompt the secret:<secret_id> that can be used to configure the Azure Storage Integrator charm.

Before configuring the charm, make sure the Azure Storage Integrator charm has granted permission to it:

juju grant-secret <secret_id> azure-storage

Then, use the secret_id to configure the charm, i.e.

juju config azure-storage-integrator \
  credentials=secret:<secret_id> \
  storage-account=<AZURE_STORAGE_ACCOUNT> \
  container=<AZURE_CONTAINER> \
  path="spark-events"

Please refer to the How-To Setup Environment for guidance on how to set up and retrieve the different parameters for Azure Storage backends. For more information on how to deploy and configure the Azure Storage Integrator charm refer to the charm documentation.

Once the Azure Storage Integrator charm is set up and on an idle/active state, the Integration Hub for Apache Spark charm can be integrated with

juju integrate azure-storage-integrator spark-integration-hub-k8s

This will automatically add relevant configuration properties to your Spark jobs, depending on the storage backend. This can be verified using the tools provided in the spark-client snap, e.g.

spark-client.service-account-registry get-config --username <service_account> --namespace <namespace>

You should see the following configuration automatically added to your service-account:

spark.hadoop.fs.azure.account.key.<AZURE_STORAGE_ACCOUNT>.dfs.core.windows.net=<AZURE_STORAGE_KEY>

Enable Monitoring with Prometheus pushgateway

The Integration Hub can consume the pushgateway relation provided by the Prometheus Pushgateway charm to provide integration with an object storage.

You can find more information on how to deploy and configure a prometheus-pushgateway charm here.

Once a prometheus-pushgateway charm is set up, the Integration Hub for Apache Spark charm can be related:

juju integrate prometheus-pushgateway spark-integration-hub-k8s

This will add relevant configuration properties to your Charmed Apache Spark service accounts, that can be verified using the snap, e.g.:

spark-client.service-account-registry get-config --username <service_account> --namespace <namespace>

where you should see additional configuration automatically added to your service-account

spark.metrics.conf.driver.sink.prometheus.pushgateway-address=<PROMETHEUS_GATEWAY_ADDRESS>:<PROMETHEUS_PORT>
spark.metrics.conf.driver.sink.prometheus.class=org.apache.spark.banzaicloud.metrics.sink.PrometheusSink
spark.metrics.conf.driver.sink.prometheus.enable-dropwizard-collector=true
spark.metrics.conf.driver.sink.prometheus.period=5
spark.metrics.conf.driver.sink.prometheus.metrics-name-capture-regex=([a-z0-9]*_[a-z0-9]*_[a-z0-9]*_)(.+)
spark.metrics.conf.driver.sink.prometheus.metrics-name-replacement=\$2
spark.metrics.conf.executor.sink.prometheus.pushgateway-address=<PROMETHEUS_GATEWAY_ADDRESS>:<PROMETHEUS_PORT>
spark.metrics.conf.executor.sink.prometheus.class=org.apache.spark.banzaicloud.metrics.sink.PrometheusSink
spark.metrics.conf.executor.sink.prometheus.enable-dropwizard-collector=true
spark.metrics.conf.executor.sink.prometheus.period=5
spark.metrics.conf.executor.sink.prometheus.metrics-name-capture-regex=([a-z0-9]*_[a-z0-9]*_[a-z0-9]*_)(.+)
spark.metrics.conf.executor.sink.prometheus.metrics-name-replacement=\$2

Overriding values and adding other configurations

Besides the configurations enabled by relations, custom configurations can also be added directly using actions.

To add a new custom configuration property:

juju run integration-hub/leader add-config conf="<property>=<value>"

Configuration properties can be removed using:

juju run integration-hub/leader clear-config

or they can be removed one by one using:

juju run integration-hub/leader remove-config key="<property>"

Finally, to list all custom configuration properties, use:

juju run integration-hub/leader list-config

Since the configurations provided using actions take the precedence, the configuration items already provided by the integration may be overridden, thus allowing some customisation of what is automatically configured by default.