Charmed Apache Spark K8s Documentation - How to deploy Charmed Apache Spark

Deploy Charmed Apache Spark on K8s

Charmed Apache Spark comes with a bundled set of components that allow you to easily manage Apache Spark workloads on K8s, providing integration with object storage, monitoring and log aggregation. For an overview on the different components that form Charmed Apache Spark, please refer to this section.

Prerequisites

Since Charmed Apache Spark will be managed by Juju, make sure that:

  • you have a Juju client (e.g. via a SNAP) installed in your local machine
  • you are able to connect to a juju controller
  • you have read-write permissions to either an S3-compatible or an Azure object storage

To set up a Juju controller on K8s and the Juju client, you can refer to existing tutorials and documentation for MicroK8s and for AWS EKS. Also refer to the How-To Setup Environment guide to install and setup an S3-compatible object storage on MicroK8s (MinIO), EKS (AWS S3), or Azure object storages. For other backends or K8s distributions other than MinIO on MicroK8s and S3 on EKS (e.g. Ceph, Charmed Kubernetes, GKE, etc.), please refer to their documentation.

Charmed Apache Spark supports native integration with the Canonical Observability Stack (COS). To enable monitoring on top of Charmed Apache Spark, make sure that you have a Juju model with COS correctly deployed. To deploy COS on MicroK8s follow the step-by-step tutorial or refer to its documentation for more informations.

Preparation

Juju Model

Make sure that you have a Juju model where you can deploy the Spark History server. In general, we advise to segregate juju applications belonging to different solutions, and therefore to have a dedicated model for Spark components, e.g.

juju add-model <juju_model>

Note that this will create a K8s namespace to which the different Charmed Apache Spark components will be deployed.

Deploy Charmed Apache Spark

Charmed Apache Spark can be deployed via:

  • Native Juju YAML bundle and overlays
  • Terraform modules

Using Juju bundles

Juju bundles are provided in the form of Jinja2 templates, for the following distribution:

You can easily customize these templates from the CLI using jinja2-cli:

pip install jinja2-cli

Once the package is installed, you can render the template using

jinja2 -D <key>=<value> bundle.yaml.j2 > bundle.yaml

There exist different YAML bundles, with different configuration options, for S3 and Azure object storage backends. Please, refer to the next sections for more information about how to configure the deployments for the different object storage backends.

Once the bundle is rendered, it can be simply deployed using

juju deploy -m <juju_model> ./bundle.yaml
S3 backends

The following table summarizes the properties to be specified for the main bundle

key Description
service_account Service Account to be used by Apache Kyuubi Engines (deprecated)
namespace Namespace where the charms will be deployed. This should correspond to the name of the Juju model to be used.
s3_endpoint Endpoint of the S3-compatible object storage backend, in the form of http(s)//host:port.
bucket Name of the S3 bucket to be used for storing logs and data

Once the bundle is deployed, you will see that most of the charms will be in a blocked status because of missing or invalid S3 credentials. In particular, the s3 charm should notify that it needs to be provided with access and a secret key. This can be done using the sync-s3-credentials action:

juju run s3/leader sync-s3-credentials \
  access-key=<access-key> secret-key=<secret-key>

After this, the charms should start to receive the credentials and move into active/idle state.

Azure storage backends

The following table summarizes the properties to be specified for the main Azure bundle.

Key Description
service_account Service Account to be used by Apache Kyuubi Engines
namespace Namespace where the charms will be deployed. This should correspond to the name of the Juju model to be used.
storage_account Name of the Azure storage account to be used.
container Name of the Azure storage container to be used for storing logs and data

Create a Juju secret holding the values for the Azure secret key:

juju add-secret azure-credentials secret-key=<AZURE_STORAGE_KEY>

This should prompt the secret:<secret_id> that can be used to configure the bundle. To do so, first grant access to the secret for the Azure Storage Integrator charm

juju grant-secret <secret_id> azure-storage

Then, you can configure the charm to use the secret

juju config azure-storage credentials=secret:<secret_id> 

After this, the different charms should start to receive the credentials and move into active/idle state.

The Azure Storage Integrator charm assumes hierarchical namespaces to have been enabled by default. When you create a new storage account, please make sure you check the “Hierarchical Namespaces” checkbox. If you want to use a legacy storage account that doesn’t have hierarchical namespaces enabled, please configure Azure Storage integrator charm to use WASB / WASBS protocol instead with: juju config azure-storage connection-protocol=wasbs

The directory spark-events needs to be created beforehand in the Azure container for the Spark History server to work. Please refer to the How-To Setup Environment guide for more detailed instructions.

Enabling COS

COS can be enabled using an overlay. Similarly to the main bundle, the jinja2 template for the overlay can be rendered with the following properties:

key Description Default
cos_controller Name of the controller hosting the COS model. micro
cos_model Name of the COS model cos

Once the template is rendered, the COS-enabled Charmed Apache Spark bundle can be deployed using:

juju deploy -m <juju_model> ./bundle.yaml --overlay cos-integration.yaml

Using Terraform

Make sure you have a working Terraform 1.8+ installed in your machine. You can install Terraform or OpenTofu via a snap.

Terraform modules make use of the Terraform Juju provider. More information about the Juju provider can be found here.

The Charmed Apache Spark Terraform module is composed of the following submodules:

  • base module that bundles all the base resources of the Charmed Apache Spark solution
  • cos-integration module that bundles all the resources that enable integration with COS

Currently only S3 storage backends are supported for Terraform-based bundles.

The Charmed Apache Spark Terraform modules can be configured using a .tfvars.json file with the following schema:

{
  "s3": {
    "bucket": <bucket_name>,
    "endpoint": <s3_endpoint>
  },
  "kyuubi_user": <kyuubi_service_account>,
  "model": <juju_model>,
  "cos_model": <cos_model> 
}

The following table provides the description of the different configuration option

key Description
kyuubi_user Service Account to be used by Apache Kyuubi Engines (deprecated)
model Namespace where the charms will be deployed. This should correspond to the name of the Juju model to be used.
s3.endpoint Endpoint of the S3-compatible object storage backend, in the form of http(s)//host:port.
s3.bucket Name of the S3 bucket to be used for storing logs and data
cos_model (Optional) Name of the model where COS is deployed. If omitted, the resource of the cos-integration submodules will not be deployed

The Juju Terraform provider does not yet support cross-controller relations with COS. Therefore, COS model must be hosted in the same controller as the Charmed Apache Spark model.

To deploy Charmed Apache Spark using Terraform, use standard TF syntax:

  • terraform init in order to initialize the modules
  • terraform apply -var-file=<.tfvars.json_filename>
  • terraform destroy -var-file=<.tfvars.json_filename>

For more information about Terraform, please refer to the official docs.

1 Like