Charmed Spark K8s Documentation - How to deploy Charmed Spark

deusebio · 27 June 2023 12:30

Deploy Charmed Spark on K8s

Charmed Spark comes with a bundled set of components that allow you to easily manage Spark workloads on K8s, providing integration with object storage, monitoring and log aggregation. For an overview on the different components that form Charmed Spark, please refer to this section.

Prerequisites

Since Charmed Spark will be managed by Juju, make sure that:

you have a Juju client (e.g. via a SNAP) installed in your local machine
you are able to connect to a juju controller
you have read-write permissions (therefore you have an access key, access secret and the s3 endpoint) to a S3-compatible object storage

To set up a Juju controller on K8s and the juju client, you can refer to existing tutorials and documentation, e.g. here for MicroK8s and here for AWS EKS. Also refer to the How-To Setup Environment userguide to install S3-compatible object storage on MicroK8s (MinIO) or EKS (AWS S3). For other backends or K8s distributions other than MinIO on MicroK8s and S3 on EKS (e.g. Ceph, Charmed Kubernetes, GKE, etc), please refer to the documentation or your admin.

Charmed Spark supports native integration with the Canonical Observability Stack (COS). To enable monitoring on top of Charmed Spark, make sure that you have a Juju model with COS correctly deployed. To deploy COS on MicroK8s follow the step-by-step tutorial or refer to its documentation for more informations.

Preparation

Juju Model

Make sure that you have a Juju model where you can deploy the Spark History server. In general, we advise to segregate juju applications belonging to different solutions, and therefore to have a dedicated model for Spark components, e.g.

juju add-model <juju_model>

Note that this will create a K8s namespace in which the different Charmed Spark components will be deployed to.

Setup S3 Bucket

Create a bucket and a path object spark-events for storing Spark logs in s3. This can be done in multiple ways depending on the S3 backend interface.

Using AWS-CLI Snap

Install the aws-cli snap

sudo snap install aws-cli --classic

Configure aws-cli client

aws configure set aws_access_key_id <ACCESS_KEY>
aws configure set aws_secret_access_key <SECRET_KEY>
aws configure set endpoint_url <S3_ENDPOINT>

Test that the aws-cli client is properly working with aws s3 ls

Create the S3 bucket

aws s3 mb “s3://<BUCKET_NAME>”

Using Python

Install boto

pip install boto3

Create the S3 bucket

from botocore.client import Config
import boto3

config = Config(connect_timeout=60, retries={"max_attempts": 0})
session = boto3.session.Session(
    aws_access_key_id=<ACCESS_KEY>, aws_secret_access_key=<SECRET_KEY>
)
s3 = session.client("s3", endpoint_url=<S3_ENDPOINT>, config=config)

s3.create_bucket(Bucket=<BUCKET_NAME>)
s3.put_object(Bucket=<BUCKET_NAME>, Key=("spark-events/"))

Setup Spark service account to run Kyuubi engines

Charmed Spark includes support for Apache Kyuubi project which enables users to remotely connect to a Spark cluster using ODBC/JDBC and query their data with standard SQL. Kyuubi requires a dedicated service account to be used for running the driver and executor engine pods.

The service account can be created using the spark-client snap:

 spark-client.service-account-registry create \
    --username <kyuubi_server_account> --namespace <juju_model>

For more information on how to further customise Spark service accounts and manage them, please refer to here.

Deploy Charmed Spark

Charmed Spark can be deployed via

Native Juju YAML bundle and overlays
Terraform modules

Using Juju bundles

Juju bundles are provided in the form of Jinja2 templates, for the following distribution:

Charmed Spark 3.4.x
- main bundle.yaml
- overlays

You can easily customize these templates from the CLI using jinja2-cli:

pip install jinja2-cli

Once the package is installed, you can render the template using

jinja2 -D = bundle.yaml.j2 > bundle.yaml

The following table summarizes the properties to be specified for the main bundle

key	Description
service_account	Service Account to be used by Kyuubi Engines (deprecated)
namespace	Namespace where the charms will be deployed. This should correspond to the name of the Juju model to be used.
s3_endpoint	Endpoint of the S3-compatible object storage backend, in the form of `http(s)//host:port`.
bucket	Name of the S3 bucket to be used for storing logs and data

Once the bundle is rendered, it can be simply deployed using

juju deploy -m <juju_model> ./bundle.yaml

After some time, you will see that most of the charms will be in a blocked status because of missing or invalid S3 credentials. In particular, the s3 charm should be notifying that it needs to be provided with the access and secret key. This can be done using the action

juju run s3/leader sync-s3-credentials \
  access-key=<access-key> secret-key=<secret-key>

After this, the different charms should start to receive the credentials and move into active/idle state.

Enabling COS

COS can be enabled using an overlay. Similarly to the main bundle, the jinja2 template for the overlay can be rendered with the following properties:

key	Description	Default
cos_controller	Name of the controller hosting the COS model.	micro
cos_model	Name of the COS model	cos

Once the template is rendered, the COS-enabled Charmed Spark bundle can be deployed using:

juju deploy -m <juju_model> ./bundle.yaml --overlay cos-integration.yaml

Using Terraform

Make sure you have a working Terraform 1.8+ installed in your machine. You can install Terraform or OpenTofu via a snap.

Terraform modules make use of the Terraform Juju provider. More information about the Juju provider can be found here.

The Charmed Spark Terraform module is composed of the following submodules:

base module that bundles all the base resources of the Charmed Spark solution
cos-integration module that bundles all the resources that enable integration with COS

The Charmed Spark Terraform modules can be configured using a .tfvars.json file with the following schema:

{
  "s3": {
    "bucket": <bucket_name>,
    "endpoint": <s3_endpoint>
  },
  "kyuubi_user": <kyuubi_service_account>,
  "model": <juju_model>,
  "cos_model": <cos_model> 
}

The following table provides the description of the different configuration option

key	Description
kyuubi_user	Service Account to be used by Kyuubi Engines (deprecated)
model	Namespace where the charms will be deployed. This should correspond to the name of the Juju model to be used.
s3.endpoint	Endpoint of the S3-compatible object storage backend, in the form of `http(s)//host:port`.
s3.bucket	Name of the S3 bucket to be used for storing logs and data
cos_model	(Optional) Name of the model where COS is deployed. If omitted, the resource of the cos-integration submodules will not be deployed

NOTE: The Juju Terraform provider does not yet support cross-controller relations with COS. Therefore, COS model must be hosted in the same controller as the Charmed Spark model.

In order to deploy Charmed Spark using terraform, use standard TF syntax

terraform init in order to initialize the modules
terraform apply -var-file=<.tfvars.json_filename>
terraform destroy -var-file=<.tfvars.json_filename>

For more information about Terraform, please refer to the official docs.