Charmed Apache Spark K8s Documentation - How to deploy Charmed Apache Spark

deusebio · 27 June 2023 12:30

Deploy Charmed Apache Spark on K8s

Charmed Apache Spark comes with a bundled set of components that allow you to easily manage Apache Spark workloads on K8s, providing integration with object storage, monitoring and log aggregation. For an overview on the different components that form Charmed Apache Spark, please refer to the components overview page.

Prerequisites

Since Charmed Apache Spark will be managed by Juju, make sure that:

you have a Juju client (e.g. via a snap) installed in your local machine
you are able to connect to a juju controller
you have read-write permissions to either an S3-compatible or an Azure object storage

To set up a Juju controller on K8s and the Juju client, you can refer to existing tutorials and documentation for MicroK8s and for AWS EKS. Also refer to the How-to set up environment guide to install and set up an S3-compatible object storage on MicroK8s (MinIO), EKS (AWS S3), or Azure object storages. For other backends or K8s distributions other than MinIO on MicroK8s and S3 on EKS (e.g. Ceph, Charmed Kubernetes, GKE, etc.), please refer to their documentation.

Charmed Apache Spark supports native integration with the Canonical Observability Stack (COS). To enable monitoring on top of Charmed Apache Spark, make sure that you have a Juju model with COS correctly deployed. To deploy COS on MicroK8s follow the step-by-step tutorial or refer to its documentation for more informations.

Preparation

Juju Model

Make sure that you have a Juju model where you can deploy the Spark History server. In general, we advise to segregate juju applications belonging to different solutions, and therefore to have a dedicated model for Spark components, e.g.

juju add-model <juju_model>

Note that this will create a K8s namespace to which the different Charmed Apache Spark components will be deployed.

Deploy Charmed Apache Spark

Charmed Apache Spark can be deployed via:

Native Juju YAML bundle and overlays
Terraform modules

Using Juju bundles

Juju bundles are provided in the form of Jinja2 templates, for the following distribution:

Charmed Apache Spark 3.4.x
- main bundle.yaml
- overlays

You can easily customize these templates from the CLI using jinja2-cli:

pip install jinja2-cli

Once the package is installed, you can render the template using

jinja2 -D <key>=<value> bundle.yaml.j2 > bundle.yaml

There exist different YAML bundles, with different configuration options, for S3 and Azure object storage backends. Please, refer to the next sections for more information about how to configure the deployments for the different object storage backends.

Once the bundle is rendered, it can be simply deployed using

juju deploy -m <juju_model> ./bundle.yaml

S3 backends

The following table summarizes the properties to be specified for the main bundle

key	Description
service_account	Service Account to be used by Apache Kyuubi Engines (deprecated)
namespace	Namespace where the charms will be deployed. This should correspond to the name of the Juju model to be used.
s3_endpoint	Endpoint of the S3-compatible object storage backend, in the form of `http(s)//host:port`.
bucket	Name of the S3 bucket to be used for storing logs and data

Once the bundle is deployed, you will see that most of the charms will be in a blocked status because of missing or invalid S3 credentials. In particular, the s3 charm should notify that it needs to be provided with access and a secret key. This can be done using the sync-s3-credentials action:

juju run s3/leader sync-s3-credentials \
  access-key=<access-key> secret-key=<secret-key>

After this, the charms should start to receive the credentials and move into active/idle state.

Azure storage backends

The following table summarizes the properties to be specified for the main Azure bundle.

Key	Description
`service_account`	Service Account to be used by Apache Kyuubi Engines
`namespace`	Namespace where the charms will be deployed. This should correspond to the name of the Juju model to be used.
`storage_account`	Name of the Azure storage account to be used.
`container`	Name of the Azure storage container to be used for storing logs and data

Create a Juju secret holding the values for the Azure secret key:

juju add-secret azure-credentials secret-key=<AZURE_STORAGE_KEY>

This should prompt the secret:<secret_id> that can be used to configure the bundle. To do so, first grant access to the secret for the Azure Storage Integrator charm

juju grant-secret <secret_id> azure-storage

Then, you can configure the charm to use the secret

juju config azure-storage credentials=secret:<secret_id>

After this, the different charms should start to receive the credentials and move into active/idle state.

The Azure Storage Integrator charm assumes hierarchical namespaces to have been enabled by default. When you create a new storage account, please make sure you check the “Hierarchical Namespaces” checkbox. If you want to use a legacy storage account that doesn’t have hierarchical namespaces enabled, please configure Azure Storage integrator charm to use WASB / WASBS protocol instead with: juju config azure-storage connection-protocol=wasbs

The directory spark-events needs to be created beforehand in the Azure container for the Spark History server to work. Please refer to the How-To Setup Environment guide for more detailed instructions.

Enabling COS

COS can be enabled using an overlay. Similarly to the main bundle, the jinja2 template for the overlay can be rendered with the following properties:

key	Description	Default
cos_controller	Name of the controller hosting the COS model.	micro
cos_model	Name of the COS model	cos

Once the template is rendered, the COS-enabled Charmed Apache Spark bundle can be deployed using:

juju deploy -m <juju_model> ./bundle.yaml --overlay cos-integration.yaml

Using Terraform

Make sure you have a working Terraform 1.8+ installed in your machine. You can install Terraform or OpenTofu via a snap.

Terraform modules make use of the Terraform Juju provider. More information about the Juju provider can be found here.

The Charmed Apache Spark Terraform module is composed of the following submodules:

base module that bundles all the base resources of the Charmed Apache Spark solution
cos-integration module that bundles all the resources that enable integration with COS

Currently only S3 storage backends are supported for Terraform-based bundles.

The Charmed Apache Spark Terraform modules can be configured using a .tfvars.json file with the following schema:

{
  "s3": {
    "bucket": <bucket_name>,
    "endpoint": <s3_endpoint>
  },
  "kyuubi_user": <kyuubi_service_account>,
  "model": <juju_model>,
  "cos_model": <cos_model> 
}

The following table provides the description of the different configuration option

key	Description
kyuubi_user	Service Account to be used by Apache Kyuubi Engines (deprecated)
model	Namespace where the charms will be deployed. This should correspond to the name of the Juju model to be used.
s3.endpoint	Endpoint of the S3-compatible object storage backend, in the form of `http(s)//host:port`.
s3.bucket	Name of the S3 bucket to be used for storing logs and data
cos_model	(Optional) Name of the model where COS is deployed. If omitted, the resource of the cos-integration submodules will not be deployed

The Juju Terraform provider does not yet support cross-controller relations with COS. Therefore, COS model must be hosted in the same controller as the Charmed Apache Spark model.

To deploy Charmed Apache Spark using Terraform, use standard TF syntax:

terraform init in order to initialize the modules
terraform apply -var-file=<.tfvars.json_filename>
terraform destroy -var-file=<.tfvars.json_filename>

For more information about Terraform, please refer to the official docs.

afrogrit · 25 April 2025 22:14

Hi everyone,

I have followed the above meticulously and I have minio for S3(from charmed KF), have also deployed charmed k8s but struggling to get Kyuubi to work, below is my status


App              Version  Status   Scale  Charm                      Channel      Rev  Address         Exposed  Message
certificates              active       1  self-signed-certificates   latest/edge  163  10.152.183.79   no
history-server            blocked      1  spark-history-server-k8s   3.4/edge      33  10.152.183.145  no       Invalid S3 credentials
integration-hub           blocked      1  spark-integration-hub-k8s  latest/edge   43  10.152.183.54   no       Invalid S3 credentials
kyuubi                    blocked      3  kyuubi-k8s                 latest/edge   45  10.152.183.17   no       Missing Object Storage backend
kyuubi-users     14.11    active       1  postgresql-k8s             14/stable    281  10.152.183.48   no
metastore        14.11    active       1  postgresql-k8s             14/stable    281  10.152.183.252  no
s3                        active       1  s3-integrator              latest/edge   17  10.152.183.119  no
zookeeper        3.8.4    active       3  zookeeper-k8s              3/edge        70  10.152.183.131  no

Unit                Workload  Agent  Address          Ports  Message
certificates/0*     active    idle   192.168.189.52
history-server/0*   blocked   idle   192.168.189.23          Invalid S3 credentials
integration-hub/0*  blocked   idle   192.168.112.111         Invalid S3 credentials
kyuubi-users/0*     active    idle   192.168.189.30          Primary
kyuubi/0*           blocked   idle   192.168.189.15          Missing Object Storage backend
kyuubi/1            blocked   idle   192.168.112.101         Missing Object Storage backend
kyuubi/2            blocked   idle   192.168.67.187          Missing Object Storage backend
metastore/0*        active    idle   192.168.112.72          Primary
s3/0*               active    idle   192.168.112.117
zookeeper/0*        active    idle   192.168.189.28
zookeeper/1         active    idle   192.168.112.125
zookeeper/2         active    idle   192.168.67.167

I’m uncertain of service_account , what should it be?

thanks ahead

luke

deusebio · 25 April 2025 23:28

The JuJu status definitely points to the S3 to not be configured correctly. There are a few things I would check:

Make sure that the pods there can route to the S3 backend (connect into a pod or span a pod in the namespace and ping/telnet the S3 service to make sure it is resolvable)
Make sure you feed (via the action) the correct secret key and secret access to the S3 integrator, you could also try to use the aws cli, to make sure you can list,create buckets, and that the credentials work correctly
make sure that you have created the bucket.

Have you inspected the charm logs (with JuJu debug-log) for the S3 as well as a few client charms (history server, integration hub)

Also if you keep having problems, it could also be best to open a GitHub issue on the spark-k8s-bundle repo. Thanks!

afrogrit · 30 April 2025 21:04

Hi @deusebio

thanks for the help, I was to quick to seek for help, I should have waited for the installation to settle, I’m able to send spark jobs but been having the issues with RequestWatchProgress is disabled - of which I’m stuck

PS - are you able to delete my question above as its misleading

25/04/29 16:16:05 ERROR AbstractWatchManager: Error received: Status(apiVersion=v1, code=500, details=null, kind=Status, message=a watch stream was requested by the client but the required storage feature RequestWatchProgress is disabled, metadata=ListMeta(_continue=null, remainingItemCount=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=InternalError, status=Failure, additionalProperties={}), will retry

deusebio · 2 May 2025 12:33

Hi,

It could be a permission problem that the watcher does not have permission over the k8s cluster. I’m wondering: have you deployed the charms with --trust to give the charms permission over the K8s api?

Anyhow, if you are keen, it is probably best to move the conversation into a dedicate issue if you can submit that.

Cheers, Enrico

afrogrit · 5 May 2025 13:36

Thanks @deusebio,

I usually try my level best and all options before I raise an issue. This is time I have failed all my attempts therefore I have raised an issue on github.

Thanks, Luke