This is the first in a series of posts that will detail my journey in provisioning Spark workloads using Juju.
In this post we will cover the very basics of deploying a singleton spark instance that you can use to execute jobs against. In the following posts we will explore different spark cluster modes, backends and different ways of interfacing to the spark backend, whichever it may be.
Spark Standalone Singleton - Up and Running
juju deploy cs:~omnivector/spark --constraints "cores=4 mem=8G root-disk=10G"
Will get you the latest stable version of the Spark charm.
Note: The Spark charm can be deployed with less resources, but will be less usable the further you slim it down.
Following a successful deployment your juju status
should resemble:
Model Controller Cloud/Region Version SLA Timestamp
bdx00 pdl-aws aws/us-west-2 2.5.4 unsupported 22:13:39Z
App Version Status Scale Charm Store Rev OS Notes
spark 2.4.1 active 1 spark jujucharms 31 ubuntu
Unit Workload Agent Machine Public address Ports Message
spark/0* active idle 1 172.31.102.151 7077/tcp,7078/tcp,8080/tcp,8081/tcp,18080/tcp Running: master,worker,history
Machine State DNS Inst id Series AZ Message
1 started 172.31.102.151 i-0ed5ca81962bb9fb2 bionic us-west-2a running
Access the spark GUIs at their respective ports:
- master ui:
http://172.31.102.151:8080
- worker ui:
http://172.31.102.151:8081
- history server ui:
http://172.31.102.151:18080
Run the spark-pi
Example
This juju action command will kick off the spark-pi
example job.
juju run-action spark/0 spark-pi --wait
Inspect the spark master, worker, and history server ui to see the application status after running the spark-pi
action.
More Than One
To expand the size of the spark cluster simply add more units!
juju add-unit -n 5 spark
Using Juju Storage
A user can provision the SPARK_WORKER_DIRS
and/or SPARK_LOCAL_DIR
via juju storage by passing in the --storage
argument with the correct parameters to the juju deploy
command.
AWS EBS Example
juju deploy cs:~omnivector/spark \
--constraints "spaces=nat root-disk=20G instance-type=t3.xlarge" \
--storage spark-local=ebs,50G --storage spark-work=ebs,100G