Run Spark Interactively
Apache Spark comes with two interactive shells:
-
spark-shell
, that is built on top of the Scala REPL shell -
pyspark
, that is built on top of the python interpreter shell
Interactive Spark Shell
Supporting interactive usage, the spark-client
snap ships with Apache Spark’s spark-shell
utility.
It’s a useful tool to validate your assumptions about Spark in Scala before finding out within an actual long running job failure.
Let us test out our spark-shell setup with a simple example.
$ spark-client.spark-shell
....
....
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 3.3.2
/_/
....
....
scala> import scala.math.random
scala> val slices = 1000
scala> val n = math.min(100000L * slices, Int.MaxValue).toInt
scala> val squares_sum = spark.sparkContext.parallelize(1 until n, slices).map { i => i * i }.reduce(_ + _)
scala> println(s"Sum of squares is ${squares_sum}")
scala> :quit
Interactive PySpark Shell
For interactive Python shell usage, spark-client
snap ships with Apache Spark’s pyspark
utility.
Make sure that Python is installed on your system. Then, execute the following commands to validate
that your pyspark
setup is working.
$ spark-client.pyspark
....
....
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 3.3.2
/_/
....
....
>>> from operator import add
>>> partitions = 1000
>>> n = 100000 * partitions
>>> def square(x: int) -> float:
... return x ** 2
...
>>> squares_sum = spark.sparkContext.parallelize(range(1, n + 1), partitions).map(square).reduce(add)
>>> print("Sum of squares is %f" % (squares_sum))
>>> quit()
- Previous: Submit a Spark Job
- Next: Tips and Tricks
- Charmed Spark Documentation