Start a Spark Cluster¶

This page assumes that you have allocated compute nodes via Slurm.

Activate the Python environment that contains sparkctl.

$ module load python
$ source ~/python-envs/sparkctl

Configure and start the Spark cluster. The sparkctl code will detect the compute nodes based on Slurm environment variables.
```
 $ sparkctl configure
```
Optional, inpect the Spark configuration in ./conf.
Start the cluster.
```
$ sparkctl start
```
Set the environment variable SPARK_CONF_DIR. This will ensure that your application uses the Spark settings created in step 2. Instructions will be printed to the console. By default, it will be
```
$ export SPARK_CONF_DIR=$(pwd)/conf
```
Set the JAVA_HOME environment variable to be the same as the java used by Spark. This should bin in your /.sparkctl.toml configuration file.
```
$ export JAVA_HOME=/datasets/images/apache_spark/jdk-21.0.7
```
Run your application.
Shut down the cluster.
```
$ sparkctl stop
```