Start a Spark ClusterΒΆ
This page assumes that you have allocated compute nodes via Slurm.
Activate the Python environment that contains sparkctl.
$ module load python $ source ~/python-envs/sparkctl
Configure and start the Spark cluster. The sparkctl code will detect the compute nodes based on Slurm environment variables.
$ sparkctl configure
Optional, inpect the Spark configuration in
./conf
.Start the cluster.
$ sparkctl start
Set the environment variable
SPARK_CONF_DIR
. This will ensure that your application uses the Spark settings created in step 2. Instructions will be printed to the console. By default, it will be$ export SPARK_CONF_DIR=$(pwd)/conf
Set the
JAVA_HOME
environment variable to be the same as the java used by Spark. This should bin in your/.sparkctl.toml
configuration file.$ export JAVA_HOME=/datasets/images/apache_spark/jdk-21.0.7
Run your application.
Shut down the cluster.
$ sparkctl stop