Heterogeneous Slurm jobs

The scripts in this package are well-suited for an environment where the Spark cluster manager, driver, and user application run on a shared node with a limited number of CPUs, and the workers run on exclusive nodes with uniform resources. Refer to this diagram to see an illustration of the Spark cluster components.

This can be achieved with Slurm heterogeneous jobs.

Here is one possible configuration:

  • Spark driver memory = 10 GB

  • Spark master memory + overhead for OS and Slurm = 20 GB

  • CPUs for Spark master, driver, user application, and overhead for OS and Slurm = 4

Allocate one compute node from the shared partition and then four from the regular partition.

Note

The shared partition must be first and must have only one compute node. That is where your application will run.

Interactive job

$ salloc --account=<your-account> -t 01:00:00 -n4 --mem=30G --partition=shared : \
    -N2 --partition=debug --mem=240G

Batch job

Here is the format of sbatch script:

#!/bin/bash
#SBATCH --account=<my-account>
#SBATCH --job-name=my-job
#SBATCH --time=4:00:00
#SBATCH --output=output_%j.o
#SBATCH --error=output_%j.e
#SBATCH --partition=shared
#SBATCH --nodes=1
#SBATCH --mem=30G
#SBATCH --ntasks=4
#SBATCH hetjob
#SBATCH --nodes=2
#SBATCH --mem=240G

You will need to adjust the CPU and memory parameters based on what you will pass to sparkctl configure.

Then proceed with the rest of the instructions.