# NREL Kestrel This package was originally developed using NREL's Kestrel HPC, which uses Slurm to manage resources. This page contains information specific to Kestrel that users must consider. ## Spark software Spark is installed on Kestrel. Use these locations when your configure your environment. ``` /datasets/images/apache_spark ├── apache-hive-4.0.1-bin.tar.gz ├── hadoop-3.4.1/ ├── jdk-21.0.7/ ├── postgresql-42.7.4.jar ├── spark-4.0.0-bin-hadoop3/ ``` ```console $ sparkctl default-config \ /datasets/images/apache_spark/spark-4.0.0-bin-hadoop3 \ /datasets/images/apache_spark/jdk-21.0.7 \ --hadoop-path /datasets/images/apache_spark/hadoop-3.4.1 \ --hive-tarball /datasets/images/apache_spark/apache-hive-4.0.1-bin.tar.gz \ --postgresql-jar-file /datasets/images/apache_spark/jdk-21.0.7 \ --compute-environment slurm ``` ## Compute nodes ### Low-bandwith compute nodes Users must use only low-bandwidth compute nodes. The "high-bandwidth" nodes are very problematic for Spark. They contain two network interface cards (NICs) which have different IP addresses. An application addressing a hostname may get routed to either NIC. However, Spark will only be listening on one address. The result is that many intra-cluster network messages will be dropped, causing the cluster to fail. Solution: Specify the constraint `lbw` (for low bandwidth) whenever you request nodes with salloc/ srun/sbatch. Example: ```console $ salloc --account -t 01:00:00 --constraint lbw ``` ### Heterogeneous Slurm jobs Kestrel offers a shared partition where you can request a subset of a compute node's CPUs and memory. Prefer this partition for your Spark master process and user application, as discussed at {ref}`heterogeneous-slurm-jobs`. ### Local storage for Spark shuffle writes A small portion of Kestrel compute nodes have local storage (~256). The drives are ~1.6 TB. Prefer these nodes for Spark workers if your jobs will shuffle lots of data. The local drives are much faster than Lustre and you'll get better results. ```console $ salloc --account -t 01:00:00 --partition nvme ``` Enable local storage for shuffle writes with ```console $ sparkctl configure --local-storage ``` Lustre will likely be sufficient for most jobs. If you will shuffle a large amount of data (TBs) and are unsure about how many nodes you'll need to have enough room, start with Lustre (which will happen by default). ### Lustre file striping If you use Lustre for shuffle writes, consider enabling custom stripe settings to optimize performance. Run this command in your runtime directory **before** you write any data to it. ```console $ lfs setstripe -E 256K -L mdt -E 8M -c -1 -S 1M -E -1 -c -1 -p flash /scratch/$USER/work-dir ```