Environments tutorial
In this tutorial, we will walk through how to build and run a basic MPI code using the four principal toolchains/software stacks on Kestrel. We will discuss common pitfalls in building and running within each of these toolchains, too.
We summarize these toolchains in the below table:
PrgEnv-* |
Compiler |
MPI |
cray |
cray cce |
Cray MPICH |
intel |
intel |
Cray MPICH |
n/a |
intel |
Intel MPI |
n/a |
gcc |
Open MPI |
Note: There is an option to compile with MPICH-based MPI (e.g., Intel MPI but not Open MPI) and then use the module cray-mpich-abi
at run-time, which causes the code to use Cray MPICH instead of the MPI it was built with. More information on how to use this feature will be added soon.
Introduction
Kestrel is a Cray machine whose nodes are connected by "Cray Slingshot" (contrast this to Eagle, which uses infiniband). We've found that packages that make use of Cray tools like Cray MPICH perform faster than when the same package is built and run without Cray tools (e.g. compiling and running with intel MPI), in part because these Cray tools are optimized to work well with Cray Slingshot.
Most of us coming from Eagle are probably used to running our codes with Intel MPI or Open MPI, but not Cray MPICH.
Using the cray-designed programming environments ("PrgEnvs") requires using special Cray compiler wrappers cc
and ftn
. These wrappers replace the MPI compiler wrappers you're used to, like mpicc
, mpiicc
, mpiifort
, etc.
This guide will walk through how to utilize the Cray PrgEnv-
environments with Cray MPICH, how to use "NREL-built" environments, and how to make sure your build is using the dependencies you expect.
What is "PrgEnv-"?
Kestrel comes pre-packaged with several "programming environments." You can see which programming environments are available by typing module avail PrgEnv
. For CPU codes, we focus on PrgEnv-cray
and PrgEnv-intel
. These environments provide compilers (accessible with the cc
, CC
, and ftn
wrappers), Cray MPICH, and some other necessary lower-level libraries.
The Tutorial
We're going to walk through building and running an MPI benchmarking code called IMB. This is a simple code that only requires a compiler and an MPI as dependencies (no scientific libraries, etc. are needed).
First, log onto Kestrel with
ssh [your username]@kestrel.hpc.nrel.gov
Let's grab an interactive node session:
salloc -N 1 -n 104 --time=01:00:00 --account=<your allocation handle>
Environment 1: PrgEnv-cray
Make a new directory
mkdir IMB-tutorial
cd IMB-tutorial
mkdir PrgEnv-cray
cd PrgEnv-cray
Then download the code:
git clone https://github.com/intel/mpi-benchmarks.git
cd mpi-benchmarks
PrgEnv-cray is the default environment on Kestrel, so it should already be loaded upon login to Kestrel. To check, type module list
and make sure you see PrgEnv-cray
somewhere in the module list. If you don't, you can restore the default environment (PrgEnv-cray) by simply running module restore
.
Now, we can build the code. Run the command:
CC=cc CXX=CC CXXFLAGS="-std=c++11" make IMB-MPI1
What does this do?
CC=cc
: set the c compiler to be cc
. Recall that cc
is the Cray wrapper around a c-compiler. Because we're in PrgEnv-cray, we expect the c compiler to be Cray's. We can test this by typing cc --version
, which outputs:
[ohull@kl1 imb]$ cc --version
No supported cpu target is set, CRAY_CPU_TARGET=x86-64 will be used.
Load a valid targeting module or set CRAY_CPU_TARGET
Cray clang version 14.0.4 (3d8a48c51d4c92570b90f8f94df80601b08918b8)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/cray/pe/cce/14.0.4/cce-clang/x86_64/share/../bin
As expected, we are using Cray's C compiler.
CXX=CC
: This sets the C++ compiler to be CC
, in the same way as CC=cc
for the C compiler above.
CXXFLAGS="-std=c++11"
tells the compiler to use the C++11 standard for compiling the C++ code, which is necessary because IMB has some code that is deprecated in C++17, which is the standard that Cray's C++ compiler defaults to.
Finally,
make IMB-MPI1
builds IMB-MPI1, the IMB executable that we want.
Let's see what libraries we dynamically linked to in this build. Once the code is done building, type:
ldd ./IMB-MPI1
This will show all libraries required by the program (on the lefthand side) and the specific implementation of those libraries that the build is currently pointing to (on the righthand side).
Let's focus on MPI. Run:
ldd ./IMB-MPI1 | grep mpi
This should output something like:
[ohull@kl1 PrgEnv-cray]$ ldd IMB-MPI1 | grep mpi
libmpi_cray.so.12 => /opt/cray/pe/lib64/libmpi_cray.so.12 (0x00007fddee9ea000)
So, the MPI library we're using is Cray's MPI (Cray MPICH)
Let's run the code:
srun -N 1 -n 104 ./IMB-MPI1 AllReduce > out
When it completes, take a look at the out file:
cat out
IMB swept from 1 MPI task to 104 MPI tasks, performing a number of MPI_ALLREDUCE calls between the MPI tasks (ranging from 0 bytes to 4194304 bytes)
Note -- very important: when you run IMB-MPI1, you MUST specify IMB-MPI1 as ./IMB-MPI1
or otherwise give a direct path to this specific version of IMB-MPI1
. When we move to the NREL-built intel environment in this tutorial, we will have an IMB-MPI1
already loaded into the path by default, and the command srun IMB-MPI1
will execute the default IMB-MPI1
, not the one you just built.
If you'd like, you can also submit this as a slurm job. Make a file submit-IMB.in
, and paste the following contents:
#!/bin/bash
#SBATCH --time=00:40:00
#SBATCH --nodes=1
#SBATCH --tasks-per-node=104
#!/bin/bash
srun -N 1 --tasks-per-node=104 --mpi=pmi2 your/path/to/IMB-tutorial/PrgEnv-cray/mpi-benchmarks/IMB-MPI1 Allreduce > out
Don't forget to update your/path/to/IMB-tutorial/PrgEnv-cray/mpi-benchmarks/IMB-MPI1
to the actual path to your IMB-MPI1 executable.
Then, sbatch submit-IMB.in
Environment 2: PrgEnv-intel
We'll now repeat all the above steps, except now with PrgEnv-intel. Return to your IMB-tutorial
directory, and mkdir PrgEnv-intel
Now, load the PrgEnv-intel environment:
module restore
module swap PrgEnv-cray PrgEnv-intel
module unload cray-libsci
Note that where possible, we want to avoid using module purge
because it can unset some environment variables that we generally want to keep. So, instead we run module restore
to restore the default environment (PrgEnv-cray) and then swap from PrgEnv-cray to PrgEnv-intel with module swap PrgEnv-cray PrgEnv-intel
. Finally, we unload the cray-libsci
package for the sake of simplicity (as of 4/23/24, we are working through resolving a default versioning conflict between cray-libsci and PrgEnv-intel. If you need to use cray-libsci within PrgEnv-intel, please reach out to hpc-help@nrel.gov)
Again, we can test which C compiler we're using with:
cc --version
Now, this should output something like:
[ohull@x1000c0s0b0n0 mpi-benchmarks]$ cc --version
Intel(R) oneAPI DPC++/C++ Compiler 2023.2.0 (2023.2.0.20230622)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /nopt/nrel/apps/cpu_stack/compilers/02-24/spack/opt/spack/linux-rhel8-sapphirerapids/gcc-12.2.1/intel-oneapi-compilers-2023.2.0-hwdq5hei2obxznfjhtlav4mi5h5jd4zw/compiler/2023.2.0/linux/bin-llvm
Configuration file: /nopt/nrel/apps/cpu_stack/compilers/02-24/spack/opt/spack/linux-rhel8-sapphirerapids/gcc-12.2.1/intel-oneapi-compilers-2023.2.0-hwdq5hei2obxznfjhtlav4mi5h5jd4zw/compiler/2023.2.0/linux/bin-llvm/../bin/icx.cfg
Contrast this to when we ran cc --version
in the PrgEnv-cray section. We're now using a different compiler (Intel oneAPI) under the hood.
We can now repeat the steps we took in the PrgEnv-cray section. Move up two directories and re-download the code:
cd ../../
mkdir PrgEnv-intel
cd PrgEnv-intel
git clone https://github.com/intel/mpi-benchmarks.git
cd mpi-benchmarks
and build it:
CC=cc CXX=CC CXXFLAGS="-std=c++11" make IMB-MPI1
Note that we specify the same compiler wrapper, cc, to be the C compiler (the CC=cc
part of the line above), as we did in the PrgEnv-cray section. But, cc
now wraps around the intel-oneapi C compiler, instead of the Cray C compiler. So, we will be building with a different compiler, even though the build command is identical!
Again, we can run with:
srun -N 1 -n 104 --mpi=pmi2 ./IMB-MPI1 AllReduce > out
Or check which libraries are dynamically linked:
ldd ./IMB-MPI1
Or, for MPI specifically:
[ohull@kl1 PrgEnv-intel]$ ldd ./IMB-MPI1 | grep mpi
libmpi_intel.so.12 => /opt/cray/pe/lib64/libmpi_intel.so.12 (0x00007f13f8f8f000)
Note that this MPI library is indeed still Cray MPICH, the name is different than in the PrgEnv-cray section because it is specifically Cray MPICH built to be compatible with intel compilers, not cray compilers, as in the last example.
You can also submit this inside a Slurm submit script:
#!/bin/bash
#SBATCH --time=00:40:00
#SBATCH --nodes=1
#SBATCH --tasks-per-node=104
#SBATCH --account=<your allocation handle>
#!/bin/bash
module restore
module swap PrgEnv-cray PrgEnv-intel
module unload cray-libsci
srun -N 1 --tasks-per-node=104 --mpi=pmi2 your/path/to/IMB-tutorial/PrgEnv-intel/mpi-benchmarks/IMB-MPI1 Allreduce > out
Note that the only difference between this submit script and the one for Environment 1 is that we exchange PrgEnv-cray
for PrgEnv-intel
.
Environment 3: Intel Compilers and Intel MPI
We've now seen two examples using Cray's environments, PrgEnv-cray
and PrgEnv-intel
. Let's build IMB using one of NREL's environments, which are separate from Cray's.
First, go back to your IMB-tutorial
directory and re-clone the code:
cd ../../
mkdir intel-intelMPI
cd intel-intelMPI
git clone https://github.com/intel/mpi-benchmarks.git
cd mpi-benchmarks
Then, load the NREL environment. To do this, first run:
module restore
module unload PrgEnv-cray
Again, we want to avoid module purge
where possible, so we restore the environment to default (PrgEnv-cray) and then unload the default environment, in order to retain underlying environment variables.
Let's check out our options for Intel compilers now:
module avail intel
We should see a number of modules. Some correspond to applications built with an intel toolchain (e.g. amr-wind/main-intel-oneapi-mpi-intel
, whose name implies that amr-wind was built with the intel oneapi MPI and intel compilers). Others correspond to the MPI (e.g. intel-oneapi-mpi/2021.8.0-intel
) or the compilers itself (e.g. intel-oneapi-compilers/2022.1.0
)
Let's load Intel MPI and Intel compilers:
module load intel-oneapi
module load intel-oneapi-compilers
module load intel-oneapi-mpi
Note that if we look back at module avail intel
and look at the header above, e.g., intel-oneapi
, we can see that these intel modules live in /nopt/nrel/apps/cpu_stack/modules/default/compilers_mpi
-- this is different than the PrgEnvs, which can be found in /opt/cray/pe/lmod/modulefiles/core
. This is one way to tell that you are using NREL's set of modules and not Cray's set of modules.
Now, we can build IMB with the intel compilers and Intel MPI:
CC=mpiicc CXX=mpiicpc CXXFLAGS="-std=c++11" make IMB-MPI1
Note that this command is slightly different than the make commands we saw in the PrgEnv-cray and PrgEnv-intel sections.
Instead of CC=cc
and CXX=CC
we have CC=mpiicc
and CXX=mpiicpc
. mpiicc
, is the intel MPI wrapper around the intel C compiler, and mpiicpc
is the same but for C++.
Remember that warning about IMB-MPI1
being in the default path? This is now true, so be careful that when you run the package, you're running the version you just built, NOT the default path version.
If you're still inside your/path/to/IMB-tutorial/intel-intelMPI/mpi-benchmarks
then we can run the command:
ldd ./IMB-MPI1 | grep mpi
This outputs something like:
[ohull@kl1 intel-intelMPI]$ ldd ./IMB-MPI1 | grep mpi
libmpicxx.so.12 => /nopt/nrel/apps/mpi/07-23/spack/opt/spack/linux-rhel8-icelake/intel-2021.6.0/intel-oneapi-mpi-2021.8.0-6pnag4mmmx6lvoczign5a4fslwvbgebb/mpi/2021.8.0/lib/libmpicxx.so.12 (0x00007f94e5e09000)
libmpifort.so.12 => /nopt/nrel/apps/mpi/07-23/spack/opt/spack/linux-rhel8-icelake/intel-2021.6.0/intel-oneapi-mpi-2021.8.0-6pnag4mmmx6lvoczign5a4fslwvbgebb/mpi/2021.8.0/lib/libmpifort.so.12 (0x00007f94e5a55000)
libmpi.so.12 => /nopt/nrel/apps/mpi/07-23/spack/opt/spack/linux-rhel8-icelake/intel-2021.6.0/intel-oneapi-mpi-2021.8.0-6pnag4mmmx6lvoczign5a4fslwvbgebb/mpi/2021.8.0/lib/release/libmpi.so.12 (0x00007f94e4138000)
We see a few more libraries than we saw with the PrgEnvs. For example, we now have libmpicxx
, libmpifort
, and libmpi
, instead of just libmpi_intel
or libmpi_cray
, as was the case with the two PrgEnvs. We can see that our three MPI library dependencies are pointing to the corresponding library's in the NREL-built environments.
We can submit an IMB job with the following slurm script:
#!/bin/bash
#SBATCH --time=00:40:00
#SBATCH --nodes=1
#SBATCH --tasks-per-node=104
module restore
module unload PrgEnv-cray
module load intel-oneapi
module load intel-oneapi-compilers
module load intel-oneapi-mpi
srun -N 1 --tasks-per-node=104 /your/path/to/IMB-tutorial/intel-intelMPI/mpi-benchmarks/IMB-MPI1 Allreduce > out
don't forget to replace /your/path/to/IMB-tutorial/intel-intelMPI/mpi-benchmarks/IMB-MPI1
with your actual path.
Environment 4: GCC and OpenMPI
Environment 4 works similarly to Environment 3, except instead of using the NREL-built intel modules, we'll use GCC and OpenMPI instead. Note that OpenMPI is not ever recommended to use multi-node, because it is unstable on cray slingshot networks. You should only use OpenMPI for single-node jobs.
Return to your IMB-tutorial
directory and set up for gcc-openMPI:
cd ../../
mkdir gcc-openMPI
cd gcc-openMPI
git clone https://github.com/intel/mpi-benchmarks.git
cd mpi-benchmarks
Run:
module restore
module unload PrgEnv-cray
module unload cce
Note that unlike the NREL-intel case, loading gcc
doesn't automatically unload cce
("cray compiler environment") so we do it manually here with module unload cce
Now, we can module avail openmpi
to find openmpi-related modules. Then, load the version of openmpi that was built with gcc:
module load openmpi/4.1.5-gcc
And finally, load gcc. To see which versions of gcc are available, type module avail gcc
. We'll use GCC 10: module load gcc/10.1.0
Now, we can build the code. Run the command:
CC=mpicc CXX=mpic++ CXXFLAGS="-std=c++11" make IMB-MPI1
Similar to using mpiicc and mpiicpc in the Environment 3 section, now we use mpicc and mpic++, because these are the Open MPI wrappers around the GCC C and C++ compilers (respectively). We are not using the cc
and CC
wrappers now because we are not using a PrgEnv
.
Once the executable is built, check the mpi library it's using with ldd:
ldd ./IMB-MPI1 | grep libmpi
This command should return something like:
[ohull@x1007c7s7b0n0 mpi-benchmarks]$ ldd ./IMB-MPI1 | grep libmpi
libmpi.so.40 => /nopt/nrel/apps/mpi/07-23/spack/opt/spack/linux-rhel8-icelake/gcc-10.1.0/openmpi-4.1.5-s5tpzjd3y4scuw76cngwz44nuup6knjt/lib/libmpi.so.40 (0x00007f5e0c823000)
We see that libmpi is indeed pointing where we want it to: to the openmpi version of libmpi built with gcc-10.1.0.
Finally, we can submit an IMB job with the following slurm script:
#!/bin/bash
#SBATCH --time=00:40:00
#SBATCH --nodes=1
#SBATCH --tasks-per-node=104
module restore
module unload PrgEnv-cray
module unload cce
module load openmpi/4.1.5-gcc
module load gcc/10.1.0
srun -N 1 --tasks-per-node=104 /your/path/to/IMB-tutorial/gcc-openMPI/mpi-benchmarks/IMB-MPI1 Allreduce > out
don't forget to replace /your/path/to/IMB-tutorial/gcc-openMPI/mpi-benchmarks/IMB-MPI1
with your actual path.
Final Words
With all four environments built, you could now run a few benchmarks comparing how MPI performs between them. Try this using 1 node and using 2 nodes, and compare the results for each environment. You should see that performance between all four environments is competitive on 1 node, but the two PrgEnv
builds run a bit faster for large message sizes on 2 nodes, and the gcc/openmpi build is liable to randomly fail in the 2 node case.
Keeping track of the environments on Kestrel can be tricky at first. The key point to remember is that there are two separate "realms" of environments: the Cray PrgEnv
s, which use Cray MPICH and best practices dictate the use of the cc
, CC
, and ftn
compiler wrappers for C, C++, and Fortran, respectively, and the NREL-built environments that function similar to how the environments on Eagle function, and which use the more familiar compiler wrappers like mpiicc
(for compiling C code with intel/intel MPI) or mpicc
(for compiling C code with gcc/Open MPI.)
Earlier in the article, we mentioned the existence of the cray-mpich-abi
, which allows you to compile your code with a non-Cray MPICH-based MPI, like Intel MPI, and then run the code with Cray MPICH via use of the cray-mpich-abi
module. We will include instructions for how to use this in an updated version of the tutorial.