Compile and run: Intel1API compilers & MPI
Introduction
This page shows how to compile and run programs using Intel's 1API tool chain. We'll look at building using their MPI and Fortran and C compilers. It is possible to build programs using Intel's MPI libraries but actually compile using gfortran and gcc. This is also covered.
Intel's C compiler icc has been around for many years. It is being retired and replaced with icx. As of summer of 2023 you can still use icc but it is scheduled to be removed by the end of the year. Building with icc produces a warning message. We'll discuss how to surpress the warning and more importantly, build using icx.
Our example programs are hybrid MPI/Openmp so we'll show commands for building hybrid programs. If your program is pure MPI the only change you need to make to the build process is to remove the compile line option -fopenmp.
Sample makefile, source codes, and runscript for on Kestrel can be found in our Kestrel Repo under the Toolchains folder. There are individual directories for source,makefiles, and scripts or you can download the intel.tgz file containing all required files. The source differs slightly from what is shown here. There is an extra file triad.c that gets compiled along with the Fortran and C programs discussed below. This file does some "dummy" work to allow the programs to run for a few seconds.
module loads for compile
These are the module you will need for compiles:
module load intel-oneapi-compilers
module load intel-oneapi-mpi
module load gcc/13.1.0
Intel compilers use some gcc functionality so we load gcc to give a newer version of that compiler.
module loads for run
Normally, builds are static, meaning that an application "knows" where to find its libraries. Thus, we don't need to load the Intel modules at runtime Unless you have some other external libaries that require a module load the only module lines you will need are:
module purge
module load libfabric
Building programs
As discussed above we can build with Intel (ifort, icc, icx) or GNU (gcc, gfortran) underlying compilers. The 5 options are:
- Fortran with: Intel MPI and Intel Fortran compiler
- C with: Intel MPI and Intel C compiler, older compiler (icc)
- C with: Intel MPI and Intel C compiler, newer compiler (icx)
- Fortran with: Intel MPI with gfortran Fortran compiler
- C with: Intel MPI with gcc C compiler
Here's what the compile lines should be where we add the -fopenmp option for Opnemp and the optimization flag -O3.
1. Fortran with: Intel MPI and Intel Fortran compiler
mpiifort -O3 -g -fopenmp ex1.f90
2a. C with: Intel MPI and Intel C compiler, older compiler (icc)
mpiicc -O3 -g -fopenmp ex1.c -o ex_c
This will produce the warning message icc: remark #10441: The Intel(R) C++ Compiler Classic (ICC) is deprecated and will be removed from product release in the second half of 2023. The Intel(R) oneAPI DPC++/C++ Compiler (ICX) is the recommended compiler moving forward. Please transition to use this compiler. Use '-diag-disable=10441' to disable this message
We can compile with the extra flag.
mpiicc -diag-disable=10441 -O3 -g -fopenmp ex1.c -o gex_c
2b. Older compiler (icc) might not be avialable
Depending on the version of compilers loaded the message shown above might be replaced with one saying that the icx is no longer available. In this case you MUST use icx. There are two ways to do that shown below.
3a. C with: Intel MPI and Intel C compiler, newer compiler (icx)
export I_MPI_CC=icx
mpiicc -O3 -g -fopenmp ex1.c -o ex_c
Setting the environmental variable tells mpiicc to use icx (the newer Intel compiler) instead of icc.
3a. C with: Intel MPI and Intel C compiler, newer compiler (icx)
mpiicx -O3 -g -fopenmp ex1.c -o ex_c
Explictly running mpiicx will give you icx as the backend compiler.
mpicc and mpif90 may not give you what you expect.
The commands mpicc and mpif90 actually call gcc and gfortran instead of the Intel compilers. If you consider these the default way to compile programs the "by default" Intel MPI does not use Intel compilers.
4. Fortran with: Intel MPI with gfortran Fortran compiler
mpif90 -O3 -g -fopenmp ex1.f90
5. C with: Intel MPI with gcc C compiler
mpicc -O3 -g -fopenmp ex1.f90
Example programs
We have two example MPI/OpenMP programs, ex1.c and ex1.f90. They are more or less identical in function. They first print MPI Library and compiler information. For example the fortran example compiled with mpiifort reports:
Fortran MPI TASKS 4
Intel(R) MPI Library 2021.8 for Linux* OS
Intel(R) Fortran Intel(R) 64 Compiler Classic for applications running on Intel
For mpif90 we get:
Fortran MPI TASKS 4
Intel(R) MPI Library 2021.8 for Linux* OS
GCC version 13.1.0
Note in these cases we have the same MPI library but different compilers.
The programs call a routine, triad. It keeps the cores busy for about 4 seconds. This allows the OS to settle down. Then for each MPI task and each openmp thread we get a line of the form:
task 0001 is running on x9000c3s2b0n0 thread= 2 of 3 is on core 054
This is saying that MPI task 1 is running on node x9000c3s2b0n0. The task has 3 openmp threads and the second is running on core 54.
Example makefile
The triad.c file containes the routines that keeps the cores busy for 4 seconds. This is common to both the fortran and C versions of our codes. As discussed above our main codes are ex1.c and ex1.f90. Our makefile will build for
1. Fortran with: Intel MPI and Intel Fortran compiler
3. C with: Intel MPI and Intel C compiler, newer compiler (icx)
4. Fortran with: Intel MPI with gfortran Fortran compiler
5. C with: Intel MPI with gcc C compiler
There are comments in the makefile to show how to build with
2. C with: Intel MPI and Intel C compiler, older compiler (icc)
The makefile has an intresting "trick". The default target is recurse. This target loads the modules then calls make again using the same makefile but with the targets intel and gnu. By using this "trick" you don't have to load modules before the make.
The targets intel and gnu each have a dependency to compile triad with either Intel or gcc compilers. Then the final applications are built with Intel MPI and again the either Intel or gnu.
The final MPI codes are:
- gex_c : gcc
- gex_f : gfortran
- ex_c : Intel C (icx)
- ex_f : Intel Fortran (ifort)
Run script
- Makes a new directory, copies the requred files and goes there
- Does a make with output going into make.log
- Sets the number of MPI tasks and openmp threads
- Sets some environmental variables to control and report on threads (discussed below)
- module commands
- module purge
- module load libfabric
- Creates a string with all of our srun options (discussed below)
- Calls srun on each version of our program
- output goes to *.out
- Report on thread placement goes to *.info
Our script sets these openmp related variables. The first is familiar. KMP_AFFINITY is unique to Intel compilers. In this case we are telling the OS to scatter (spread) out our threads. OMP_PROC_BIND=spread does the same thing but it is not unique to Intel compilers. So in this case KMP_AFFINITY is actually redundent.
export OMP_NUM_THREADS=3
export KMP_AFFINITY=scatter
export OMP_PROC_BIND=spread
The next line
export BIND="--cpu-bind=v,cores"
is not technically used as an environmental variable but it will be used to create the srun command line. Passing --cpu-bind=v to srun will casue it to report threading information. The "cores" option tells srun to "Automatically generate masks binding tasks to cores." There are many other binding options as described in the srun man page. This setting works well for many programs.
Our srun command line options for 2 tasks per node and 3 threads per task are:
--mpi=pmi2 --cpu-bind=v,cores --threads-per-core=1 --tasks-per-node=2 --cpus-per-task=3
- --mpi=pmi2 : tells srun to use a particular launcher (This is optional.)
- --cpu-bind=v,cores : discussed above
- --threads-per-core=1 : don't allow multiple threads to run on the same core. Without this option it is possible for multiple threads to end up on the same core, decreasing performance.
- --cpus-per-task=3 : The cpus-per-task should always be equal to OMP_NUM_THREADS.
The final thing the script does is produce a results report. This is just a list of mapping of mpi tasks and threads. There should not be any repeats in the list. There will be "repeats" of cores but on different nodes. There will be "repeats" of nodes but with different cores.
You can change the values for --cpu-bind, OMP_PROC_BIND, and threads-per-core to see if this list changes.