Running on Swift#

Please see the Modules page for information about setting up your environment and loading modules.

swift.hpc.nrel.gov
swift-login-1.hpc.nrel.gov

swift.hpc.nrel.gov is a round-robin alias that will connect you to any available login node.

SSH Keys#

User accounts have a default set of keys cluster and cluster.pub. The config file will use these even if you generate a new keypair using ssh-keygen. If you are adding your keys to Github or elsewhere you should either use cluster.pub or will have to modify the config file.

Slurm and Partitions#

The most up to date list of partitions can always be found by running the sinfo command on the cluster.

Partition	Description
long	jobs up to ten days of walltime
standard	jobs up to two days of walltime
gpu	Nodes with four NVIDIA A100 40 GB Computational Accelerators, up to two days of walltime
debug	two nodes reserved for short tests, up to four hours of walltime

Each partition also has a matching -standby partition. Allocations which have consumed all awarded AUs for the year may only submit jobs to these partitions, and their default QoS will be set to standby. Jobs in standby partitions will be scheduled when there are otherwise idle cycles and no other non-standby jobs are available. Jobs that run in the standby queue will not be charged any AUs.

Any allocation may submit a job to a standby QoS, even if there are unspent AUs.

By default, nodes can be shared between users. To get exclusive access to a node use the --exclusive flag in your sbatch script or on the sbatch command line.

Important

Use --cpus-per-task with srun/sbatch otherwise some applications may only utilize a single core.

GPU Nodes#

Swift now has ten GPU nodes. Each GPU node has 4 NVIDIA A100 40GB GPUs, 96 CPU cores, and 1TB RAM.

GPU nodes are also shared, meaning that less than a full node may be requested for a job, leaving the remainder of the node for use by other jobs concurrently. (See the section below on AU Charges for how this affects the AU usage rate.)

To request use of a GPU, use the flag --gres=gpu:<quantity> with sbatch, srun, or salloc, or add it as an #SBATCH directive in your sbatch submit script, where <quantity> is a number from 1 to 4.

CPU Core and RAM Defaults on GPU Nodes#

If your job will require more than the default 1 CPU core and 1.5GB RAM you must request the quantity of cores and/or RAM that you will need, by using additional flags such as --ntasks= or --mem=. See the Slurm Job Scheduling section for details on requesting additional resources.

Allocation Unit (AU) Charges#

The equation for calculating the AU cost of a job on Swift is:

AU cost = (Walltime in hours * Number of Nodes * QoS Factor * Charge Factor)

The Walltime is the actual length of time that the job runs, in hours or fractions thereof.

The Number of nodes can be whole nodes or fractions of a node. See below for more information.

The Charge Factor for Swift CPU nodes is 5.

The Charge Factor for Swift GPU nodes is 50, or 12.5 per GPU.

The QoS Factor for normal priority jobs is 1.

The QoS Factor for high-priority jobs is 2.

The QoS Factor for standby priority jobs is 0. There is no AU cost for standby jobs.

One CPU node for one hour of walltime at normal priority costs 5 AU total.

One CPU node for one hour of walltime at high priority costs 10 AU total.

One GPU for one hour of walltime at normal priority costs 12.5 AU total.

Four GPUs for one hour of walltime at normal priority costs 50 AU total.

Shared/Fractional CPU Nodes#

Swift allows jobs to share nodes, meaning fractional allocations are possible.

Standard (CPU) compute nodes have 128 CPU cores and 256GB RAM.

When a job only requests part of a node, usage is tracked on the basis of:

1 core = 2GB RAM = 1/128th of a node

Using all resources on a single node, whether CPU, RAM, or both, will max out at 128/128 per node = 1.

The highest quantity of resource requested will determine the total AU charge.

For example, a job that requests 64 cores and 128GB RAM (one half of a node) would be:

1 hour walltime * 0.5 nodes * 1 QoS Factor * 5 Charge Factor = 2.5 AU per node-hour.

Shared/Fractional GPU Nodes#

Jobs on Swift may also share GPU nodes.

Standard GPU nodes have 96 CPU cores, four NVIDIA A100 40GB GPUs, and 1TB RAM.

You may request 1, 2, 3, or 4 GPUs per GPU node, as well as any additional CPU and RAM required.

Usage is tracked on the basis of:

1 GPU = 25% of total cores (24/96) = 25% of total RAM (256GB/1TB) = 25% of a node

The highest quantity of resource requested will determine the total AU charge.

AU Calculation Examples#

AU calculations are performed automatically between the Slurm scheduler and Lex(NREL's web-based allocation tracking/management software). The following calculations are approximations to help illustrate how your AU will be consumed based on your job resource requests and are approximations only:

A request of 1 GPU, up to 24 CPU cores, and up to 256GB RAM will be charged at 12.5 AU/hr:

1/4 GPUs = 25% total GPUs = 50 AU * 0.25 = 12.5 AU (this is what will be charged)
1 core = 1% total cores = 50 AU * 0.01 = 0.50 AU (ignored)
1GB/1TB = 0.1% total RAM = 50 AU * 0.001 = 0.05 AU (ignored)

A request of 1 GPU, 48 CPU cores, and 100GB RAM will be charged at 25 AU/hr:

1/4 GPUs = 25% total GPUs = 50 AU * 0.25 = 12.5 AU (ignored)
48/96 cores = 50% total cores = 50 AU * 0.5 = 25 AU (this is what will be charged)
100GB/1TB = 10% total RAM = 50 AU * 0.10 = 5 AU (ignored)

A request of 2 GPUs, 55 CPU cores, and 200GB RAM will be charged at approximately 28.7 AU/hr:

2/4 GPUs = 50% total GPUS = 50 AU * 0.5 = 25 AU (ignored)
55/96 cores = 57.3% of total cores = 50 AU * .573 = 28.65 AU (this is what will be charged)
200GB/1TB = 20% total RAM = 50 AU * 0.2 = 10 AU (ignored)

A request of 1 GPU, 1 CPU core, and 1TB RAM will be charged at 50 AU/hr:

1/4 GPUs = 25% total GPUS = 50 AU * 0.25 = 12.5 AU (ignored)
1/96 cores = 1% total cores = 50 AU * 0.01 = 0.50 AU (ignored)
1TB/1TB = 100% total RAM = 50 AU * 1 = 50 AU (this is what will be charged)

Software Environments and Example Files#

Multiple software environments are available on Swift, with a number of commonly used modules including compilers, common build tools, specific AMD optimized libraries, and some analysis tools. The environments are in date stamped subdirectories, in the directory /nopt/nrel/apps. Each environment directory has a file myenv.*. Sourcing that file will enable the environment.

When you login you will have access to the default environments and the myenv file will have been sourced for you. You can see the directory containing the environment by running the module avail command.

In the directory for an environment you will see a subdirectory example. This contains a makefile for a simple hello world program written in both Fortran and C. The README.md file contains additional information, most of which is replicated here. It is suggested that you copy the example directory to your own /home for experimentation:

cp -r example ~/example
cd ~/example

Conda#

There is a very basic version of conda in the "anaconda" directory in each /nopt/nrel/apps/YYMMDDa directory. However, there is a more complete environment pointed to by the module under /nopt/nrel/apps/modules. Please see our Conda Documentation for more information.

Simple batch script#

Here is a sample batch script for running the 'hello world' example program, runopenmpi.

#!/bin/bash
#SBATCH --job-name="install"
#SBATCH --nodes=2
#SBATCH --tasks-per-node=2
#SBATCH --exclusive
#SBATCH --account=<myaccount>
#SBATCH --partition=debug
#SBATCH --time=00:01:00


cat $0

#These should be loaded before doing a make
module load gcc  openmpi 

export OMP_NUM_THREADS=2
srun  -n 4 ./fhostone -F
srun  -n 4 ./phostone -F

To run this you need to replace <myaccount> with the appropriate account and ensure that slurm is in your path by running:

module load slurm

Then submit the sbatch script with:

sbatch --partition=test runopenmpi

Building the 'hello world' example#

Obviously for the script given above to work you must first build the application. You need to:

Load the modules
make

Loading the modules.#

We are going to use gnu compilers with OpenMPI.

ml gcc openmpi

Run make#

make

Full procedure#

[nrmc2l@swift-login-1 ~]$ cd ~
[nrmc2l@swift-login-1 ~]$ mkdir example
[nrmc2l@swift-login-1 ~]$ cd ~/example
[nrmc2l@swift-login-1 ~]$ cp -r /nopt/nrel/apps/210928a/example/* .

[nrmc2l@swift-login-1 ~ example]$ cat runopenmpi 
#!/bin/bash
#SBATCH --job-name="install"
#SBATCH --nodes=2
#SBATCH --tasks-per-node=2
#SBATCH --exclusive
#SBATCH --account=<myaccount>
#SBATCH --partition=debug
#SBATCH --time=00:01:00


cat $0

#These should be loaded before doing a make:
module load gcc  openmpi 

export OMP_NUM_THREADS=2
srun  -n 4 ./fhostone -F
srun  -n 4 ./phostone -F


[nrmc2l@swift-login-1 ~ example]$ module load gcc  openmpi
[nrmc2l@swift-login-1 ~ example]$ make
mpif90 -fopenmp fhostone.f90 -o fhostone
rm getit.mod  mympi.mod  numz.mod
mpicc -fopenmp phostone.c -o phostone
[nrmc2l@swift-login-1 ~ example]$ sbatch runopenmpi
Submitted batch job 187
[nrmc2l@swift-login-1 ~ example]$

Results#

[nrmc2l@swift-login-1 example]$ cat *312985*
#!/bin/bash
#SBATCH --job-name="install"
#SBATCH --nodes=2
#SBATCH --tasks-per-node=2
#SBATCH --exclusive
#SBATCH --partition=debug
#SBATCH --time=00:01:00


cat $0

#These should be loaded before doing a make
module load gcc  openmpi 

export OMP_NUM_THREADS=2
srun  -n 4 ./fhostone -F
srun  -n 4 ./phostone -F

MPI Version:Open MPI v4.1.1, package: Open MPI nrmc2l@swift-login-1.swift.hpc.nrel.gov Distribution, ident: 4.1.1, repo rev: v4.1.1, Apr 24, 2021
task    thread             node name  first task    # on node  core
0002      0000                 c1-31        0002         0000   018
0000      0000                 c1-30        0000         0000   072
0000      0001                 c1-30        0000         0000   095
0001      0000                 c1-30        0000         0001   096
0001      0001                 c1-30        0000         0001   099
0002      0001                 c1-31        0002         0000   085
0003      0000                 c1-31        0002         0001   063
0003      0001                 c1-31        0002         0001   099
0001      0000                 c1-30        0000         0001  0097
0001      0001                 c1-30        0000         0001  0103
0003      0000                 c1-31        0002         0001  0062
0003      0001                 c1-31        0002         0001  0103
MPI VERSION Open MPI v4.1.1, package: Open MPI nrmc2l@swift-login-1.swift.hpc.nrel.gov Distribution, ident: 4.1.1, repo rev: v4.1.1, Apr 24, 2021
task    thread             node name  first task    # on node  core
0000      0000                 c1-30        0000         0000  0072
0000      0001                 c1-30        0000         0000  0020
0002      0000                 c1-31        0002         0000  0000
0002      0001                 c1-31        0002         0000  0067
[nrmc2l@swift-login-1 example]$

Building with Intel Fortran or Intel C and OpenMPI#

You can build parallel programs using OpenMPI and the Intel Fortran ifort and Intel C icc compilers.

We have the example programs build with gnu compilers and OpenMP using the lines:

[nrmc2l@swift-login-1 ~ example]$ mpif90 -fopenmp fhostone.f90 -o fhostone
[nrmc2l@swift-login-1 ~ example]$ mpicc -fopenmp phostone.c -o phostone

This gives us:

[nrmc2l@swift-login-1 ~ example]$ ls -l fhostone
-rwxrwxr-x. 1 nrmc2l nrmc2l 42128 Jul 30 13:36 fhostone
[nrmc2l@swift-login-1 ~ example]$ ls -l phostone
-rwxrwxr-x. 1 nrmc2l nrmc2l 32784 Jul 30 13:36 phostone

Note the size of the executable files.

If you want to use the Intel compilers, first load the appropriate modules:

module load openmpi intel-oneapi-compilers gcc

Then we can set the variables OMPI_FC=ifort and OMPI_CC=icc, and recompile:

[nrmc2l@swift-login-1 ~ example]$ export OMPI_FC=ifort
[nrmc2l@swift-login-1 ~ example]$ export OMPI_CC=icc
[nrmc2l@swift-login-1 ~ example]$ mpif90 -fopenmp fhostone.f90 -o fhostone
[nrmc2l@swift-login-1 ~ example]$ mpicc -fopenmp phostone.c -o phostone


[nrmc2l@swift-login-1 ~ example]$ ls -lt fhostone
-rwxrwxr-x. 1 nrmc2l nrmc2l 41376 Jul 30 13:37 fhostone
[nrmc2l@swift-login-1 ~ example]$ ls -lt phostone
-rwxrwxr-x. 1 nrmc2l nrmc2l 32200 Jul 30 13:37 phostone
[nrmc2l@swift-login-1 ~ example]$

Note the size of the executable files have changed. You can also see the difference by running the commands:

nm fhostone | grep intel | wc
nm phostone | grep intel | wc

on the two versions of the program. It will show how many calls to Intel routines are in each, 51 and 36 compared to 0 for the gnu versions.

Building and Running with Intel MPI#

We can build with the Intel versions of MPI. We assume we will want to build with icc and ifort as the backend compilers. We load the modules:

ml gcc
ml intel-oneapi-compilers
ml intel-oneapi-mpi

Then, build and run the same example as above:

make clean
make PFC=mpiifort PCC=mpiicc

Giving us:

[nrmc2l@swift-login-1 example]$ ls -lt fhostone phostone
-rwxrwxr-x. 1 nrmc2l hpcapps 160944 Aug  5 16:14 phostone
-rwxrwxr-x. 1 nrmc2l hpcapps 952352 Aug  5 16:14 fhostone
[nrmc2l@swift-login-1 example]$

We need to make some changes to our batch script. Replace the module load line with:

module load intel-oneapi-mpi intel-oneapi-compilers gcc

Our IntelMPI batch script, runintel under /example, is:

#!/bin/bash
#SBATCH --job-name="install"
#SBATCH --nodes=2
#SBATCH --tasks-per-node=2
#SBATCH --exclusive
#SBATCH --account=<myaccount>
#SBATCH --partition=debug
#SBATCH --time=00:01:00


cat $0

#These should be loaded before doing a make
module load intel-oneapi-mpi intel-oneapi-compilers gcc

export OMP_NUM_THREADS=2
srun  -n 4 ./fhostone -F
srun  -n 4 ./phostone -F

Which produces the following output:

MPI Version:Intel(R) MPI Library 2021.3 for Linux* OS

task    thread             node name  first task    # on node  core
0000      0000                 c1-32        0000         0000   127
0000      0001                 c1-32        0000         0000   097
0001      0000                 c1-32        0000         0001   062
0001      0001                 c1-32        0000         0001   099

MPI VERSION Intel(R) MPI Library 2021.3 for Linux* OS

task    thread             node name  first task    # on node  core
0000      0000                 c1-32        0000         0000  0127
0000      0001                 c1-32        0000         0000  0097
0001      0000                 c1-32        0000         0001  0127
0001      0001                 c1-32        0000         0001  0099

VASP, Jupyter, Julia, and Other Applications on Swift#

Please see the relevant page in the Applications section for more information on using applications on Swift and other NREL clusters.