Running Multiple Sub-Jobs with One Job Script
If your workload consists of serial or modestly parallel programs, you can run multiple instances of your program at the same time using different processor cores on a single node. This will allow you to make better use of your allocation because it will use the resources on the node that would otherwise be idle.
Example
For illustration, we use a simple C code to calculate pi. The source code and instructions for building that program are provided below:
Sample Program
Copy and paste the following into a terminal window that's connected to the cluster.
This will stream the pasted contents into a file called pi.c
using the command cat << eof > pi.c
.
cat << eof > pi.c
#include <stdio.h>
// pi.c: A sample C code calculating pi
main() {
double x,h,sum = 0;
int i,N;
printf("Input number of iterations: ");
scanf("%d",&N);
h=1.0/(double) N;
for (i=0; i<N; i++) {
x=h*((double) i + 0.5);
sum += 4.0*h/(1.0+x*x);
}
printf("\nN=%d, PI=%.15f\n", N,sum);
}
eof
Compile the Code
This example uses the Intel C compiler. Load the module and compile pi.c with the following commands:
$ module purge
$ module load intel-mpi
$ icc -O2 pi.c -o pi_test
$ ./pi_test
A sample batch job script file to run 8 copies of the pi_test program on a node with 24 processor cores is given below. This script creates 8 directories and starts 8 jobs, each in the background. It waits for all 8 jobs to complete before finishing.
Copy and paste the following into a text file
Place that batch file into one of your directories on the cluster. Make sure to change the allocation to a project-handle you belong to.
#!/bin/bash
## Required Parameters ##############################################
#SBATCH --time 10:00 # WALLTIME limit of 10 minutes
## Double ## will cause SLURM to ignore the directive:
#SBATCH -A <handle> # Account (replace with appropriate)
#SBATCH -n 8 # ask for 8 tasks
#SBATCH -N 1 # ask for 1 node
## Optional Parameters ##############################################
#SBATCH --job-name wait_test # name to display in queue
#SBATCH --output std.out
#SBATCH --error std.err
JOBNAME=$SLURM_JOB_NAME # re-use the job-name specified above
# Run 1 job per task
N_JOB=$SLURM_NTASKS # create as many jobs as tasks
for((i=1;i<=$N_JOB;i++))
do
mkdir $JOBNAME.run$i # Make subdirectories for each job
cd $JOBNAME.run$i # Go to job directory
echo 10*10^$i | bc > input # Make input files
time ../pi_test < input > log & # Run your executable, note the "&"
cd ..
done
#Wait for all
wait
echo
echo "All done. Checking results:"
grep "PI" $JOBNAME.*/log
Submit the Batch Script
Use the following Slurm sbatch command to submit the script. The job will be scheduled, and you can view the output once the job completes to confirm the results.
$ sbatch -A <project-handle> <batch_file>