nsrdb.utilities.cli.SlurmManager
- class SlurmManager(user=None, queue_dict=None)[source]
Bases:
SLURM
GAPs-compliant SLURM manager
- Parameters:
user (str | None) – HPC username. None will get your username using getpass.getuser()
queue_dict (dict | None) – Parsed HPC queue dictionary (qstat for PBS or squeue for SLURM) from parse_queue_str(). None will get the queue from PBS or SLURM.
Methods
change_qos
(arg, qos)Change the priority (quality of service) for a job.
check_status
([job_id, job_name])Check the status of an HPC job using the HPC queue.
check_status_using_job_id
(job_id)Check the status of a job using the HPC queue and job ID.
format_walltime
(hours)Get the SLURM walltime string in format "HH:MM:SS"
hold
(arg)Temporarily hold a job from submitting.
make_path
(d)Make a directory tree if it doesn't exist.
make_sh
(fname, script)Make a shell script (.sh file) to execute a subprocess.
parse_queue_str
(queue_str[, keys])Parse the qstat or squeue output string into a dict format keyed by integer job id with nested dictionary of job properties (queue printout columns).
query_queue
([job_name, user, qformat, skip_rows])Run the HPC queue command and return the raw stdout string.
release
(arg)Release a job that was previously on hold so it will be submitted to a compute node.
rm
(fname)Remove a file.
s
(s)Format input as str w/ appropriate quote types for python cli entry.
sbatch
(cmd[, alloc, walltime, memory, ...])Submit a SLURM job via sbatch command and SLURM shell script
scancel
(arg)Cancel a slurm job.
scontrol
(cmd)Submit an scontrol command.
submit
(cmd[, background, background_stdout])Open a subprocess and submit a command.
Attributes
MAX_NAME_LEN
QCOL_ID
QCOL_NAME
QCOL_STATUS
QSKIP
SQ_FORMAT
USER
Get the HPC queue parsed into dict format keyed by integer job id
Get a list of the job integer ids in the queue
Get a list of the job names in the queue
- check_status_using_job_id(job_id)[source]
Check the status of a job using the HPC queue and job ID.
- Parameters:
job_id (int) – Job integer ID number.
- Returns:
status (str | None) – Queue job status string or None if not found.
- change_qos(arg, qos)
Change the priority (quality of service) for a job.
- Parameters:
arg (int | list | str) – SLURM integer job id(s) to change qos for. Can be ‘all’ for all jobs.
qos (str) – New qos value
- check_status(job_id=None, job_name=None)
Check the status of an HPC job using the HPC queue.
- Parameters:
job_id (int | None) – Job integer ID number (preferred input)
job_name (str) – Job name string.
- Returns:
status (str | NoneType) – Queue job status str or None if not found. SLURM status strings: PD, R, CG (pending, running, complete). PBS status strings: Q, R, C (queued, running, complete).
- static format_walltime(hours)
Get the SLURM walltime string in format “HH:MM:SS”
- Parameters:
hours (float | int) – Requested number of job hours.
- Returns:
walltime (str) – SLURM walltime request in format “HH:MM:SS”
- hold(arg)
Temporarily hold a job from submitting. Held jobs will stay in queue but will not get nodes until released.
- Parameters:
arg (int | list | str) – SLURM integer job id(s) to hold. Can be ‘all’ to hold all jobs.
- static make_path(d)
Make a directory tree if it doesn’t exist.
- Parameters:
d (str) – Directory tree to check and potentially create.
- static make_sh(fname, script)
Make a shell script (.sh file) to execute a subprocess.
- Parameters:
fname (str) – Name of the .sh file to create.
script (str) – Contents to be written into the .sh file.
- classmethod parse_queue_str(queue_str, keys=0)
Parse the qstat or squeue output string into a dict format keyed by integer job id with nested dictionary of job properties (queue printout columns).
- Parameters:
queue_str (str) – HPC queue output string (qstat for PBS or squeue for SLURM). Typically a space-delimited string with line breaks.
keys (list | int) – Argument to set the queue job attributes (column headers). This defaults to an integer which says which row index contains the space-delimited column headers. Can also be a list to explicitly set the column headers.
- Returns:
queue_dict (dict) – HPC queue parsed into dictionary format keyed by integer job id with nested dictionary of job properties (queue printout columns).
- classmethod query_queue(job_name=None, user=None, qformat=None, skip_rows=None)
Run the HPC queue command and return the raw stdout string.
- Parameters:
job_name (str | None) – Optional to check the squeue for a specific job name (not limited to the 8 shown characters) or None to show user’s whole queue.
user (str | None) – HPC username. None will get your username using getpass.getuser()
qformat (str | None) – Queue format string specification. Changing this form the default (None) could have adverse effects!
skip_rows (int | list | None) – Optional row index values to skip.
- Returns:
stdout (str) – HPC queue output string. Can be split on line breaks to get list.
- property queue
Get the HPC queue parsed into dict format keyed by integer job id
- Returns:
queue (dict) – HPC queue parsed into dictionary format keyed by integer job id with nested dictionary of job properties (queue printout columns).
- property queue_job_ids
Get a list of the job integer ids in the queue
- property queue_job_names
Get a list of the job names in the queue
- release(arg)
Release a job that was previously on hold so it will be submitted to a compute node.
- Parameters:
arg (int | list | str) – SLURM integer job id(s) to release. Can be ‘all’ to release all jobs.
- static rm(fname)
Remove a file.
- Parameters:
fname (str) – Filename (with path) to remove.
- static s(s)
Format input as str w/ appropriate quote types for python cli entry.
Examples
list, tuple -> “[‘one’, ‘two’]” dict -> “{‘key’: ‘val’}” int, float, None -> ‘0’ str, other -> ‘string’
- sbatch(cmd, alloc=None, walltime=None, memory=None, nodes=1, feature=None, name='reV', stdout_path='./stdout', keep_sh=False, conda_env=None, module=None, module_root='/shared-projects/rev/modulefiles')
Submit a SLURM job via sbatch command and SLURM shell script
- Parameters:
cmd (str) –
- Command to be submitted in SLURM shell script. Example:
‘python -m reV.generation.cli_gen’
alloc (str) – HPC project (allocation) handle. Example: ‘rev’. Default is not to state an allocation (does not work on Eagle slurm).
walltime (float) – Node walltime request in hours. Default is not to state a walltime (does not work on Eagle slurm).
memory (int) – Node memory request in GB.
nodes (int) – Number of nodes to use for this sbatch job. Default is 1.
feature (str) – Additional flags for SLURM job. Format is “–qos=high” or “–depend=[state:job_id]”. Default is None.
name (str) – SLURM job name.
stdout_path (str) – Path to print .stdout and .stderr files.
keep_sh (bool) – Boolean to keep the .sh files. Default is to remove these files after job submission.
conda_env (str) – Conda environment to activate
module (bool) – Module to load
module_root (str) – Path to module root to load
- Returns:
out (str) – sbatch standard output, if submitted successfully, this is the slurm job id.
err (str) – sbatch standard error, this is typically an empty string if the job was submitted successfully.
- scancel(arg)
Cancel a slurm job.
- Parameters:
arg (int | list | str) – SLURM integer job id(s) to cancel. Can be a list of integer job ids, ‘all’ to cancel all jobs, or a feature (-p short) to cancel all jobs with a given feature
- static scontrol(cmd)
Submit an scontrol command.
- Parameters:
cmd (str) – Command string after “scontrol” word
- static submit(cmd, background=False, background_stdout=False)
Open a subprocess and submit a command.
- Parameters:
cmd (str) – Command to be submitted using python subprocess.
background (bool) – Flag to submit subprocess in the background. stdout stderr will be empty strings if this is True.
background_stdout (bool) – Flag to capture the stdout/stderr from the background process in a nohup.out file.
- Returns:
stdout (str) – Subprocess standard output. This is decoded from the subprocess stdout with rstrip.
stderr (str) – Subprocess standard error. This is decoded from the subprocess stderr with rstrip. After decoding/rstrip, this will be empty if the subprocess doesn’t return an error.