sup3r.utilities.cli.SlurmManager#

class SlurmManager(user=None, queue_dict=None)[source]#

Bases: SLURM

GAPs-compliant SLURM manager

Parameters:
  • user (str | None) – HPC username. None will get your username using getpass.getuser()

  • queue_dict (dict | None) – Parsed HPC queue dictionary (qstat for PBS or squeue for SLURM) from parse_queue_str(). None will get the queue from PBS or SLURM.

Methods

change_qos(arg, qos)

Change the priority (quality of service) for a job.

check_status([job_id, job_name])

Check the status of an HPC job using the HPC queue.

check_status_using_job_id(job_id)

Check the status of a job using the HPC queue and job ID.

format_walltime(hours)

Get the SLURM walltime string in format "HH:MM:SS"

hold(arg)

Temporarily hold a job from submitting.

make_path(d)

Make a directory tree if it doesn't exist.

make_sh(fname, script)

Make a shell script (.sh file) to execute a subprocess.

parse_queue_str(queue_str[, keys])

Parse the qstat or squeue output string into a dict format keyed by integer job id with nested dictionary of job properties (queue printout columns).

query_queue([job_name, user, qformat, skip_rows])

Run the HPC queue command and return the raw stdout string.

release(arg)

Release a job that was previously on hold so it will be submitted to a compute node.

rm(fname)

Remove a file.

s(s)

Format input as str w/ appropriate quote types for python cli entry.

sbatch(cmd[, alloc, walltime, memory, ...])

Submit a SLURM job via sbatch command and SLURM shell script

scancel(arg)

Cancel a slurm job.

scontrol(cmd)

Submit an scontrol command.

submit(cmd[, background, background_stdout])

Open a subprocess and submit a command.

Attributes

MAX_NAME_LEN

QCOL_ID

QCOL_NAME

QCOL_STATUS

QSKIP

SQ_FORMAT

USER

queue

Get the HPC queue parsed into dict format keyed by integer job id

queue_job_ids

Get a list of the job integer ids in the queue

queue_job_names

Get a list of the job names in the queue

check_status_using_job_id(job_id)[source]#

Check the status of a job using the HPC queue and job ID.

Parameters:

job_id (int) – Job integer ID number.

Returns:

status (str | None) – Queue job status string or None if not found.

change_qos(arg, qos)#

Change the priority (quality of service) for a job.

Parameters:
  • arg (int | list | str) – SLURM integer job id(s) to change qos for. Can be ‘all’ for all jobs.

  • qos (str) – New qos value

check_status(job_id=None, job_name=None)#

Check the status of an HPC job using the HPC queue.

Parameters:
  • job_id (int | None) – Job integer ID number (preferred input)

  • job_name (str) – Job name string.

Returns:

status (str | NoneType) – Queue job status str or None if not found. SLURM status strings: PD, R, CG (pending, running, complete). PBS status strings: Q, R, C (queued, running, complete).

static format_walltime(hours)#

Get the SLURM walltime string in format “HH:MM:SS”

Parameters:

hours (float | int) – Requested number of job hours.

Returns:

walltime (str) – SLURM walltime request in format “HH:MM:SS”

hold(arg)#

Temporarily hold a job from submitting. Held jobs will stay in queue but will not get nodes until released.

Parameters:

arg (int | list | str) – SLURM integer job id(s) to hold. Can be ‘all’ to hold all jobs.

static make_path(d)#

Make a directory tree if it doesn’t exist.

Parameters:

d (str) – Directory tree to check and potentially create.

static make_sh(fname, script)#

Make a shell script (.sh file) to execute a subprocess.

Parameters:
  • fname (str) – Name of the .sh file to create.

  • script (str) – Contents to be written into the .sh file.

classmethod parse_queue_str(queue_str, keys=0)#

Parse the qstat or squeue output string into a dict format keyed by integer job id with nested dictionary of job properties (queue printout columns).

Parameters:
  • queue_str (str) – HPC queue output string (qstat for PBS or squeue for SLURM). Typically a space-delimited string with line breaks.

  • keys (list | int) – Argument to set the queue job attributes (column headers). This defaults to an integer which says which row index contains the space-delimited column headers. Can also be a list to explicitly set the column headers.

Returns:

queue_dict (dict) – HPC queue parsed into dictionary format keyed by integer job id with nested dictionary of job properties (queue printout columns).

classmethod query_queue(job_name=None, user=None, qformat=None, skip_rows=None)#

Run the HPC queue command and return the raw stdout string.

Parameters:
  • job_name (str | None) – Optional to check the squeue for a specific job name (not limited to the 8 shown characters) or None to show user’s whole queue.

  • user (str | None) – HPC username. None will get your username using getpass.getuser()

  • qformat (str | None) – Queue format string specification. Changing this form the default (None) could have adverse effects!

  • skip_rows (int | list | None) – Optional row index values to skip.

Returns:

stdout (str) – HPC queue output string. Can be split on line breaks to get list.

property queue#

Get the HPC queue parsed into dict format keyed by integer job id

Returns:

queue (dict) – HPC queue parsed into dictionary format keyed by integer job id with nested dictionary of job properties (queue printout columns).

property queue_job_ids#

Get a list of the job integer ids in the queue

property queue_job_names#

Get a list of the job names in the queue

release(arg)#

Release a job that was previously on hold so it will be submitted to a compute node.

Parameters:

arg (int | list | str) – SLURM integer job id(s) to release. Can be ‘all’ to release all jobs.

static rm(fname)#

Remove a file.

Parameters:

fname (str) – Filename (with path) to remove.

static s(s)#

Format input as str w/ appropriate quote types for python cli entry.

Examples

list, tuple -> “[‘one’, ‘two’]” dict -> “{‘key’: ‘val’}” int, float, None -> ‘0’ str, other -> ‘string’

sbatch(cmd, alloc=None, walltime=None, memory=None, nodes=1, feature=None, name='reV', stdout_path='./stdout', keep_sh=False, conda_env=None, module=None, module_root='/shared-projects/rev/modulefiles')#

Submit a SLURM job via sbatch command and SLURM shell script

Parameters:
  • cmd (str) –

    Command to be submitted in SLURM shell script. Example:

    ‘python -m reV.generation.cli_gen’

  • alloc (str) – HPC project (allocation) handle. Example: ‘rev’. Default is not to state an allocation (does not work on Eagle slurm).

  • walltime (float) – Node walltime request in hours. Default is not to state a walltime (does not work on Eagle slurm).

  • memory (int) – Node memory request in GB.

  • nodes (int) – Number of nodes to use for this sbatch job. Default is 1.

  • feature (str) – Additional flags for SLURM job. Format is “–qos=high” or “–depend=[state:job_id]”. Default is None.

  • name (str) – SLURM job name.

  • stdout_path (str) – Path to print .stdout and .stderr files.

  • keep_sh (bool) – Boolean to keep the .sh files. Default is to remove these files after job submission.

  • conda_env (str) – Conda environment to activate

  • module (bool) – Module to load

  • module_root (str) – Path to module root to load

Returns:

  • out (str) – sbatch standard output, if submitted successfully, this is the slurm job id.

  • err (str) – sbatch standard error, this is typically an empty string if the job was submitted successfully.

scancel(arg)#

Cancel a slurm job.

Parameters:

arg (int | list | str) – SLURM integer job id(s) to cancel. Can be a list of integer job ids, ‘all’ to cancel all jobs, or a feature (-p short) to cancel all jobs with a given feature

static scontrol(cmd)#

Submit an scontrol command.

Parameters:

cmd (str) – Command string after “scontrol” word

static submit(cmd, background=False, background_stdout=False)#

Open a subprocess and submit a command.

Parameters:
  • cmd (str) – Command to be submitted using python subprocess.

  • background (bool) – Flag to submit subprocess in the background. stdout stderr will be empty strings if this is True.

  • background_stdout (bool) – Flag to capture the stdout/stderr from the background process in a nohup.out file.

Returns:

  • stdout (str) – Subprocess standard output. This is decoded from the subprocess stdout with rstrip.

  • stderr (str) – Subprocess standard error. This is decoded from the subprocess stderr with rstrip. After decoding/rstrip, this will be empty if the subprocess doesn’t return an error.