rex.utilities.hpc.SLURM

class SLURM(user=None, queue_dict=None)[source]

Bases: HpcJobManager

Subclass for SLURM subprocess jobs.

Parameters:
  • user (str | None) – HPC username. None will get your username using getpass.getuser()

  • queue_dict (dict | None) – Parsed HPC queue dictionary (qstat for PBS or squeue for SLURM) from parse_queue_str(). None will get the queue from PBS or SLURM.

Methods

change_qos(arg, qos)

Change the priority (quality of service) for a job.

check_status([job_id, job_name])

Check the status of an HPC job using the HPC queue.

format_walltime(hours)

Get the SLURM walltime string in format "HH:MM:SS"

hold(arg)

Temporarily hold a job from submitting.

make_path(d)

Make a directory tree if it doesn't exist.

make_sh(fname, script)

Make a shell script (.sh file) to execute a subprocess.

parse_queue_str(queue_str[, keys])

Parse the qstat or squeue output string into a dict format keyed by integer job id with nested dictionary of job properties (queue printout columns).

query_queue([job_name, user, qformat, skip_rows])

Run the HPC queue command and return the raw stdout string.

release(arg)

Release a job that was previously on hold so it will be submitted to a compute node.

rm(fname)

Remove a file.

s(s)

Format input as str w/ appropriate quote types for python cli entry.

sbatch(cmd[, alloc, walltime, memory, ...])

Submit a SLURM job via sbatch command and SLURM shell script

scancel(arg)

Cancel a slurm job.

scontrol(cmd)

Submit an scontrol command.

submit(cmd[, background, background_stdout])

Open a subprocess and submit a command.

Attributes

MAX_NAME_LEN

QCOL_ID

QCOL_NAME

QCOL_STATUS

QSKIP

SQ_FORMAT

USER

queue

Get the HPC queue parsed into dict format keyed by integer job id

queue_job_ids

Get a list of the job integer ids in the queue

queue_job_names

Get a list of the job names in the queue

classmethod query_queue(job_name=None, user=None, qformat=None, skip_rows=None)[source]

Run the HPC queue command and return the raw stdout string.

Parameters:
  • job_name (str | None) – Optional to check the squeue for a specific job name (not limited to the 8 shown characters) or None to show user’s whole queue.

  • user (str | None) – HPC username. None will get your username using getpass.getuser()

  • qformat (str | None) – Queue format string specification. Changing this form the default (None) could have adverse effects!

  • skip_rows (int | list | None) – Optional row index values to skip.

Returns:

stdout (str) – HPC queue output string. Can be split on line breaks to get list.

static scontrol(cmd)[source]

Submit an scontrol command.

Parameters:

cmd (str) – Command string after “scontrol” word

scancel(arg)[source]

Cancel a slurm job.

Parameters:

arg (int | list | str) – SLURM integer job id(s) to cancel. Can be a list of integer job ids, ‘all’ to cancel all jobs, or a feature (-p short) to cancel all jobs with a given feature

change_qos(arg, qos)[source]

Change the priority (quality of service) for a job.

Parameters:
  • arg (int | list | str) – SLURM integer job id(s) to change qos for. Can be ‘all’ for all jobs.

  • qos (str) – New qos value

hold(arg)[source]

Temporarily hold a job from submitting. Held jobs will stay in queue but will not get nodes until released.

Parameters:

arg (int | list | str) – SLURM integer job id(s) to hold. Can be ‘all’ to hold all jobs.

release(arg)[source]

Release a job that was previously on hold so it will be submitted to a compute node.

Parameters:

arg (int | list | str) – SLURM integer job id(s) to release. Can be ‘all’ to release all jobs.

check_status(job_id=None, job_name=None)

Check the status of an HPC job using the HPC queue.

Parameters:
  • job_id (int | None) – Job integer ID number (preferred input)

  • job_name (str) – Job name string.

Returns:

status (str | NoneType) – Queue job status str or None if not found. SLURM status strings: PD, R, CG (pending, running, complete). PBS status strings: Q, R, C (queued, running, complete).

static format_walltime(hours)

Get the SLURM walltime string in format “HH:MM:SS”

Parameters:

hours (float | int) – Requested number of job hours.

Returns:

walltime (str) – SLURM walltime request in format “HH:MM:SS”

static make_path(d)

Make a directory tree if it doesn’t exist.

Parameters:

d (str) – Directory tree to check and potentially create.

static make_sh(fname, script)

Make a shell script (.sh file) to execute a subprocess.

Parameters:
  • fname (str) – Name of the .sh file to create.

  • script (str) – Contents to be written into the .sh file.

classmethod parse_queue_str(queue_str, keys=0)

Parse the qstat or squeue output string into a dict format keyed by integer job id with nested dictionary of job properties (queue printout columns).

Parameters:
  • queue_str (str) – HPC queue output string (qstat for PBS or squeue for SLURM). Typically a space-delimited string with line breaks.

  • keys (list | int) – Argument to set the queue job attributes (column headers). This defaults to an integer which says which row index contains the space-delimited column headers. Can also be a list to explicitly set the column headers.

Returns:

queue_dict (dict) – HPC queue parsed into dictionary format keyed by integer job id with nested dictionary of job properties (queue printout columns).

property queue

Get the HPC queue parsed into dict format keyed by integer job id

Returns:

queue (dict) – HPC queue parsed into dictionary format keyed by integer job id with nested dictionary of job properties (queue printout columns).

property queue_job_ids

Get a list of the job integer ids in the queue

property queue_job_names

Get a list of the job names in the queue

static rm(fname)

Remove a file.

Parameters:

fname (str) – Filename (with path) to remove.

static s(s)

Format input as str w/ appropriate quote types for python cli entry.

Examples

list, tuple -> “[‘one’, ‘two’]” dict -> “{‘key’: ‘val’}” int, float, None -> ‘0’ str, other -> ‘string’

sbatch(cmd, alloc=None, walltime=None, memory=None, nodes=1, feature=None, name='reV', stdout_path='./stdout', keep_sh=False, conda_env=None, module=None, module_root='/shared-projects/rev/modulefiles')[source]

Submit a SLURM job via sbatch command and SLURM shell script

Parameters:
  • cmd (str) –

    Command to be submitted in SLURM shell script. Example:

    ‘python -m reV.generation.cli_gen’

  • alloc (str) – HPC project (allocation) handle. Example: ‘rev’. Default is not to state an allocation (does not work on Eagle slurm).

  • walltime (float) – Node walltime request in hours. Default is not to state a walltime (does not work on Eagle slurm).

  • memory (int) – Node memory request in GB.

  • nodes (int) – Number of nodes to use for this sbatch job. Default is 1.

  • feature (str) – Additional flags for SLURM job. Format is “–qos=high” or “–depend=[state:job_id]”. Default is None.

  • name (str) – SLURM job name.

  • stdout_path (str) – Path to print .stdout and .stderr files.

  • keep_sh (bool) – Boolean to keep the .sh files. Default is to remove these files after job submission.

  • conda_env (str) – Conda environment to activate

  • module (bool) – Module to load

  • module_root (str) – Path to module root to load

Returns:

  • out (str) – sbatch standard output, if submitted successfully, this is the slurm job id.

  • err (str) – sbatch standard error, this is typically an empty string if the job was submitted successfully.

static submit(cmd, background=False, background_stdout=False)

Open a subprocess and submit a command.

Parameters:
  • cmd (str) – Command to be submitted using python subprocess.

  • background (bool) – Flag to submit subprocess in the background. stdout stderr will be empty strings if this is True.

  • background_stdout (bool) – Flag to capture the stdout/stderr from the background process in a nohup.out file.

Returns:

  • stdout (str) – Subprocess standard output. This is decoded from the subprocess stdout with rstrip.

  • stderr (str) – Subprocess standard error. This is decoded from the subprocess stderr with rstrip. After decoding/rstrip, this will be empty if the subprocess doesn’t return an error.