gaps.hpc.SLURM#
- class SLURM(user=None, queue_dict=None)[source]#
Bases:
HpcJobManager
Subclass for SLURM subprocess jobs.
- Parameters:
user (str | None, optional) – HPC username. None will get your username using
getpass.getuser()
. By default, None.queue_dict (dict | None, optional) – Parsed HPC queue dictionary from
parse_queue_str()
. None will get the queue info from the hardware. By default, None.
Methods
cancel
(arg)Cancel a job.
check_status_using_job_id
(job_id)Check the status of a job using the HPC queue and job ID.
check_status_using_job_name
(job_name)Check the status of a job using the HPC queue and job name.
make_script_str
(name, cmd, allocation, walltime)Generate the SLURM submission script.
parse_queue_str
(queue_str)Parse the hardware queue string into a nested dictionary.
Run the HPC queue command and return the raw stdout string.
Reset the query dict cache so that hardware is queried again.
submit
(name[, keep_sh])Submit a job on the HPC.
Attributes
COLUMN_HEADERS
COMMANDS
MAX_NAME_LEN
Q_SUBMITTED_STATUS
SHELL_FILENAME_FMT
USER
HPC queue keyed by job ids with values as job properties.
- query_queue()[source]#
Run the HPC queue command and return the raw stdout string.
- Returns:
stdout (str) – HPC queue output string. Can be split on line breaks to get a list.
- make_script_str(name, cmd, allocation, walltime, qos='normal', memory=None, feature=None, stdout_path='./stdout', conda_env=None, sh_script=None)[source]#
Generate the SLURM submission script.
- Parameters:
name (str) – SLURM job name.
cmd (str) –
- Command to be submitted in SLURM shell script. Example:
‘python -m reV.generation.cli_gen’
allocation (str) – HPC allocation account. Example: ‘rev’.
walltime (int | float) – Node walltime request in hours. Example: 4.
qos ({“normal”, “high”}) – Quality of service specification for job. Jobs with “high” priority will be charged at 2x the rate. By default,
"normal"
.memory (int , optional) – Node memory request in GB. By default, None.
feature (str, optional) – Additional flags for SLURM job. Format is “–partition=debug” or “–depend=[state:job_id]”. Do not use this input to specify QOS. Use the ``qos`` input instead. By default, None.
stdout_path (str, optional) – Path to print .stdout and .stderr files. By default,
DEFAULT_STDOUT_PATH
.conda_env (str, optional) – Conda environment to activate. By default, None.
sh_script (str, optional) – Script to run before executing command. By default, None.
- Returns:
str – SLURM script to submit.
- cancel(arg)#
Cancel a job.
- Parameters:
arg (int | list | str) – Integer job id(s) to cancel. Can be a list of integer job ids, ‘all’ to cancel all jobs, or a feature (-p short) to cancel all jobs with a given feature
- check_status_using_job_id(job_id)#
Check the status of a job using the HPC queue and job ID.
- Parameters:
job_id (int) – Job integer ID number.
- Returns:
status (str | None) – Queue job status string or None if not found.
- check_status_using_job_name(job_name)#
Check the status of a job using the HPC queue and job name.
- Parameters:
job_name (str) – Job name string.
- Returns:
status (str | None) – Queue job status string or None if not found.
- classmethod parse_queue_str(queue_str)#
Parse the hardware queue string into a nested dictionary.
This function parses the queue output string into a dictionary keyed by integer job ids with values as dictionaries of job properties (queue printout columns).
- Parameters:
queue_str (str) – HPC queue output string. Typically a space-delimited string with line breaks.
- Returns:
queue_dict (dict) – HPC queue parsed into dictionary format keyed by integer job ids with values as dictionaries of job properties (queue printout columns).
- reset_query_cache()#
Reset the query dict cache so that hardware is queried again.
- submit(name, keep_sh=False, **kwargs)#
Submit a job on the HPC.
- Parameters:
name (str) – HPC job name.
keep_sh (bool, optional) – Option to keep the submission script on disk. By default, False.
**kwargs – Extra keyword-argument pairs to be passed to
make_script_str()
.
- Returns:
out (str) – Standard output from submission. If submitted successfully, this is the Job ID.
err (str) – Standard error. This is an empty string if the job was submitted successfully.