jade.hpc.pbs_manager.PbsManager¶
- class jade.hpc.pbs_manager.PbsManager(config)[source]¶
Bases:
HpcManagerInterface
Manages PBS jobs.
Methods
Return True if the current node is the manager node.
cancel_job
(job_id)Cancel job.
check_status
([name, job_id])Check the status of a job.
Check the statuses of all user jobs.
Checks if the storage configuration is appropriate for execution.
Create a Dask cluster.
Create a Dask local cluster.
create_submission_script
(name, script, filename)Create the script to queue the jobs to the HPC.
Get HPC configuration parameters.
Get the job ID for the local compute node.
get_job_stats
(job_id)Get stats for job ID.
Get path to local storage space.
Return the node ID of the current system.
Return the number of CPUs in the system.
list_active_nodes
(job_id)Return the nodes currently participating in the job.
Logs all relevant HPC environment variables.
submit
(filename)Submit the work to the HPC queue.
Attributes
USER
- cancel_job(job_id)[source]¶
Cancel job.
- Parameters:
job_id (str)
- Returns:
return code
- Return type:
int
- check_status(name=None, job_id=None)[source]¶
Check the status of a job. Either name or job_id must be passed. Handles transient errors for up to one minute.
- Parameters:
name (str) – job name
job_id (str) – job ID
- Return type:
- Raises:
ExecutionError – Raised if statuses cannot be retrieved.
- check_statuses()[source]¶
Check the statuses of all user jobs. Handles transient errors for up to one minute.
- Returns:
key is job_id, value is HpcJobStatus
- Return type:
dict
- Raises:
ExecutionError – Raised if statuses cannot be retrieved.
- check_storage_configuration()[source]¶
Checks if the storage configuration is appropriate for execution.
- Raises:
InvalidConfiguration – Raised if the configuration is not valid
- create_cluster()[source]¶
Create a Dask cluster.
- Returns:
SLURM: SLURMCluster
- Return type:
Dask cluster
- create_local_cluster()[source]¶
Create a Dask local cluster.
- Return type:
dask.distributed.LocalCluster
- create_submission_script(name, script, filename, path='.')[source]¶
Create the script to queue the jobs to the HPC.
- Parameters:
name (str) – job name
script (str) – script to execute on HPC
filename (str) – submission script filename
path (str) – path for stdout and stderr files
- list_active_nodes(job_id)[source]¶
Return the nodes currently participating in the job. Order should be deterministic.
- Parameters:
job_id (str)
- Returns:
list of node hostnames
- Return type:
list
- submit(filename)[source]¶
Submit the work to the HPC queue. Handles transient errors for up to one minute.
- Parameters:
filename (str) – HPC script filename
- Returns:
(Status, job_id, stderr)
- Return type:
tuple of Status, str, str
- abstract get_node_id()¶
Return the node ID of the current system.
- Return type:
str