jade.hpc.hpc_manager_interface.HpcManagerInterface¶
- class jade.hpc.hpc_manager_interface.HpcManagerInterface[source]¶
Bases:
ABC
Defines the implementation interface for managing an HPC.
Methods
Return True if the current node is the manager node.
cancel_job
(job_id)Cancel job.
check_status
([name, job_id])Check the status of a job.
Check the statuses of all user jobs.
Checks if the storage configuration is appropriate for execution.
Create a Dask cluster.
Create a Dask local cluster.
create_submission_script
(name, script, ...)Create the script to queue the jobs to the HPC.
Get HPC configuration parameters.
Get the job ID for the local compute node.
get_job_stats
(job_id)Get stats for job ID.
Get path to local storage space.
Return the node ID of the current system.
Return the number of CPUs in the system.
list_active_nodes
(job_id)Return the nodes currently participating in the job.
Logs all relevant HPC environment variables.
submit
(filename)Submit the work to the HPC queue.
Attributes
USER
- abstract am_i_manager()[source]¶
Return True if the current node is the manager node.
- Return type:
bool
- abstract cancel_job(job_id)[source]¶
Cancel job.
- Parameters:
job_id (str)
- Returns:
return code
- Return type:
int
- abstract check_status(name=None, job_id=None)[source]¶
Check the status of a job. Either name or job_id must be passed. Handles transient errors for up to one minute.
- Parameters:
name (str) – job name
job_id (str) – job ID
- Return type:
- Raises:
ExecutionError – Raised if statuses cannot be retrieved.
- abstract check_statuses()[source]¶
Check the statuses of all user jobs. Handles transient errors for up to one minute.
- Returns:
key is job_id, value is HpcJobStatus
- Return type:
dict
- Raises:
ExecutionError – Raised if statuses cannot be retrieved.
- abstract check_storage_configuration()[source]¶
Checks if the storage configuration is appropriate for execution.
- Raises:
InvalidConfiguration – Raised if the configuration is not valid
- abstract create_submission_script(name, script, filename, path)[source]¶
Create the script to queue the jobs to the HPC.
- Parameters:
name (str) – job name
script (str) – script to execute on HPC
filename (str) – submission script filename
path (str) – path for stdout and stderr files
- abstract create_cluster()[source]¶
Create a Dask cluster.
- Returns:
SLURM: SLURMCluster
- Return type:
Dask cluster
- abstract create_local_cluster()[source]¶
Create a Dask local cluster.
- Return type:
dask.distributed.LocalCluster