jade.hpc.hpc_manager.HpcManager¶
- class jade.hpc.hpc_manager.HpcManager(submission_groups, output)[source]¶
Bases:
object
Manages HPC job submission and monitoring.
Methods
Return True if the current node is the manager node.
cancel_job
(job_id)Cancel job.
check_status
([name, job_id])Return the status of a job by name or ID.
Check the statuses of all user jobs.
create_hpc_interface
(config)Returns an HPC implementation instance appropriate for the current environment.
get_hpc_config
(submission_group_name)Returns the HPC interface instance.
get_job_stats
(job_id)Get path to local storage space.
get_manager_node
(job_id)Return the first node in the job.
list_active_nodes
(job_id)Return the nodes currently participating in the job.
submit
(directory, name, script, ...[, wait, ...])Submits scripts to the queue for execution.
Attributes
Return the type of HPC management system.
- cancel_job(job_id)[source]¶
Cancel job.
- Parameters:
job_id (str)
- Returns:
return code
- Return type:
int
- check_status(name=None, job_id=None)[source]¶
Return the status of a job by name or ID.
- Parameters:
name (str) – job name
job_id (str) – job ID
- Return type:
- check_statuses()[source]¶
Check the statuses of all user jobs.
- Returns:
key is job_id, value is HpcJobStatus
- Return type:
dict
- get_hpc_config(submission_group_name)[source]¶
Returns the HPC interface instance.
- Parameters:
submission_group_name (str)
- Return type:
- get_manager_node(job_id)[source]¶
Return the first node in the job.
- Parameters:
job_id (str)
- Returns:
list of node hostnames
- Return type:
list
- list_active_nodes(job_id)[source]¶
Return the nodes currently participating in the job. Order should be deterministic.
- Parameters:
job_id (str)
- Returns:
list of node hostnames
- Return type:
list
- submit(directory, name, script, submission_group_name, wait=False, keep_submission_script=True, dry_run=False)[source]¶
Submits scripts to the queue for execution.
- Parameters:
directory (str) – directory to contain the submission script
name (str) – job name
script (str) – Script to execute.
submission_group_name (str)
wait (bool) – Wait for execution to complete.
keep_submission_script (bool) – Do not delete the submission script.
dry_run (bool) – Do not actually submit jobs. Just create the files.
- Returns:
(job_id, submission status)
- Return type:
tuple