gaps.cli.preprocessing.preprocess_script_config#

preprocess_script_config(config, cmd)[source]#

Pre-process script config.

Parameters:
  • config (dict) – Script config. This config will be updated such that the “cmd” key is always a list.

  • cmd (str | list) – A single command represented as a string or a list of command strings to execute on a node. If the input is a list, each command string in the list will be executed on a separate node. For example, to run a python script, simply specify

    "cmd": "python my_script.py"
    

    This will run the python file “my_script.py” (in the project directory) on a single node.

    Important

    It is inefficient to run scripts that only use a single processor on HPC nodes for extended periods of time. Always make sure your long-running scripts use Python’s multiprocessing library wherever possible to make the most use of shared HPC resources.

    To run multiple commands in parallel, supply them as a list:

    "cmd": [
        "python /path/to/my_script/py -a -out out_file.txt",
        "wget https://website.org/latest.zip"
    ]
    

    This input will run two commands (a python script with the specified arguments and a wget command to download a file from the web), each on their own node and in parallel as part of this pipeline step. Note that commands are always executed from the project directory.

Returns:

dict – Updated script config.