gaps.cli.preprocessing.preprocess_collect_config#
- preprocess_collect_config(config, project_dir, command_name, collect_pattern='PIPELINE')[source]#
Pre-process collection config.
Specifically, the “collect_pattern” key is resolved into a list of 2-tuples, where the first element in each tuple is a path to the collection output file, while the second element is the corresponding filepath-pattern representing the files to be collected into the output file.
- Parameters:
config (dict) – Collection config. This config will be updated to include a “collect_pattern” key if it doesn’t already have one. If the “collect_pattern” key exists, it can be a string, a collection, or a mapping with a .items() method.
project_dir (path-like) – Path to project directory. This path is used to resolve the out filepath input from the user.
command_name (str) – Name of the command being run. This is used to parse the pipeline status for output files if
collect_pattern="PIPELINE"
in the input config.collect_pattern (str | list | dict, optional) – Unix-style
/filepath/pattern*.h5
representing the files to be collected into a single output HDF5 file. If no output file path is specified (i.e. this input is a single pattern or a list of patterns), the output file path will be inferred from the pattern itself (specifically, the wildcard will be removed and the result will be the output file path). If a list of patterns is provided, each pattern will be collected into a separate output file. To specify the name of the output file(s), set this input to a dictionary where the keys are paths to the output file (including the filename itself; relative paths are allowed) and the values are patterns representing the files that should be collected into the output file. If running a collect job as part of a pipeline, this input can be set to"PIPELINE"
, which will parse the output of the previous step and generate the input file pattern and output file name automatically. By default,"PIPELINE"
.
- Returns:
dict – Updated collection config.