gaps.cli.preprocessing.preprocess_collect_config#

preprocess_collect_config(config, project_dir, command_name, collect_pattern='PIPELINE')[source]#

Pre-process collection config.

Specifically, the “collect_pattern” key is resolved into a list of 2-tuples, where the first element in each tuple is a path to the collection output file, while the second element is the corresponding filepath-pattern representing the files to be collected into the output file.

Parameters:

config (dict) – Collection config. This config will be updated to include a “collect_pattern” key if it doesn’t already have one. If the “collect_pattern” key exists, it can be a string, a collection, or a mapping with a .items() method.
project_dir (path-like) – Path to project directory. This path is used to resolve the out filepath input from the user.
command_name (str) – Name of the command being run. This is used to parse the pipeline status for output files if collect_pattern="PIPELINE" in the input config.
collect_pattern (str | list | dict, optional) – Unix-style /filepath/pattern*.h5 representing the files to be collected into a single output HDF5 file. If no output file path is specified (i.e. this input is a single pattern or a list of patterns), the output file path will be inferred from the pattern itself (specifically, the wildcard will be removed and the result will be the output file path). If a list of patterns is provided, each pattern will be collected into a separate output file. To specify the name of the output file(s), set this input to a dictionary where the keys are paths to the output file (including the filename itself; relative paths are allowed) and the values are patterns representing the files that should be collected into the output file. If running a collect job as part of a pipeline, this input can be set to "PIPELINE", which will parse the output of the previous step and generate the input file pattern and output file name automatically. By default, "PIPELINE".

Returns:

dict – Updated collection config.