sup3r.postprocessing.collectors.nc.CollectorNC#

class CollectorNC(file_paths)[source]#

Bases: BaseCollector

Sup3r NETCDF file collection framework

Parameters:

file_paths (list | str) – Explicit list of str file paths that will be sorted and collected or a single string with unix-style /search/patt*ern.<ext>. Files should have non-overlapping time_index and spatial domains.

Methods

collect(file_paths, out_file[, features, ...])

Collect data files from a dir to one output file.

get_chunk_indices(file)

Get spatial and temporal chunk indices from the given file name.

get_dset_attrs(feature)

Get attrributes for output feature

get_node_cmd(config)

Get a CLI call to collect data.

get_time_dim_name(filepath)

Get the name of the time dimension in the given file

group_spatial_chunks()

Group same spatial chunks together so each chunk has same spatial footprint but different times

write_data(out_file, dsets, time_index, ...)

Write list of datasets to out_file.

classmethod collect(file_paths, out_file, features='all', log_level=None, log_file=None, overwrite=True, res_kwargs=None)[source]#

Collect data files from a dir to one output file.

Filename requirements:
  • Should end with “.nc”

Parameters:
  • file_paths (list | str) – Explicit list of str file paths that will be sorted and collected or a single string with unix-style /search/patt*ern.nc.

  • out_file (str) – File path of final output file.

  • features (list | str) – List of dsets to collect. If ‘all’ then all data_vars will be collected.

  • log_level (str | None) – Desired log level, None will not initialize logging.

  • log_file (str | None) – Target log file. None logs to stdout.

  • write_status (bool) – Flag to write status file once complete if running from pipeline.

  • job_name (str) – Job name for status file if running from pipeline.

  • overwrite (bool) – Whether to overwrite existing output file

  • res_kwargs (dict | None) – Dictionary of kwargs to pass to xarray.open_mfdataset.

group_spatial_chunks()[source]#

Group same spatial chunks together so each chunk has same spatial footprint but different times

static get_chunk_indices(file)#

Get spatial and temporal chunk indices from the given file name.

Returns:

  • temporal_chunk_index (str) – Zero padded integer for the temporal chunk index

  • spatial_chunk_index (str) – Zero padded integer for the spatial chunk index

static get_dset_attrs(feature)#

Get attrributes for output feature

Parameters:

feature (str) – Name of feature to write

Returns:

  • attrs (dict) – Dictionary of attributes for requested dset

  • dtype (str) – Data type for requested dset. Defaults to float32

classmethod get_node_cmd(config)#

Get a CLI call to collect data.

Parameters:

config (dict) – sup3r collection config with all necessary args and kwargs to run data collection.

static get_time_dim_name(filepath)#

Get the name of the time dimension in the given file

Parameters:

filepath (str) – Path to the file

Returns:

time_key (str) – Name of the time dimension in the given file

classmethod write_data(out_file, dsets, time_index, data_list, meta, global_attrs=None)#

Write list of datasets to out_file.

Parameters:
  • out_file (str) – Pre-existing H5 file output path

  • dsets (list) – list of datasets to write to out_file

  • time_index (pd.DatetimeIndex()) – Pandas datetime index to use for file time_index.

  • data_list (list) – List of np.ndarray objects to write to out_file

  • meta (pd.DataFrame) – Full meta dataframe for the final output data.

  • global_attrs (dict) – Namespace of file-global attributes for the final output data.