sup3r.utilities.regridder.RegridOutput

class RegridOutput(source_files, out_pattern, target_meta, heights, cache_pattern=None, leaf_size=4, k_neighbors=4, incremental=False, n_chunks=100, max_nodes=1, worker_kwargs=None)[source]

Bases: OutputMixIn, DistributedProcess

Output regridded data as it is interpolated. Takes source data from windspeed and winddirection h5 files and uses this data to interpolate onto a new target grid. The interpolated data is then written to new files, with one file for each field (e.g. windspeed_100m).

Parameters:
  • source_files (str | list) – Path to source files to regrid to target_meta

  • out_pattern (str) – Pattern to use for naming outputs file to store the regridded data. This must include a {file_id} format key. e.g. ./chunk_{file_id}.h5

  • target_meta (str) – Path to dataframe of final grid coordinates on which to regrid

  • heights (list) – List of wind field heights to regrid. e.g if heights = [100] then windspeed_100m and winddirection_100m will be regridded and stored in the output_file.

  • cache_pattern (str) – Pattern for cached indices and distances for ball tree

  • leaf_size (int, optional) – leaf size for BallTree

  • k_neighbors (int, optional) – number of nearest neighbors to use for interpolation

  • incremental (bool) – Whether to keep already written output chunks or overwrite them

  • n_chunks (int) – Number of spatial chunks to use for interpolation. The total number of points in the target_meta will be split into n_chunks and the points in each chunk will be interpolated at the same time.

  • max_nodes (int) – Number of nodes to distribute chunks across.

  • worker_kwargs (dict | None) – Dictionary of workers args. Optional keys include regrid_workers (max number of workers to use for regridding and output)

Methods

chunk_finished(chunk_index)

Check if process for given chunk_index has already been run.

get_dset_attrs(feature)

Get attrributes for output feature

get_node_cmd(config)

Get a CLI call to regrid data.

get_time_dim_name(filepath)

Get the name of the time dimension in the given file

node_finished(node_index)

Check if all out files for a given node have been saved

run(node_index)

Run regridding and output write in either serial or parallel

write_coordinates(source_files, chunk_index)

Write regridded coordinate data to the output file

write_data(out_file, dsets, time_index, ...)

Write list of datasets to out_file.

Attributes

all_finished

Check if all out files have been saved

chunks

Get the number of process chunks for this distributed routine.

distance_chunks

Get list of distance chunks to use for chunking data extraction and interpolation.

failed_chunks

Check whether any processes have failed.

index_chunks

Get list of index chunks to use for chunking data extraction and interpolation.

max_memory

Check max memory usage (in GB)

max_nodes

Get uncapped max number of nodes to distribute processes across

meta_chunks

Get meta chunks corresponding to the spatial chunks of the target_meta

node_chunks

Get the chunk indices for different nodes

node_files

Get the file lists for different nodes

nodes

Get the max number of nodes to distribute chunks across, limited by the number of process chunks

out_files

Get list of output files for each spatial chunk

output_features

Get list of dsets to write to output files

spatial_slices

Get the list of slices which select index and distance chunks

property spatial_slices

Get the list of slices which select index and distance chunks

property max_memory

Check max memory usage (in GB)

property index_chunks

Get list of index chunks to use for chunking data extraction and interpolation. indices[i] is the set of indices for the i-th coordinate in the target grid which select the neighboring points in the source grid

property distance_chunks

Get list of distance chunks to use for chunking data extraction and interpolation. distances[i] is the set of distances from the i-th coordinate in the target grid to the neighboring points in the source grid

property meta_chunks

Get meta chunks corresponding to the spatial chunks of the target_meta

property out_files

Get list of output files for each spatial chunk

property output_features

Get list of dsets to write to output files

classmethod get_node_cmd(config)[source]

Get a CLI call to regrid data.

Parameters:

config (dict) – sup3r collection config with all necessary args and kwargs to run regridding.

run(node_index)[source]

Run regridding and output write in either serial or parallel

Parameters:

node_index (int) – Node index to run. e.g. if node_index=0 then only the chunks for node_chunks[0] will be run.

property all_finished

Check if all out files have been saved

chunk_finished(chunk_index)

Check if process for given chunk_index has already been run.

Parameters:

chunk_index (int) – Index of the process chunk to check for completion. Considered finished if there is already an output file and incremental is False.

Returns:

bool – Whether the process for the given chunk has finished

property chunks

Get the number of process chunks for this distributed routine.

property failed_chunks

Check whether any processes have failed.

static get_dset_attrs(feature)

Get attrributes for output feature

Parameters:

feature (str) – Name of feature to write

Returns:

  • attrs (dict) – Dictionary of attributes for requested dset

  • dtype (str) – Data type for requested dset. Defaults to float32

static get_time_dim_name(filepath)

Get the name of the time dimension in the given file

Parameters:

filepath (str) – Path to the file

Returns:

time_key (str) – Name of the time dimension in the given file

property max_nodes

Get uncapped max number of nodes to distribute processes across

property node_chunks

Get the chunk indices for different nodes

property node_files

Get the file lists for different nodes

node_finished(node_index)

Check if all out files for a given node have been saved

Parameters:

node_index (int) – Index of node to check for completed processes

Returns:

bool – Whether all processes for the given node have finished

property nodes

Get the max number of nodes to distribute chunks across, limited by the number of process chunks

write_coordinates(source_files, chunk_index)[source]

Write regridded coordinate data to the output file

Parameters:
  • source_files (list) – List of paths to source files

  • chunk_index (int) – Index of spatial chunk to regrid and write to output file

classmethod write_data(out_file, dsets, time_index, data_list, meta, global_attrs=None)

Write list of datasets to out_file.

Parameters:
  • out_file (str) – Pre-existing H5 file output path

  • dsets (list) – list of datasets to write to out_file

  • time_index (pd.DatetimeIndex()) – Pandas datetime index to use for file time_index.

  • data_list (list) – List of np.ndarray objects to write to out_file

  • meta (pd.DataFrame) – Full meta dataframe for the final output data.

  • global_attrs (dict) – Namespace of file-global attributes for the final output data.