sup3r.utilities.regridder.RegridOutput

class RegridOutput(source_files, out_pattern, target_meta, heights, cache_pattern=None, leaf_size=4, k_neighbors=4, incremental=False, n_chunks=100, max_nodes=1, worker_kwargs=None)[source]

Bases: OutputMixIn, DistributedProcess

Output regridded data as it is interpolated. Takes source data from windspeed and winddirection h5 files and uses this data to interpolate onto a new target grid. The interpolated data is then written to new files, with one file for each field (e.g. windspeed_100m).

Parameters:

source_files (str | list) – Path to source files to regrid to target_meta
out_pattern (str) – Pattern to use for naming outputs file to store the regridded data. This must include a {file_id} format key. e.g. ./chunk_{file_id}.h5
target_meta (str) – Path to dataframe of final grid coordinates on which to regrid
heights (list) – List of wind field heights to regrid. e.g if heights = [100] then windspeed_100m and winddirection_100m will be regridded and stored in the output_file.
cache_pattern (str) – Pattern for cached indices and distances for ball tree
leaf_size (int, optional) – leaf size for BallTree
k_neighbors (int, optional) – number of nearest neighbors to use for interpolation
incremental (bool) – Whether to keep already written output chunks or overwrite them
n_chunks (int) – Number of spatial chunks to use for interpolation. The total number of points in the target_meta will be split into n_chunks and the points in each chunk will be interpolated at the same time.
max_nodes (int) – Number of nodes to distribute chunks across.
worker_kwargs (dict | None) – Dictionary of workers args. Optional keys include regrid_workers (max number of workers to use for regridding and output)

Methods

`chunk_finished`(chunk_index)	Check if process for given chunk_index has already been run.
`get_dset_attrs`(feature)	Get attrributes for output feature
`get_node_cmd`(config)	Get a CLI call to regrid data.
`get_time_dim_name`(filepath)	Get the name of the time dimension in the given file
`node_finished`(node_index)	Check if all out files for a given node have been saved
`run`(node_index)	Run regridding and output write in either serial or parallel
`write_coordinates`(source_files, chunk_index)	Write regridded coordinate data to the output file
`write_data`(out_file, dsets, time_index, ...)	Write list of datasets to out_file.

Attributes

`all_finished`	Check if all out files have been saved
`chunks`	Get the number of process chunks for this distributed routine.
`distance_chunks`	Get list of distance chunks to use for chunking data extraction and interpolation.
`failed_chunks`	Check whether any processes have failed.
`index_chunks`	Get list of index chunks to use for chunking data extraction and interpolation.
`max_memory`	Check max memory usage (in GB)
`max_nodes`	Get uncapped max number of nodes to distribute processes across
`meta_chunks`	Get meta chunks corresponding to the spatial chunks of the target_meta
`node_chunks`	Get the chunk indices for different nodes
`node_files`	Get the file lists for different nodes
`nodes`	Get the max number of nodes to distribute chunks across, limited by the number of process chunks
`out_files`	Get list of output files for each spatial chunk
`output_features`	Get list of dsets to write to output files
`spatial_slices`	Get the list of slices which select index and distance chunks

property spatial_slices: Get the list of slices which select index and distance chunks

property max_memory: Check max memory usage (in GB)

property index_chunks: Get list of index chunks to use for chunking data extraction and interpolation. indices[i] is the set of indices for the i-th coordinate in the target grid which select the neighboring points in the source grid

property distance_chunks: Get list of distance chunks to use for chunking data extraction and interpolation. distances[i] is the set of distances from the i-th coordinate in the target grid to the neighboring points in the source grid

property meta_chunks: Get meta chunks corresponding to the spatial chunks of the target_meta

property out_files: Get list of output files for each spatial chunk

property output_features: Get list of dsets to write to output files

classmethod get_node_cmd(config)[source]

Get a CLI call to regrid data.

Parameters:: config (dict) – sup3r collection config with all necessary args and kwargs to run regridding.

run(node_index)[source]

Run regridding and output write in either serial or parallel

Parameters:: node_index (int) – Node index to run. e.g. if node_index=0 then only the chunks for node_chunks[0] will be run.

property all_finished: Check if all out files have been saved

chunk_finished(chunk_index)

Check if process for given chunk_index has already been run.

Parameters:: chunk_index (int) – Index of the process chunk to check for completion. Considered finished if there is already an output file and incremental is False.
Returns:: bool – Whether the process for the given chunk has finished

property chunks: Get the number of process chunks for this distributed routine.

property failed_chunks: Check whether any processes have failed.

static get_dset_attrs(feature)

Get attrributes for output feature

Parameters:

feature (str) – Name of feature to write

Returns:

attrs (dict) – Dictionary of attributes for requested dset
dtype (str) – Data type for requested dset. Defaults to float32

static get_time_dim_name(filepath)

Get the name of the time dimension in the given file

Parameters:: filepath (str) – Path to the file
Returns:: time_key (str) – Name of the time dimension in the given file

property max_nodes: Get uncapped max number of nodes to distribute processes across

property node_chunks: Get the chunk indices for different nodes

property node_files: Get the file lists for different nodes

node_finished(node_index)

Check if all out files for a given node have been saved

Parameters:: node_index (int) – Index of node to check for completed processes
Returns:: bool – Whether all processes for the given node have finished

property nodes: Get the max number of nodes to distribute chunks across, limited by the number of process chunks

write_coordinates(source_files, chunk_index)[source]

Write regridded coordinate data to the output file

Parameters:

source_files (list) – List of paths to source files
chunk_index (int) – Index of spatial chunk to regrid and write to output file

classmethod write_data(out_file, dsets, time_index, data_list, meta, global_attrs=None)

Write list of datasets to out_file.

Parameters:

out_file (str) – Pre-existing H5 file output path
dsets (list) – list of datasets to write to out_file
time_index (pd.DatetimeIndex()) – Pandas datetime index to use for file time_index.
data_list (list) – List of np.ndarray objects to write to out_file
meta (pd.DataFrame) – Full meta dataframe for the final output data.
global_attrs (dict) – Namespace of file-global attributes for the final output data.