
class RegridOutput(source_files, out_pattern, target_meta, heights, cache_pattern=None, leaf_size=4, k_neighbors=4, incremental=False, n_chunks=100, max_nodes=1, worker_kwargs=None)[source]

Bases: OutputMixIn, DistributedProcess

Output regridded data as it is interpolated. Takes source data from windspeed and winddirection h5 files and uses this data to interpolate onto a new target grid. The interpolated data is then written to new files, with one file for each field (e.g. windspeed_100m).

  • source_files (str | list) – Path to source files to regrid to target_meta

  • out_pattern (str) – Pattern to use for naming outputs file to store the regridded data. This must include a {file_id} format key. e.g. ./chunk_{file_id}.h5

  • target_meta (str) – Path to dataframe of final grid coordinates on which to regrid

  • heights (list) – List of wind field heights to regrid. e.g if heights = [100] then windspeed_100m and winddirection_100m will be regridded and stored in the output_file.

  • cache_pattern (str) – Pattern for cached indices and distances for ball tree

  • leaf_size (int, optional) – leaf size for BallTree

  • k_neighbors (int, optional) – number of nearest neighbors to use for interpolation

  • incremental (bool) – Whether to keep already written output chunks or overwrite them

  • n_chunks (int) – Number of spatial chunks to use for interpolation. The total number of points in the target_meta will be split into n_chunks and the points in each chunk will be interpolated at the same time.

  • max_nodes (int) – Number of nodes to distribute chunks across.

  • worker_kwargs (dict | None) – Dictionary of workers args. Optional keys include regrid_workers (max number of workers to use for regridding and output)



Check if process for given chunk_index has already been run.


Get attrributes for output feature


Get a CLI call to regrid data.


Get the name of the time dimension in the given file


Check if all out files for a given node have been saved


Run regridding and output write in either serial or parallel

write_coordinates(source_files, chunk_index)

Write regridded coordinate data to the output file

write_data(out_file, dsets, time_index, ...)

Write list of datasets to out_file.



Check if all out files have been saved


Get the number of process chunks for this distributed routine.


Get list of distance chunks to use for chunking data extraction and interpolation.


Check whether any processes have failed.


Get list of index chunks to use for chunking data extraction and interpolation.


Check max memory usage (in GB)


Get uncapped max number of nodes to distribute processes across


Get meta chunks corresponding to the spatial chunks of the target_meta


Get the chunk indices for different nodes


Get the file lists for different nodes


Get the max number of nodes to distribute chunks across, limited by the number of process chunks


Get list of output files for each spatial chunk


Get list of dsets to write to output files


Get the list of slices which select index and distance chunks

property spatial_slices

Get the list of slices which select index and distance chunks

property max_memory

Check max memory usage (in GB)

property index_chunks

Get list of index chunks to use for chunking data extraction and interpolation. indices[i] is the set of indices for the i-th coordinate in the target grid which select the neighboring points in the source grid

property distance_chunks

Get list of distance chunks to use for chunking data extraction and interpolation. distances[i] is the set of distances from the i-th coordinate in the target grid to the neighboring points in the source grid

property meta_chunks

Get meta chunks corresponding to the spatial chunks of the target_meta

property out_files

Get list of output files for each spatial chunk

property output_features

Get list of dsets to write to output files

classmethod get_node_cmd(config)[source]

Get a CLI call to regrid data.


config (dict) – sup3r collection config with all necessary args and kwargs to run regridding.


Run regridding and output write in either serial or parallel


node_index (int) – Node index to run. e.g. if node_index=0 then only the chunks for node_chunks[0] will be run.

property all_finished

Check if all out files have been saved


Check if process for given chunk_index has already been run.


chunk_index (int) – Index of the process chunk to check for completion. Considered finished if there is already an output file and incremental is False.


bool – Whether the process for the given chunk has finished

property chunks

Get the number of process chunks for this distributed routine.

property failed_chunks

Check whether any processes have failed.

static get_dset_attrs(feature)

Get attrributes for output feature


feature (str) – Name of feature to write


  • attrs (dict) – Dictionary of attributes for requested dset

  • dtype (str) – Data type for requested dset. Defaults to float32

static get_time_dim_name(filepath)

Get the name of the time dimension in the given file


filepath (str) – Path to the file


time_key (str) – Name of the time dimension in the given file

property max_nodes

Get uncapped max number of nodes to distribute processes across

property node_chunks

Get the chunk indices for different nodes

property node_files

Get the file lists for different nodes


Check if all out files for a given node have been saved


node_index (int) – Index of node to check for completed processes


bool – Whether all processes for the given node have finished

property nodes

Get the max number of nodes to distribute chunks across, limited by the number of process chunks

write_coordinates(source_files, chunk_index)[source]

Write regridded coordinate data to the output file

  • source_files (list) – List of paths to source files

  • chunk_index (int) – Index of spatial chunk to regrid and write to output file

classmethod write_data(out_file, dsets, time_index, data_list, meta, global_attrs=None)

Write list of datasets to out_file.

  • out_file (str) – Pre-existing H5 file output path

  • dsets (list) – list of datasets to write to out_file

  • time_index (pd.DatetimeIndex()) – Pandas datetime index to use for file time_index.

  • data_list (list) – List of np.ndarray objects to write to out_file

  • meta (pd.DataFrame) – Full meta dataframe for the final output data.

  • global_attrs (dict) – Namespace of file-global attributes for the final output data.