nsrdb.aggregation.aggregation.Aggregation

class Aggregation(var, data_fpath, nn, w, final_ti)[source]

Bases: object

Framework for performing spatiotemporal aggregation.

Parameters:
  • var (str) – Variable (dataset) name being aggregated.

  • data_fpath (str) – Filepath to h5 file containing source var data.

  • nn (np.ndarray) – 1D array of site (column) indices in data_fpath to aggregate.

  • w (int) – Window size for temporal aggregation.

  • final_ti (pd.DateTimeIndex) – Final datetime index (used to ensure the aggregated profile has correct length).

Methods

cloud_property(var, data_fpath, nn, w, ...)

Run cloud property aggregation, returning the mean cloud property only for timesteps that match the most common (mode) cloud type.

cloud_property_avg(cprop_source, ...)

Run cloud property aggregation based on output cloud type.

cloud_type(var, data_fpath, nn, w, final_ti)

Run cloud type aggregation, returning the most common cloud type.

cloud_type_mode(data, w)

Get the mode of a 2D cloud type array using a rolling time window.

dhi(var, i, fout)

Calculate the aggregated DHI from an aggregated output file.

fill_flag(var, data_fpath, nn, w, final_ti)

Run fill flag aggregation, returning the percentage of timesteps that were filled.

format_out_arr(arr)

Format the output array (round and flatten).

mean(var, data_fpath, nn, w, final_ti)

Run agg using a spatial average and temporal moving window average.

point(var, data_fpath, nn, w, final_ti)

Run agg by selecting just the closest site and timestep.

reduce_timeseries(arr)

Reduce a high res timeseries to a coarse timeseries.

spatial_avg(data)

Average the source data across the spatial extent.

spatial_sum(data)

Sum the source data across the spatial extent.

time_avg(inp)

Calculate the rolling time average for an input array or df.

time_sum(inp)

Calculate the rolling sum for an input array or df.

Attributes

data

Get the timeseries data for the specified var and sites.

source_time_index

Get the time index of the source data.

property source_time_index

Get the time index of the source data.

Returns:

time_index (pd.Datetimeindex) – Datetimeindex of the source dataset.

property data

Get the timeseries data for the specified var and sites.

Returns:

_data (np.ndarray) – Unscaled float data array with shape (ti, nn) where ti is the native time index length and nn is the number of neighbors in the self.nn attr.

static spatial_avg(data)[source]

Average the source data across the spatial extent.

Returns:

data (np.ndarray) – Unscaled float data array with shape (ti, ) where ti is the native time index length the data was averaged accross all nn neighbors.

static spatial_sum(data)[source]

Sum the source data across the spatial extent.

Returns:

data (np.ndarray) – Unscaled float data array with shape (ti, ) where ti is the native time index length the data was summed accross all nn neighbors.

time_avg(inp)[source]

Calculate the rolling time average for an input array or df.

Parameters:

inp (np.ndarray | pd.DataFrame) – Input array/df with data to average.

Returns:

out (np.ndarray | pd.DataFrame) – Array or dataframe with same size as input and each value is a moving average.

time_sum(inp)[source]

Calculate the rolling sum for an input array or df.

Parameters:

inp (np.ndarray | pd.DataFrame) – Input array/df with data to sum.

Returns:

out (np.ndarray | pd.DataFrame) – Array or dataframe with same size as input and each value is a moving sum.

static cloud_type_mode(data, w)[source]

Get the mode of a 2D cloud type array using a rolling time window.

Parameters:
  • data (np.ndarray) – 2D array of integer cloud types.

  • w (int) – Temporal window over which to take the mode.

Returns:

data (np.ndarray) – Mode of cloud type.

reduce_timeseries(arr)[source]

Reduce a high res timeseries to a coarse timeseries.

Parameters:

arr (np.ndarray) – 2D numpy array

Returns:

arr (np.ndarray) – Shortened 2D numpy array with length equal to the final ti.

static format_out_arr(arr)[source]

Format the output array (round and flatten).

classmethod point(var, data_fpath, nn, w, final_ti)[source]

Run agg by selecting just the closest site and timestep.

Parameters:
  • var (str) – Variable (dataset) name being aggregated.

  • data_fpath (str) – Filepath to h5 file containing source var data.

  • nn (np.ndarray) – 1D array of site (column) indices in data_fpath to aggregate.

  • w (int) – Window size for temporal aggregation.

  • final_ti (pd.DateTimeIndex) – Final datetime index (used to ensure the aggregated profile has correct length).

Returns:

data (np.ndarray) – (n, ) array unscaled and rounded data from the nn with time series matching final_ti.

classmethod dhi(var, i, fout)[source]

Calculate the aggregated DHI from an aggregated output file.

Parameters:
  • var (str) – Variable name, either “dhi” or “clearsky_dhi”.

  • i (int) – Site index in fout.

  • fout (str) – Filepath to the output file containing aggregated GHI, DNI, and SZA to calculate aggregated DHI.

Returns:

dhi (np.ndarray) – DHI calcualted from vars in fout.

classmethod fill_flag(var, data_fpath, nn, w, final_ti)[source]

Run fill flag aggregation, returning the percentage of timesteps that were filled.

Parameters:
  • var (str) – Variable (dataset) name being aggregated (fill_flag).

  • data_fpath (str) – Filepath to h5 file containing source var data.

  • nn (np.ndarray) – 1D array of site (column) indices in data_fpath to aggregate.

  • w (int) – Window size for temporal aggregation.

  • final_ti (pd.DateTimeIndex) – Final datetime index (used to ensure the aggregated profile has correct length).

Returns:

data (np.ndarray) – (n, ) array unscaled and rounded data from the nn with time series matching final_ti.

classmethod cloud_type(var, data_fpath, nn, w, final_ti)[source]

Run cloud type aggregation, returning the most common cloud type.

Parameters:
  • var (str) – Variable (dataset) name being aggregated (cloud_type).

  • data_fpath (str) – Filepath to h5 file containing source var data.

  • nn (np.ndarray) – 1D array of site (column) indices in data_fpath to aggregate.

  • w (int) – Window size for temporal aggregation.

  • final_ti (pd.DateTimeIndex) – Final datetime index (used to ensure the aggregated profile has correct length).

Returns:

data (np.ndarray) – (n, ) array unscaled and rounded data from the nn with time series matching final_ti.

static cloud_property_avg(cprop_source, ctype_source, ctype_out_full, w)[source]

Run cloud property aggregation based on output cloud type.

Parameters:
  • cprop_source (np.ndarray) – Source (full resolution) cloud property data.

  • ctype_source (np.ndarray) – Source (full resolution) cloud type data.

  • ctype_out_full (np.ndarray) – Output (reduced resolution) cloud type data, interpolated to the same length as the source resolution.

  • w (int) – Window size.

Returns:

cprop_out (np.ndarray) – Average cloud property data in the window surrounding each timestep masked by cloud type output == cloud type source. Shape is same as ctype_out_full.

classmethod cloud_property(var, data_fpath, nn, w, final_ti, gid, fout)[source]

Run cloud property aggregation, returning the mean cloud property only for timesteps that match the most common (mode) cloud type.

Parameters:
  • var (str) – Variable (dataset) name being aggregated (cloud_type).

  • data_fpath (str) – Filepath to h5 file containing source var data.

  • nn (np.ndarray) – 1D array of site (column) indices in data_fpath to aggregate.

  • w (int) – Window size for temporal aggregation.

  • final_ti (pd.DateTimeIndex) – Final datetime index (used to ensure the aggregated profile has correct length).

  • gid (int) – Site index in fout.

  • fout (str) – Filepath to the output file containing aggregated cloud type.

Returns:

data (np.ndarray) – Average cloud property data in the window surrounding each timestep masked by cloud type output == cloud type source. Shape is same as ctype_out_full. Array is (n, ) and is unscaled and rounded data from the nn with time series matching final_ti.

classmethod mean(var, data_fpath, nn, w, final_ti)[source]

Run agg using a spatial average and temporal moving window average.

Parameters:
  • var (str) – Variable (dataset) name being aggregated.

  • data_fpath (str) – Filepath to h5 file containing source var data.

  • nn (np.ndarray) – 1D array of site (column) indices in data_fpath to aggregate.

  • w (int) – Window size for temporal aggregation.

  • final_ti (pd.DateTimeIndex) – Final datetime index (used to ensure the aggregated profile has correct length).

Returns:

data (np.ndarray) – (n, ) array unscaled and rounded data from the nn with time series matching final_ti.