nsrdb.file_handlers.surfrad.Surfrad
- class Surfrad(h5_file, unscale=True, str_decode=True, group=None, hsds=False, hsds_kwargs=None)[source]
Bases:
Resource
Framework to open surfrad ground measurement h5 files and format dataframes for easy validation against NSRDB data.
- Parameters:
h5_file (str) – Path to .h5 resource file
unscale (bool, optional) – Boolean flag to automatically unscale variables on extraction, by default True
str_decode (bool, optional) – Boolean flag to decode the bytestring meta data into normal strings. Setting this to False will speed up the meta data read, by default True
group (str, optional) – Group within .h5 resource file to open, by default None
hsds (bool, optional) – Boolean flag to use h5pyd to handle .h5 ‘files’ hosted on AWS behind HSDS, by default False
hsds_kwargs (dict, optional) – Dictionary of optional kwargs for h5pyd, e.g., bucket, username, password, by default None
Methods
close
()Close h5 instance
df_str_decode
(df)Decode a dataframe with byte string columns into ordinary str cols.
get_SAM_df
(site)Placeholder for get_SAM_df method that it resource specific
get_attrs
([dset])Get h5 attributes either from file or dataset
get_df
([dt_out, window_minutes])Get rolling avg irradiance df to benchmark against
get_dset_properties
(dset)Get dataset properties (shape, dtype, chunks)
get_meta_arr
(rec_name[, rows])Get a meta array by name (faster than DataFrame extraction).
get_rolling
(df[, window])Get a rolling avg dataset sampled at a given timestep interval.
get_scale_factor
(dset)Get dataset scale factor
get_units
(dset)Get dataset units
get_window_size
(df[, window_minutes])Calculate the index window size to take a moving average over.
open_dataset
(ds_name)Open resource dataset
preload_SAM
(h5_file, sites, tech[, unscale, ...])Pre-load project_points for SAM
Attributes
ADD_ATTR
OLD_ADD_ATTR
OLD_SCALE_ATTR
OLD_UNIT_ATTR
SCALE_ATTR
UNIT_ATTR
Dictionary of all dataset add offset factors
Dictionary of all dataset attributes
Dictionary of all dataset chunk sizes
(lat, lon) pairs
Get the version attribute of the data.
Datasets available
Datasets available
Dictionary of all dataset dtypes
Global (file) attributes
Groups available
Open h5py File instance.
Extract (latitude, longitude) pairs
Resource meta data DataFrame
Get the native measurement data in dataframe format.
Available resource datasets
Available resource datasets
Dictionary of all dataset scale factors
Resource shape (timesteps, sites) shape = (len(time_index), len(meta))
Dictionary of all dataset shapes
Resource DatetimeIndex
Dictionary of all dataset units
- static get_rolling(df, window=61)[source]
Get a rolling avg dataset sampled at a given timestep interval.
Rolling average is centered and ignores nan values.
- Parameters:
df (pd.DataFrame) – Timeseries data.
window (int) – Timesteps that the moving average window will be over.
- Returns:
df (pd.DataFrame) – Dataframe with same datetimeindex as input df. Each value is a moving average of the input df.
- static get_window_size(df, window_minutes=61)[source]
Calculate the index window size to take a moving average over.
- Parameters:
df (pd.DataFrame) – Timeseries data frame with datetime index.
window_minutes (int) – Minutes that the moving average window will be over.
- Returns:
window (int) – Number of index values that the window will be over
- property native_df
Get the native measurement data in dataframe format.
- Returns:
native_df (pd.DataFrame) – Time series dataframe with irradiance data.
- get_df(dt_out='5min', window_minutes=61)[source]
Get rolling avg irradiance df to benchmark against
The output time index will be the full year index with dt_out, but any missing data in the surfrad file will be passed through as nan.
- Parameters:
dt_out (str) – Pandas timestep size (30min, 5min, 1min) for the output from this method.
window_minutes (int) – Minutes that the moving average window will be over. This will be calculated while considering the source time resolution of the SURFRAD measurements.
- Returns:
df_out (pd.DataFrame) – Dataframe with datetimeindex with dt_out timestep size. Each value is a moving average of the measurement data.
- property adders
Dictionary of all dataset add offset factors
- Returns:
adders (dict)
- property attrs
Dictionary of all dataset attributes
- Returns:
attrs (dict)
- property chunks
Dictionary of all dataset chunk sizes
- Returns:
chunks (dict)
- close()
Close h5 instance
- property coordinates
(lat, lon) pairs
- Returns:
lat_lon (ndarray)
- Type:
Coordinates
- property data_version
Get the version attribute of the data. None if not available.
- Returns:
version (str | None)
- property datasets
Datasets available
- Returns:
list
- static df_str_decode(df)
Decode a dataframe with byte string columns into ordinary str cols.
- Parameters:
df (pd.DataFrame) – Dataframe with some columns being byte strings.
- Returns:
df (pd.DataFrame) – DataFrame with str columns instead of byte str columns.
- property dsets
Datasets available
- Returns:
list
- property dtypes
Dictionary of all dataset dtypes
- Returns:
dtypes (dict)
- get_SAM_df(site)
Placeholder for get_SAM_df method that it resource specific
- Parameters:
site (int) – Site to extract SAM DataFrame for
- get_attrs(dset=None)
Get h5 attributes either from file or dataset
- Parameters:
dset (str) – Dataset to get attributes for, if None get file (global) attributes
- Returns:
attrs (dict) – Dataset or file attributes
- get_dset_properties(dset)
Get dataset properties (shape, dtype, chunks)
- Parameters:
dset (str) – Dataset to get scale factor for
- Returns:
shape (tuple) – Dataset array shape
dtype (str) – Dataset array dtype
chunks (tuple) – Dataset chunk size
- get_meta_arr(rec_name, rows=slice(None, None, None))
Get a meta array by name (faster than DataFrame extraction).
- Parameters:
rec_name (str) – Named record from the meta data to retrieve.
rows (slice) – Rows of the record to extract.
- Returns:
meta_arr (np.ndarray) – Extracted array from the meta data record name.
- get_scale_factor(dset)
Get dataset scale factor
- Parameters:
dset (str) – Dataset to get scale factor for
- Returns:
float – Dataset scale factor, used to unscale int values to floats
- get_units(dset)
Get dataset units
- Parameters:
dset (str) – Dataset to get units for
- Returns:
str – Dataset units, None if not defined
- property global_attrs
Global (file) attributes
- Returns:
global_attrs (dict)
- property groups
Groups available
- Returns:
groups (list) – List of groups
- property h5
Open h5py File instance. If _group is not None return open Group
- Returns:
h5 (h5py.File | h5py.Group)
- property lat_lon
Extract (latitude, longitude) pairs
- Returns:
lat_lon (ndarray)
- property meta
Resource meta data DataFrame
- Returns:
meta (pandas.DataFrame)
- open_dataset(ds_name)
Open resource dataset
- Parameters:
ds_name (str) – Dataset name to open
- Returns:
ds (ResourceDataset) – Resource for open resource dataset
- classmethod preload_SAM(h5_file, sites, tech, unscale=True, str_decode=True, group=None, hsds=False, hsds_kwargs=None, time_index_step=None, means=False)
Pre-load project_points for SAM
- Parameters:
h5_file (str) – h5_file to extract resource from
sites (list) – List of sites to be provided to SAM (sites is synonymous with gids aka spatial indices)
tech (str) – Technology to be run by SAM
unscale (bool) – Boolean flag to automatically unscale variables on extraction
str_decode (bool) – Boolean flag to decode the bytestring meta data into normal strings. Setting this to False will speed up the meta data read.
group (str) – Group within .h5 resource file to open
hsds (bool, optional) – Boolean flag to use h5pyd to handle .h5 ‘files’ hosted on AWS behind HSDS, by default False
hsds_kwargs (dict, optional) – Dictionary of optional kwargs for h5pyd, e.g., bucket, username, password, by default None
time_index_step (int, optional) – Step size for time_index, used to reduce temporal resolution, by default None
means (bool, optional) – Boolean flag to compute mean resource when res_array is set, by default False
- Returns:
SAM_res (SAMResource) – Instance of SAMResource pre-loaded with Solar resource for sites in project_points
- property res_dsets
Available resource datasets
- Returns:
list
- property resource_datasets
Available resource datasets
- Returns:
list
- property scale_factors
Dictionary of all dataset scale factors
- Returns:
scale_factors (dict)
- property shape
Resource shape (timesteps, sites) shape = (len(time_index), len(meta))
- Returns:
shape (tuple)
- property shapes
Dictionary of all dataset shapes
- Returns:
shapes (dict)
- property time_index
Resource DatetimeIndex
- Returns:
time_index (pandas.DatetimeIndex)
- property units
Dictionary of all dataset units
- Returns:
units (dict)