nsrdb.data_model.clouds.CloudVar

class CloudVar(name, var_meta, date, freq=None, dsets=('cloud_type', 'cld_opd_dcomp', 'cld_reff_dcomp', 'cld_press_acha'), parallax_correct=True, solar_shading=True, remap_pc=True, **kwargs)[source]

Bases: AncillaryVarHandler

Framework for cloud data extraction (GOES data processed by UW).

Parameters:

name (str) – NSRDB var name.
var_meta (str | pd.DataFrame | None) – CSV file or dataframe containing meta data for all NSRDB variables. Defaults to the NSRDB var meta csv in git repo.
date (datetime.date) – Single day to extract data for.
freq (str | None) – Optional timeseries frequency to force cloud files to (time_index.freqstr). If None, the frequency of the cloud file list will be inferred.
dsets (tuple | list) – Source datasets to extract. It is more efficient to extract all required datasets at once from each cloud file, so that only one kdtree is built for each unique coordinate set in each cloud file.
parallax_correct (bool) – Flag to adjust cloud coordinates so clouds are overhead their coordinates and not at the apparent location from the sensor.
solar_shading (bool) – Flag to adjust cloud coordinates so clouds are assigned to the coordiantes they shade.
remap_pc (bool) – Flag to remap the parallax-corrected and solar-shading-corrected data back onto the original semi-regular GOES coordinates

Methods

`get_handler`(fp_cloud, **kwargs)	Get a single cloud timestep data handler for one cloud file.
`get_timestamp`(fstr[, integer])	Extract the cloud file timestamp.
`infer_data_freq`(flist)	Infer the cloud data timestep frequency from the file list.
`infer_data_time_index`(flist)	Get the actual time index of the file set based on the timestamps.
`pre_flight`()	Perform pre-flight checks - source pattern check.
`save_obj`(cloud_var_single)	Save a single cloud object to a cache for later use.
`scale_data`(array)	Perform a safe data scaling operation on a source data array.
`unscale_data`(array)	Perform a safe data unscaling operation on a source data array.

Attributes

`DEFAULT_DIR`
`NN_METHOD`
`attrs`	Return a dictionary of dataset attributes for HDF5 dataset attrs.
`cache_file`	Get the nearest neighbor result cache csv file for this var.
`chunks`	Get the variable's intended storage chunk shape.
`data_source`	Get the data source.
`date`	Get the date for this handler
`description`	Long variable description.
`doy`	Get the day of year string e.g. 001 for jan 1 and 365 for Dec 31.
`dset_name`	Get the source dataset name for the NSRDB variable.
`dtype`	Get the data type attribute.
`elevation_correct`	Get the elevation correction preference.
`file`	Alias for cloudvar file list.
`file_df`	Get a dataframe with nominal time index and available cloud files.
`file_set`	Get the source file set name for the NSRDB variable.
`final_dtype`	Get the variable's intended storage datatype.
`flist`	List of cloud data file paths for one day.
`freq`	Get the file list timeseries frequency.
`inferred_freq`	Get the inferred frequency from the file list.
`mask`	Get a boolean mask to locate the current variable in the meta data.
`name`	Get the NSRDB variable name.
`next_date`	Get the date after the date for this handler.
`next_file`	Get the file path for the date for the target NSRDB variable name based on the glob self.next_pattern.
`next_file_exists`	Check if file for next date exists
`next_pattern`	Get the next date source file pattern which is sent to glob().
`pattern`	Get the source file pattern which is sent to glob().
`physical_max`	Get the variable's physical maximum value.
`physical_min`	Get the variable's physical minimum value.
`scale_factor`	Get the variable's intended storage scale factor.
`single_handler_kwargs`	Get a kwargs dict to initialize a single cloud timestep data handler
`source_dir`	Get the source directory containing the variable data files.
`spatial_method`	Get the spatial interpolation method.
`temporal_method`	Get the temporal interpolation method.
`time_index`	Get the GOES cloud data time index.
`units`	Get the units attribute.
`var_meta`	Return the meta data for NSRDB variables.

property doy

Get the day of year string e.g. 001 for jan 1 and 365 for Dec 31

Returns:: str

property pattern

Get the source file pattern which is sent to glob().

Returns:: str | None

pre_flight()[source]

Perform pre-flight checks - source pattern check.

Returns:: missing (str) – Look for the source pattern and return the string if not found. If nothing is missing, return an empty string.

static get_timestamp(fstr, integer=True)[source]

Extract the cloud file timestamp.

Parameters:

fstr (str) – File path or file name with timestamp.
integer (bool) – Flag to convert string match to integer.

Returns:

time (int | str | None) – Integer timestamp of format: YYYYDDDHHMM (YYYY DDD HH MM) where DDD is day of year (1 through 366). None if not found

property file

Alias for cloudvar file list.

Returns:: list

property flist

List of cloud data file paths for one day. Each file is a timestep.

Note that this is the raw parsed file list, which may not match self.file_df DataFrame, which is the final file list based on desired timestep frequency

Returns:: flist (list) – List of .h5 or .nc full file paths sorted by timestamp. Exception raised if no files are found.

property inferred_freq

Get the inferred frequency from the file list.

Returns:: freq (str) – Pandas datetime frequency.

property freq

Get the file list timeseries frequency.

Is forced if this object is initialized with freq != None. Otherwise, inferred from file list.

Returns:: freq (str) – Nominal pandas datetimeindex frequency of the cloud file list.

property file_df

Get a dataframe with nominal time index and available cloud files.

Returns:: _file_df (pd.DataFrame) – Timeseries of available cloud file paths. The datetimeindex is created by the infered timestep frequency of the cloud files. The data column is the file paths. Timesteps with missing data files has NaN file paths.

static infer_data_time_index(flist)[source]

Get the actual time index of the file set based on the timestamps.

Parameters:: flist (list) – List of strings of cloud files (with or without full file path).
Returns:: time_index (pd.datetimeindex) – Pandas datetime index based on the actual file timestamps.

static infer_data_freq(flist)[source]

Infer the cloud data timestep frequency from the file list.

Parameters:: flist (list) – List of strings of cloud files (with or without full file path).
Returns:: freq (str) – Pandas datetime frequency.

property time_index

Get the GOES cloud data time index.

Returns:: cloud_ti (pd.DatetimeIndex) – Pandas datetime index for the current day at the cloud temporal resolution (should match the NSRDB resolution).

property single_handler_kwargs

Get a kwargs dict to initialize a single cloud timestep data handler

Returns:: dict

property attrs

Return a dictionary of dataset attributes for HDF5 dataset attrs.

Returns:: attrs (dict) – Namespace of attributes to define the dataset.

property cache_file

Get the nearest neighbor result cache csv file for this var.

Returns:: _cache_file (False | str) – False for no caching, or a string filename (no path).

property chunks

Get the variable’s intended storage chunk shape.

Returns:: chunks (tuple) – Data storage chunk shape (row_chunk, col_chunk).

property data_source

Get the data source.

Returns:: data_source (str) – Data source.

property date

Get the date for this handler

Returns:: datetime.date

property description

Long variable description.

Returns:: description (str) – Description of the variable to provide more info than the sometimes opaque dset names.

property dset_name

Get the source dataset name for the NSRDB variable. This is typically the netcdf or h5 source dataset name for the variable such as T2M or TOTANGSTR (for MERRA temp and alpha)

Returns:: str

property dtype

Get the data type attribute.

Returns:: dtype (str) – Intended NSRDB disk data type.

property elevation_correct

Get the elevation correction preference.

Returns:: elevation_correct (bool) – Whether or not to use elevation correction for the current var.

property file_set

Get the source file set name for the NSRDB variable. This is typically used for MERRA source filesets such as tavg1_2d_aer_Nx or tavg1_2d_slv_Nx (for MERRA)

Returns:: str

property final_dtype

Get the variable’s intended storage datatype.

Returns:: dtype (str) – Data type for the current variable.

static get_handler(fp_cloud, **kwargs)[source]

Get a single cloud timestep data handler for one cloud file.

Parameters:

fp_cloud (str) – Single cloud source file either .nc or .h5
kwargs (dict) – Kwargs for the initialization of CloudVarSingleH5 or CloudVarSingleNC along with fp_cloud

Returns:

obj (None | CloudVarSingleNC | CloudVarSingleH5) – Handler for a single cloud data file.

property mask: Get a boolean mask to locate the current variable in the meta data.

property name: Get the NSRDB variable name.

property next_date

Get the date after the date for this handler. This is used to get the data for the next date for temporal interpolation

Returns:: datetime.date

property next_file

Get the file path for the date for the target NSRDB variable name based on the glob self.next_pattern. The file is used to get the data for the next date for temporal interpolation

Returns:: str

property next_file_exists: Check if file for next date exists

property next_pattern

Get the next date source file pattern which is sent to glob().

Returns:: str | None

property physical_max

Get the variable’s physical maximum value.

Returns:: physical_max (float) – Physical maximum value for the variable. Variable range can be truncated at this value. Must be consistent with the final dtype and scale factor.

property physical_min

Get the variable’s physical minimum value.

Returns:: physical_min (float) – Physical minimum value for the variable. Variable range can be truncated at this value. Must be consistent with the final dtype and scale factor.

scale_data(array)

Perform a safe data scaling operation on a source data array.

Steps:

Enforce physical range limits
Apply scale factor (mulitply)
Round if integer
Enforce dtype bit range limits
Perform dtype conversion
Return manipulated array

Parameters:: array (np.ndarray) – Source data array with full precision (likely float32).
Returns:: array (np.ndarray) – Source data array with final datatype.

property scale_factor

Get the variable’s intended storage scale factor.

Returns:: scale_factor (float) – Scale factor for the current variable. Data is multiplied by this scale factor before being stored.

property source_dir

Get the source directory containing the variable data files.

Returns:: source_dir (str) – Directory containing source data files (with possible sub folders).

property spatial_method

Get the spatial interpolation method.

Returns:: spatial_method (str) – NN or IDW

property temporal_method

Get the temporal interpolation method.

Returns:: temporal_method (str) – linear or nearest

property units

Get the units attribute.

Returns:: units (str) – NSRDB variable units.

unscale_data(array)

Perform a safe data unscaling operation on a source data array.

Parameters:: array (np.ndarray) – Scaled source data array with integer precision.
Returns:: array (np.ndarray) – Unscaled source data array with float32 precision.

property var_meta

Return the meta data for NSRDB variables.

Returns:: _var_meta (pd.DataFrame) – Meta data for NSRDB variables.

save_obj(cloud_var_single)[source]

Save a single cloud object to a cache for later use.

Parameters:: cloud_obj_single (CloudVarSingleH5 | CloudVarSingleNC) – Single-timestep cloud variable data handler.