reVX.rpm.rpm_clusters.RPMClusters

class RPMClusters(cf_fpath, gen_gids, n_clusters, region=None)[source]

Bases: object

Base class for RPM clusters

Examples

>>> from reV import Resource
>>>
>>> fname = '$TESTDATADIR/reV_gen/gen_pv_2012.h5'
>>> with Resource(fname) as res:
>>>     gen_gids = f.meta.index.values
>>>
>>> clusters = RPMClusters(fname, gen_gids, n_clusters=6)
>>> clusters._cluster(**kwargs)
>>> clusters.meta
        gen_gid   latitude  longitude  cluster_id   geometry
0         0  41.290001 -71.860001           0  POINT (-71.86000 41.29000)
1         1  41.290001 -71.820000           0  POINT (-71.82000 41.29000)
2         2  41.250000 -71.820000           4  POINT (-71.82000 41.25000)
3         3  41.330002 -71.820000           0  POINT (-71.82000 41.33000)
4         4  41.369999 -71.820000           0  POINT (-71.82000 41.37000)
..      ...        ...        ...         ...                         ...
95       95  41.250000 -71.660004           4  POINT (-71.66000 41.25000)
96       96  41.889999 -71.660004           5  POINT (-71.66000 41.89000)
97       97  41.450001 -71.660004           3  POINT (-71.66000 41.45000)
98       98  41.610001 -71.660004           1  POINT (-71.66000 41.61000)
99       99  41.410000 -71.660004           3  POINT (-71.66000 41.41000)

Generate Shape File of Cluster

>>> RPMClusters.generate_shapefile(clusters.meta, fpath='./test.shp')

Parameters:

cf_fpath (str) – Path to reV .h5 files containing desired capacity factor profiles
gen_gids (list | ndarray) – List or vector of gen_gids to cluster on
n_clusters (int) – Number of clusters to identify
region (str | None) – Optional region identifier that you are clustering on for better debugging.

Methods

`cluster`(cf_h5_path, region_gen_gids, n_clusters)	Entry point for RPMCluster to get clusters for a given region defined as a list \| array of gen_gids
`generate_shapefile`(meta, fpath[, beautify, ...])	Generate cluster polygons and save to shapefile

Attributes

`cluster_coefficients`	returns: cluster_coeffs (ndarray) -- Representative coefficients for each cluster
`cluster_coordinates`	returns: cluster_coords (ndarray) -- lon, lat coordinates of the centroid of each cluster
`cluster_ids`	returns: cluster_ids (ndarray) -- Cluster cluster_id for each gen_gid
`coefficients`	returns: _coefficients (ndarray) -- Array of wavelet coefficients for each gen_gid
`coordinates`	returns: coords (ndarray) -- lon, lat coordinates for each gen_gid
`meta`	returns: _meta (pandas.DataFrame) -- DataFrame of meta data:
`n_clusters`	returns: _n_clusters (int) -- Number of clusters

property coefficients

Returns:: _coefficients (ndarray) – Array of wavelet coefficients for each gen_gid

property meta

Returns:: _meta (pandas.DataFrame) – DataFrame of meta data: - gen_gid - latitude - longitude - cluster_id - rank

property n_clusters

Returns:: _n_clusters (int) – Number of clusters

property cluster_coefficients

Returns:: cluster_coeffs (ndarray) – Representative coefficients for each cluster

property cluster_ids

Returns:: cluster_ids (ndarray) – Cluster cluster_id for each gen_gid

property cluster_coordinates

Returns:: cluster_coords (ndarray) – lon, lat coordinates of the centroid of each cluster

property coordinates

Returns:: coords (ndarray) – lon, lat coordinates for each gen_gid

classmethod generate_shapefile(meta, fpath, beautify=True, source_crs='EPSG:4326', target_crs=None)[source]: Generate cluster polygons and save to shapefile

classmethod cluster(cf_h5_path, region_gen_gids, n_clusters, method='kmeans', method_kwargs=None, dist_rank_filter=True, dist_rmse_kwargs=None, contiguous_filter=True, contiguous_kwargs=None, region=None)[source]

Entry point for RPMCluster to get clusters for a given region defined as a list | array of gen_gids

Parameters:

cf_h5_path (str) – Path to reV .h5 files containing desired capacity factor profiles
region_gen_gids (list | ndarray) – List or vector of gen_gids to cluster on
n_clusters (int) – Number of clusters to identify
method (str) – Method to use to cluster coefficients
method_kwargs (dict) – Kwargs for running _cluster_coefficients
dist_rank_filter (bool) – Re-cluster data by minimizing the sum of the: - distance between each point and each cluster centroid
dist_rmse_kwargs (dict) – Kwargs for running _dist_rank_optimization
contiguous_filter (bool) – Re-classify clusters by making contigous cluster polygons
contiguous_kwargs (dict) – Kwargs for _contiguous_filter
region (str | None) – Optional region identifier that you are clustering on for better debugging.

Returns:

out (pandas.DataFrame) – Cluster results: (gen_gid, lon, lat, cluster_id, rank)

Examples

>>> from reV import Resource
>>>
>>> fname = '$TESTDATADIR/reV_ge/gen_pv_2012.h5'
>>> with Resource(fname) as res:
>>>     gen_gids = f.meta.index.values
>>>
>>> RPMClusters.cluster(fname, gen_gids, n_clusters=6)
        gen_gid   latitude  longitude  cluster_id   geometry
0         0  41.290001 -71.860001       0  POINT (-71.86000 41.29000)
1         1  41.290001 -71.820000       0  POINT (-71.82000 41.29000)
2         2  41.250000 -71.820000       4  POINT (-71.82000 41.25000)
3         3  41.330002 -71.820000       0  POINT (-71.82000 41.33000)
4         4  41.369999 -71.820000       0  POINT (-71.82000 41.37000)
..      ...        ...        ...     ...                         ...
95       95  41.250000 -71.660004       4  POINT (-71.66000 41.25000)
96       96  41.889999 -71.660004       5  POINT (-71.66000 41.89000)
97       97  41.450001 -71.660004       3  POINT (-71.66000 41.45000)
98       98  41.610001 -71.660004       1  POINT (-71.66000 41.61000)
99       99  41.410000 -71.660004       3  POINT (-71.66000 41.41000)