buildings_bench.transforms
BoxCoxTransform
buildings_bench.transforms.BoxCoxTransform
A class that computes and applies the Box-Cox transform to data.
__init__(max_datapoints: int = 1000000)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
max_datapoints |
int
|
If the number of datapoints is greater than this, subsample. |
1000000
|
train(data: np.array) -> None
Train the Box-Cox transform on the data with sklearn.preprocessing.PowerTransformer.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
np.array
|
of shape (n, 1) or (b,n,1) |
required |
save(output_path: Path) -> None
Save the Box-Cox transform
load(saved_path: Path) -> None
Load the Box-Cox transform
transform(sample: np.ndarray) -> np.ndarray
Transform a sample via Box-Cox. Not ran on the GPU, so input/output are numpy arrays.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
sample |
np.ndarray
|
of shape (n, 1) or (b,n,1) |
required |
Returns:
Name | Type | Description |
---|---|---|
transformed_sample |
np.ndarray
|
of shape (n, 1) or (b,n,1) |
undo_transform(sample: Union[np.ndarray, torch.Tensor]) -> Union[np.ndarray, torch.Tensor]
Undo the transformation of a sample via Box-Cox
Parameters:
Name | Type | Description | Default |
---|---|---|---|
sample |
Union[np.ndarray, torch.LongTensor]
|
of shape (n, 1) or (b,n,1). numpy if device is cpu or torch Tensor if device is cuda. |
required |
Returns:
Name | Type | Description |
---|---|---|
unscaled_sample |
Union[np.ndarray, torch.LongTensor]
|
of shape (n, 1) or (b,n,1). |
StandardScalerTransform
buildings_bench.transforms.StandardScalerTransform
A class that standardizes data by removing the mean and scaling to unit variance.
__init__(max_datapoints = 1000000, device = 'cpu')
Parameters:
Name | Type | Description | Default |
---|---|---|---|
max_datapoints |
int
|
If the number of datapoints is greater than this, subsample. |
1000000
|
device |
str
|
'cpu' or 'cuda' |
'cpu'
|
train(data: np.array) -> None
Train the StandardScaler transform on the data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
np.array
|
of shape (n, 1) or (b,n,1) |
required |
save(output_path: Path) -> None
Save the StandardScaler transform
load(saved_path: Path) -> None
Load the StandardScaler transform
transform(sample: Union[np.ndarray, torch.Tensor]) -> torch.Tensor
Transform a sample via StandardScaler
Parameters:
Name | Type | Description | Default |
---|---|---|---|
sample |
Union[np.ndarray, torch.LongTensor]
|
shape (n, 1) or (b,n,1) |
required |
Returns:
Name | Type | Description |
---|---|---|
transformed_samples |
torch.Tensor
|
shape (n, 1) or (b,n,1) |
undo_transform(sample: Union[np.ndarray, torch.Tensor]) -> torch.Tensor
Undo the transformation of a sample via StandardScaler
Parameters:
Name | Type | Description | Default |
---|---|---|---|
sample |
np.ndarray
|
of shape (n, 1) or (b,n,1) or torch.Tensor of shape (n, 1) or (b,n,1) |
required |
Returns:
Name | Type | Description |
---|---|---|
unscaled_sample |
torch.Tensor
|
of shape (n, 1) or (b,n,1) |
undo_transform_std(scaled_std: torch.Tensor) -> torch.Tensor
Undo transform for standard deviation.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
scaled_std |
torch.Tensor
|
of shape (n, 1) or (b,n,1) |
required |
Returns:
Name | Type | Description |
---|---|---|
unscaled_std |
torch.Tensor
|
of shape (n, 1) or (b,n,1) |
LatLonTransform
buildings_bench.transforms.LatLonTransform
Pre-processing lat,lon data with standard normalization by Buildings-900K training set.
transform_latlon(latlon: np.ndarray) -> np.ndarray
Transform a raw Lat/Lon sample into a normalized Lat/Lon sample
Parameters:
Name | Type | Description | Default |
---|---|---|---|
latlon |
np.ndarray
|
of shape (2,). |
required |
Returns:
Name | Type | Description |
---|---|---|
transformed_latlon |
np.ndarray
|
of shape (2,). |
undo_transform(normalized_latlon: np.ndarray) -> np.ndarray
Undo the transformation of a sample
Parameters:
Name | Type | Description | Default |
---|---|---|---|
normalized_latlon |
np.ndarray
|
of shape (n, 2) or (b,n,2). |
required |
Returns:
Name | Type | Description |
---|---|---|
unnormalized_latlon |
np.ndarray
|
of shape (n, 2) or (b,n,2). |
transform(puma_id: str) -> np.ndarray
Look up a PUMA ID's normalized Lat/Lon centroid.
This is used in the Buildings-900K Dataset to look up a lat/lon for each building's PUMA.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
puma_id |
str
|
PUMA ID |
required |
Returns:
Name | Type | Description |
---|---|---|
centroid |
np.ndarray
|
of shape (1,2) |
TimestampTransform
buildings_bench.transforms.TimestampTransform
Extract timestamp features from a Pandas timestamp Series.
__init__(is_leap_year: bool = False) -> None
Parameters:
Name | Type | Description | Default |
---|---|---|---|
is_leap_year |
bool
|
Whether the year of the building data is a leap year or not. |
False
|
transform(timestamp_series: pd.DataFrame) -> np.ndarray
Extract timestamp features from a Pandas timestamp Series.
- Day of week (0-6)
- Day of year (0-364)
- Hour of day (0-23)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
timestamp_series |
pd.DataFrame
|
of shape (n,) or (b,n) |
required |
Returns:
Name | Type | Description |
---|---|---|
time_features |
np.ndarray
|
of shape (n,3) or (b,n,3) |
undo_transform(time_features: np.ndarray) -> np.ndarray
Convert normalized time features back to original time features
Parameters:
Name | Type | Description | Default |
---|---|---|---|
time_features |
np.ndarray
|
of shape (n, 3) or (b,n,3) |
required |
Returns:
Name | Type | Description |
---|---|---|
unnormalized_time_features |
np.ndarray
|
of shape (n, 3) or (b,n,3) |