Skip to content

buildings_bench.transforms

BoxCoxTransform

buildings_bench.transforms.BoxCoxTransform

A class that computes and applies the Box-Cox transform to data.

__init__(max_datapoints = 1000000)

Parameters:

Name Type Description Default
max_datapoints int

If the number of datapoints is greater than this, subsample.

1000000
train(data: np.array) -> None

Train the Box-Cox transform on the data with sklearn.preprocessing.PowerTransformer.

Parameters:

Name Type Description Default
data np.array

of shape (n, 1) or (b,n,1)

required
save(output_path: Path) -> None

Save the Box-Cox transform

load(saved_path: Path) -> None

Load the Box-Cox transform

transform(sample: np.ndarray) -> np.ndarray

Transform a sample via Box-Cox. Not ran on the GPU, so input/output are numpy arrays.

Parameters:

Name Type Description Default
sample np.ndarray

of shape (n, 1) or (b,n,1)

required

Returns:

Name Type Description
transformed_sample np.ndarray

of shape (n, 1) or (b,n,1)

undo_transform(sample: Union[np.ndarray, torch.Tensor]) -> Union[np.ndarray, torch.Tensor]

Undo the transformation of a sample via Box-Cox

Parameters:

Name Type Description Default
sample np.ndarray) or (torch.LongTensor

of shape (n, 1) or (b,n,1). numpy if device is cpu or torch Tensor if device is cuda.

required

Returns:

Name Type Description
unscaled_sample np.ndarray or torch.Tensor

of shape (n, 1) or (b,n,1).

StandardScalerTransform

buildings_bench.transforms.StandardScalerTransform

A class that standardizes data by removing the mean and scaling to unit variance.

__init__(max_datapoints = 1000000, device = 'cpu')

Parameters:

Name Type Description Default
max_datapoints int

If the number of datapoints is greater than this, subsample.

1000000
device str

'cpu' or 'cuda'

'cpu'
train(data: np.array) -> None

Train the StandardScaler transform on the data.

Parameters:

Name Type Description Default
data np.array

of shape (n, 1) or (b,n,1)

required
save(output_path: Path) -> None

Save the StandardScaler transform

load(saved_path: Path) -> None

Load the StandardScaler transform

transform(sample: Union[np.ndarray, torch.Tensor]) -> torch.Tensor

Transform a sample via StandardScaler

Parameters:

Name Type Description Default
sample np.ndarray or torch.Tensor

shape (n, 1) or (b,n,1)

required

Returns:

Name Type Description
transformed_samples torch.Tensor

shape (n, 1) or (b,n,1)

undo_transform(sample: Union[np.ndarray, torch.Tensor]) -> torch.Tensor

Undo the transformation of a sample via StandardScaler

Parameters:

Name Type Description Default
sample np.ndarray

of shape (n, 1) or (b,n,1) or torch.Tensor of shape (n, 1) or (b,n,1)

required

Returns:

Name Type Description
unscaled_sample torch.Tensor

of shape (n, 1) or (b,n,1)

undo_transform_std(scaled_std: torch.Tensor) -> torch.Tensor

Undo transform for standard deviation.

Parameters:

Name Type Description Default
scaled_std torch.Tensor

of shape (n, 1) or (b,n,1)

required

Returns:

Name Type Description
unscaled_std torch.Tensor

of shape (n, 1) or (b,n,1)

LatLonTransform

buildings_bench.transforms.LatLonTransform

Pre-processing lat,lon data with standard normalization by Buildings-900K training set.

transform_latlon(latlon: np.ndarray) -> np.ndarray

Transform a raw Lat/Lon sample into a normalized Lat/Lon sample

Parameters:

Name Type Description Default
latlon np.ndarray

of shape (2,).

required

Returns:

Name Type Description
transformed_latlon np.ndarray

of shape (2,).

undo_transform(normalized_latlon: np.ndarray) -> np.ndarray

Undo the transformation of a sample

Parameters:

Name Type Description Default
normalized_latlon np.ndarray

of shape (n, 2) or (b,n,2).

required

Returns:

Name Type Description
unnormalized_latlon np.ndarray

of shape (n, 2) or (b,n,2).

transform(puma_id: str) -> np.ndarray

Look up a PUMA ID's normalized Lat/Lon centroid.

This is used in the Buildings-900K Dataset to look up a lat/lon for each building's PUMA.

Parameters:

Name Type Description Default
puma_id str

PUMA ID

required

Returns:

Name Type Description
centroid np.ndarray

of shape (1,2)

TimestampTransform

buildings_bench.transforms.TimestampTransform

Extract timestamp features from a Pandas timestamp Series.

__init__(is_leap_year: bool = False)

Parameters:

Name Type Description Default
is_leap_year bool

Whether the year of the building data is a leap year or not.

False
transform(timestamp_series: pd.DataFrame) -> np.ndarray

Extract timestamp features from a Pandas timestamp Series.

  • Day of week (0-6)
  • Day of year (0-364)
  • Hour of day (0-23)

Parameters:

Name Type Description Default
timestamp_series pd.DataFrame

of shape (n,) or (b,n)

required

Returns:

Name Type Description
time_features np.ndarray

of shape (n,3) or (b,n,3)

undo_transform(time_features: np.ndarray) -> np.ndarray

Convert normalized time features back to original time features

Parameters:

Name Type Description Default
time_features np.ndarray

of shape (n, 3) or (b,n,3)

required

Returns:

Name Type Description
unnormalized_time_features np.ndarray

of shape (n, 3) or (b,n,3)