Skip to content

buildings_bench.transforms

BoxCoxTransform

buildings_bench.transforms.BoxCoxTransform

A class that computes and applies the Box-Cox transform to data.

__init__(max_datapoints=1000000)

Parameters:

Name Type Description Default
max_datapoints int

If the number of datapoints is greater than this, subsample.

1000000
load(saved_path)

Load the Box-Cox transform

save(output_path)

Save the Box-Cox transform

train(data)

Train the Box-Cox transform on the data with sklearn.preprocessing.PowerTransformer.

Parameters:

Name Type Description Default
data array

of shape (n, 1) or (b,n,1)

required
transform(sample)

Transform a sample via Box-Cox. Not ran on the GPU, so input/output are numpy arrays.

Parameters:

Name Type Description Default
sample ndarray

of shape (n, 1) or (b,n,1)

required

Returns:

Name Type Description
transformed_sample ndarray

of shape (n, 1) or (b,n,1)

undo_transform(sample)

Undo the transformation of a sample via Box-Cox

Parameters:

Name Type Description Default
sample Union[ndarray, LongTensor]

of shape (n, 1) or (b,n,1). numpy if device is cpu or torch Tensor if device is cuda.

required

Returns:

Name Type Description
unscaled_sample Union[ndarray, LongTensor]

of shape (n, 1) or (b,n,1).

StandardScalerTransform

buildings_bench.transforms.StandardScalerTransform

A class that standardizes data by removing the mean and scaling to unit variance.

__init__(max_datapoints=1000000, device='cpu')

Parameters:

Name Type Description Default
max_datapoints int

If the number of datapoints is greater than this, subsample.

1000000
device str

'cpu' or 'cuda'

'cpu'
load(saved_path)

Load the StandardScaler transform

save(output_path)

Save the StandardScaler transform

train(data)

Train the StandardScaler transform on the data.

Parameters:

Name Type Description Default
data array

of shape (n, 1) or (b,n,1)

required
transform(sample)

Transform a sample via StandardScaler

Parameters:

Name Type Description Default
sample Union[ndarray, LongTensor]

shape (n, 1) or (b,n,1)

required

Returns: transformed_samples (torch.Tensor): shape (n, 1) or (b,n,1)

undo_transform(sample)

Undo the transformation of a sample via StandardScaler

Parameters:

Name Type Description Default
sample ndarray

of shape (n, 1) or (b,n,1) or torch.Tensor of shape (n, 1) or (b,n,1)

required

Returns:

Name Type Description
unscaled_sample Tensor

of shape (n, 1) or (b,n,1)

undo_transform_std(scaled_std)

Undo transform for standard deviation.

Parameters:

Name Type Description Default
scaled_std Tensor

of shape (n, 1) or (b,n,1)

required

Returns:

Name Type Description
unscaled_std Tensor

of shape (n, 1) or (b,n,1)

LatLonTransform

buildings_bench.transforms.LatLonTransform

Pre-processing lat,lon data with standard normalization by Buildings-900K training set.

transform(puma_id)

Look up a PUMA ID's normalized Lat/Lon centroid.

This is used in the Buildings-900K Dataset to look up a lat/lon for each building's PUMA.

Parameters:

Name Type Description Default
puma_id str

PUMA ID

required

Returns:

Name Type Description
centroid ndarray

of shape (1,2)

transform_latlon(latlon)

Transform a raw Lat/Lon sample into a normalized Lat/Lon sample

Parameters:

Name Type Description Default
latlon ndarray

of shape (2,).

required

Returns:

Name Type Description
transformed_latlon ndarray

of shape (2,).

undo_transform(normalized_latlon)

Undo the transformation of a sample

Parameters:

Name Type Description Default
normalized_latlon ndarray

of shape (n, 2) or (b,n,2).

required

Returns:

Name Type Description
unnormalized_latlon ndarray

of shape (n, 2) or (b,n,2).

TimestampTransform

buildings_bench.transforms.TimestampTransform

Extract timestamp features from a Pandas timestamp Series.

__init__(is_leap_year=False)

Parameters:

Name Type Description Default
is_leap_year bool

Whether the year of the building data is a leap year or not.

False
transform(timestamp_series)

Extract timestamp features from a Pandas timestamp Series.

  • Day of week (0-6)
  • Day of year (0-364)
  • Hour of day (0-23)

Parameters:

Name Type Description Default
timestamp_series DataFrame

of shape (n,) or (b,n)

required

Returns:

Name Type Description
time_features ndarray

of shape (n,3) or (b,n,3)

undo_transform(time_features)

Convert normalized time features back to original time features

Parameters:

Name Type Description Default
time_features ndarray

of shape (n, 3) or (b,n,3)

required

Returns:

Name Type Description
unnormalized_time_features ndarray

of shape (n, 3) or (b,n,3)