Skip to content

buildings_bench.tokenizer

Tokenizer Quick Start

Instantiate a LoadQuantizer

from buildings_bench.tokenizer import LoadQuantizer

transform_path =  Path(os.environ.get('BUILDINGS_BENCH')) / 'metadata' / 'transforms'

load_transform = LoadQuantizer(
    with_merge=True,  # Default vocabulary has merged KMeans centroids
    num_centroids=2274, # Default vocabulary has 2,274 tokens
    device='cuda:0' if 'cuda' in args.device else 'cpu')

# Load the saved faiss KMeans state from disk
load_transform.load(transform_path)

Quantize a load time series

batch['load'] = load_transform.transform(batch['load'])

Dequantize transformer predictions

# predictions are a Tensor of shape [batch_size, pred_len, 1] of quantized values
# distribution_params is a Tensor of shape [batch_size, pred_len, num_centroids] of logits
predictions, distribution_params = model.predict(batch)

# Dequantize the predictions
predictions = load_transform.undo_transform(predictions)

Extract the categorical distribution

# First, apply softmax to the logits to normalize them into a categorical distribution
distribution_params = torch.softmax(distribution_params, dim=-1)

# The merged centroid values are the load values corresponding
# to each token. Note that the merged centroids are already sorted
# in increasing order.

# if using merge...
load_values = load_transform.merged_centroids
# else, load_values = load_transform.kmeans.centroids.squeeze()
# Now, distribution_params[i] is the probability 
# assigned to load_values[i].

LoadQuantizer

buildings_bench.tokenizer.LoadQuantizer

Quantize load timeseries with KMeans. Merge centroids that are within a threshold.

__init__(seed=1, num_centroids=2274, with_merge=False, merge_threshold=0.01, device='cpu')

Parameters:

Name Type Description Default
seed int

random seed. Default: 1.

1
num_centroids int

number of centroids: Default: 2274.

2274
with_merge bool

whether to merge centroids that are within a threshold: Default: False.

False
merge_threshold float

threshold for merging centroids. Default: 0.01 (kWh).

0.01
device str

cpu or cuda. Default: cpu.

'cpu'
train(sample)

Fit KMeans to a subset of the data.

Optionally, merge centroids that are within a threshold.

Parameters:

Name Type Description Default
sample ndarray

shape [num_samples, 1]

required
transform(sample)

Quantize a sample of load values into a sequence of indices.

Parameters:

Name Type Description Default
sample Union[ndarray, Tensor]

of shape (n, 1) or (b,n,1). type is numpy if device is cpu or torch Tensor if device is cuda.

required

Returns:

Name Type Description
sample Union[ndarray, Tensor]

of shape (n, 1) or (b,n,1).

undo_transform(sample)

Dequantize a sample of integer indices into a sequence of load values.

Parameters:

Name Type Description Default
sample Union[ndarray, Tensor]

of shape (n, 1) or (b,n,1). type is numpy if device is cpu or torch Tensor if device is cuda.

required

Returns:

Name Type Description
sample Union[ndarray, Tensor]

of shape (n, 1) or (b,n,1).