buildings_bench.tokenizer
Tokenizer Quick Start
Instantiate a LoadQuantizer
from buildings_bench.tokenizer import LoadQuantizer
transform_path = Path(os.environ.get('BUILDINGS_BENCH')) / 'metadata' / 'transforms'
load_transform = LoadQuantizer(
with_merge=True, # Default vocabulary has merged KMeans centroids
num_centroids=2274, # Default vocabulary has 2,274 tokens
device='cuda:0' if 'cuda' in args.device else 'cpu')
# Load the saved faiss KMeans state from disk
load_transform.load(transform_path)
Quantize a load time series
Dequantize transformer predictions
# predictions are a Tensor of shape [batch_size, pred_len, 1] of quantized values
# distribution_params is a Tensor of shape [batch_size, pred_len, num_centroids] of logits
predictions, distribution_params = model.predict(batch)
# Dequantize the predictions
predictions = load_transform.undo_transform(predictions)
Extract the categorical distribution
# First, apply softmax to the logits to normalize them into a categorical distribution
distribution_params = torch.softmax(distribution_params, dim=-1)
# The merged centroid values are the load values corresponding
# to each token. Note that the merged centroids are already sorted
# in increasing order.
# if using merge...
load_values = load_transform.merged_centroids
# else, load_values = load_transform.kmeans.centroids.squeeze()
# Now, distribution_params[i] is the probability
# assigned to load_values[i].
LoadQuantizer
buildings_bench.tokenizer.LoadQuantizer
Quantize load timeseries with KMeans. Merge centroids that are within a threshold.
__init__(seed=1, num_centroids=2274, with_merge=False, merge_threshold=0.01, device='cpu')
Parameters:
Name | Type | Description | Default |
---|---|---|---|
seed
|
int
|
random seed. Default: 1. |
1
|
num_centroids
|
int
|
number of centroids: Default: 2274. |
2274
|
with_merge
|
bool
|
whether to merge centroids that are within a threshold: Default: False. |
False
|
merge_threshold
|
float
|
threshold for merging centroids. Default: 0.01 (kWh). |
0.01
|
device
|
str
|
cpu or cuda. Default: cpu. |
'cpu'
|
train(sample)
Fit KMeans to a subset of the data.
Optionally, merge centroids that are within a threshold.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
sample
|
ndarray
|
shape [num_samples, 1] |
required |
transform(sample)
Quantize a sample of load values into a sequence of indices.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
sample
|
Union[ndarray, Tensor]
|
of shape (n, 1) or (b,n,1). type is numpy if device is cpu or torch Tensor if device is cuda. |
required |
Returns:
Name | Type | Description |
---|---|---|
sample |
Union[ndarray, Tensor]
|
of shape (n, 1) or (b,n,1). |
undo_transform(sample)
Dequantize a sample of integer indices into a sequence of load values.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
sample
|
Union[ndarray, Tensor]
|
of shape (n, 1) or (b,n,1). type is numpy if device is cpu or torch Tensor if device is cuda. |
required |
Returns:
Name | Type | Description |
---|---|---|
sample |
Union[ndarray, Tensor]
|
of shape (n, 1) or (b,n,1). |