buildings_bench.tokenizer
Tokenizer Quick Start
Instantiate a LoadQuantizer
from buildings_bench.tokenizer import LoadQuantizer
transform_path = Path(os.environ.get('BUILDINGS_BENCH')) / 'metadata' / 'transforms'
load_transform = LoadQuantizer(
with_merge=True, # Default vocabulary has merged KMeans centroids
num_centroids=2274, # Default vocabulary has 2,274 tokens
device='cuda:0' if 'cuda' in args.device else 'cpu')
# Load the saved faiss KMeans state from disk
load_transform.load(transform_path)
Quantize a load time series
Dequantize transformer predictions
# predictions are a Tensor of shape [batch_size, pred_len, 1] of quantized values
# distribution_params is a Tensor of shape [batch_size, pred_len, num_centroids] of logits
predictions, distribution_params = model.predict(batch)
# Dequantize the predictions
predictions = load_transform.undo_transform(predictions)
Extract the categorical distribution
# First, apply softmax to the logits to normalize them into a categorical distribution
distribution_params = torch.softmax(distribution_params, dim=-1)
# The merged centroid values are the load values corresponding
# to each token. Note that the merged centroids are already sorted
# in increasing order.
# if using merge...
load_values = load_transform.merged_centroids
# else, load_values = load_transform.kmeans.centroids.squeeze()
# Now, distribution_params[i] is the probability
# assigned to load_values[i].
LoadQuantizer
buildings_bench.tokenizer.LoadQuantizer
Quantize load timeseries with KMeans. Merge centroids that are within a threshold.
__init__(seed: int = 1, num_centroids: int = 2274, with_merge: int = False, merge_threshold: int = 0.01, device: str = 'cpu')
Parameters:
Name | Type | Description | Default |
---|---|---|---|
seed |
int
|
random seed. Default: 1. |
1
|
num_centroids |
int
|
number of centroids: Default: 2274. |
2274
|
with_merge |
bool
|
whether to merge centroids that are within a threshold: Default: False. |
False
|
merge_threshold |
float
|
threshold for merging centroids. Default: 0.01 (kWh). |
0.01
|
device |
str
|
cpu or cuda. Default: cpu. |
'cpu'
|
train(sample: np.ndarray) -> None
Fit KMeans to a subset of the data.
Optionally, merge centroids that are within a threshold.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
sample |
np.ndarray
|
shape [num_samples, 1] |
required |
transform(sample: Union[np.ndarray, torch.Tensor]) -> Union[np.ndarray, torch.Tensor]
Quantize a sample of load values into a sequence of indices.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
sample |
Union[np.ndarray, torch.Tensor]
|
of shape (n, 1) or (b,n,1). type is numpy if device is cpu or torch Tensor if device is cuda. |
required |
Returns:
Name | Type | Description |
---|---|---|
sample |
Union[np.ndarray, torch.Tensor]
|
of shape (n, 1) or (b,n,1). |
undo_transform(sample: Union[np.ndarray, torch.Tensor]) -> Union[np.ndarray, torch.Tensor]
Dequantize a sample of integer indices into a sequence of load values.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
sample |
Union[np.ndarray, torch.Tensor]
|
of shape (n, 1) or (b,n,1). type is numpy if device is cpu or torch Tensor if device is cuda. |
required |
Returns:
Name | Type | Description |
---|---|---|
sample |
Union[np.ndarray, torch.Tensor]
|
of shape (n, 1) or (b,n,1). |