Data
Given the complexity of the ComStock software workflow, and the big-data skill set and computing hardware required, the pathway for professionals and researchers to use ComStock successfully is to interact with the pre-created results, rather than running the ComStock modeling tool. This section provides information about accessing ComStock data, and a list of published datasets.
Data Access Platforms, Structure and Contents
Published Datasets
These datasets describe the timeseries energy consumption of the U.S. commercial building stock at the end-use level. For details on how it was created and validated, please see the project’s final report. See the Data Access Platforms, Structure and Contents section for more details about the data.
Each dataset release is introduced with a webinar presentation. Access the webinar recordings and slides on the Upgrade Measures page.
ComStock dataset releases are summarized in the following table with links for accessing the aggregate results.
Dataset Notice: ComStock Standard Dataset Release 2024 Release 2 uses an improved sampling method. The OEDI file structure has been modified starting with this release. For details about the new sampling method and file structure, please read the New ComStock Sampling Method explanation.
ComStock Standard Dataset Release 2024 Release 2 - 2018 Weather | ComStock End Use Savings Shape 2024 Release 1 - 2018 Weather | ComStock End Use Savings Shape 2023 Release 2 - 2018 Weather | ComStock End Use Savings Shape 2023 Release 1 - 2018 Weather | ComStock End Use Load Profiles - 2018 Weather | ComStock End Use Load Profiles - Typical Weather | |
---|---|---|---|---|---|---|
OEDI Name | 2024/comstock_amy2018_release_2 | 2024/comstock_amy2018_release_1 | 2023/comstock_amy2018_release_2 | 2023/comstock_amy2018_release_1 | 2021/comstock_amy2018_release_1 | 2021/comstock_tmy3_release_1 |
Data Viewer | Data viewer links expected by 1/17/25 | by_state, by_puma_northeast, by_puma_midwest, by_puma_south, by_puma_west | by_state | by_state | by_state, by_puma_northeast, by_puma_midwest, by_puma_south, by_puma_west | by_state, by_puma_northeast, by_puma_midwest, by_puma_south, by_puma_west |
OEDI Data Lake | OEDI Data Lake | OEDI Data Lake | OEDI Data Lake | OEDI Data Lake | OEDI Data Lake | OEDI Data Lake |
Publication Date | Dec. 2024 | March 2024 | Sept. 2023 | March 2023 | Oct. 2021 | Oct. 2021 |
Release # | 2024_2 | 2024_1 | 2023_2 | 2023_1 | 2021_1 | 2021_1 |
Building Stock Represented | U.S. commercial sector circa 2018 | U.S. commercial sector circa 2018 | U.S. commercial sector circa 2018 | U.S. commercial sector circa 2018 | U.S. commercial sector circa 2018 | U.S. commercial sector circa 2018 |
Upgrades Available* | 39 | 30 | 17 | 9 | None | None |
Weather Year | amy2018 | amy2018 | amy2018 | amy2018 | amy2018 | tmy3 |
*Visit the Upgrade Measures page for list of available upgrade measures and measure documentation.
Data Access Platforms, Structure and Contents
At the most fundamental level, the ComStock dataset is a collection of end-use load profiles of approximately 350,000 building energy models. The output of each building energy model is 1 year of energy consumption in 15-minute intervals, separated into end-use categories.
Accessing national ComStock building load profiles in the full dataset requires big-data skills that make the full dataset inaccessible for most users. To support many use cases, aggregate load profiles for the following geographic resolutions are published for ComStock releases:
- 16 ASHRAE/International Energy Conservation Code climate zones
- 5 U.S. Department of Energy Building America climate zones
- 8 Electric System independent system operator and regional transmission organization regions
- 2,400+ U.S. Census Public Use Microdata Areas
- 3,000+ U.S. counties.
Data Access Platforms
The following table summarizes the various ways to access and use ComStock data.
The dataset has been formatted to be accessible in four main ways to meet the needs of many different users and use cases.
Metadata: Files of individual model characteristics together with annual results, commonly referred to as the “metadata” file
Load Profiles: Timeseries load profiles (individual building and pre-aggregated) in downloadable spreadsheets
Data Viewer: A web-based data viewer, customizable time scales and aggregations
Full Database: A detailed format that can be queried with big data tools
Aggregate ComStock datasets can be accessed via the Open Energy Data Initiative (OEDI) Data Lake and the ComStock data viewer. ComStock datasets are published with actual weather data (AMY). In the initial public dataset release (2021_1), there are two versions published: one with AMY weather, and another with typical weather data (TMY3). Note that the TMY3 15-minute energy data should not be used for larger geographies because weather events are not regionally aligned.
For information on how to query the full ComStock dataset, please refer to this documentation.
Please note, there are separate public datasets available for residential and commercial building stocks.
ComStock Data Viewer
The ComStock data viewer exists to quickly filter, slice, combine, visualize, and download the results in custom ways. This platform is available at comstock.nrel.gov. Multiple geographic views of the datasets on the data viewer have been created: by state, and by Census region by PUMA.
Open Energy Data Initiative (OEDI) Data Lake
OEDI is an energy information portal, and is developed and maintained by the National Renewable Energy Laboratory with funding and support from the U.S. Department of Energy and a network of International Partners & Sponsors. The OEDI data lake contains comprehensive aggregate data for ComStock releases. This includes metadata and timeseries energy consumption results (baseline and upgrades, if applicable), individual building energy models, weather files, geographic information, and data dictionaries.
The ComStock release directory structure of the data lake is summarized in the table, below. For more detailed information about the contents of the ComStock OEDI data lake, visit the README.
OEDI Directory Structure and Contents
Name | Contents |
---|---|
building_energy_models | Building energy models, in OpenStudio format, that were run to create the dataset. |
geographic_information | Information on various geographies used in the dataset provided for convenience. Includes map files showing the shapes of the geographies (states, PUMAs) used for partitioning and a lookup table mapping between census tracts and various other geographies. |
metadata | Building characteristics (age, area, HVAC system type, etc.) for each of the building energy models run to create the timeseries data and annual energy results. Descriptions of the characteristics are included in data_dictionary.tsv , enumeration_dictionary.tsv , and upgrade_dictionary.tsv . |
timeseries_aggregates | Aggregate end-use load profiles by building type and geography that can be opened and analyzed in Excel, python, or other common data analysis tools. |
timeseries_aggregates_metadata | Building characteristics for timeseries_aggregates building energy models. Follows the same format at metadata . |
timeseries_individual_buildings | The raw individual building timeseries data. This is a large number of individual files! |
weather | Key weather data used as an input to run the building energy models to create the dataset. |
citation.txt | Citation to use when referencing this work. |
data_dictionary.tsv | Describes the column names found in the metadata and timeseries data files. |
enumeration_dictionary.tsv | Expands the definitions of the enumerations used in the metadata files. |
upgrade_dictionary.tsv | Expands the definitions of the upgrades. |
Dataset Naming Convention
ComStock releases on OEDI and the data viewer use the following naming convention.
<dataset type>_<weather data>_<year of publication>_release_<release number>
example: comstock _ amy2018 _ 2021 _release_ 1
result: comstock_amy2018_2021_release_1
- dataset type
- resstock = residential buildings stock
- comstock = commercial building stock
- weather data
- amy2018 = actual meteorological year 2018 (2018 weather data from NOAA ISD, NSRDB, and MesoWest)
- tmy3 = typical weather from 1991-2005 (see this publication for details)
- year of publication
- 2021 = dataset was published in 2021
- 2022 = dataset was published in 2022
- etc.
- release
- release_1 = first release of the dataset during the year of publication
- release_2 = second release of the dataset during the year of publication
- etc.
Field Naming Convention
The field naming convention is fairly simple. At the highest level there is – “in.” for inputs, “out.” for outputs, “calc.” for calculated fields, then a handful of columns that provide simulation information.
For the “out.” prefix there is a second level that includes – fuel type, emissions, model parameter and statistic fields, and site energy. The “in.” prefix does not have a second level.
The third level of “out.” is where you’ll find the end uses.
Finally, units are denoted by a “..” with the unit following.