Frequently Asked Questions

Expand the sections below for answers to frequently asked questions. If you have additional questions, please email us at ComStock@nrel.gov.

ComStock Essentials

Are these load profiles measured or simulated?

The profiles are simulated using the ResStock and ComStock modeling tools, which have been validated and informed by the best available data against an array of empirical datasets. ResStock and ComStock use the EnergyPlus simulation engine. The validation results and uncertainty for quantities of interest are presented in the End-Use Load Profiles final report.

ResStock generally simulates 550,000 individual building energy models, and ComStock simulates 150,000 building energy models.
What building types does ComStock model?

ComStock models 15 commercial building types. Compared to the Commercial Building Energy Consumption Survey (CBECS) 2018 estimation, ComStock datasets account for 63% of both the energy use and floor area of commercial buildings in the United States. The ComStock development team is actively working on adding more building types to the model. See the explanation titled "Building Types Not Included in ComStock" for more detail.
What year does the baseline stock represent?

The ComStock and ResStock datasets represent, as closely as possible, the 2018 U.S. commercial and residential building stock characteristics. The energy consumption results depend on the weather data used in the simulations. When modeled with AMY2018 weather, the datasets represent energy use for the year 2018. When TMY3 weather is used, they represent typical or average energy consumption under typical climate conditions.

Emissions and utility bills in the ComStock and ResStock datasets use input data from a several years, depending on the dataset release. See the ComStock reference documentation or ResStock reference documentation for more details.
Are ComStock and ResStock credible?

Yes. The models underwent extensive calibration as part of the End Use Load Profiles (EULP) project where we compared model load profiles to AMI data from around the country, and updated baseline model schedules, power densities, among other things using various data sources. Reference the final report for more details. The EULP project concluded in 2021.

For every baseline update and upgrade measures since EULP, ComStock compares energy consumption and EUI to available data sources, such as CBECS and EIA. These comparisons are available on the OEDI Data Lake for each dataset. You can find links to OEDI in the Published Datasets section of the Data page.

For details about how to determine whether the models are appropriate for a specific analysis, reference the explanation titled "Considerations for ComStock Calibration, Validation, and Uncertainty."
Which dataset release should I use? And can I compare upgrades from different dataset releases?

ComStock publishes datasets on a regular basis, and we recommend using the latest release. See the Data page for a list of available datasets and access links.

It is not necessary to compare upgrades across ComStock dataset releases because all datasets include both new upgrade measures and all measures from previous releases, as well as any improvements made to the baseline model. Information about upgrade measures included in dataset releases can be found on the Upgrade Measures page. Baseline model improvements are captured in the release change log on our public GitHub repository. Note that we re-sample our input characteristic distributions for every release and as a result, the building IDs between releases will not match.
What are weights in ComStock and how are they used?

Weights in ComStock represent the number of real buildings in the U.S. building stock that a ComStock model represents. Weights are determined using national floor area by building type from CBECS. Use the weights by multiplying the energy consumption column by the weight for the model. Some results columns already have the weight applied. These have the word “weighted” in the name. See the explanation titled "Sampling and Weighting in ComStock" for more information.
How many profiles or models should be used for an analysis, and how does the number used affect uncertainty of results?

The minimum sample count required for a given geography in ComStock is a function of the number of commercial buildings present in that area, as well as the quality of available input data for the ComStock model. To ensure statistical robustness in your analysis using ComStock, you may need additional building models depending on the specificity of your segmentation. A good rule of thumb is to include at least six models per segment (e.g., building type, sub-type, size, vintage, or operation hours). For example, if you’re analyzing small office buildings open more than 18 hours a day, make sure you have at least six such models.

Also, cross-check ComStock’s building representation with external sources (like Google Maps or local datasets) to ensure the dataset reflects your target geography. For more detail, see the explanation titled "Sample Size Considerations"

Queries in sparsely populated areas or with filters applied may have relatively few samples available. In these cases, samples from nearby locations can be grouped to increase the sample size. See the tutorial titled "Perform an analysis by blending ComStock and local data" for an example of incorporating local floor area estimates to improve representation of ComStock data at specific geographic resolutions.

Users should estimate standard error for metrics of interest using the standard deviation divided by the square root of the number of samples (i.e., profiles or models). See Section 5.1.3 in the End-Use Load Profiles methodology report for a discussion on uncertainty calculations.
How should I cite the datasets?

ComStock and ResStock can be cited according to the suggestions here for ComStock and here for ResStock.

Datasets and Data Access

How do I access the dataset?

There are several access platforms available to access ComStock and ResStock datasets. See the ComStock Data page and ResStock Data page for more detail about dataset access and links to the public datasets.
Are descriptions available for the end-use categories and fields available for filtering?

Descriptions of each of the building characteristics and the end-use categories can be found in the “data_dictionary.tsv” file. Descriptions of the values used in those filters can be found in the “enumeration_dictionary.tsv”. Both files can be downloaded from the OEDI Data Lake and are unique to each dataset release. Use the correct data dictionary for the relevant dataset. They can be opened with Excel or a text editor.

Links to the OEDI Data Lake for each dataset release can be found on the ComStock Data page and ResStock Data page.
What are the data units?

ComStock and ResStock data have multiple units. For annual results data downloaded from the Open Energy Data Initiative (OEDI) data lake, units can be found in the "data_dictionary.tsv" file. Some fields will also have the units in the column header at the end of the name (e.g., "out.electricity.total.jan.energy_consumption..kwh"). Timeseries energy consumption data on OEDI are provided in kWh. Natural gas, fuel oil, and propane are output in kwh--this is intentional though unconventional.

The Data Viewer provides energy data in metric units, visible in the y-axis label. Depending on the scale of energy being shown, the metric prefix will automatically adjust (T for tera, G for giga, M for mega, etc.).

For Tableau dashboards, use the relevant column headers or the graph axis to see the units.
What is the timezone of the timestamps?

The timestamps of all load profiles have been converted to Eastern Standard Time, to prevent issues when aggregating across time zones.

The underlying modeling was conducted using local standard time for each location, with occupant schedules adjusted for daylight savings as applicable. All EnergyPlus timeseries outputs were converted from local standard time to Eastern Standard Time for publication in the web Data Viewer, Data Viewer exports, timeseries aggregates, and individual timeseries parquet files. In converting from local Standard Time to Eastern Standard Time, if necessary the last few hours of each dataset were moved to the beginning of the timeseries. For example, the first two hours of data from Colorado in Eastern Standard Time (Jan 1, midnight to 2 AM) were originally modeled as the last two hours of the year in Mountain Standard Time (Dec 31, 10 PM to midnight) using the corresponding weather.
Does the timestamp represent the beginning, middle, or end of each 15-minute interval?

The timestamp indicates the end of each 15-minute interval. So "12:15" represents the energy use between 12:00 and 12:15.
Do the timeseries aggregates have the sample weighting factors applied?

Yes. The aggregates represent the total relevant building stock with all relevant weights applied (e.g., all small office buildings in the state of Colorado), not just the sum of the model results.
Are there load profiles available for the 16 California Climate Zones?

ComStock includes commercial buildings in California, and the datasets provide California Energy Commission (CEC) climate zones in the field “in.cec_climate_zone” in the metadata_and_annual_results and metadata_and_annual_results_aggregates files on the OEDI data lake.

There are a few known issues with California models in ComStock. Please see the "California Models Known Issues" explanation for more information.
What do the codes used to describe "county_id" and other geographic fields mean?

ComStock and ResStock use the National Historical GIS (NHGIS) GISJOIN standard codes for county, census PUMA, and census tract, which are based on Federal Information Processing System (FIPS) codes. The datasets use the 2010 version of the GISJOIN codes--2020 are not available at this time. For more information about the geospatial fields available in the datasets, see this explanation for ComStock, and this explanation for ResStock.

In most ComStock and ResStock datasets, county name is available in addition to the GISJOIN county code. For both tools, the column in the metadata_and_annual_results files on OEDI is called "in.county_name."
Where can I find documentation on what technologies are available in the upgrade measures?

See the Upgrade Measures page for a complete list of available upgrade measures and packages in ComStock datasets, including a link to their documentation, and in which dataset release the measure was first included.
Are weather data files available in EPW format?

Weather data used for the modeling have been provided in .csv format for regression modeling, forecasting, or other analyses. The TMY3 weather files in EnergyPlus input format (EPW) can be downloaded from the NLR Data Catalog, with filenames that correspond to county IDs in the ResStock and ComStock metadata. EPW format weather files for 2018 or other actual meteorological years (AMY) have not been publicly released. These files can be purchased from private sector vendors. See here for a list of providers.
Are the EnergyPlus model input files (.idf) or OpenStudio (.osm) files available?

OpenStudio model input files (.osm) are available in the dataset on the OEDI data lake in the "building_energy_models" directory. Files are named by the building ID ("bldg_id"). The EnergyPlus model input files are not available.
Is there an API to access data without downloading locally?

Currently, there is no API. However, we have posted a tutorial example showing how to load the datasets into cloud services such as Amazon Web Services (AWS) so the data can be queried by analytic tools like Athena.

Example notebooks and SQL queries are also available on the "Access ComStock datasets programmatically" page, and more will be added as we develop them. The queries and example notebooks are a good starting point for accessing ResStock programmatically, too.
How do I access the timeseries data for a specific building model?

To download a few results by IDs, you can use a manual approach. First use the metadata_and_annual_results to find the IDs you want to access. Then, note the download URL for any easy-to-access ID and edit it to reflect the ID you want.

For example, right clicking on the first ID under ResStock dataset 2022.1.1, AMY 2018, upgrade 02, and choosing “copy link” provides this URL: https://oedi-data-lake.s3.amazonaws.com/nrel-pds-building-stock/end-use-load-profiles-for-us-building-stock/2022/resstock_amy2018_release_1.1/timeseries_individual_buildings/by_state/upgrade=2/state=WA/100025-2.parquet. To access ID 813 instead of 100025, change the “100025-2” to “813-2” in the URL, and paste it into a web browser. That will download the data for ID 813.

What software can I use to open the .parquet files?

Parquet files can be read using programming languages such as Python, using the pyarrow package. For other options, see here. There are a few third-party graphical tools for viewing parquet files, but we have not tested them and the third-party support is limited.

See below for example Python code to convert parquet file to csv.


        import pandas as pd
        import os
        folder_path = 'C:/Users/username/Documents/EUSS/Results’
        file_name = '813-2'
        suffix = '.parquet'
        file = pd.read_parquet(os.path.join(folder_path, file_name+suffix))
        new_suffix = '.csv'
        file.to_csv(os.path.join(folder_path, file_name+new_suffix), index=False)

I am trying to match buildings between releases. Why do the building IDs not match between them?

The building IDs and exact building characteristics between releases will not match because we re-sample our input characteristic distributions for every release. However, you can filter the building models using building characteristics to identify similar samples between releases. For instance, using building type, size, location, and wall construction type to identify similar models. The fields with the prefix “in.” show the available model inputs that you can use to do the comparison. You can see a complete list and description of available fields in the “data_dictionary.tsv” file on the OEDI Data Lake. Links to the datasets on OEDI are in the "Published Datasets" section of the ComStock Data page and ResStock data page.

Data Viewer

What is the Data Viewer?

The Data Viewer is a web-based visualization platform that allows users to easily filter, aggregate, view, and download ComStock end-use energy data in a web browser.

Links to Data Viewer visualizations for each dataset release are on the Data page.

For Data Viewer trainings, visit the NLR’s Building Stock Analysis YouTube channel.
In the Data Viewer, what does "sum" or "average" mean?

The "sum" aggregation is the total energy consumption for all buildings that meet the filter criteria across all the occurrences of the given time step within the selected month(s). For example, in a day timeseries range for a specific state for the month of July, the 7-7:15 AM hour time step shows the sum of all energy consumption statewide between 7-7:15 AM in July, from buildings that meet the filter criteria. The "sum" view has fewer uses than the "average" view. The "average" aggregation is the total energy consumption for all buildings that meet the filter criteria, averaged across all the occurrences of the given time step within the selected month(s).

For example, in a day timeseries range for a specific state for the month of July, the 7-7:15 AM hour time step shows the average statewide energy consumption between 7-7:15 AM in July, from buildings that meet the filter criteria. The "average" aggregation provides a view of the average day of total energy consumption in the state. This is the more logical view for most use cases. Note that while each time step within a day or a year has the same number of occurrences within each dataset, each time step for a week does not - some days of the week occur more times than others in each year or month range (except for February).
In the Data Viewer, how are the peak day and min peak day defined?

The peak day is the day with the highest single-hour (peak) energy consumption within the selected months.

The min peak day is the day with the lowest single-hour energy consumption within the selected months.
Why is the time series data sometimes slow to load after I click the update button?

We query data in real time to produce the time series graphs you see on the webpage, and this can involve scanning terabytes (TB) of data. Running a baseline-only query for California, Texas, New York, or Illinois takes around a minute, while running a query for a state like Colorado or Massachusetts takes about 10-20 seconds. However, if the graphs have previously been generated we have the data cached and can typically load the data in a few seconds. That's why the load time varies.
Why can’t I click on “Explore Timeseries”?

The “Explore Timeseries” option is available once a specific geography (e.g. state or PUMA region) is selected.
How do I see a profile for just one, or just a few, end uses?

Clicking on the end uses in the legend will highlight the end use in the visualization.
Can I aggregate over multiple locations?

The viewer allows aggregations of up to six locations (states or PUMAs, depending on the dataset). When viewing a single location, choose the “+ More Locations” option, add up to five additional locations, and choose “Update Search”.

Additionally, sums of more than six locations can be created manually by downloading sums of up to six locations and summing further on your local computer.

TMY3 weather is not aligned between locations. This does not affect our recommendations for working with annual data. However, if your application requires timeseries data and therefore would benefit from aligned weather, we recommend either using an AMY dataset, or filtering by weather station and summing only within a single weather station’s PUMAs.
How can I filter the data based on building characteristics?

The "+ Filter" button enables users to filter the data by characteristics, such as vintage, floor area, and building type. This feature also enables aggregations of locations, including by PUMA and county.

See our YouTube training video on the Data Viewer, around 3:50, to learn how to add multiple filters.
How can I see the building characteristics associated with an aggregate load profile from the data viewer?

The building characteristics are available on the Open Energy Data Initiative (OEDI) data lake. Visit the Data page for links to the OEDI pages for each dataset. In the "metadata_and_annual_results_aggregate" directory on OEDI, navigate to the national file: metadata_and_annual_results_aggregates > national > full > csv > baseline_agg.csv.gz. Download the file, unzip it and open in Microsoft Excel. Use the filters applied on the Data Viewer to filter the spreadsheet.

Note that the national file is an “aggregate,” meaning that the data in the file is consolidated by merging duplicate building models within a geography (in this case state), so each building ID appears only once with a combined weight. Columns that cannot be meaningfully aggregated from the tract level—such as Cambium grid region and CEJST designation—are excluded from the resulting low-resolution, “aggregate” files. For more information about the updated OEDI file structure as a result of the new sampling method, please see the "New ComStock Sampling Method" explanation.

Analysis

Can I run ComStock or ResStock myself?

The code required to run ComStock and ResStock is available on the ComStock and ResStock public GitHub repositories. Other related code repositories are provided on the "For Developers" page for ComStock and ResStock.

While these resources are available, ComStock and ResStock are complex modeling tools and there is no documentation for running the model other than what exists in the codebase, and we are not able to support running the models at this time. We generally do not recommend running the model unless you have a deep understanding of the methodology and objectives. Please email us at ComStock@nrel.gov or ResStock@nrel.gov if you have suggestions for improvements or specific needs.
I am interested in an upgrade measure combination that is not currently available as an upgrade package in the public datasets. Can I combine results from the individual measures?

Our general guidance is to NOT combine measure results. There are interactions between most upgrade measures that affect the amount of savings and make results of multiple measures together misleading.

For an explanation and examples on this topic, see the linked ComStock and ResStock resources.

If you have questions about combining specific measures, please email us at ComStock@nrel.gov or ResStock@nrel.gov.

Frequently Asked Questions

ComStock Essentials

Datasets and Data Access

Data Viewer

Analysis

Modeling Methods, Assumptions and Documentation