File Types API Reference¶
This document describes the file type system in r2x-core, which is used to validate and handle different data file formats.
Overview¶
The FileType
class hierarchy provides a type-safe way to work with different file formats. Each file type knows whether it can support time series data through the supports_timeseries
attribute.
Base Class¶
FileType
¶
@dataclass(slots=True)
class FileType:
"""Base class for file data types."""
supports_timeseries: bool = False
model_config = ConfigDict(arbitrary_types_allowed=True)
Attributes:
supports_timeseries
(bool): Whether this file type can store time series data. Default isFalse
.
File Type Classes¶
TableFile
¶
Represents tabular data files (CSV, TSV).
class TableFile(FileType):
"""Data model for tabular data (CSV, TSV, etc.)."""
supports_timeseries: bool = True
Supported Extensions:
.csv
- Comma-separated values.tsv
- Tab-separated values
Time Series Support: ✅ Yes
Use Cases:
Component definitions (generators, buses, lines)
Time series profiles (hourly generation, load)
Human-readable data interchange
Example:
from r2x_core import DataFile
# Component data
components = DataFile(
name="generators",
file_path="data/generators.csv",
)
assert isinstance(components.file_type, TableFile)
# Time series data
profiles = DataFile(
name="profiles",
file_path="data/profiles.csv",
is_timeseries=True,
)
assert isinstance(profiles.file_type, TableFile)
H5File
¶
Represents HDF5 (Hierarchical Data Format) files.
class H5File(FileType):
"""Data model for HDF5 data."""
supports_timeseries: bool = True
Supported Extensions:
.h5
- HDF5 format.hdf5
- HDF5 format (alternate extension)
Time Series Support: ✅ Yes
Use Cases:
Large time series datasets
Multi-year profiles in hierarchical structure
High-performance data storage
Complex nested data structures
Example:
# Multi-year time series in HDF5
profiles = DataFile(
name="generation_profiles",
file_path="data/profiles_2020_2050.h5",
is_timeseries=True,
)
assert isinstance(profiles.file_type, H5File)
assert profiles.file_type.supports_timeseries
ParquetFile
¶
Represents Apache Parquet columnar storage files.
class ParquetFile(FileType):
"""Data model for Parquet data."""
supports_timeseries: bool = True
Supported Extensions:
.parquet
- Apache Parquet format
Time Series Support: ✅ Yes
Use Cases:
Large time series with excellent compression
Wide tables with many columns
Data interchange with analytics tools
Efficient columnar queries
Example:
# Load profiles in Parquet format
load_data = DataFile(
name="load_profiles",
file_path="data/load.parquet",
is_timeseries=True,
)
assert isinstance(load_data.file_type, ParquetFile)
JSONFile
¶
Represents JSON (JavaScript Object Notation) files.
class JSONFile(FileType):
"""Data model for JSON data."""
supports_timeseries: bool = False
Supported Extensions:
.json
- JSON format
Time Series Support: ❌ No
Use Cases:
Component definitions
Configuration files
Metadata
Hierarchical component relationships
Example:
# Component metadata in JSON
metadata = DataFile(
name="model_metadata",
file_path="data/metadata.json",
is_timeseries=False, # Default
)
assert isinstance(metadata.file_type, JSONFile)
assert not metadata.file_type.supports_timeseries
# This would raise ValueError
# bad = DataFile(
# file_path="data/profiles.json",
# is_timeseries=True, # ERROR! JSON doesn't support time series
# )
XMLFile
¶
Represents XML (eXtensible Markup Language) files.
class XMLFile(FileType):
"""Data model for XML data."""
supports_timeseries: bool = False
Supported Extensions:
.xml
- XML format
Time Series Support: ❌ No
Use Cases:
Legacy model formats
Hierarchical component definitions
Configuration with complex nesting
Example:
# Component definitions in XML
components = DataFile(
name="network",
file_path="data/network.xml",
)
assert isinstance(components.file_type, XMLFile)
assert not components.file_type.supports_timeseries
Extension Mapping¶
The EXTENSION_MAPPING
dictionary maps file extensions to their corresponding FileType
classes:
EXTENSION_MAPPING: dict[str, type[FileType]] = {
".csv": TableFile,
".tsv": TableFile,
".h5": H5File,
".hdf5": H5File,
".parquet": ParquetFile,
".json": JSONFile,
".xml": XMLFile,
}
This mapping is used internally by DataFile.file_type
to determine the file type from the file extension.
Type Alias¶
TableDataFileType
¶
A type alias for file types that represent tabular data:
TableDataFileType: TypeAlias = TableFile | H5File
Usage:
from r2x_core.file_types import TableDataFileType
def process_table_data(file_type: TableDataFileType) -> None:
"""Process tabular data files."""
match file_type:
case TableFile():
# Handle CSV/TSV
...
case H5File():
# Handle HDF5
...
Validation¶
File types are validated automatically when accessing DataFile.file_type
:
from r2x_core import DataFile
# Valid: CSV supports time series
valid = DataFile(
name="profiles",
file_path="data/profiles.csv",
is_timeseries=True,
)
print(valid.file_type) # TableFile()
# Invalid: Unknown extension
try:
invalid_ext = DataFile(
name="data",
file_path="data/file.xyz",
)
_ = invalid_ext.file_type # Raises ValueError
except ValueError as e:
print(e) # "Unsupported file extension: .xyz"
# Invalid: JSON doesn't support time series
try:
invalid_ts = DataFile(
name="profiles",
file_path="data/profiles.json",
is_timeseries=True,
)
_ = invalid_ts.file_type # Raises ValueError
except ValueError as e:
print(e) # "File type JSONFile does not support time series data..."
Adding New File Types¶
To add support for a new file format:
Create a new FileType subclass:
@dataclass(slots=True)
class NetCDFFile(FileType):
"""Data model for NetCDF data."""
supports_timeseries: bool = True # If it supports time series
Add to EXTENSION_MAPPING:
EXTENSION_MAPPING: dict[str, type[FileType]] = {
# ... existing mappings ...
".nc": NetCDFFile,
".netcdf": NetCDFFile,
}
Update TableDataFileType if needed:
# If the new type represents tabular data
TableDataFileType: TypeAlias = TableFile | H5File | NetCDFFile
That’s it! The validation and type checking will work automatically.
Best Practices¶
Set supports_timeseries correctly: This determines what kinds of data can be stored in this format.
Use type hints: When writing functions that work with specific file types, use type hints for better IDE support:
def process_csv(file_type: TableFile) -> None: ...
Pattern matching: Use structural pattern matching to handle different file types:
match datafile.file_type: case TableFile(): ... case H5File(): ... case ParquetFile(): ...
Check supports_timeseries: Before processing time series, verify the file type supports it:
if datafile.is_timeseries: assert datafile.file_type.supports_timeseries # Safe to process as time series
See Also¶
DataFile Reference - Complete DataFile API
Working with Time Series Files - Time series guide
Parser Basics - Using file types in parsers