compass.extraction.solar.ordinance.SolarOrdinanceTextExtractor#

class SolarOrdinanceTextExtractor(llm_caller)[source]#

Bases: BaseTextExtractor

Extract succinct ordinance text from input

Purpose:

Extract relevant ordinance text from document.

Responsibilities:
  1. Extract portions from chunked document text relevant to particular ordinance type (e.g. solar zoning for utility-scale systems).

Key Relationships:

Uses a StructuredLLMCaller for LLM queries.

Parameters:

llm_caller (LLMCaller) – LLM Caller instance used to extract ordinance info with.

Methods

extract_solar_energy_system_section(text_chunks)

Extract ordinance text from input text chunks for SEF

Attributes

SOLAR_ENERGY_SYSTEM_FILTER_PROMPT

Prompt to extract ordinance text for SEF

SYSTEM_MESSAGE

System message for text extraction LLM calls

parsers

Iterable of parsers provided by this extractor

SOLAR_ENERGY_SYSTEM_FILTER_PROMPT = "# CONTEXT #\nWe want to reduce the provided excerpt to only contain information about **solar energy systems**. The extracted text will be used for structured data extraction, so it must be both **comprehensive** (retaining all relevant details) and **focused** (excluding unrelated content), with **zero rewriting or paraphrasing**. Ensure that all retained information is **directly applicable to solar energy systems** while preserving full context and accuracy.\n\n# OBJECTIVE #\nExtract all text **pertaining to solar energy systems** from the provided excerpt.\n\n# RESPONSE #\nFollow these guidelines carefully:\n\n1. ## Scope of Extraction ##:\n- Include **all** text that pertains to** solar energy systems**, even if they are referred to by different names such as:\n\tSolar panels, solar energy conversion systems (secs), solar energy facilities (sef), solar energy farms (sef), solar farms (sf), utility-scale solar energy systems (uses), commercial solar energy systems (cses), ground-mounted solar energy systems (gses), alternate energy systems (aes), commercial energy production systems (cepcs), or similar.\n- Explicitly include any text related to **bans or prohibitions** on solar energy systems.\n- Explicitly include any text related to the adoption or enactment date of the ordinance (if any).\n\n2. ## Exclusions ##:\n- Do **not** include text that does not pertain to solar energy systems.\n\n3. ## Formatting & Structure ##:\n- **Preserve _all_ section titles, headers, and numberings** for reference.\n- **Maintain the original wording, formatting, and structure** to ensure accuracy.\n\n4. ## Output Handling ##:\n- This is a strict extraction task act like a text filter, **not** a summarizer or writer.\n- Do not add, explain, reword, or summarize anything.\n- The output must be a **copy-paste** of the original excerpt.\n**Absolutely no paraphrasing or rewriting.**\n- The output must consist **only** of contiguous or discontiguous verbatim blocks copied from the input.\n- If **no relevant text** is found, return the response: 'No relevant text.'"#

Prompt to extract ordinance text for SEF

async extract_solar_energy_system_section(text_chunks)[source]#

Extract ordinance text from input text chunks for SEF

Parameters:

text_chunks (list of str) – List of strings, each of which represent a chunk of text. The order of the strings should be the order of the text chunks.

Returns:

str – Ordinance text extracted from text chunks.

property parsers#

Iterable of parsers provided by this extractor

Yields:
  • name (str) – Name describing the type of text output by the parser.

  • parser (callable()) – Async function that takes a text_chunks input and outputs parsed text.

SYSTEM_MESSAGE = 'You are a text extraction assistant. Your job is to extract only verbatim, **unmodified** excerpts from provided legal or policy documents. Do not interpret or paraphrase. Do not summarize. Only return exactly copied segments that match the specified scope. If the relevant content appears within a table, return the entire table, including headers and footers, exactly as formatted.'#

System message for text extraction LLM calls