compass.extraction.solar.ordinance.SolarPermittedUseDistrictsTextExtractor#

class SolarPermittedUseDistrictsTextExtractor(llm_caller)[source]#

Bases: BaseTextExtractor

Extract succinct ordinance text from input

Purpose:

Extract relevant ordinance text from document.

Responsibilities:
  1. Extract portions from chunked document text relevant to particular ordinance type (e.g. solar zoning for utility-scale systems).

Key Relationships:

Uses a StructuredLLMCaller for LLM queries.

Parameters:

llm_caller (LLMCaller) – LLM Caller instance used to extract ordinance info with.

Methods

extract_permitted_uses(text_chunks)

Extract permitted uses text from input text chunks

extract_sef_permitted_uses(text_chunks)

Extract permitted uses text for large SEF from input text

Attributes

PERMITTED_USES_FILTER_PROMPT

Prompt to extract ordinance text for permitted uses

SEF_PERMITTED_USES_FILTER_PROMPT

Prompt to extract ordinance text for permitted uses for SEF

SYSTEM_MESSAGE

System message for text extraction LLM calls

parsers

Iterable of parsers provided by this extractor

SYSTEM_MESSAGE = 'You are a text extraction assistant. Your job is to extract only verbatim, **unmodified** excerpts from provided legal or policy documents. Do not interpret or paraphrase. Do not summarize. Only return exactly copied segments that match the specified scope. If the relevant content appears within a table, return the entire table, including headers and footers, exactly as formatted.'#

System message for text extraction LLM calls

PERMITTED_USES_FILTER_PROMPT = "# CONTEXT #\nWe want to reduce the provided excerpt to only contain information detailing permitted use(s) for a district. The extracted text will be used for structured data extraction, so it must be both **comprehensive** (retaining all relevant details) and **focused** (excluding unrelated content), with **zero rewriting or paraphrasing**. Ensure that all retained information is **directly applicable** to permitted use(s) for one or more districts while preserving full context and accuracy.\n\n# OBJECTIVE #\nRemove all text **not directly pertinent** to permitted use(s) for a district.\n\n# RESPONSE #\nFollow these guidelines carefully:\n\n1. ## Scope of Extraction ##:\n- Retain all text defining permitted use(s) for a district, including:\n\t- **Primary, Special, Conditional, Accessory, Prohibited, and any other use types.**\n\t- **District names and zoning classifications.**\n- Pay extra attention to any references to **solar energy facilities** or related terms.\n- Ensure that **tables, lists, and structured elements** are preserved as they may contain relevant details.\n\n2. ## Exclusions ##:\n- Do **not** include unrelated regulations, procedural details, or non-use-based restrictions.\n\n3. ## Formatting & Structure ##:\n- **Preserve _all_ section titles, headers, and numberings** for reference, **especially if they contain the district name**.\n- **Maintain the original wording, formatting, and structure** to ensure accuracy.\n\n4. ## Output Handling ##:\n- This is a strict extraction task act like a text filter, **not** a summarizer or writer.\n- Do not add, explain, reword, or summarize anything.\n- The output must be a **copy-paste** of the original excerpt.\n**Absolutely no paraphrasing or rewriting.**\n- The output must consist **only** of contiguous or discontiguous verbatim blocks copied from the input.\n- If **no relevant text** is found, return the response: 'No relevant text.'"#

Prompt to extract ordinance text for permitted uses

SEF_PERMITTED_USES_FILTER_PROMPT = "# CONTEXT #\nWe want to reduce the provided excerpt to only contain information detailing **solar energy system** permitted use(s) for a district. The extracted text will be used for structured data extraction, so it must be both **comprehensive** (retaining all relevant details) and **focused** (excluding unrelated content), with **zero rewriting or paraphrasing**. Ensure that all retained information is **directly applicable** to permitted use(s) for solar energy systems in one or more districts while preserving full context and accuracy.\n\n# OBJECTIVE #\nRemove all text **not directly pertinent** to solar energy conversion system permitted use(s) for a district.\n\n# RESPONSE #\nFollow these guidelines carefully:\n\n1. ## Scope of Extraction ##:\n- Retain all text defining permitted use(s) for a district, including:\n\t- **Primary, Special, Conditional, Accessory, Prohibited, and any other use types.**\n\t- **District names and zoning classifications.**\n- Ensure that **tables, lists, and structured elements** are preserved as they may contain relevant details.\n\n2. ## Exclusions ##:\n- Do not include text that does not pertain at all to solar energy systems.\n\n3. ## Formatting & Structure ##:\n- **Preserve _all_ section titles, headers, and numberings** for reference, **especially if they contain the district name**.\n- **Maintain the original wording, formatting, and structure** to ensure accuracy.\n\n4. ## Output Handling ##:\n- This is a strict extraction task act like a text filter, **not** a summarizer or writer.\n- Do not add, explain, reword, or summarize anything.\n- The output must be a **copy-paste** of the original excerpt.\n**Absolutely no paraphrasing or rewriting.**\n- The output must consist **only** of contiguous or discontiguous verbatim blocks copied from the input.\n- If **no relevant text** is found, return the response: 'No relevant text.'"#

Prompt to extract ordinance text for permitted uses for SEF

async extract_permitted_uses(text_chunks)[source]#

Extract permitted uses text from input text chunks

Parameters:

text_chunks (list of str) – List of strings, each of which represent a chunk of text. The order of the strings should be the order of the text chunks.

Returns:

str – Ordinance text extracted from text chunks.

async extract_sef_permitted_uses(text_chunks)[source]#

Extract permitted uses text for large SEF from input text

Parameters:

text_chunks (list of str) – List of strings, each of which represent a chunk of text. The order of the strings should be the order of the text chunks.

Returns:

str – Ordinance text extracted from text chunks.

property parsers#

Iterable of parsers provided by this extractor

Yields:
  • name (str) – Name describing the type of text output by the parser.

  • parser (callable()) – Async function that takes a text_chunks input and outputs parsed text.