compass.extraction.apply.extract_ordinance_text_with_llm#

async extract_ordinance_text_with_llm(doc, text_splitter, extractor, original_text_key)[source]#

Extract ordinance text from document using LLM

Parameters:
  • doc (elm.web.document.BaseDocument) – A document known to contain ordinance information. This means it must contain an "ordinance_text" key in the attrs. You can run check_for_ordinance_info() to have this attribute populated automatically for documents that are found to contain ordinance data. Note that if the document’s attrs does not contain the "ordinance_text" key, you will get an error.

  • text_splitter (obj) – Instance of an object that implements a split_text method. The method should take text as input (str) and return a list of text chunks. Langchain’s text splitters should work for this input.

  • extractor (compass.extraction.ordinance.WindOrdinanceTextExtractor) – Instance of WindOrdinanceTextExtractor used for ordinance text extraction.

  • original_text_key (str) – String corresponding to the doc.attrs key containing the original text (before extraction).

Returns:

  • elm.web.document.BaseDocument – Document that has been parsed for ordinance text. The results of the extraction are stored in the document’s attrs.

  • str – Key corresponding to the cleaned ordinance text stored in the doc.attrs dictionary.