compass.extraction.apply.extract_ordinance_text_with_llm#
- async extract_ordinance_text_with_llm(doc, text_splitter, extractor, original_text_key)[source]#
Extract ordinance text from document using LLM
- Parameters:
doc (
elm.web.document.BaseDocument
) – A document known to contain ordinance information. This means it must contain an"ordinance_text"
key in the attrs. You can runcheck_for_ordinance_info()
to have this attribute populated automatically for documents that are found to contain ordinance data. Note that if the document’s attrs does not contain the"ordinance_text"
key, you will get an error.text_splitter (
obj
) – Instance of an object that implements a split_text method. The method should take text as input (str) and return a list of text chunks. Langchain’s text splitters should work for this input.extractor (
compass.extraction.ordinance.WindOrdinanceTextExtractor
) – Instance ofWindOrdinanceTextExtractor
used for ordinance text extraction.original_text_key (
str
) – String corresponding to the doc.attrs key containing the original text (before extraction).
- Returns:
elm.web.document.BaseDocument
– Document that has been parsed for ordinance text. The results of the extraction are stored in the document’s attrs.str
– Key corresponding to the cleaned ordinance text stored in the doc.attrs dictionary.