compass.extraction.apply.extract_ordinance_text_with_llm#
- async extract_ordinance_text_with_llm(doc, text_splitter, extractor, original_text_key)[source]#
Extract ordinance text from document using LLM
- Parameters:
doc (
elm.web.document.BaseDocument) – A document known to contain ordinance information. This means it must contain an"ordinance_text"key in the attrs. You can runcheck_for_ordinance_info()to have this attribute populated automatically for documents that are found to contain ordinance data. Note that if the document’s attrs does not contain the"ordinance_text"key, you will get an error.text_splitter (
TextSplitter, optional) – Optional Langchain text splitter (or subclass instance), or any object that implements a split_text method. The method should take text as input (str) and return a list of text chunks.extractor (
WindOrdinanceTextExtractor) – Object used for ordinance text extraction.original_text_key (
str) – String corresponding to the doc.attrs key containing the original text (before extraction).
- Returns:
elm.web.document.BaseDocument– Document that has been parsed for ordinance text. The results of the extraction are stored in the document’s attrs.str– Key corresponding to the cleaned ordinance text stored in the doc.attrs dictionary.