elm.ords.extraction.apply.extract_ordinance_text_with_llm
- async extract_ordinance_text_with_llm(doc, text_splitter, extractor)[source]
Extract ordinance text from document using LLM.
- Parameters:
doc (elm.web.document.BaseDocument) – A document known to contain ordinance information. This means it must contain an
"ordinance_text"
key in the metadata. You can runcheck_for_ordinance_info()
to have this attribute populated automatically for documents that are found to contain ordinance data. Note that if the document’s metadata does not contain the"ordinance_text"
key, you will get an error.text_splitter (obj) – Instance of an object that implements a split_text method. The method should take text as input (str) and return a list of text chunks. Langchain’s text splitters should work for this input.
extractor (elm.ords.extraction.ordinance.OrdinanceExtractor) – Instance of ~elm.ords.extraction.ordinance.OrdinanceExtractor used for ordinance text extraction.
- Returns:
elm.web.document.BaseDocument – Document that has been parsed for ordinance text. The results of the extraction are stored in the document’s metadata. In particular, the metadata will contain a
"cleaned_ordinance_text"
key that will contain the cleaned ordinance text.