elm.ords.extraction.apply.extract_ordinance_text_with_llm

async extract_ordinance_text_with_llm(doc, text_splitter, extractor)[source]

Extract ordinance text from document using LLM.

Parameters:

doc (elm.web.document.BaseDocument) – A document known to contain ordinance information. This means it must contain an "ordinance_text" key in the metadata. You can run check_for_ordinance_info() to have this attribute populated automatically for documents that are found to contain ordinance data. Note that if the document’s metadata does not contain the "ordinance_text" key, you will get an error.
text_splitter (obj) – Instance of an object that implements a split_text method. The method should take text as input (str) and return a list of text chunks. Langchain’s text splitters should work for this input.
extractor (elm.ords.extraction.ordinance.OrdinanceExtractor) – Instance of ~elm.ords.extraction.ordinance.OrdinanceExtractor used for ordinance text extraction.

Returns:

elm.web.document.BaseDocument – Document that has been parsed for ordinance text. The results of the extraction are stored in the document’s metadata (attrs). In particular, the metadata (attrs) will contain a "cleaned_ordinance_text" key that will contain the cleaned ordinance text.