compass.extraction.apply.check_for_ordinance_info#
- async check_for_ordinance_info(doc, model_config, heuristic, tech, ordinance_text_collector_class, permitted_use_text_collector_class=None, usage_tracker=None)[source]#
Parse a single document for ordinance information
- Parameters:
doc (
elm.web.document.BaseDocument) – A document instance (PDF, HTML, etc) potentially containing ordinance information. Note that if the document’sattrshas the"contains_ord_info"key, it will not be processed. To force a document to be processed by this function, remove that key from the documentsattrs.tech (
str) – Technology of interest (e.g. “solar”, “wind”, etc). This is used to set up some document validation decision trees.text_splitter (
TextSplitter, optional) – Optional Langchain text splitter (or subclass instance), or any object that implements a split_text method. The method should take text as input (str) and return a list of text chunks.usage_tracker (
UsageTracker, optional) – Optional tracker instance to monitor token usage during LLM calls. By default,None.
- Returns:
elm.web.document.BaseDocument– Document that has been parsed for ordinance text. The results of the parsing are stored in the documents attrs. In particular, the attrs will contain a"contains_ord_info"key that will be set toTrueif ordinance info was found in the text, andFalseotherwise. IfTrue, the attrs will also contain a"date"key containing the most recent date that the ordinance was enacted (or a tuple of None if not found), and an"ordinance_text"key containing the ordinance text snippet. Note that the snippet may contain other info as well, but should encapsulate all of the ordinance text.