compass.extraction.date.DateExtractor#

class DateExtractor(structured_llm_caller, text_splitter=None)[source]#

Bases: object

Helper class to extract date info from document

Parameters:
  • structured_llm_caller (StructuredLLMCaller) – Instance used for structured validation queries.

  • text_splitter (TextSplitter, optional) – Optional text splitter (or subclass instance, or any object that implements a split_text method) to attach to doc (used for splitting out pages in an HTML document). By default, None.

Methods

parse(doc)

Extract date (year, month, day) from doc

Attributes

SYSTEM_MESSAGE

System message for date extraction LLM calls

SYSTEM_MESSAGE = "You are a legal scholar that reads ordinance text and extracts structured date information. Return your answer as a dictionary in JSON format (not markdown). Your JSON file must include exactly four keys. The first key is 'explanation', which contains a short summary of the most relevant date information you found in the text. The second key is 'year', which should contain an integer value that represents the latest year this ordinance was enacted/updated, or null if that information cannot be found in the text. The third key is 'month', which should contain an integer value that represents the latest month of the year this ordinance was enacted/updated, or null if that information cannot be found in the text. The fourth key is 'day', which should contain an integer value that represents the latest day of the month this ordinance was enacted/updated, or null if that information cannot be found in the text. Only provide values if you are confident that they represent the latest date this ordinance was enacted/updated"#

System message for date extraction LLM calls

async parse(doc)[source]#

Extract date (year, month, day) from doc

Parameters:

doc (elm.web.document.BaseDocument) – Document with a raw_pages attribute.

Returns:

tuple – 3-tuple containing year, month, day, or None if any of those are not found.