compass.extraction.date.DateExtractor#
- class DateExtractor(structured_llm_caller, text_splitter=None)[source]#
Bases:
objectHelper class to extract date info from document
- Parameters:
structured_llm_caller (
StructuredLLMCaller) – Instance used for structured validation queries.text_splitter (
TextSplitter, optional) – Optional text splitter (or subclass instance, or any object that implements a split_text method) to attach to doc (used for splitting out pages in an HTML document). By default,None.
Methods
parse(doc)Extract date (year, month, day) from doc
Attributes
System message for date extraction LLM calls
- SYSTEM_MESSAGE = "You are a legal scholar that reads ordinance text and extracts structured date information. Return your answer as a dictionary in JSON format (not markdown). Your JSON file must include exactly four keys. The first key is 'explanation', which contains a short summary of the most relevant date information you found in the text. The second key is 'year', which should contain an integer value that represents the latest year this ordinance was enacted/updated, or null if that information cannot be found in the text. The third key is 'month', which should contain an integer value that represents the latest month of the year this ordinance was enacted/updated, or null if that information cannot be found in the text. The fourth key is 'day', which should contain an integer value that represents the latest day of the month this ordinance was enacted/updated, or null if that information cannot be found in the text. Only provide values if you are confident that they represent the latest date this ordinance was enacted/updated"#
System message for date extraction LLM calls
- async parse(doc)[source]#
Extract date (year, month, day) from doc
- Parameters:
doc (
elm.web.document.BaseDocument) – Document with a raw_pages attribute.- Returns:
tuple– 3-tuple containing year, month, day, orNoneif any of those are not found.