compass.extraction.date.DateExtractor#

class DateExtractor(structured_llm_caller, text_splitter=None)[source]#

Bases: object

Helper class to extract date info from document

Parameters:
  • structured_llm_caller (compass.llm.StructuredLLMCaller) – StructuredLLMCaller instance. Used for structured validation queries.

  • text_splitter (langchain.text_splitter.TextSplitter, optional) – Optional text splitter instance to attach to doc (used for splitting out pages in an HTML document). By default, None.

Methods

parse(doc)

Extract date (year, month, day) from doc

Attributes

SYSTEM_MESSAGE

async parse(doc)[source]#

Extract date (year, month, day) from doc

Parameters:

doc (elm.web.document.BaseDocument) – Document with a raw_pages attribute.

Returns:

tuple – 3-tuple containing year, month, day, or None if any of those are not found.