compass.extraction.date.DateExtractor#
- class DateExtractor(structured_llm_caller, text_splitter=None)[source]#
Bases:
object
Helper class to extract date info from document
- Parameters:
structured_llm_caller (
compass.llm.StructuredLLMCaller
) – StructuredLLMCaller instance. Used for structured validation queries.text_splitter (
langchain.text_splitter.TextSplitter
, optional) – Optional text splitter instance to attach to doc (used for splitting out pages in an HTML document). By default,None
.
Methods
parse
(doc)Extract date (year, month, day) from doc
Attributes
SYSTEM_MESSAGE
- async parse(doc)[source]#
Extract date (year, month, day) from doc
- Parameters:
doc (
elm.web.document.BaseDocument
) – Document with a raw_pages attribute.- Returns:
tuple
– 3-tuple containing year, month, day, orNone
if any of those are not found.