elm.web.document.BaseDocument
- class BaseDocument(pages, metadata=None)[source]
Bases:
ABC
Base ELM web document representation.
- Parameters:
pages (iterable) – Iterable of strings, where each string is a page of a document.
metadata (dict, optional) – Optional dict containing metadata for the document. By default,
None
.
Methods
Attributes
Cleaned document file extension.
Dict of kwargs to pass to open when writing this doc.
True
if the document contains no pages.List of (a limited count of) raw pages
Cleaned text from document