elm.web.document.BaseDocument

class BaseDocument(pages, metadata=None)[source]

Bases: ABC

Base ELM web document representation.

Parameters:
  • pages (iterable) – Iterable of strings, where each string is a page of a document.

  • metadata (dict, optional) – Optional dict containing metadata for the document. By default, None.

Methods

Attributes

FILE_EXTENSION

Cleaned document file extension.

WRITE_KWARGS

Dict of kwargs to pass to open when writing this doc.

empty

True if the document contains no pages.

raw_pages

List of (a limited count of) raw pages

text

Cleaned text from document

property empty

True if the document contains no pages.

Type:

bool

property raw_pages

List of (a limited count of) raw pages

Type:

list

property text

Cleaned text from document

Type:

str

abstract property WRITE_KWARGS

Dict of kwargs to pass to open when writing this doc.

Type:

dict

abstract property FILE_EXTENSION

Cleaned document file extension.

Type:

str