elm.utilities.parse.read_pdf

read_pdf(pdf_bytes, verbose=True)[source]

Read PDF contents from bytes.

This method will automatically try to detect multi-column format and load the text without a physical layout in that case.

Parameters:
  • pdf_bytes (bytes) – Bytes corresponding to a PDF file.

  • verbose (bool, optional) – Option to log errors during parsing. By default, True.

Returns:

iterable – Iterable containing pages of the PDF document. This iterable may be empty if there was an error reading the PDF file.