elm.utilities.parse.read_pdf
- read_pdf(pdf_bytes, verbose=True)[source]
Read PDF contents from bytes.
This method will automatically try to detect multi-column format and load the text without a physical layout in that case.
- Parameters:
pdf_bytes (bytes) – Bytes corresponding to a PDF file.
verbose (bool, optional) – Option to log errors during parsing. By default,
True
.
- Returns:
iterable – Iterable containing pages of the PDF document. This iterable may be empty if there was an error reading the PDF file.