compass.services.cpu.read_pdf_file_ocr#
- async read_pdf_file_ocr(pdf_fp, **kwargs)[source]#
Read local PDF file using OCR (pytesseract)
Note that Pytesseract must be set up properly for this method to work. In particular, the pytesseract.pytesseract.tesseract_cmd attribute must be set to point to the pytesseract exe.
- Parameters:
pdf_fp (path-like) – Path to PDF file (OCR).
**kwargs – Keyword-value arguments to pass to
elm.web.document.PDFDocumentinitializer.
- Returns:
elm.web.document.PDFDocument– PDFDocument instances with pages loaded as text.