elm.web.file_loader.BaseAsyncFileLoader
- class BaseAsyncFileLoader(pdf_read_coroutine, html_read_coroutine, pdf_read_kwargs=None, html_read_kwargs=None, pdf_ocr_read_coroutine=None, file_cache_coroutine=None, **__)[source]
Bases:
ABCBase class for async file loading
- Parameters:
pdf_read_coroutine (callable) – PDF file read coroutine. Must by an async function. Must return a
elm.web.document.PDFDocument.html_read_coroutine (callable, optional) – HTML file read coroutine. Must by an async function. Must return a
elm.web.document.HTMLDocument.pdf_read_kwargs (dict, optional) – Keyword-value argument pairs to pass to the pdf_read_coroutine. By default,
None.html_read_kwargs (dict, optional) – Keyword-value argument pairs to pass to the html_read_coroutine. By default,
None.pdf_ocr_read_coroutine (callable, optional) – PDF OCR file read coroutine. Must by an async function. Should accept PDF bytes as the first argument and kwargs as the rest. Must return a
elm.web.document.PDFDocument. IfNone, PDF OCR parsing is not attempted, and any scanned PDF URL’s will return a blank document. By default,None.file_cache_coroutine (callable, optional) – File caching coroutine. Can be used to cache files downloaded by this class. Must accept an
Documentinstance as the first argument and the file content to be written as the second argument. If this method is not provided, no document caching is performed. By default,None.
Methods
fetch(source)Fetch a document for the given source.
fetch_all(*sources)Fetch documents for all requested sources.