elm.web.file_loader.AsyncLocalFileLoader
- class AsyncLocalFileLoader(pdf_read_kwargs=None, html_read_kwargs=None, pdf_read_coroutine=None, html_read_coroutine=None, pdf_ocr_read_coroutine=None, file_cache_coroutine=None, doc_attrs=None, **__)[source]
Bases:
BaseAsyncFileLoaderAsync local file (PDF or HTML) loader
- Parameters:
pdf_read_kwargs (dict, optional) – Keyword-value argument pairs to pass to the pdf_read_coroutine. By default,
None.html_read_kwargs (dict, optional) – Keyword-value argument pairs to pass to the html_read_coroutine. By default,
None.pdf_read_coroutine (callable, optional) – PDF file read coroutine. Must by an async function. Should accept a PDF filepath as the first argument and kwargs as the rest. Must return a
elm.web.document.PDFDocumentalong with the raw PDF bytes (for caching purposes). IfNone, a default function that runs in the main thread is used. By default,None.html_read_coroutine (callable, optional) – HTML file read coroutine. Must by an async function. Should accept an HTML filepath as the first argument and kwargs as the rest. Must return a
elm.web.document.HTMLDocumentalong with the raw text (for caching purposes). IfNone, a default function that runs in the main thread is used. By default,None.pdf_ocr_read_coroutine (callable, optional) – PDF OCR file read coroutine. Must by an async function. Should accept a PDF filepath as the first argument and kwargs as the rest. Must return a
elm.web.document.PDFDocumentalong with the raw PDF bytes (for caching purposes). IfNone, PDF OCR parsing is not attempted, and any scanned PDF URL’s will return a blank document. By default,None.file_cache_coroutine (callable, optional) – File caching coroutine. Can be used to cache files downloaded by this class. Must accept an
Documentinstance as the first argument and the file content to be written as the second argument. If this method is not provided, no document caching is performed. By default,None.doc_attrs (dict, optional) – Additional document attributes to add to each loaded document. By default,
None.
Methods
fetch(source)Fetch a document for the given source.
fetch_all(*sources)Fetch documents for all requested sources.
- async fetch(source)
Fetch a document for the given source.
- Parameters:
source (str) – Source used to load the document.
- Returns:
elm.web.document.Document– Document instance containing text, if the load was successful.
- async fetch_all(*sources)
Fetch documents for all requested sources.
- Parameters:
*sources – Iterable of sources (as strings) used to fetch the documents.
- Returns:
list – List of documents, one per requested sources.