elm.ords.services.cpu.PDFLoader
- class PDFLoader(**kwargs)[source]
Bases:
ProcessPoolService
Class to load PDFs in a ProcessPoolExecutor.
- Parameters:
**kwargs – Keyword-value argument pairs to pass to
concurrent.futures.ProcessPoolExecutor
. By default,None
.
Methods
Open thread pool and temp directory
call
(*args, **kwargs)Call the service.
process
(fn, pdf_bytes, **kwargs)Write URL doc to file asynchronously.
process_using_futures
(fut, *args, **kwargs)Process a call to the service.
Shutdown thread pool and cleanup temp directory
Attributes
Max number of concurrent job submissions.
Always
True
(limiting is handled by asyncio)Service name used to pull the correct queue object.
- async process(fn, pdf_bytes, **kwargs)[source]
Write URL doc to file asynchronously.
- Parameters:
doc (elm.web.document.Document) – Document containing meta information about the file. Must have a “source” key in the metadata dict containing the URL, which will be converted to a file name using
compute_fn_from_url()
.file_content (str | bytes) – File content, typically string text for HTML files and bytes for PDF file.
make_name_unique (bool, optional) – Option to make file name unique by adding a UUID at the end of the file name. By default,
False
.
- Returns:
Path – Path to output file.
- MAX_CONCURRENT_JOBS = 10000
Max number of concurrent job submissions.
- acquire_resources()
Open thread pool and temp directory
- async classmethod call(*args, **kwargs)
Call the service.
- Parameters:
*args, **kwargs – Positional and keyword arguments to be passed to the underlying service processing function.
- Returns:
obj – A response object from the underlying service.
- async process_using_futures(fut, *args, **kwargs)
Process a call to the service.
- Parameters:
fut (asyncio.Future) – A future object that should get the result of the processing operation. If the processing function returns
answer
, this method should callfut.set_result(answer)
.**kwargs – Keyword arguments to be passed to the underlying processing function.
- release_resources()
Shutdown thread pool and cleanup temp directory