compass.services.cpu.PDFLoader#
- class PDFLoader(**kwargs)[source]#
Bases:
ProcessPoolService
Class to load PDFs in a ProcessPoolExecutor
- Parameters:
**kwargs – Keyword-value argument pairs to pass to
concurrent.futures.ProcessPoolExecutor
. By default,None
.
Methods
Open thread pool and temp directory
call
(*args, **kwargs)Call the service
process
(fn, pdf_bytes, **kwargs)Write URL doc to file asynchronously
process_using_futures
(fut, *args, **kwargs)Process a call to the service
Shutdown thread pool and cleanup temp directory
Attributes
Max number of concurrent job submissions.
Always
True
(limiting is handled by asyncio)Service name used to pull the correct queue object
- async process(fn, pdf_bytes, **kwargs)[source]#
Write URL doc to file asynchronously
- Parameters:
doc (
elm.web.document.Document
) – Document containing meta information about the file. Must have a “source” key in theattrs
dict containing the URL, which will be converted to a file name usingcompute_fn_from_url()
.file_content (
str
orbytes
) – File content, typically string text for HTML files and bytes for PDF file.make_name_unique (
bool
, optional) – Option to make file name unique by adding a UUID at the end of the file name. By default,False
.
- Returns:
Path
– Path to output file.
- MAX_CONCURRENT_JOBS = 10000#
Max number of concurrent job submissions.
- acquire_resources()#
Open thread pool and temp directory
- async classmethod call(*args, **kwargs)#
Call the service
- Parameters:
*args, **kwargs – Positional and keyword arguments to be passed to the underlying service processing function.
- Returns:
obj
– A response object from the underlying service.
- async process_using_futures(fut, *args, **kwargs)#
Process a call to the service
- Parameters:
fut (
asyncio.Future
) – A future object that should get the result of the processing operation. If the processing function returnsanswer
, this method should callfut.set_result(answer)
.**kwargs – Keyword arguments to be passed to the underlying processing function.
- release_resources()#
Shutdown thread pool and cleanup temp directory