elm.web.google_search.filter_documents
- async filter_documents(documents, validation_coroutine, task_name=None, **kwargs)[source]
Filter documents by applying a filter function to each.
- Parameters:
documents (iter of
elm.web.document.BaseDocument
) – Iterable of documents to filter.validation_coroutine (coroutine) – A coroutine that returns
False
if the document should be discarded andTrue
otherwise. This function should take a singleelm.web.document.BaseDocument
instance as the first argument. The function may have other arguments, which will be passed down using **kwargs.task_name (str, optional) – Optional task name to use in
asyncio.create_task()
. By default,None
.**kwargs – Keyword-argument pairs to pass to validation_coroutine. This should not include the document instance itself, which will be independently passed in as the first argument.
- Returns:
list of
elm.web.document.BaseDocument
– List of documents that passed the validation check, sorted by text length, with PDF documents taking the highest precedence.