elm.web.google_search.filter_documents

async filter_documents(documents, validation_coroutine, task_name=None, **kwargs)[source]

Filter documents by applying a filter function to each.

Parameters:
  • documents (iter of elm.web.document.BaseDocument) – Iterable of documents to filter.

  • validation_coroutine (coroutine) – A coroutine that returns False if the document should be discarded and True otherwise. This function should take a single elm.web.document.BaseDocument instance as the first argument. The function may have other arguments, which will be passed down using **kwargs.

  • task_name (str, optional) – Optional task name to use in asyncio.create_task(). By default, None.

  • **kwargs – Keyword-argument pairs to pass to validation_coroutine. This should not include the document instance itself, which will be independently passed in as the first argument.

Returns:

list of elm.web.document.BaseDocument – List of documents that passed the validation check, sorted by text length, with PDF documents taking the highest precedence.