elm.web.google_search.google_results_as_docs

async google_results_as_docs(queries, num_urls=None, browser_semaphore=None, task_name=None, **file_loader_kwargs)[source]

Retrieve top N google search results as document instances.

Parameters:
  • queries (collection of str) – Collection of strings representing google queries. Documents for the top num_urls google search results (from all of these queries combined_ will be returned from this function.

  • num_urls (int, optional) – Number of unique top Google search result to return as docs. The google search results from all queries are interleaved and the top num_urls unique URL’s are downloaded as docs. If this number is less than len(queries), some of your queries may not contribute to the final output. By default, None, which sets num_urls = 3 * len(queries).

  • browser_semaphore (asyncio.Semaphore, optional) – Semaphore instance that can be used to limit the number of playwright browsers open concurrently. If None, no limits are applied. By default, None.

  • task_name (str, optional) – Optional task name to use in asyncio.create_task(). By default, None.

  • **file_loader_kwargs – Keyword-argument pairs to initialize elm.web.file_loader.AsyncFileLoader with. If found, the “pw_launch_kwargs” key in these will also be used to initialize the elm.web.google_search.PlaywrightGoogleLinkSearch used for the google URL search. By default, None.

Returns:

list of elm.web.document.BaseDocument – List of documents representing the top num_urls results from the google searches across all queries.