compass.scripts.download.find_jurisdiction_website#

async find_jurisdiction_website(jurisdiction, model_configs, file_loader_kwargs=None, search_semaphore=None, browser_semaphore=None, usage_tracker=None, url_ignore_substrings=None, **kwargs)[source]#

Search for the main landing page of a given jurisdiction

Parameters:
  • jurisdiction (Jurisdiction) – Jurisdiction instance representing the jurisdiction to find the main webpage for.

  • model_configs (dict) – Dictionary of LLMConfig instances. Should have at minium a “default” key that is used as a fallback for all tasks.

  • file_loader_kwargs (dict, optional) – Dictionary of keyword arguments pairs to initialize elm.web.file_loader.AsyncFileLoader. If found, the “pw_launch_kwargs” key in these will also be used to initialize the elm.web.search.google.PlaywrightGoogleLinkSearch used for the Google URL search. By default, None.

  • search_semaphore (asyncio.Semaphore, optional) – Semaphore instance that can be used to limit the number of playwright browsers used to submit search engine queries open concurrently. If None, no limits are applied. By default, None.

  • browser_semaphore (asyncio.Semaphore, optional) – Semaphore instance that can be used to limit the number of playwright browsers open concurrently. If None, no limits are applied. By default, None.

  • usage_tracker (compass.services.usage.UsageTracker, optional) – Optional tracker instance to monitor token usage during LLM calls. By default, None.

Returns:

str | None – URL for the jurisdiction website, if found; None otherwise.