compass.scripts.download.find_jurisdiction_website#
- async find_jurisdiction_website(jurisdiction, model_configs, file_loader_kwargs=None, search_semaphore=None, browser_semaphore=None, usage_tracker=None, url_ignore_substrings=None, **kwargs)[source]#
Search for the main landing page of a given jurisdiction
- Parameters:
jurisdiction (
Jurisdiction
) – Jurisdiction instance representing the jurisdiction to find the main webpage for.model_configs (
dict
) – Dictionary ofLLMConfig
instances. Should have at minium a “default” key that is used as a fallback for all tasks.file_loader_kwargs (
dict
, optional) – Dictionary of keyword arguments pairs to initializeelm.web.file_loader.AsyncFileLoader
. If found, the “pw_launch_kwargs” key in these will also be used to initialize theelm.web.search.google.PlaywrightGoogleLinkSearch
used for the Google URL search. By default,None
.search_semaphore (
asyncio.Semaphore
, optional) – Semaphore instance that can be used to limit the number of playwright browsers used to submit search engine queries open concurrently. IfNone
, no limits are applied. By default,None
.browser_semaphore (
asyncio.Semaphore
, optional) – Semaphore instance that can be used to limit the number of playwright browsers open concurrently. IfNone
, no limits are applied. By default,None
.usage_tracker (
compass.services.usage.UsageTracker
, optional) – Optional tracker instance to monitor token usage during LLM calls. By default,None
.
- Returns:
str | None
– URL for the jurisdiction website, if found;None
otherwise.