compass.scripts.download.download_jurisdiction_ordinance_using_search_engine#
- async download_jurisdiction_ordinance_using_search_engine(question_templates, jurisdiction, num_urls=5, file_loader_kwargs=None, search_semaphore=None, browser_semaphore=None, url_ignore_substrings=None, **kwargs)[source]#
Download the ordinance document(s) for a single jurisdiction
- Parameters:
jurisdiction (
Jurisdiction
) – Location objects representing the jurisdiction.model_configs (
dict
) – Dictionary ofLLMConfig
instances. Should have at minium a “default” key that is used as a fallback for all tasks.num_urls (
int
, optional) – Number of unique Google search result URL’s to check for ordinance document. By default,5
.file_loader_kwargs (
dict
, optional) – Dictionary of keyword-argument pairs to initializeelm.web.file_loader.AsyncFileLoader
with. If found, the “pw_launch_kwargs” key in these will also be used to initialize theelm.web.search.google.PlaywrightGoogleLinkSearch
used for the google URL search. By default,None
.search_semaphore (
asyncio.Semaphore
, optional) – Semaphore instance that can be used to limit the number of playwright browsers used to submit search engine queries open concurrently. If this input isNone
, the input from browser_semaphore will be used in its place (i.e. the searches and file downloads will be limited using the same semaphore). By default,None
.browser_semaphore (
asyncio.Semaphore
, optional) – Semaphore instance that can be used to limit the number of playwright browsers used to download content from the web open concurrently. IfNone
, no limits are applied. By default,None
.usage_tracker (
compass.services.usage.UsageTracker
, optional) – Optional tracker instance to monitor token usage during LLM calls. By default,None
.
- Returns:
list
orNone
– List ofBaseDocument
instances possibly containing ordinance information, orNone
if no ordinance document was found.
Notes
Requires
TempFileCachePB
service to be running.