compass.web.website_crawl#

Custom COMPASS website crawler

Much more simplistic than the Crawl4AI crawler, but designed to access some links that Crawl4AI cannot (such as those behind a button interface).

Module attributes

DOC_THRESHOLD

Default max documents to collect before terminating COMPASS crawl

Classes

COMPASSCrawler(validator, url_scorer[, ...])

A simple website crawler to search for ordinance documents

COMPASSLinkScorer([keyword_points])

Custom URL scorer for COMPASS website crawling

Link(*[, href, text, title, base_domain, ...])

Crawl4AI Link subclass with a few utilities