elm.web.website_crawl

ELM Document retrieval from a website

Module attributes

BEST_ZONING_ORDINANCE_KEYWORDS

Keywords and their associated point values for scoring URLs.

ELM_URL_FILTER

Filter used to exclude URLs that are not relevant to the search

Classes

ContentTypeExcludeFilter([exclude_extensions])

Content type to exclude filter using fast lookups

ELMLinkScorer([keyword_points])

Custom URL scorer for ELM website crawling

ELMWebsiteCrawler(validator[, ...])

Crawl a website for documents of interest

ELMWebsiteCrawlingStrategy(max_depth, ...)

Custom crawling strategy for ELM website searching

PeekablePriorityQueue([maxsize])

A priority queue that allows peeking at the next item