compass.extraction.solar.ordinance.SolarHeuristic#

class SolarHeuristic[source]#

Bases: Heuristic

Perform a heuristic check for mention of solar farms in text

Methods

check(text[, match_count_threshold])

Check for mention of a tech in text

Attributes

GOOD_TECH_ACRONYMS

Acronyms for solar farms that we want to capture

GOOD_TECH_KEYWORDS

Words that indicate we should keep a chunk for analysis

GOOD_TECH_PHRASES

Phrases that indicate text is about solar farms

NOT_TECH_WORDS

Words and phrases that indicate text is NOT about solar farms

NOT_TECH_WORDS = ['concentrated solar', 'csp', 'micro secs', 'small secs', 'mini secs', 'private secs', 'personal secs', 'psecs', 'solaris', 'small solar', 'micro solar', 'mini solar', 'private solar', 'personal solar', 'swecs', 'solar break', 'solar damage', 'solar data', 'solar resource']#

Words and phrases that indicate text is NOT about solar farms

GOOD_TECH_KEYWORDS = ['solar', 'setback']#

Words that indicate we should keep a chunk for analysis

GOOD_TECH_ACRONYMS = ['secs', 'sef', 'ses', 'cses']#

Acronyms for solar farms that we want to capture

GOOD_TECH_PHRASES = ['commercial solar energy system', 'solar energy conversion', 'solar energy system', 'solar panel', 'solar farm', 'solar energy farm', 'utility solar energy system']#

Phrases that indicate text is about solar farms

check(text, match_count_threshold=1)#

Check for mention of a tech in text

This check first strips the text of any tech “look-alike” words (e.g. “window”, “windshield”, etc for “wind” technology). Then, it checks for particular keywords, acronyms, and phrases that pertain to the tech in the text. If enough keywords are mentions (as dictated by match_count_threshold), this check returns True.

Parameters:
  • text (str) – Input text that may or may not mention the technology of interest.

  • match_count_threshold (int, optional) – Number of keywords that must match for the text to pass this heuristic check. Count must be strictly greater than this value. By default, 1.

Returns:

boolTrue if the number of keywords/acronyms/phrases detected exceeds the match_count_threshold.