compass.validation.content.LegalTextValidator#
- class LegalTextValidator(tech, *args, score_threshold=0.8, doc_is_from_ocr=False, **kwargs)[source]#
Bases:
StructuredLLMCallerParse chunks to determine if they contain legal text
- Parameters:
tech (
str) – Technology of interest (e.g. “solar”, “wind”, etc). This is used to set up some document validation decision trees.score_threshold (
float, optional) – Minimum fraction of text chunks that have to pass the legal check for the whole document to be considered legal text. By default,0.8.*args – Parameters to pass to the StructuredLLMCaller initializer.
**kwargs – Parameters to pass to the StructuredLLMCaller initializer.
Methods
call(sys_msg, content[, usage_sub_label])Call LLM for structured data retrieval
check_chunk(chunk_parser, ind)Check a chunk at a given ind to see if it contains legal text
Attributes
System message for legal text validation LLM calls
Trueif text was found to be from a legal source- SYSTEM_MESSAGE = 'You are an AI designed to classify text excerpts based on their source type. The goal is to identify text that is extracted from **legally binding regulations (such as zoning ordinances or enforceable bans)** and filter out text that was extracted from anything other than a legal statute for an existing jurisdiction.'#
System message for legal text validation LLM calls
- async check_chunk(chunk_parser, ind)[source]#
Check a chunk at a given ind to see if it contains legal text
- Parameters:
chunk_parser (
ParseChunksWithMemory) – Instance that contains aparse_from_indmethod.ind (
int) – Index of the chunk to check.
- Returns:
bool– Boolean flag indicating whether or not the text in the chunk resembles legal text.
- async call(sys_msg, content, usage_sub_label=LLMUsageCategory.DEFAULT)#
Call LLM for structured data retrieval
- Parameters:
sys_msg (
str) – The LLM system message. If this text does not contain the instruction text “Return your answer as a dictionary in JSON format”, it will be added.content (
str) – LLM call content (typically some text to extract info from).usage_sub_label (
str, optional) – Label to store token usage under. By default,"default".
- Returns:
dict– Dictionary containing the LLM-extracted features. Dictionary may be empty if there was an error during the LLM call.