compass.validation.content.LegalTextValidator#
- class LegalTextValidator(*args, score_threshold=0.8, doc_is_from_ocr=False, **kwargs)[source]#
Bases:
StructuredLLMCaller
Parse chunks to determine if they contain legal text
- Parameters:
score_threshold (
float
, optional) – Minimum fraction of text chunks that have to pass the legal check for the whole document to be considered legal text. By default,0.8
.*args, **kwargs – Parameters to pass to the
StructuredLLMCaller
initializer.
Methods
call
(sys_msg, content[, usage_sub_label])Call LLM for structured data retrieval.
check_chunk
(chunk_parser, ind)Check a chunk at a given ind to see if it contains legal text
Attributes
SYSTEM_MESSAGE
True
if text was found to be from a legal source- async check_chunk(chunk_parser, ind)[source]#
Check a chunk at a given ind to see if it contains legal text
- Parameters:
chunk_parser (
ParseChunksWithMemory
) – Instance of ParseChunksWithMemory that contains a parse_from_ind method.ind (
int
) – Index of the chunk to check.
- Returns:
bool
– Boolean flag indicating whether or not the text in the chunk resembles legal text.
- async call(sys_msg, content, usage_sub_label=LLMUsageCategory.DEFAULT)#
Call LLM for structured data retrieval.
- Parameters:
sys_msg (
str
) – The LLM system message. If this text does not contain the instruction text “Return your answer as a dictionary in JSON format”, it will be added.content (
str
) – LLM call content (typically some text to extract info from).usage_sub_label (
str
, optional) – Label to store token usage under. By default,"default"
.
- Returns:
dict
– Dictionary containing the LLM-extracted features. Dictionary may be empty if there was an error during the LLM call.