compass.validation.content.LegalTextValidator#

class LegalTextValidator(*args, score_threshold=0.8, doc_is_from_ocr=False, **kwargs)[source]#

Bases: StructuredLLMCaller

Parse chunks to determine if they contain legal text

Parameters:
  • score_threshold (float, optional) – Minimum fraction of text chunks that have to pass the legal check for the whole document to be considered legal text. By default, 0.8.

  • *args, **kwargs – Parameters to pass to the StructuredLLMCaller initializer.

Methods

call(sys_msg, content[, usage_sub_label])

Call LLM for structured data retrieval.

check_chunk(chunk_parser, ind)

Check a chunk at a given ind to see if it contains legal text

Attributes

SYSTEM_MESSAGE

is_legal_text

True if text was found to be from a legal source

True if text was found to be from a legal source

Type:

bool

async check_chunk(chunk_parser, ind)[source]#

Check a chunk at a given ind to see if it contains legal text

Parameters:
  • chunk_parser (ParseChunksWithMemory) – Instance of ParseChunksWithMemory that contains a parse_from_ind method.

  • ind (int) – Index of the chunk to check.

Returns:

bool – Boolean flag indicating whether or not the text in the chunk resembles legal text.

async call(sys_msg, content, usage_sub_label=LLMUsageCategory.DEFAULT)#

Call LLM for structured data retrieval.

Parameters:
  • sys_msg (str) – The LLM system message. If this text does not contain the instruction text “Return your answer as a dictionary in JSON format”, it will be added.

  • content (str) – LLM call content (typically some text to extract info from).

  • usage_sub_label (str, optional) – Label to store token usage under. By default, "default".

Returns:

dict – Dictionary containing the LLM-extracted features. Dictionary may be empty if there was an error during the LLM call.