compass.validation.content.parse_by_chunks#
- async parse_by_chunks(chunk_parser, heuristic, legal_text_validator, callbacks=None, min_chunks_to_process=3)[source]#
Parse text by chunks, passing to callbacks if it’s legal text
This method goes through the chunks one by one, and passes them to the callback parsers if the legal_text_validator check passes. If min_chunks_to_process number of chunks fail the legal text check, parsing is aborted.
- Parameters:
chunk_parser (
ParseChunksWithMemory
) – Instance of ParseChunksWithMemory that contains the attributes text_chunks and num_to_recall. The chunks in the text_chunks attribute will be iterated over.heuristic (
Heuristic
) – Instance of Heuristic with a check method. This should be a fast check meant to quickly dispose of chunks of text. Any chunk that fails this check will NOT be passed to the callback parsers.legal_text_validator (
LegalTextValidator
) – Instance of LegalTextValidator that can be used to validate each chunk for legal text.callbacks (
list
, optional) – List of async callbacks that take a chunk_parser and index as inputs and return a boolean determining whether the text chunk was parsed successfully or not. By default,None
, which does not use any callbacks.min_chunks_to_process (
int
, optional) – Minimum number of chunks to process before aborting due to text not being legal. By default,3
.