compass.validation.content.parse_by_chunks#

async parse_by_chunks(chunk_parser, heuristic, legal_text_validator, callbacks=None, min_chunks_to_process=3)[source]#

Parse text by chunks, passing to callbacks if it’s legal text

This method goes through the chunks one by one, and passes them to the callback parsers if the legal_text_validator check passes. If min_chunks_to_process number of chunks fail the legal text check, parsing is aborted.

Parameters:
  • chunk_parser (ParseChunksWithMemory) – Instance of ParseChunksWithMemory that contains the attributes text_chunks and num_to_recall. The chunks in the text_chunks attribute will be iterated over.

  • heuristic (Heuristic) – Instance of Heuristic with a check method. This should be a fast check meant to quickly dispose of chunks of text. Any chunk that fails this check will NOT be passed to the callback parsers.

  • legal_text_validator (LegalTextValidator) – Instance of LegalTextValidator that can be used to validate each chunk for legal text.

  • callbacks (list, optional) – List of async callbacks that take a chunk_parser and index as inputs and return a boolean determining whether the text chunk was parsed successfully or not. By default, None, which does not use any callbacks.

  • min_chunks_to_process (int, optional) – Minimum number of chunks to process before aborting due to text not being legal. By default, 3.