elm.embed.ChunkAndEmbed

class ChunkAndEmbed(text, model=None, **chunk_kwargs)[source]

Bases: ApiBase

Class to chunk text data and create embeddings

Parameters:

text (str) – Single continuous piece of text to chunk up by paragraph and embed or filepath to .txt file containing one piece of text.
model (None | str) – Optional specification of OpenAI model to use. Default is cls.DEFAULT_MODEL
chunk_kwargs (dict | None) – kwargs for initialization of elm.chunk.Chunker

Methods

`call_api`(url, headers, request_json)	Make an asyncronous OpenAI API call.
`call_api_async`(url, headers, all_request_jsons)	Use GPT to clean raw pdf text in parallel calls to the OpenAI API.
`chat`(query[, temperature])	Have a continuous chat with the LLM including context from previous chat() calls stored as attributes in this class.
`clean_tables`(text)	Make sure that table headers are in the same paragraph as the table itself.
`clear`()	Clear chat history and reduce messages to just the initial model role message.
`count_tokens`(text, model[, fallback_model])	Return the number of tokens in a string.
`generic_async_query`(queries[, model_role, ...])	Run a number of generic single queries asynchronously (not conversational)
`generic_query`(query[, model_role, temperature])	Ask a generic single query without conversation
`get_embedding`(text)	Get the 1D array (list) embedding of a text string.
`run`([rate_limit])	Run text embedding in serial
`run_async`([rate_limit])	Run text embedding on chunks asynchronously

Attributes

`DEFAULT_MODEL`	Default model to do embeddings.
`EMBEDDING_MODEL`	Default model to do text embeddings.
`EMBEDDING_URL`	OpenAI embedding API URL
`HEADERS`	OpenAI API Headers
`MODEL_ROLE`	High level model role
`TOKENIZER_ALIASES`	Optional mappings for unusual Azure names to tiktoken/openai names.
`TOKENIZER_PATTERNS`	Order-prioritized list of model sub-strings to look for in model name to send to tokenizer.
`URL`	OpenAI API URL to be used with environment variable OPENAI_API_KEY.
`all_messages_txt`	Get a string printout of the full conversation with the LLM

DEFAULT_MODEL = 'text-embedding-ada-002': Default model to do embeddings.

EMBEDDING_MODEL = 'text-embedding-ada-002': Default model to do text embeddings.

EMBEDDING_URL = 'https://api.openai.com/v1/embeddings': OpenAI embedding API URL

HEADERS = {'Authorization': 'Bearer None', 'Content-Type': 'application/json', 'api-key': 'None'}: OpenAI API Headers

MODEL_ROLE = 'You are a research assistant that answers questions.': High level model role

TOKENIZER_ALIASES = {'gpt-35-turbo': 'gpt-3.5-turbo', 'gpt-4-32k': 'gpt-4-32k-0314', 'llmev-gpt-4-32k': 'gpt-4-32k-0314', 'wetosa-gpt-4': 'gpt-4', 'wetosa-gpt-4-standard': 'gpt-4', 'wetosa-gpt-4o': 'gpt-4o'}: Optional mappings for unusual Azure names to tiktoken/openai names.

TOKENIZER_PATTERNS = ('gpt-5', 'gpt-4o', 'gpt-4-32k', 'gpt-4'): Order-prioritized list of model sub-strings to look for in model name to send to tokenizer. As an alternative to alias lookup, this will use the tokenizer pattern if found in the model string

URL = 'https://api.openai.com/v1/chat/completions': OpenAI API URL to be used with environment variable OPENAI_API_KEY. Use an Azure API endpoint to trigger Azure usage along with environment variables AZURE_OPENAI_KEY, AZURE_OPENAI_VERSION, and AZURE_OPENAI_ENDPOINT

property all_messages_txt

Get a string printout of the full conversation with the LLM

Returns:: str

async static call_api(url, headers, request_json)

Make an asyncronous OpenAI API call.

Parameters:

url (str) –

OpenAI API url, typically either:
https://api.openai.com/v1/embeddings https://api.openai.com/v1/chat/completions
headers (dict) –

OpenAI API headers, typically:

{“Content-Type”: “application/json”,
“Authorization”: f”Bearer {openai.api_key}”}
request_json (dict) –

API data input, typically looks like this for chat completion:

{“model”: “gpt-3.5-turbo”,

“messages”: [{“role”: “system”, “content”: “You do this…”},
{“role”: “user”, “content”: “Do this: {}”}],

“temperature”: 0.0}

Returns:

out (dict) – API response in json format

async call_api_async(url, headers, all_request_jsons, ignore_error=None, rate_limit=40000.0)

Use GPT to clean raw pdf text in parallel calls to the OpenAI API.

NOTE: you need to call this using the await command in ipython or jupyter, e.g.: out = await PDFtoTXT.clean_txt_async()

Parameters:

url (str) –

OpenAI API url, typically either:
https://api.openai.com/v1/embeddings https://api.openai.com/v1/chat/completions
headers (dict) –

OpenAI API headers, typically:

{“Content-Type”: “application/json”,
“Authorization”: f”Bearer {openai.api_key}”}
all_request_jsons (list) – List of API data input, one entry typically looks like this for chat completion:

{“model”: “gpt-3.5-turbo”,

“messages”: [{“role”: “system”, “content”: “You do this…”},
{“role”: “user”, “content”: “Do this: {}”}],

“temperature”: 0.0}
ignore_error (None | callable) – Optional callable to parse API error string. If the callable returns True, the error will be ignored, the API call will not be tried again, and the output will be an empty string.
rate_limit (float) – OpenAI API rate limit (tokens / minute). Note that the gpt-3.5-turbo limit is 90k as of 4/2023, but we’re using a large factor of safety (~1/2) because we can only count the tokens on the input side and assume the output is about the same count.

Returns:

out (list) – List of API outputs where each list entry is a GPT answer from the corresponding message in the all_request_jsons input.

chat(query, temperature=0)

Have a continuous chat with the LLM including context from previous chat() calls stored as attributes in this class.

Parameters:

query (str) – Question to ask ChatGPT
temperature (float) – GPT model temperature, a measure of response entropy from 0 to 1. 0 is more reliable and nearly deterministic; 1 will give the model more creative freedom and may not return as factual of results.

Returns:

response (str) – Model response

static clean_tables(text)[source]: Make sure that table headers are in the same paragraph as the table itself. Typically, tables are looked for with pipes and hyphens, which is how GPT cleans tables in text.

clear(): Clear chat history and reduce messages to just the initial model role message.

classmethod count_tokens(text, model, fallback_model='gpt-5')

Return the number of tokens in a string.

Parameters:

text (str) – Text string to get number of tokens for
model (str) – specification of OpenAI model to use (e.g., “gpt-3.5-turbo”)
fallback_model (str, default=’gpt-5’) – Model to be used for tokenizer if input model can’t be found in TOKENIZER_ALIASES and doesn’t have any easily noticeable patterns.

Returns:

n (int) – Number of tokens in text

async generic_async_query(queries, model_role=None, temperature=0, ignore_error=None, rate_limit=40000.0)

Run a number of generic single queries asynchronously (not conversational)

NOTE: you need to call this using the await command in ipython or jupyter, e.g.: out = await Summary.run_async()

Parameters:

query (list) – Questions to ask ChatGPT (list of strings)
model_role (str | None) – Role for the model to take, e.g.: “You are a research assistant”. This defaults to self.MODEL_ROLE
temperature (float) – GPT model temperature, a measure of response entropy from 0 to 1. 0 is more reliable and nearly deterministic; 1 will give the model more creative freedom and may not return as factual of results.
ignore_error (None | callable) – Optional callable to parse API error string. If the callable returns True, the error will be ignored, the API call will not be tried again, and the output will be an empty string.
rate_limit (float) – OpenAI API rate limit (tokens / minute). Note that the gpt-3.5-turbo limit is 90k as of 4/2023, but we’re using a large factor of safety (~1/2) because we can only count the tokens on the input side and assume the output is about the same count.

Returns:

response (list) – Model responses with same length as query input.

generic_query(query, model_role=None, temperature=0)

Ask a generic single query without conversation

Parameters:

query (str) – Question to ask ChatGPT
model_role (str | None) – Role for the model to take, e.g.: “You are a research assistant”. This defaults to self.MODEL_ROLE
temperature (float) – GPT model temperature, a measure of response entropy from 0 to 1. 0 is more reliable and nearly deterministic; 1 will give the model more creative freedom and may not return as factual of results.

Returns:

response (str) – Model response

classmethod get_embedding(text)

Get the 1D array (list) embedding of a text string.

Parameters:: text (str) – Text to embed
Returns:: embedding (list) – List of float that represents the numerical embedding of the text

run(rate_limit=175000.0)[source]

Run text embedding in serial

Parameters:: rate_limit (float) – OpenAI API rate limit (tokens / minute). Note that the embedding limit is 350k as of 4/2023, but we’re using a large factor of safety (~1/2) because we can only count the tokens on the input side and assume the output is about the same count.
Returns:: embedding (list) – List of 1D arrays representing the embeddings for all text chunks

async run_async(rate_limit=175000.0)[source]

Run text embedding on chunks asynchronously

NOTE: you need to call this using the await command in ipython or jupyter, e.g.: out = await ChunkAndEmbed.run_async()

Parameters:: rate_limit (float) – OpenAI API rate limit (tokens / minute). Note that the embedding limit is 350k as of 4/2023, but we’re using a large factor of safety (~1/2) because we can only count the tokens on the input side and assume the output is about the same count.
Returns:: embedding (list) – List of 1D arrays representing the embeddings for all text chunks