elm.summary.Summary

class Summary(text, model=None, n_words=500, **chunk_kwargs)[source]

Bases: ApiBase

Interface to perform Recursive Summarization and Distillation of research text

Parameters:
  • text (str | list) – Single body of text to chunk up using elm.Chunker or a pre-chunked list of strings. Works well if this is a single document with empty lines between paragraphs.

  • model (str) – GPT model name, default is the DEFAULT_MODEL global var

  • n_words (int) – Desired length of the output text. Note that this is never perfect but helps guide the LLM to an approximate desired output length. 400-600 words seems to work quite well with GPT-4. This gets formatted into the MODEL_INSTRUCTION attribute.

  • chunk_kwargs (dict | None) – kwargs for initialization of elm.chunk.Chunker

Methods

call_api(url, headers, request_json)

Make an asyncronous OpenAI API call.

call_api_async(url, headers, all_request_jsons)

Use GPT to clean raw pdf text in parallel calls to the OpenAI API.

chat(query[, temperature])

Have a continuous chat with the LLM including context from previous chat() calls stored as attributes in this class.

clear()

Clear chat history and reduce messages to just the initial model role message.

combine(text_summary)

Combine separate chunk summaries into one more comprehensive narrative

count_tokens(text, model)

Return the number of tokens in a string.

generic_async_query(queries[, model_role, ...])

Run a number of generic single queries asynchronously (not conversational)

generic_query(query[, model_role, temperature])

Ask a generic single query without conversation

get_embedding(text)

Get the 1D array (list) embedding of a text string.

run([temperature, fancy_combine])

Use GPT to do a summary of input text.

run_async([temperature, ignore_error, ...])

Run text summary asynchronously for all text chunks

Attributes

DEFAULT_MODEL

Default model to do pdf text cleaning.

EMBEDDING_MODEL

Default model to do text embeddings.

EMBEDDING_URL

OpenAI embedding API URL

HEADERS

OpenAI API Headers

MODEL_INSTRUCTION

Prefix to the engineered prompt.

MODEL_ROLE

High level model role, somewhat redundant to MODEL_INSTRUCTION

TOKENIZER_ALIASES

Optional mappings for unusual Azure names to tiktoken/openai names.

URL

OpenAI API URL to be used with environment variable OPENAI_API_KEY.

all_messages_txt

Get a string printout of the full conversation with the LLM

MODEL_ROLE = 'You are an energy scientist summarizing prior research'

High level model role, somewhat redundant to MODEL_INSTRUCTION

MODEL_INSTRUCTION = 'Can you please summarize the text quoted above in {n_words} words?\n\n"""\n{text_chunk}\n"""'

Prefix to the engineered prompt. The format args text_chunk and n_words will be formatted by the Summary class at runtime. text_chunk will be provided by the Summary text chunks, n_words is an initialization argument for the Summary class.

DEFAULT_MODEL = 'gpt-3.5-turbo'

Default model to do pdf text cleaning.

EMBEDDING_MODEL = 'text-embedding-ada-002'

Default model to do text embeddings.

EMBEDDING_URL = 'https://api.openai.com/v1/embeddings'

OpenAI embedding API URL

HEADERS = {'Authorization': 'Bearer None', 'Content-Type': 'application/json', 'api-key': 'None'}

OpenAI API Headers

TOKENIZER_ALIASES = {'gpt-35-turbo': 'gpt-3.5-turbo', 'gpt-4-32k': 'gpt-4-32k-0314', 'llmev-gpt-4-32k': 'gpt-4-32k-0314'}

Optional mappings for unusual Azure names to tiktoken/openai names.

URL = 'https://api.openai.com/v1/chat/completions'

OpenAI API URL to be used with environment variable OPENAI_API_KEY. Use an Azure API endpoint to trigger Azure usage along with environment variables AZURE_OPENAI_KEY, AZURE_OPENAI_VERSION, and AZURE_OPENAI_ENDPOINT

property all_messages_txt

Get a string printout of the full conversation with the LLM

Returns:

str

async static call_api(url, headers, request_json)

Make an asyncronous OpenAI API call.

Parameters:
  • url (str) –

    OpenAI API url, typically either:

    https://api.openai.com/v1/embeddings https://api.openai.com/v1/chat/completions

  • headers (dict) –

    OpenAI API headers, typically:
    {“Content-Type”: “application/json”,

    “Authorization”: f”Bearer {openai.api_key}”}

  • request_json (dict) –

    API data input, typically looks like this for chat completion:
    {“model”: “gpt-3.5-turbo”,
    “messages”: [{“role”: “system”, “content”: “You do this…”},

    {“role”: “user”, “content”: “Do this: {}”}],

    “temperature”: 0.0}

Returns:

out (dict) – API response in json format

async call_api_async(url, headers, all_request_jsons, ignore_error=None, rate_limit=40000.0)

Use GPT to clean raw pdf text in parallel calls to the OpenAI API.

NOTE: you need to call this using the await command in ipython or jupyter, e.g.: out = await PDFtoTXT.clean_txt_async()

Parameters:
  • url (str) –

    OpenAI API url, typically either:

    https://api.openai.com/v1/embeddings https://api.openai.com/v1/chat/completions

  • headers (dict) –

    OpenAI API headers, typically:
    {“Content-Type”: “application/json”,

    “Authorization”: f”Bearer {openai.api_key}”}

  • all_request_jsons (list) – List of API data input, one entry typically looks like this for chat completion:

    {“model”: “gpt-3.5-turbo”,
    “messages”: [{“role”: “system”, “content”: “You do this…”},

    {“role”: “user”, “content”: “Do this: {}”}],

    “temperature”: 0.0}

  • ignore_error (None | callable) – Optional callable to parse API error string. If the callable returns True, the error will be ignored, the API call will not be tried again, and the output will be an empty string.

  • rate_limit (float) – OpenAI API rate limit (tokens / minute). Note that the gpt-3.5-turbo limit is 90k as of 4/2023, but we’re using a large factor of safety (~1/2) because we can only count the tokens on the input side and assume the output is about the same count.

Returns:

out (list) – List of API outputs where each list entry is a GPT answer from the corresponding message in the all_request_jsons input.

chat(query, temperature=0)

Have a continuous chat with the LLM including context from previous chat() calls stored as attributes in this class.

Parameters:
  • query (str) – Question to ask ChatGPT

  • temperature (float) – GPT model temperature, a measure of response entropy from 0 to 1. 0 is more reliable and nearly deterministic; 1 will give the model more creative freedom and may not return as factual of results.

Returns:

response (str) – Model response

clear()

Clear chat history and reduce messages to just the initial model role message.

classmethod count_tokens(text, model)

Return the number of tokens in a string.

Parameters:
  • text (str) – Text string to get number of tokens for

  • model (str) – specification of OpenAI model to use (e.g., “gpt-3.5-turbo”)

Returns:

n (int) – Number of tokens in text

async generic_async_query(queries, model_role=None, temperature=0, ignore_error=None, rate_limit=40000.0)

Run a number of generic single queries asynchronously (not conversational)

NOTE: you need to call this using the await command in ipython or jupyter, e.g.: out = await Summary.run_async()

Parameters:
  • query (list) – Questions to ask ChatGPT (list of strings)

  • model_role (str | None) – Role for the model to take, e.g.: “You are a research assistant”. This defaults to self.MODEL_ROLE

  • temperature (float) – GPT model temperature, a measure of response entropy from 0 to 1. 0 is more reliable and nearly deterministic; 1 will give the model more creative freedom and may not return as factual of results.

  • ignore_error (None | callable) – Optional callable to parse API error string. If the callable returns True, the error will be ignored, the API call will not be tried again, and the output will be an empty string.

  • rate_limit (float) – OpenAI API rate limit (tokens / minute). Note that the gpt-3.5-turbo limit is 90k as of 4/2023, but we’re using a large factor of safety (~1/2) because we can only count the tokens on the input side and assume the output is about the same count.

Returns:

response (list) – Model responses with same length as query input.

generic_query(query, model_role=None, temperature=0)

Ask a generic single query without conversation

Parameters:
  • query (str) – Question to ask ChatGPT

  • model_role (str | None) – Role for the model to take, e.g.: “You are a research assistant”. This defaults to self.MODEL_ROLE

  • temperature (float) – GPT model temperature, a measure of response entropy from 0 to 1. 0 is more reliable and nearly deterministic; 1 will give the model more creative freedom and may not return as factual of results.

Returns:

response (str) – Model response

classmethod get_embedding(text)

Get the 1D array (list) embedding of a text string.

Parameters:

text (str) – Text to embed

Returns:

embedding (list) – List of float that represents the numerical embedding of the text

combine(text_summary)[source]

Combine separate chunk summaries into one more comprehensive narrative

Parameters:

summary (str) – Summary of text. May be several disjointed paragraphs

Returns:

summary (str) – Summary of text. Paragraphs will be more cohesive.

run(temperature=0, fancy_combine=True)[source]

Use GPT to do a summary of input text.

Parameters:
  • temperature (float) – GPT model temperature, a measure of response entropy from 0 to 1. 0 is more reliable and nearly deterministic; 1 will give the model more creative freedom and may not return as factual of results.

  • fancy_combine (bool) – Flag to use the GPT model to combine the separate outputs into a cohesive summary.

Returns:

summary (str) – Summary of text.

async run_async(temperature=0, ignore_error=None, rate_limit=40000.0, fancy_combine=True)[source]

Run text summary asynchronously for all text chunks

NOTE: you need to call this using the await command in ipython or jupyter, e.g.: out = await Summary.run_async()

Parameters:
  • temperature (float) – GPT model temperature, a measure of response entropy from 0 to 1. 0 is more reliable and nearly deterministic; 1 will give the model more creative freedom and may not return as factual of results.

  • ignore_error (None | callable) – Optional callable to parse API error string. If the callable returns True, the error will be ignored, the API call will not be tried again, and the output will be an empty string.

  • rate_limit (float) – OpenAI API rate limit (tokens / minute). Note that the gpt-3.5-turbo limit is 90k as of 4/2023, but we’re using a large factor of safety (~1/2) because we can only count the tokens on the input side and assume the output is about the same count.

  • fancy_combine (bool) – Flag to use the GPT model to combine the separate outputs into a cohesive summary.

Returns:

summary (str) – Summary of text.