elm.wizard.EnergyWizardPostgres

class EnergyWizardPostgres(db_host, db_port, db_name, db_schema, db_table, probes=25, meta_columns=None, cursor=None, boto_client=None, model=None, token_budget=3500, tag=False)[source]

Bases: EnergyWizardBase

Interface to ask OpenAI LLMs about energy research.

This class is for execution with a postgres vector database. Querying the database requires the use of the psycopg2 and boto3 python packages, environment variables (‘EWIZ_DB_USER’ and ‘EWIZ_DB_PASSWORD’) storing the db user and password, and the specification of other connection paremeters such as host, port, and name. The database has the following columns: id, embedding, chunks, and metadata.

This class is designed as follows: Vector database: PostgreSQL database accessed using psycopg2. Query Embedding: AWS using boto3 LLM Application: GPT4 via Azure deployment

Parameters:
  • db_host (str) – Host url for postgres database.

  • db_port (str) – Port for postres database. ex: ‘5432’

  • db_name (str) – Postgres database name.

  • db_schema (str) – Schema name for postres database.

  • db_table (str) – Table to query in Postgres database. Necessary columns: id, chunks, embedding, title, and url.

  • probes (int) – Number of lists to search in vector database. Recommended value is sqrt(n_lists).

  • meta_columns (list) – List of metadata columns to retrieve from database. Default query returns title, url, nrel_id, and id. nrel_id and id are necessary to correctly format references.

  • cursor (psycopg2.extensions.cursor) – PostgreSQL database cursor used to execute queries.

  • boto_client (botocore.client.BedrockRuntime) – AWS boto3 client used to access embedding model.

  • model (str) – GPT model name, default is the DEFAULT_MODEL global var

  • token_budget (int) – Number of tokens that can be embedded in the prompt. Note that the default budget for GPT-3.5-Turbo is 4096, but you want to subtract some tokens to account for the response budget.

  • tag (bool) – Flag to add tag/metadata to text chunks before sending query to GPT.

Methods

call_api(url, headers, request_json)

Make an asyncronous OpenAI API call.

call_api_async(url, headers, all_request_jsons)

Use GPT to clean raw pdf text in parallel calls to the OpenAI API.

chat(query[, debug, stream, temperature, ...])

Answers a query by doing a semantic search of relevant text with embeddings and then sending engineered query to the LLM.

clear()

Clear chat history and reduce messages to just the initial model role message.

count_tokens(text, model)

Return the number of tokens in a string.

engineer_query(query[, token_budget, ...])

Engineer a query for GPT using the corpus of information

generic_async_query(queries[, model_role, ...])

Run a number of generic single queries asynchronously (not conversational)

generic_query(query[, model_role, temperature])

Ask a generic single query without conversation

get_embedding(text)

Get the 1D array (list) embedding of a text string as generated by specified AWS model.

make_ref_list(ids)

Make a reference list.

query_vector_db(query[, limit])

Returns a list of strings and relatednesses, sorted from most related to least.

Attributes

DEFAULT_META_COLS

Default columns to retrieve for metadata

DEFAULT_MODEL

Default model to do pdf text cleaning.

EMBEDDING_MODEL

Default model to do text embeddings.

EMBEDDING_URL

OpenAI embedding API URL

HEADERS

OpenAI API Headers

MODEL_INSTRUCTION

Prefix to the engineered prompt

MODEL_ROLE

High level model role, somewhat redundant to MODEL_INSTRUCTION

TOKENIZER_ALIASES

Optional mappings for weird azure names to tiktoken/openai names.

URL

OpenAI API URL to be used with environment variable OPENAI_API_KEY.

all_messages_txt

Get a string printout of the full conversation with the LLM

EMBEDDING_MODEL = 'amazon.titan-embed-text-v1'

Default model to do text embeddings.

TOKENIZER_ALIASES = {'ewiz-gpt-4': 'gpt-4', 'gpt-35-turbo': 'gpt-3.5-turbo', 'gpt-4-32k': 'gpt-4-32k-0314', 'llmev-gpt-4-32k': 'gpt-4-32k-0314'}

Optional mappings for weird azure names to tiktoken/openai names.

DEFAULT_META_COLS = ['title', 'url', 'nrel_id', 'id']

Default columns to retrieve for metadata

DEFAULT_MODEL = 'gpt-3.5-turbo'

Default model to do pdf text cleaning.

EMBEDDING_URL = 'https://api.openai.com/v1/embeddings'

OpenAI embedding API URL

HEADERS = {'Authorization': 'Bearer None', 'Content-Type': 'application/json', 'api-key': 'None'}

OpenAI API Headers

MODEL_INSTRUCTION = 'Use the information below to answer the subsequent question. If the answer cannot be found in the text, write "I could not find an answer."'

Prefix to the engineered prompt

MODEL_ROLE = 'You parse through articles to answer questions.'

High level model role, somewhat redundant to MODEL_INSTRUCTION

URL = 'https://api.openai.com/v1/chat/completions'

OpenAI API URL to be used with environment variable OPENAI_API_KEY. Use an Azure API endpoint to trigger Azure usage along with environment variables AZURE_OPENAI_KEY, AZURE_OPENAI_VERSION, and AZURE_OPENAI_ENDPOINT

property all_messages_txt

Get a string printout of the full conversation with the LLM

Returns:

str

async static call_api(url, headers, request_json)

Make an asyncronous OpenAI API call.

Parameters:
  • url (str) –

    OpenAI API url, typically either:

    https://api.openai.com/v1/embeddings https://api.openai.com/v1/chat/completions

  • headers (dict) –

    OpenAI API headers, typically:
    {“Content-Type”: “application/json”,

    “Authorization”: f”Bearer {openai.api_key}”}

  • request_json (dict) –

    API data input, typically looks like this for chat completion:
    {“model”: “gpt-3.5-turbo”,
    “messages”: [{“role”: “system”, “content”: “You do this…”},

    {“role”: “user”, “content”: “Do this: {}”}],

    “temperature”: 0.0}

Returns:

out (dict) – API response in json format

async call_api_async(url, headers, all_request_jsons, ignore_error=None, rate_limit=40000.0)

Use GPT to clean raw pdf text in parallel calls to the OpenAI API.

NOTE: you need to call this using the await command in ipython or jupyter, e.g.: out = await PDFtoTXT.clean_txt_async()

Parameters:
  • url (str) –

    OpenAI API url, typically either:

    https://api.openai.com/v1/embeddings https://api.openai.com/v1/chat/completions

  • headers (dict) –

    OpenAI API headers, typically:
    {“Content-Type”: “application/json”,

    “Authorization”: f”Bearer {openai.api_key}”}

  • all_request_jsons (list) – List of API data input, one entry typically looks like this for chat completion:

    {“model”: “gpt-3.5-turbo”,
    “messages”: [{“role”: “system”, “content”: “You do this…”},

    {“role”: “user”, “content”: “Do this: {}”}],

    “temperature”: 0.0}

  • ignore_error (None | callable) – Optional callable to parse API error string. If the callable returns True, the error will be ignored, the API call will not be tried again, and the output will be an empty string.

  • rate_limit (float) – OpenAI API rate limit (tokens / minute). Note that the gpt-3.5-turbo limit is 90k as of 4/2023, but we’re using a large factor of safety (~1/2) because we can only count the tokens on the input side and assume the output is about the same count.

Returns:

out (list) – List of API outputs where each list entry is a GPT answer from the corresponding message in the all_request_jsons input.

chat(query, debug=True, stream=True, temperature=0, convo=False, token_budget=None, new_info_threshold=0.7, print_references=False, return_chat_obj=False)

Answers a query by doing a semantic search of relevant text with embeddings and then sending engineered query to the LLM.

Parameters:
  • query (str) – Question being asked of EnergyWizard

  • debug (bool) – Flag to return extra diagnostics on the engineered question.

  • stream (bool) – Flag to print subsequent chunks of the response in a streaming fashion

  • temperature (float) – GPT model temperature, a measure of response entropy from 0 to 1. 0 is more reliable and nearly deterministic; 1 will give the model more creative freedom and may not return as factual of results.

  • convo (bool) – Flag to perform semantic search with full conversation history (True) or just the single query (False). Call EnergyWizard.clear() to reset the chat history.

  • token_budget (int) – Option to override the class init token budget.

  • new_info_threshold (float) – New text added to the engineered query must contain at least this much new information. This helps prevent (for example) the table of contents being added multiple times.

  • print_references (bool) – Flag to print references if EnergyWizard is initialized with a valid ref_col.

  • return_chat_obj (bool) – Flag to only return the ChatCompletion from OpenAI API.

Returns:

  • response (str) – GPT output / answer.

  • query (str) – If debug is True, the engineered query asked of GPT will also be returned here

  • references (list) – If debug is True, the list of references (strs) used in the engineered prompt is returned here

clear()

Clear chat history and reduce messages to just the initial model role message.

classmethod count_tokens(text, model)

Return the number of tokens in a string.

Parameters:
  • text (str) – Text string to get number of tokens for

  • model (str) – specification of OpenAI model to use (e.g., “gpt-3.5-turbo”)

Returns:

n (int) – Number of tokens in text

engineer_query(query, token_budget=None, new_info_threshold=0.7, convo=False)

Engineer a query for GPT using the corpus of information

Parameters:
  • query (str) – Question being asked of GPT

  • token_budget (int) – Option to override the class init token budget.

  • new_info_threshold (float) – New text added to the engineered query must contain at least this much new information. This helps prevent (for example) the table of contents being added multiple times.

  • convo (bool) – Flag to perform semantic search with full conversation history (True) or just the single query (False). Call EnergyWizard.clear() to reset the chat history.

Returns:

  • message (str) – Engineered question to GPT including information from corpus and the original query

  • references (list) – The list of references (strs) used in the engineered prompt is returned here

async generic_async_query(queries, model_role=None, temperature=0, ignore_error=None, rate_limit=40000.0)

Run a number of generic single queries asynchronously (not conversational)

NOTE: you need to call this using the await command in ipython or jupyter, e.g.: out = await Summary.run_async()

Parameters:
  • query (list) – Questions to ask ChatGPT (list of strings)

  • model_role (str | None) – Role for the model to take, e.g.: “You are a research assistant”. This defaults to self.MODEL_ROLE

  • temperature (float) – GPT model temperature, a measure of response entropy from 0 to 1. 0 is more reliable and nearly deterministic; 1 will give the model more creative freedom and may not return as factual of results.

  • ignore_error (None | callable) – Optional callable to parse API error string. If the callable returns True, the error will be ignored, the API call will not be tried again, and the output will be an empty string.

  • rate_limit (float) – OpenAI API rate limit (tokens / minute). Note that the gpt-3.5-turbo limit is 90k as of 4/2023, but we’re using a large factor of safety (~1/2) because we can only count the tokens on the input side and assume the output is about the same count.

Returns:

response (list) – Model responses with same length as query input.

generic_query(query, model_role=None, temperature=0)

Ask a generic single query without conversation

Parameters:
  • query (str) – Question to ask ChatGPT

  • model_role (str | None) – Role for the model to take, e.g.: “You are a research assistant”. This defaults to self.MODEL_ROLE

  • temperature (float) – GPT model temperature, a measure of response entropy from 0 to 1. 0 is more reliable and nearly deterministic; 1 will give the model more creative freedom and may not return as factual of results.

Returns:

response (str) – Model response

get_embedding(text)[source]

Get the 1D array (list) embedding of a text string as generated by specified AWS model.

Parameters:

text (str) – Text to embed

Returns:

embedding (list) – List of float that represents the numerical embedding of the text

query_vector_db(query, limit=100)[source]

Returns a list of strings and relatednesses, sorted from most related to least. SQL query uses a context handler and rollback to ensure a failed query does not interupt future questions from the user. Ex: a user submitting a new question before the first one completes will close the cursor preventing future database access.

Parameters:
  • query (str) – Question being asked of GPT

  • limit (int) – Number of top results to return.

Returns:

  • strings (np.ndarray) – 1D array of related strings

  • score (np.ndarray) – 1D array of float scores of strings

  • ids (np.ndarray) – 1D array of IDs in the text corpus corresponding to the ranked strings/scores outputs.

make_ref_list(ids)[source]

Make a reference list. SQL query uses a context handler and rollback to ensure a failed query does not interupt future questions from the user. Ex: a user submitting a new question before the first one completes will close the cursor preventing future database access.

Parameters:

ids (np.ndarray) – IDs of the used text from the text corpus

Returns:

ref_list (list) – A list of references (strs) used. Ideally, this is something like: [“{ref_title} ({ref_url})”]