The Energy Wizard
This example demonstrates how to download a set of PDFs, convert to text, chunk, embed, and then run a streamlit app that interfaces an LLM with the text corpus.
Notes:
In this example, we use the optional popper PDF utility which you will have to install separately. You can also use the python-native
PyPDF2
package when calling usingelm.pdf.PDFtoTXT
but we have found that poppler works better.Streamlit is required to run this app, which is not an explicit requirement of this repo (
pip install streamlit
)You need to set up your own OpenAI or Azure-OpenAI API keys to run the scripts.
Downloading and Embedding PDFs
Run python ./retrieve_docs.py
to retrieve 20 of the latest NREL technical
reports from OSTI. The script then converts the PDFs to text and then runs the
text through the OpenAI embedding model.
Running the Streamlit App
Run streamlit run ./run_app.py
to start the streamlit app. You can now chat
with the Energy Wizard, which will interface with the downloaded text corpus to
answer your questions.