Update 7/31 - Two new samples added for OpenAI
- Forward-Looking Active REtrieval augmented generation (FLARE)
-
- Sample: /SearchWithOpenAI/pages/150_FLARE_AOI.py
- Self-Ask prompting (building on Chain of Thought COT)
-
- Sample: /SearchWithOpenAI/pages/160_Self_Ask_AOI.py
Learn more about Direct Ask, Chain of Thought, Self Ask, and Forward Looking Active REtrieval aumented generation (FLARE) - Medium article - 7 min easy read
This project will help you learn the basics to -
- Index multiple documents in a repository using HuggingFace embeddings.
- Save the indexes in Chroma & FAISS for recall.
- Cleanup / delete the indexes to start fresh.
- Query the FAISS and Chroma index / db.
- Search using embeddings (without having to sign up for any OpenAI) for similarity text snippets.
- Search with either OpenAI or Azure OpenAI. You will use the "Which API would you like to use?" page to choose the API to use.
- Bonus: Get details on cost of the call (AI tokens and cost) and also get similar information document search on the store.
Link to the video recording - https://youtu.be/VaShd-0UoGg
- Clean up indexes, add your documents to store (options: FAISS / Chroma / both)
- Choose your OpenAI (options: OpenAI / Azure OpenAI) See how
- Q&A from the set of uploaded documents (options: FAISS / Chroma) See how
- Your own chatgpt (options: prompt engineer with your customized prompts) See how
- Search a specific website / whole web and feed to OpenAI for extracting informaiton (options: OpenAI / Azure OpenAI :: sample uses Bing Search) See how
You can choose between OpenAI and Azure OpenAI for running the samples. Remember too update .env with keys and values. Back to all features
Now go ahead, and ask you questions about the two "State of the Union" speech pdfs and txt you had indexed (some ideas - when were the speeches made? What were the dates that the speeches were made? Summarize both speeches in 100 words or less. What did the president say about affordable care act?) Back to all features
NEW Sample - Your own ChatGPT will be as below Back to all features
New Sample - search your own website of the whole web. Feed the results to ChatGPT for summarizing or answering questions. Back to all features
To use the script, you will need to follow these steps:
- Clone the repository via
git clone https://github.com/ushakrishnan/SearchWithOpenAI.git
andcd SearchWithOpenAI
into the cloned repository. - Install the required packages:
pip install -r requirements.txt
- Run
playwright install
to install playwright - Copy the .env.template file to .env:
cp .env.template .env
. This is where you will set the following variables. - Set your OpenAI API key in the OPENAI_API_KEY and Azure OpenAI details in the env file.
- Run the script:
streamlit run Home.py
- Streamlit - https://github.com/streamlit/streamlit
- LangChain - https://github.com/hwchase17/langchain
- OpenAI - https://github.com/openai/openai-python
- Chroma - https://github.com/chroma-core/chroma
- TikToken - https://github.com/openai/tiktoken
- pypdf - https://github.com/py-pdf/pypdf
- Unstructured - https://github.com/Unstructured-IO/unstructured
- Cython - https://github.com/cython/cython
- Faiss - https://github.com/facebookresearch/faiss
- Streamlit-Pages - https://github.com/blackary/st_pages
- Requests - https://pypi.org/project/requests/
Have fun creating making your own searchable library of PDFs.
This repository is for educational purposes only. However, if you find an error in the code, feel free to open an issue or submit a pull request.
If you encounter any issues or have suggestions for improvements, please open an issue in the GitHub repository. We appreciate your feedback and will do our best to address any problems.
This project is licensed under the MIT License. Feel free to modify and distribute the code as per the terms of the license.