This project implements a PDF document question-answering system using LangChain, OpenAI, and FAISS vector store. It allows you to load a PDF document, split it into chunks, create embeddings, and perform question-answering tasks using the document's content.
- PDF document loading and processing
- Text splitting with overlap for better context preservation
- Vector embeddings using OpenAI
- In-memory and persistent vector storage using FAISS
- Question-answering capabilities using LangChain and OpenAI
- Python 3.8+
- OpenAI API key
- Clone the repository
- Install dependencies using Pipenv:
pipenv install- Create a
.envfile in the project root and add your OpenAI API key:
OPENAI_API_KEY=your_api_key_here
main.py- Main application file containing the PDF processing and QA logicfaiss_index_react/- Directory containing the saved FAISS vector storereact-paper.pdf- Sample PDF document for testing.env- Environment variables file (not tracked in git)PipfileandPipfile.lock- Python dependency management files
- Activate the virtual environment:
pipenv shell- Run the main script:
python main.pyThe script will:
- Load the PDF document
- Split it into manageable chunks
- Create embeddings using OpenAI
- Store the vectors in FAISS
- Set up a question-answering chain using LangChain
- LangChain - Framework for developing applications powered by language models
- OpenAI - For embeddings and language model
- FAISS - Efficient similarity search and clustering of dense vectors
- PyPDF - PDF document processing
This project is open source and available under the MIT License.