Skip to content

This project implements a PDF document question-answering system using LangChain, OpenAI, and FAISS vector store. It allows you to load a PDF document, split it into chunks, create embeddings, and perform question-answering tasks using the document's content.

Notifications You must be signed in to change notification settings

samuelcastro/pdf-rag-faiss

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PDF Question Answering with Vector Store

This project implements a PDF document question-answering system using LangChain, OpenAI, and FAISS vector store. It allows you to load a PDF document, split it into chunks, create embeddings, and perform question-answering tasks using the document's content.

Features

  • PDF document loading and processing
  • Text splitting with overlap for better context preservation
  • Vector embeddings using OpenAI
  • In-memory and persistent vector storage using FAISS
  • Question-answering capabilities using LangChain and OpenAI

Prerequisites

  • Python 3.8+
  • OpenAI API key

Installation

  1. Clone the repository
  2. Install dependencies using Pipenv:
pipenv install
  1. Create a .env file in the project root and add your OpenAI API key:
OPENAI_API_KEY=your_api_key_here

Project Structure

  • main.py - Main application file containing the PDF processing and QA logic
  • faiss_index_react/ - Directory containing the saved FAISS vector store
  • react-paper.pdf - Sample PDF document for testing
  • .env - Environment variables file (not tracked in git)
  • Pipfile and Pipfile.lock - Python dependency management files

Usage

  1. Activate the virtual environment:
pipenv shell
  1. Run the main script:
python main.py

The script will:

  1. Load the PDF document
  2. Split it into manageable chunks
  3. Create embeddings using OpenAI
  4. Store the vectors in FAISS
  5. Set up a question-answering chain using LangChain

Technologies Used

  • LangChain - Framework for developing applications powered by language models
  • OpenAI - For embeddings and language model
  • FAISS - Efficient similarity search and clustering of dense vectors
  • PyPDF - PDF document processing

License

This project is open source and available under the MIT License.

About

This project implements a PDF document question-answering system using LangChain, OpenAI, and FAISS vector store. It allows you to load a PDF document, split it into chunks, create embeddings, and perform question-answering tasks using the document's content.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages