Retrieval-Augmented Generation (RAG) System

Complete workflow from document ingestion to response generation

1. Document Processing Pipeline

PDF

TXT

TextFileLoader and PDFFileLoader handle different formats

CharacterTextSplitter
Chunk size: 1000 chars
Overlap: 200 chars

text-embedding-3-small
1536-dimensional vectors

Dictionary of numpy arrays
Async processing for performance

"What is RAG?"

Same embedding model as documents

Cosine similarity as distance metric

Top k=4 most relevant chunks

System: "Use the provided context..."

User: Query + Retrieved Context

gpt-4o-mini
Zero-shot in-context learning

"RAG combines retrieval with generation..."

Factual response grounded in retrieved context

DocumentLoader routes PDFs to PDFFileLoader based on file extension

PDFs are processed one page at a time with page markers for reference

Processed pages are combined into a single document before splitting

RAG System Implementation - Combining retrieval with generation for factual responses