Retrieval-Augmented Generation (RAG) System

Complete workflow from document ingestion to response generation

1. Document Processing Pipeline

Input Documents

PDF
TXT

Document Loading

TextFileLoader and PDFFileLoader handle different formats

Text Splitting

CharacterTextSplitter
Chunk size: 1000 chars
Overlap: 200 chars

Embedding Generation

text-embedding-3-small
1536-dimensional vectors

Vector Database

Dictionary of numpy arrays
Async processing for performance

2. Query Processing Pipeline

User Query

"What is RAG?"

Query Embedding

Same embedding model as documents

Similarity Search

Cosine similarity as distance metric

Retrieval

Top k=4 most relevant chunks

3. Response Generation Pipeline

Prompt Construction

System: "Use the provided context..."

User: Query + Retrieved Context

LLM Processing

gpt-4o-mini
Zero-shot in-context learning

Response Generation

"RAG combines retrieval with generation..."

Final Answer

Factual response grounded in retrieved context

PDF Processing Enhancement

File Type Detection

DocumentLoader routes PDFs to PDFFileLoader based on file extension

Page-by-Page Processing

PDFs are processed one page at a time with page markers for reference

Content Combination

Processed pages are combined into a single document before splitting

RAG System Implementation - Combining retrieval with generation for factual responses

Made with DeepSite LogoDeepSite - 🧬 Remix