The RAG Architecture Challenge
Retrieval-Augmented Generation (RAG) systems need to find relevant documents before generating answers. The typical approach involves converting documents into vector embeddings and performing similarity searches to find the most relevant content for a given query.
Many teams immediately reach for dedicated vector databases: Pinecone, Weaviate, Milvus, Chroma. These are excellent tools, but they add operational complexity. You now have two databases to manage, sync, and secure. For many applications, there's a simpler path.
Why PostgreSQL with pgvector
PostgreSQL 16 with the pgvector extension provides native vector data types and similarity search operations. For our document management platform, this was the winning choice for several reasons:
1. Native Vector Support
The pgvector extension adds a vector data type and operators for cosine similarity, L2 distance, and inner product searches—directly in SQL:
-- Create table with vector column
CREATE TABLE document_versions (
id SERIAL PRIMARY KEY,
title TEXT,
content TEXT,
embedding_vector vector(384) -- 384-dimensional embeddings
);
-- Create index for fast similarity search
CREATE INDEX ON document_versions
USING ivfflat (embedding_vector vector_cosine_ops)
WITH (lists = 100); 2. Laravel Ecosystem Compatibility
PostgreSQL integrates seamlessly with Laravel's Eloquent ORM. We can combine vector searches with standard relational queries—filtering by user permissions, date ranges, or document types while searching semantically:
-- Semantic search with business logic filters
SELECT id, title,
1 - (embedding_vector <=> $query_vector) AS similarity_score
FROM document_versions
WHERE case_id = $user_case_id
AND embedding_vector IS NOT NULL
AND created_at > NOW() - INTERVAL '1 year'
ORDER BY embedding_vector <=> $query_vector
LIMIT 20; 3. Developer Familiarity
SQL syntax remains accessible to developers experienced with relational databases. There's no new query language to learn. The team can debug, optimize, and extend vector queries using familiar tools like EXPLAIN ANALYZE.
4. Single Source of Truth
Documents, metadata, access controls, and embeddings all live in the same database. No synchronization issues. No eventual consistency problems. When a document is deleted, its embedding is deleted in the same transaction.
The RAG Implementation
Our RAG system processes natural language questions through four stages:
- Query Processing: Natural language questions are expanded with domain context
- Vector Retrieval: Semantic search retrieves top 50 candidate documents
- AI Re-ranking: Gemini Flash evaluates candidates for relevance
- Answer Generation: Contextual answers with citations and confidence scores
Generating Embeddings
We use Sentence Transformers (all-MiniLM-L6-v2) running as a Python microservice. Documents are chunked and embedded when created or updated:
# Python embedding service
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('all-MiniLM-L6-v2')
def embed_document(text: str) -> list[float]:
# Returns 384-dimensional vector
return model.encode(text).tolist() Performance Targets
We set aggressive targets and met them:
- Vector search: < 50ms (p95)
- Cache hit rate: > 80%
- End-to-end query: < 2 seconds
Kubernetes Deployment Considerations
The platform runs on Kubernetes with PostgreSQL in a StatefulSet. This choice matters for databases:
Why StatefulSets for Databases
StatefulSets are used instead of Deployments for PostgreSQL because databases require:
- Stable Network Identity: Each pod gets a predictable hostname (e.g.,
postgres-0) that persists across restarts, enabling reliable connection strings - Persistent Storage Binding: StatefulSets ensure each pod is always reconnected to the same PersistentVolumeClaim, preventing data loss when pods are rescheduled
- Ordered Deployment/Scaling: Pods are created and terminated in order (0, 1, 2...), critical for database replication where primary must exist before replicas
- Graceful Shutdown: StatefulSets wait for pods to fully terminate, allowing databases to flush writes and close connections cleanly
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: postgres
spec:
serviceName: postgres
replicas: 1
template:
spec:
containers:
- name: postgres
image: pgvector/pgvector:pg16
resources:
requests:
cpu: 1500m
memory: 2Gi
limits:
cpu: 3000m
memory: 4Gi
volumeClaimTemplates:
- metadata:
name: postgres-data
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 100Gi Microservices Architecture
The embedding generation runs as a separate Python service, keeping the main Laravel application focused on business logic:
| Service | Purpose | Scaling |
|---|---|---|
| API | Laravel backend | HPA (2+ replicas) |
| Horizon | Queue workers | CPU-based autoscaling |
| Embeddings | Vector generation | Request-based |
| PaddleOCR | Document OCR | GPU-accelerated |
| Whisper | Audio transcription | GPU-accelerated |
When to Use a Dedicated Vector Database
PostgreSQL with pgvector isn't always the right choice. Consider dedicated vector databases when:
- You have billions of vectors (pgvector works well up to tens of millions)
- You need distributed vector search across multiple regions
- Your embedding dimensions exceed 2000 (pgvector supports up to 16,000, but performance degrades)
- You're already managing a vector database for other purposes
For our document management platform with millions of documents and 384-dimensional embeddings, PostgreSQL handles everything with room to spare.
Key Takeaways
- Simplicity wins: One database is easier to manage than two
- SQL is powerful: Combining vector search with relational queries in a single statement is incredibly useful
- Laravel works great: No special ORM or driver needed—just raw queries for vector operations
- StatefulSets matter: Databases need stable identities and persistent storage
- Start simple: You can always migrate to a dedicated vector database if you outgrow PostgreSQL
The best architecture is the one that solves your problem with the least operational complexity. For RAG systems at moderate scale, that's often PostgreSQL with pgvector.