Back to Work

Why PostgreSQL with pgvector for RAG Systems

Building semantic search and AI-powered document retrieval without a separate vector database. How we chose PostgreSQL with pgvector over dedicated solutions like Pinecone or Weaviate—and why it was the right call.

The RAG Architecture Challenge

Retrieval-Augmented Generation (RAG) systems need to find relevant documents before generating answers. The typical approach involves converting documents into vector embeddings and performing similarity searches to find the most relevant content for a given query.

Many teams immediately reach for dedicated vector databases: Pinecone, Weaviate, Milvus, Chroma. These are excellent tools, but they add operational complexity. You now have two databases to manage, sync, and secure. For many applications, there's a simpler path.

Why PostgreSQL with pgvector

PostgreSQL 16 with the pgvector extension provides native vector data types and similarity search operations. For our document management platform, this was the winning choice for several reasons:

1. Native Vector Support

The pgvector extension adds a vector data type and operators for cosine similarity, L2 distance, and inner product searches—directly in SQL:

-- Create table with vector column
CREATE TABLE document_versions (
    id SERIAL PRIMARY KEY,
    title TEXT,
    content TEXT,
    embedding_vector vector(384)  -- 384-dimensional embeddings
);

-- Create index for fast similarity search
CREATE INDEX ON document_versions
USING ivfflat (embedding_vector vector_cosine_ops)
WITH (lists = 100);

2. Laravel Ecosystem Compatibility

PostgreSQL integrates seamlessly with Laravel's Eloquent ORM. We can combine vector searches with standard relational queries—filtering by user permissions, date ranges, or document types while searching semantically:

-- Semantic search with business logic filters
SELECT id, title,
       1 - (embedding_vector <=> $query_vector) AS similarity_score
FROM document_versions
WHERE case_id = $user_case_id
  AND embedding_vector IS NOT NULL
  AND created_at > NOW() - INTERVAL '1 year'
ORDER BY embedding_vector <=> $query_vector
LIMIT 20;

3. Developer Familiarity

SQL syntax remains accessible to developers experienced with relational databases. There's no new query language to learn. The team can debug, optimize, and extend vector queries using familiar tools like EXPLAIN ANALYZE.

4. Single Source of Truth

Documents, metadata, access controls, and embeddings all live in the same database. No synchronization issues. No eventual consistency problems. When a document is deleted, its embedding is deleted in the same transaction.

The RAG Implementation

Our RAG system processes natural language questions through four stages:

  1. Query Processing: Natural language questions are expanded with domain context
  2. Vector Retrieval: Semantic search retrieves top 50 candidate documents
  3. AI Re-ranking: Gemini Flash evaluates candidates for relevance
  4. Answer Generation: Contextual answers with citations and confidence scores

Generating Embeddings

We use Sentence Transformers (all-MiniLM-L6-v2) running as a Python microservice. Documents are chunked and embedded when created or updated:

# Python embedding service
from sentence_transformers import SentenceTransformer

model = SentenceTransformer('all-MiniLM-L6-v2')

def embed_document(text: str) -> list[float]:
    # Returns 384-dimensional vector
    return model.encode(text).tolist()

Performance Targets

We set aggressive targets and met them:

  • Vector search: < 50ms (p95)
  • Cache hit rate: > 80%
  • End-to-end query: < 2 seconds

Kubernetes Deployment Considerations

The platform runs on Kubernetes with PostgreSQL in a StatefulSet. This choice matters for databases:

Why StatefulSets for Databases

StatefulSets are used instead of Deployments for PostgreSQL because databases require:

  1. Stable Network Identity: Each pod gets a predictable hostname (e.g., postgres-0) that persists across restarts, enabling reliable connection strings
  2. Persistent Storage Binding: StatefulSets ensure each pod is always reconnected to the same PersistentVolumeClaim, preventing data loss when pods are rescheduled
  3. Ordered Deployment/Scaling: Pods are created and terminated in order (0, 1, 2...), critical for database replication where primary must exist before replicas
  4. Graceful Shutdown: StatefulSets wait for pods to fully terminate, allowing databases to flush writes and close connections cleanly
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: postgres
spec:
  serviceName: postgres
  replicas: 1
  template:
    spec:
      containers:
      - name: postgres
        image: pgvector/pgvector:pg16
        resources:
          requests:
            cpu: 1500m
            memory: 2Gi
          limits:
            cpu: 3000m
            memory: 4Gi
  volumeClaimTemplates:
  - metadata:
      name: postgres-data
    spec:
      accessModes: ["ReadWriteOnce"]
      resources:
        requests:
          storage: 100Gi

Microservices Architecture

The embedding generation runs as a separate Python service, keeping the main Laravel application focused on business logic:

Service Purpose Scaling
API Laravel backend HPA (2+ replicas)
Horizon Queue workers CPU-based autoscaling
Embeddings Vector generation Request-based
PaddleOCR Document OCR GPU-accelerated
Whisper Audio transcription GPU-accelerated

When to Use a Dedicated Vector Database

PostgreSQL with pgvector isn't always the right choice. Consider dedicated vector databases when:

  • You have billions of vectors (pgvector works well up to tens of millions)
  • You need distributed vector search across multiple regions
  • Your embedding dimensions exceed 2000 (pgvector supports up to 16,000, but performance degrades)
  • You're already managing a vector database for other purposes

For our document management platform with millions of documents and 384-dimensional embeddings, PostgreSQL handles everything with room to spare.

Key Takeaways

  • Simplicity wins: One database is easier to manage than two
  • SQL is powerful: Combining vector search with relational queries in a single statement is incredibly useful
  • Laravel works great: No special ORM or driver needed—just raw queries for vector operations
  • StatefulSets matter: Databases need stable identities and persistent storage
  • Start simple: You can always migrate to a dedicated vector database if you outgrow PostgreSQL
Key Takeaway

The best architecture is the one that solves your problem with the least operational complexity. For RAG systems at moderate scale, that's often PostgreSQL with pgvector.

Alex McGlothlin

Senior Software Engineer specializing in Laravel, system architecture, and high-traffic infrastructure. 18+ years of experience building scalable solutions.