October 2025 • 12 min read • Infrastructure

Building a High-Performance Reverse Proxy: 20k req/sec with 65ms Latency

How we built a multi-tenant reverse proxy system that intercepts HTTP traffic, applies real-time content transformations, and serves modified responses with minimal latency overhead—outperforming competitors by 6x.

The Performance Challenge

When building a reverse proxy that sits between users and origin servers, every millisecond counts. The proxy intercepts every request, processes it, fetches from the origin, transforms the response, and delivers it back to the user. Any inefficiency is multiplied across millions of requests.

Our target was ambitious: add less than 100ms of latency while performing license validation, dynamic routing, content transformation, and caching—all at 20,000+ requests per second. Competitors in this space typically add 400ms or more.

Performance Benchmarks

Here's what we achieved compared to the competition:

Metric	Our System	Competitor Avg
Added Latency	65ms	400ms+
Throughput	20,000+ req/sec	Varies
Concurrent Connections	10,000	Varies
Cache Hit Rate	95%+	N/A

Technology Choice: OpenResty

We chose OpenResty (Nginx + LuaJIT) as our foundation. Why not Node.js, Go, or a traditional application server? The answer comes down to the event-driven architecture and the ability to execute Lua code at specific phases of the Nginx request lifecycle.

-- Performance configuration
worker_processes auto;
events {
    worker_connections 10000;
    use epoll;
}

-- Shared memory caching
lua_shared_dict license_cache 10m;
lua_shared_dict toolbar_cache 5m;
lua_shared_dict dns_cache 10m;
lua_shared_dict cache_metrics 10m;

With worker_connections 10000 and epoll for efficient I/O multiplexing, each worker process can handle thousands of concurrent connections without threading overhead.

Request Processing Pipeline

Every request flows through a carefully optimized pipeline:

License Lookup - Redis cache check, 2ms average
Origin Resolution - DNS lookup with caching
Upstream Fetch - Connection pooling to origin servers
Cache Lookup - S3/Redis translation cache, <50ms
Content Transformation - Real-time HTML modification
Response Delivery - Compressed response to client

Dynamic Multi-Tenant Routing

Unlike traditional reverse proxies with hardcoded upstreams, our system dynamically routes each request based on the requesting domain. This enables true multi-tenancy where thousands of customer domains all flow through the same proxy infrastructure.

-- Dynamic upstream selection
access_by_lua_block {
    local license_lookup = require "license_lookup"
    local origin_resolver = require "origin_resolver"

    -- Lookup license for this domain
    local license = license_lookup.lookup(ngx.var.host)

    if not license then
        return ngx.exit(403)
    end

    -- Resolve origin dynamically
    local origin = origin_resolver.get_origin_target(
        license,
        license.origin_protocol or "https"
    )

    ngx.var.upstream_target = origin.target
    ngx.var.upstream_host = ngx.var.host
}

Connection Pooling for Performance

One of the biggest performance wins came from aggressive connection pooling. Instead of establishing a new TCP connection for each upstream request, we maintain pools of keepalive connections:

local res = httpc:request_uri(upstream_url, {
    keepalive_timeout = 60000,  -- 60 second keepalive
    keepalive_pool = 50,        -- Pool of 50 connections
    ssl_verify = false,
    ssl_server_name = ngx.var.upstream_host  -- SNI
})

This eliminates the TCP handshake and TLS negotiation overhead for the vast majority of requests.

Auto-SSL Certificate Management

Managing SSL certificates for thousands of customer domains manually would be impossible. We implemented automatic certificate provisioning using Let's Encrypt:

auto_ssl:set("allow_domain", function(domain)
    -- Only issue certificates for domains with valid licenses
    local license = license_lookup.lookup(domain)
    return license and license.status == "active"
end)

The system validates that each domain has an active license before issuing a certificate, stores certificates in Redis for fast lookup with S3 for persistence, and handles automatic renewal in the background.

SSL Passthrough for End-to-End Encryption

A recent enhancement added SSL passthrough capability, ensuring true end-to-end encryption between the proxy and customer origin servers. Rather than terminating SSL at the proxy and making unencrypted requests to origins, the system now:

Maintains TLS connections to origin servers using SNI (Server Name Indication)
Verifies origin certificates against trusted CAs (configurable per license)
Supports mutual TLS (mTLS) for origins requiring client certificate authentication
Preserves the full chain of trust from end user through proxy to origin

This eliminates the security gap where traffic between the proxy and origin could potentially be intercepted, meeting enterprise security requirements for sensitive data transmission.

Kubernetes Deployment

The proxy runs on Kubernetes with horizontal pod autoscaling based on CPU utilization:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
spec:
  minReplicas: 4
  maxReplicas: 16
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

Our node pool configuration includes 2-4 nodes with 4vCPU and 8GB RAM each, dedicated node pools with taints for workload isolation, and pod anti-affinity rules for high availability across nodes.

Monitoring & Observability

Every request includes custom timing headers for debugging:

-- Add timing headers to every response
local total_time_ms = (ngx.now() - request_start_time) * 1000
ngx.header["X-Origin-Time"] = string.format("%.2fms", origin_time_ms)
ngx.header["X-Total-Time"] = string.format("%.2fms", total_time_ms)

Combined with Prometheus metrics, Grafana dashboards, and Loki log aggregation, we have complete visibility into system performance.

Key Takeaways

Building a high-performance reverse proxy taught us several important lessons:

Choose the right tool: OpenResty's event-driven architecture and LuaJIT performance were essential for meeting our latency targets
Cache aggressively: Shared memory dictionaries, Redis, and S3 caching layers work together to minimize redundant work
Pool connections: Connection reuse eliminates the biggest source of latency in proxy systems
Measure everything: Per-request timing headers and comprehensive metrics enable continuous optimization
Design for multi-tenancy: Dynamic routing based on request context enables massive scale

Key Takeaway

The difference between 65ms and 400ms latency might seem small, but at 20,000 requests per second, it's the difference between a snappy user experience and a sluggish one.

Alex McGlothlin

Senior Software Engineer specializing in Laravel, system architecture, and high-traffic infrastructure. 18+ years of experience building scalable solutions.

All Articles Next Article