The Performance Challenge
When building a reverse proxy that sits between users and origin servers, every millisecond counts. The proxy intercepts every request, processes it, fetches from the origin, transforms the response, and delivers it back to the user. Any inefficiency is multiplied across millions of requests.
Our target was ambitious: add less than 100ms of latency while performing license validation, dynamic routing, content transformation, and caching—all at 20,000+ requests per second. Competitors in this space typically add 400ms or more.
Performance Benchmarks
Here's what we achieved compared to the competition:
| Metric | Our System | Competitor Avg |
|---|---|---|
| Added Latency | 65ms | 400ms+ |
| Throughput | 20,000+ req/sec | Varies |
| Concurrent Connections | 10,000 | Varies |
| Cache Hit Rate | 95%+ | N/A |
Technology Choice: OpenResty
We chose OpenResty (Nginx + LuaJIT) as our foundation. Why not Node.js, Go, or a traditional application server? The answer comes down to the event-driven architecture and the ability to execute Lua code at specific phases of the Nginx request lifecycle.
-- Performance configuration
worker_processes auto;
events {
worker_connections 10000;
use epoll;
}
-- Shared memory caching
lua_shared_dict license_cache 10m;
lua_shared_dict toolbar_cache 5m;
lua_shared_dict dns_cache 10m;
lua_shared_dict cache_metrics 10m;
With worker_connections 10000 and epoll for efficient I/O multiplexing, each worker process can handle thousands of concurrent connections without threading overhead.
Request Processing Pipeline
Every request flows through a carefully optimized pipeline:
- License Lookup - Redis cache check, 2ms average
- Origin Resolution - DNS lookup with caching
- Upstream Fetch - Connection pooling to origin servers
- Cache Lookup - S3/Redis translation cache, <50ms
- Content Transformation - Real-time HTML modification
- Response Delivery - Compressed response to client
Dynamic Multi-Tenant Routing
Unlike traditional reverse proxies with hardcoded upstreams, our system dynamically routes each request based on the requesting domain. This enables true multi-tenancy where thousands of customer domains all flow through the same proxy infrastructure.
-- Dynamic upstream selection
access_by_lua_block {
local license_lookup = require "license_lookup"
local origin_resolver = require "origin_resolver"
-- Lookup license for this domain
local license = license_lookup.lookup(ngx.var.host)
if not license then
return ngx.exit(403)
end
-- Resolve origin dynamically
local origin = origin_resolver.get_origin_target(
license,
license.origin_protocol or "https"
)
ngx.var.upstream_target = origin.target
ngx.var.upstream_host = ngx.var.host
} Connection Pooling for Performance
One of the biggest performance wins came from aggressive connection pooling. Instead of establishing a new TCP connection for each upstream request, we maintain pools of keepalive connections:
local res = httpc:request_uri(upstream_url, {
keepalive_timeout = 60000, -- 60 second keepalive
keepalive_pool = 50, -- Pool of 50 connections
ssl_verify = false,
ssl_server_name = ngx.var.upstream_host -- SNI
}) This eliminates the TCP handshake and TLS negotiation overhead for the vast majority of requests.
Auto-SSL Certificate Management
Managing SSL certificates for thousands of customer domains manually would be impossible. We implemented automatic certificate provisioning using Let's Encrypt:
auto_ssl:set("allow_domain", function(domain)
-- Only issue certificates for domains with valid licenses
local license = license_lookup.lookup(domain)
return license and license.status == "active"
end) The system validates that each domain has an active license before issuing a certificate, stores certificates in Redis for fast lookup with S3 for persistence, and handles automatic renewal in the background.
SSL Passthrough for End-to-End Encryption
A recent enhancement added SSL passthrough capability, ensuring true end-to-end encryption between the proxy and customer origin servers. Rather than terminating SSL at the proxy and making unencrypted requests to origins, the system now:
- Maintains TLS connections to origin servers using SNI (Server Name Indication)
- Verifies origin certificates against trusted CAs (configurable per license)
- Supports mutual TLS (mTLS) for origins requiring client certificate authentication
- Preserves the full chain of trust from end user through proxy to origin
This eliminates the security gap where traffic between the proxy and origin could potentially be intercepted, meeting enterprise security requirements for sensitive data transmission.
Kubernetes Deployment
The proxy runs on Kubernetes with horizontal pod autoscaling based on CPU utilization:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
spec:
minReplicas: 4
maxReplicas: 16
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70 Our node pool configuration includes 2-4 nodes with 4vCPU and 8GB RAM each, dedicated node pools with taints for workload isolation, and pod anti-affinity rules for high availability across nodes.
Monitoring & Observability
Every request includes custom timing headers for debugging:
-- Add timing headers to every response
local total_time_ms = (ngx.now() - request_start_time) * 1000
ngx.header["X-Origin-Time"] = string.format("%.2fms", origin_time_ms)
ngx.header["X-Total-Time"] = string.format("%.2fms", total_time_ms) Combined with Prometheus metrics, Grafana dashboards, and Loki log aggregation, we have complete visibility into system performance.
Key Takeaways
Building a high-performance reverse proxy taught us several important lessons:
- Choose the right tool: OpenResty's event-driven architecture and LuaJIT performance were essential for meeting our latency targets
- Cache aggressively: Shared memory dictionaries, Redis, and S3 caching layers work together to minimize redundant work
- Pool connections: Connection reuse eliminates the biggest source of latency in proxy systems
- Measure everything: Per-request timing headers and comprehensive metrics enable continuous optimization
- Design for multi-tenancy: Dynamic routing based on request context enables massive scale
The difference between 65ms and 400ms latency might seem small, but at 20,000 requests per second, it's the difference between a snappy user experience and a sluggish one.