
What is a reverse proxy server
Interesting detail about the modern internet. When you visit Amazon or watch a series on Netflix, there's a complex system of reverse proxy servers working between you and the content. You don't see it, but it's there.
What's a reverse proxy in technical terms? It's software that receives HTTP(S) requests and decides which backend server to forward them to. Sounds simple, right? But there's a lot going on inside. Take connection pooling — the proxy creates persistent connections with backend servers and then reuses them for different users. No need to establish a new TCP connection every time, the time savings are huge.
And why is it called "reverse" anyway? A regular proxy hides the client from the server. A reverse proxy does the opposite — it hides the server from the client. To the user it looks like one server, though behind the scenes there could be a farm of hundreds of machines.
Forward and reverse proxy – what's the difference
How does a forward proxy work? You go into your browser settings, find the proxy settings. You enter a server address, say 192.168.1.100, specify port 8080. Done. Now the browser will go online not directly, but through this server. For websites you're now at the proxy server's address, not your home IP. Many people bypass geographic blocks this way.
A reverse proxy works from the opposite side. You type site.com, DNS returns IP 1.2.3.4. But that's not the server with the website, it's a reverse proxy. It receives the request, looks at its settings. Sees routing rules. Everything starting with /api/* goes to server A, images /images/* to CDN, everything else to server B.
The key difference is control. You set up a forward proxy yourself as a user. The site owner sets up the reverse proxy. A forward proxy can cache data to save your traffic. A reverse proxy caches to lighten the load on its servers.
How a reverse (reverse) proxy works
Let's break down a real request step by step. You typed example.com/products/123. The browser does a DNS lookup, gets 5.6.7.8. That's nginx configured as a reverse proxy. Sends there GET /products/123 HTTP/1.1.
Nginx parses the request. Checks location blocks in the config. Finds a match with location /products. It says proxy_pass http://products_backend. That's an upstream group of three servers. Nginx selects a server by round-robin (or another algorithm), say 10.0.0.2 port 5000.
Opens a connection to 10.0.0.2 on port 5000 (or takes from the keepalive pool). Modifies headers, adds X-Real-IP, X-Forwarded-For, changes Host. Sends the request. Gets a response. May cache it if configured. Returns to the client.
Why you need a reverse proxy
The first reason is horizontal scaling. You have a Rails application. One process handles 50 req/sec. Need 500 req/sec? You run 10 processes (can be on different machines), put nginx in front of them. Done, without changing code.
The second reason is related to specialization. Nginx serves static files more efficiently than Ruby/Python/PHP. You configure location /static with sendfile on, tcp_nopush on. Static files fly, the application isn't bothered.
The third reason concerns a single point for cross-cutting concerns. CORS headers, rate limiting, gzip compression, SSL termination. All on the proxy. Backends stay simple, only deal with business logic.
Main tasks and benefits of reverse proxy
Load balancing
The simplest case is round-robin. Requests go in a circle. First to server1, second to server2, third to server3, fourth back to server1. In nginx it looks like this.
upstream backend {
server 10.0.0.1:5000;
server 10.0.0.2:5000;
server 10.0.0.3:5000;
}
But round-robin is dumb. Doesn't account that server1 might be overloaded while server2 is idle. That's why there's least_conn. A new request goes where there are fewer active connections.
upstream backend {
least_conn;
server 10.0.0.1:5000;
server 10.0.0.2:5000;
}
Even cooler to use weights. You have server1 with 32GB RAM, server2 with 8GB. Makes sense to send 4 times more to the first one.
upstream backend {
server 10.0.0.1:5000 weight=4;
server 10.0.0.2:5000 weight=1;
}
Caching and content delivery acceleration
Cache in nginx is a separate dissertation topic. Basic configuration looks like this.
proxy_cache_path /var/cache/nginx levels=1:2 keys_zone=my_cache:10m max_size=1g inactive=60m;
server {
location / {
proxy_cache my_cache;
proxy_cache_valid 200 1h;
proxy_cache_valid 404 1m;
proxy_pass http://backend;
}
}
The levels=1:2 parameter means a two-level folder structure. Otherwise there'll be a million files in one directory. The file system will choke.
keys_zone designates an area in shared memory for storing cache keys.
max_size sets the maximum size on disk.
The trick is in proxy_cache_use_stale. Backend crashed? Nginx will serve stale cache.
proxy_cache_use_stale error timeout http_500 http_502 http_503 http_504;
HTTPS encryption and security
SSL/TLS handshake eats CPU like crazy. RSA 2048 bit requires 0.5ms per handshake on a modern CPU. With 1000 new connections per second, half a core goes just to cryptography.
Reverse proxy https solves the problem. SSL terminates at nginx, then plain HTTP goes on.
server {
listen 443 ssl http2;
ssl_certificate /etc/ssl/certs/cert.pem;
ssl_certificate_key /etc/ssl/private/key.pem;
ssl_session_cache shared:SSL:10m;
ssl_session_timeout 10m;
location / {
proxy_pass http://backend;
}
}
The ssl_session_cache parameter is critically important. Without it every connection requires a full handshake. With it, session resumption happens, 10 times faster.
Masking internal infrastructure
Backends can be on anything. You have a legacy PHP4 application? Hide it behind nginx, no one will know. API on Go, admin panel on Django, reports on .NET? From the outside looks like a single service.
Headers can be cleaned up. Backend returns Server Apache/2.2.15 (CentOS)? You overwrite it.
proxy_hide_header Server;
add_header Server "nginx";
And if you're working with IP-demanding platforms, you'll have to think about the quality of outgoing connections. Specialized services like GonzoProxy help here. They have 20+ million verified residential IPs that don't get flagged as proxies. Especially relevant for parsing and multi-accounting.
When to use reverse proxy
For web application protection
Rate limiting saves from dumb brute force. Someone's hammering POST /login? We limit.
limit_req_zone $binary_remote_addr zone=login:10m rate=5r/m;
location /login {
limit_req zone=login burst=5 nodelay;
proxy_pass http://backend;
}
5 requests per minute per IP. The burst=5 parameter allows exceeding, but no more than 5 requests. nodelay means don't delay, immediately return 503.
ModSecurity (WAF for nginx) filters absolutely everything. SQL injections, XSS, path traversal. Out-of-the-box rules catch 90% of typical attacks. But false positives happen, needs tuning.
For scalability and fault tolerance
Adding a server to nginx is simple. Add one line and reload.
upstream backend {
server 10.0.0.1:5000;
server 10.0.0.2:5000;
server 10.0.0.3:5000; # new
}
No downtime. Nginx will reread the config, old workers will finish current requests and die, new ones will start with the new config.
Health checks in the open source version of nginx are primitive. Only passive. Server didn't respond max_fails times in a row, excluded for fail_timeout seconds. Nginx Plus has active health checks, but that's paid.
For corporate network optimization
Single entry point for all services. Instead of a dozen DNS A-records you get one. Instead of a dozen SSL certificates you use one wildcard. Instead of firewall setup for each service you only configure for the proxy.
Authentication can be moved to the proxy. Use nginx with the auth_request module.
location /internal/ {
auth_request /auth;
proxy_pass http://internal_service;
}
location = /auth {
internal;
proxy_pass http://auth_service/verify;
}
By the way, for distributed teams with lots of external integrations, quality proxies like GonzoProxy become a must-have. When you're hitting partner APIs 1000 times per minute, residential IPs with low fraud scores are critical.
Popular solutions
Nginx as reverse proxy
Event-driven architecture is nginx's killer feature. One worker process through epoll (Linux) or kqueue (BSD) handles thousands of connections. Apache under load forks processes and eats gigabytes of RAM. Nginx works in 100 MB.
Nginx config reads like DSL. Nested blocks, directive inheritance. Location blocks match by priority. First exact (=), then regexes (~), then prefixes.
location = /api/v1/status { # exact match, maximum priority
return 200 "OK";
}
location ~ \.php$ { # regex, medium priority
proxy_pass http://php_backend;
}
location /static/ { # prefix, low priority
root /var/www;
}
Apache as reverse proxy
Apache with mod_proxy works differently. Process-based model (prefork MPM) or thread-based (worker MPM). Each request gets a separate process/thread. Memory hungry, but isolation is better.
Configuration through .htaccess is a double-edged sword. Convenient for shared hosting, but performance hit on every request. Apache rereads .htaccess files along the entire path.
<VirtualHost *:80>
ProxyPreserveHost On
ProxyPass / http://127.0.0.1:8080/
ProxyPassReverse / http://127.0.0.1:8080/
</VirtualHost>
ProxyPreserveHost is important. Without it backend will get Host 127.0.0.1 instead of the original domain.
Setting up reverse proxy in Windows environment
IIS with ARR (Application Request Routing) is the standard for Windows. GUI for configuration helps those who fear the console. URL Rewrite module is quite powerful, supports regexes.
<system.webServer>
<rewrite>
<rules>
<rule name="ReverseProxy" stopProcessing="true">
<match url="(.*)" />
<action type="Rewrite" url="http://backend/{R:1}" />
</rule>
</rules>
</rewrite>
</system.webServer>
The problem with IIS is performance. On the same hardware nginx will handle 5-10 times more requests.
How to set up reverse proxy: step-by-step guide
Prerequisites (server, access, basic knowledge)
VPS/VDS minimum is 1 CPU, 512 MB RAM for tests. For production take 2+ CPU, 2+ GB RAM. SSD is mandatory if you're planning caching. Random I/O on HDD will kill performance.
You need root or sudo access. Without this nginx won't start on ports 80/443 (privileged). You can start on 8080, but then you'll have to specify the port in the URL.
Commands that will definitely come in handy. ss -tulpn will show which ports are listening. journalctl -u nginx will output systemd unit logs. nginx -T will show the full configuration including includes. curl -I will check response headers.
Setting up reverse proxy in Nginx (configuration example)
Installing nginx. For Ubuntu/Debian use the command apt update && apt install -y nginx.
For RHEL/CentOS/Rocky run yum install -y epel-release && yum install -y nginx.
Config structure in Debian-based is like this. The /etc/nginx/sites-available/ folder contains configs, /etc/nginx/sites-enabled/ contains symlinks to active ones. In RHEL-based all configs are in /etc/nginx/conf.d/ with .conf extension.
Writing config /etc/nginx/sites-available/myapp.
upstream app_backend {
keepalive 32; # keep 32 connections open
server 127.0.0.1:3000 max_fails=2 fail_timeout=10s;
server 127.0.0.1:3001 max_fails=2 fail_timeout=10s backup; # backup server
}
server {
listen 80;
server_name myapp.com;
# Buffer sizes are important for performance
client_body_buffer_size 128k;
client_max_body_size 10m;
proxy_buffer_size 4k;
proxy_buffers 32 4k;
location / {
proxy_pass http://app_backend;
proxy_http_version 1.1; # for keepalive
proxy_set_header Connection ""; # for keepalive
# Pass real client IP
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_set_header Host $host;
# Timeouts
proxy_connect_timeout 5s;
proxy_send_timeout 60s;
proxy_read_timeout 60s;
}
# Serve static directly
location ~* \.(jpg|jpeg|gif|png|css|js|ico|xml)$ {
expires 30d;
add_header Cache-Control "public, immutable";
root /var/www/static;
}
}
Activate with commands ln -s /etc/nginx/sites-available/myapp /etc/nginx/sites-enabled/ and then nginx -t && systemctl reload nginx.
Setting up reverse proxy for HTTPS
Certbot is the simplest way to get a free SSL from Let's Encrypt. Install snap install --classic certbot.
Then run certbot --nginx -d myapp.com -d www.myapp.com --email admin@myapp.com --agree-tos --no-eff-email.
Certbot will add SSL settings to the config itself. But it's better to tweak some parameters.
# Modern protocols and ciphers
ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256;
ssl_prefer_server_ciphers off;
# OCSP stapling speeds up certificate verification
ssl_stapling on;
ssl_stapling_verify on;
# HTTP/2 for performance
listen 443 ssl http2;
Checking and testing operation
Test config before applying. This is sacred. Run nginx -t.
If you see "syntax is ok", you can reload. "test failed" means read what's wrong.
Check what's listening with command ss -tulpn | grep nginx. Should see ports 80 and 443.
curl for checking works like this. For HTTP use curl -I http://myapp.com. For HTTPS run curl -I https://myapp.com. Follow redirects via curl -IL http://myapp.com. With custom header check curl -H "X-Test: 123" http://myapp.com.
Debug logs watch with command tail -f /var/log/nginx/access.log for general log. Errors are visible in tail -f /var/log/nginx/error.log. Filter by IP via tail -f /var/log/nginx/access.log | grep "192.168".
FAQ
Reverse proxy vs VPN: when does reverse proxy solve the task better?
VPN creates an encrypted tunnel between two networks. All protocols, any ports. Suitable when you need to give remote workers access to the internal network. The downside is the need to install a client, configure routing.
Reverse proxy works only with HTTP(S). But no client needed. Works in any browser. Can be finely tuned. This URL is public, this requires a password, this is accessible only from the office network.
In short, use VPN when you need network access, reverse proxy when you need web service access.
How does reverse proxy work with WebSocket and how is it different from regular HTTP?
WebSocket starts as an HTTP request with headers Upgrade websocket and Connection Upgrade. Server responds with 101 Switching Protocols. Then WebSocket frames go over the same TCP connection.
For nginx it's critical to pass headers correctly.
location /ws/ {
proxy_pass http://websocket_backend;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_read_timeout 3600s; # keep connection for an hour
proxy_send_timeout 3600s;
}
Without proxy_read_timeout nginx will close the connection after 60 seconds by default.
What security headers should a reverse proxy add?
The minimum set that will pass any security audit includes the following headers.
# Clickjacking protection
add_header X-Frame-Options "SAMEORIGIN" always;
# Disable MIME type sniffing
add_header X-Content-Type-Options "nosniff" always;
# XSS filter for older browsers
add_header X-XSS-Protection "1; mode=block" always;
# Control referrer transmission
add_header Referrer-Policy "strict-origin-when-cross-origin" always;
# HTTPS only (HSTS)
add_header Strict-Transport-Security "max-age=63072000; includeSubDomains; preload" always;
# Content Security Policy is the most powerful
add_header Content-Security-Policy "default-src 'self'; script-src 'self' 'unsafe-inline' https://cdn.jsdelivr.net; style-src 'self' 'unsafe-inline';" always;
The always parameter is important to add. It adds headers even to error responses.
Sticky-sessions behind reverse proxy: when are they needed and how do they work with JWT?
Sticky (aka persistent) sessions bind a client to a backend server. Classic case is PHP sessions in files. User logged in on server1, session was written to /tmp/sess_abc123. If the next request goes to server2, that file isn't there. User is logged out.
Different solutions exist. ip_hash in nginx provides binding by client IP.
upstream backend {
ip_hash;
server srv1:80;
server srv2:80;
}
The problem is that behind NAT all clients have one IP.
Cookie-based works more reliably (nginx Plus, or sticky module for open source).
upstream backend {
server srv1:80;
server srv2:80;
sticky cookie srv_id expires=1h;
}
Switching to stateless with JWT is the best solution. Token contains all info, any backend can serve. Sticky sessions aren't needed. But JWT is bigger in size. Each request carries 1-2 KB of token.