This How-To aims at bringing some help regarding the configuration of Apache HTTPD to make it act as a reverse proxy with load-balancing capabilities. Instructions are tested on GNU/Linux Debian. They should work on any Debian-derived distribution, or with some tweaks here and there, to any GNU/Linux distribution.
Explanations
Reverse Proxy?
While a proxy is traditionally useful when connections leaving internal network areas have to be monitored, like for analyzing web requests from managed clients to prevent unwanted activities, a reverse proxy works the other way around: it handles incoming connections from an outside network area.
When a client establishes a connection to a server hosting the desired service(s), they actually connect to the reverse proxy first, without even realizing it. The reverse proxy can, depending on how it is configured, take some decisions before letting the request pass to the real server(s) behind it:
- If you configure your reverse proxy to analyze incoming requests for signs of exploitation attempts, it becomes functionally equivalent to a Web Application Firewall (or WAF). You can also ban the offending clients, meaning that they’ll never get the chance to reach the server(s) they were intending to scrutinize, like an Intrusion Prevention System (or IPS) would;
- If you configure your reverse proxy to distribute incoming requests across several servers, it becomes a load balancer.
Load balancing?
Load balancing is a technique that becomes particularly useful when the service you want to offer cannot be disturbed during high affluence periods or downtime (wether planned or not).
The server(s) hosting the service(s) are identically configured and grouped in one or more clusters, meaning they have to be functionally equivalent to one another.
Different algorithms, with different use cases and drawbacks, can be used to distribute the load between the members of a cluster.
Use Case
In my use case, an Apache HTTPD server is configured as a reverse proxy on a Debian VM sitting in the DMZ network area. This machine is reachable from the WAN and a public DNS has a record for it.
Two Debian VMs are running Apache HTTPD, configured as functionally identical web servers.
In this How-To, I’ll guide you trough the additional HTTPD modules to activate, the configuration of the cluster containing the two web servers and the directives to add on both sides.
It is assumed that HTTP/2 is used, as well as HTTP/1.1. If that’s not your case, tweak the instructions accordingly.
Steps
Add modules to Apache HTTPD
Because Apache HTTPD is highly modular and Debian makes it incredibly easy to do so, all you have to do is to execute
# a2enmod http2 proxy proxy_balancer proxy_http proxy_http2 lbmethod_bybusyness headers status
to enable the required modules so HTTPD can handle its reverse proxy and load balancer role.
Some modules are also necessary on the web servers:
# a2enmod http2 status headers remoteip
Configure the reverse proxy
Because the machine hosting the reverse-proxy-configured HTTPD also manages virtual hosts for other purposes, all the configuration is done at a vhost level, except otherwise mentioned.
Protocols
The reverse proxy is meant to handle both HTTP/2 and HTTP/1.1 connections. The following directive
Protocols h2 h2c http/1.1
makes sure that any connection is being offered the chance to be upgraded to HTTP/2 before definitely falling back to HTTP/1.1.
The proxy will only be talking to the web servers using H2C, which is HTTP/2 without SSL. This brings us the following advantages:
- The public web traffic can be encrypted if
mod_ssl
is set accordingly; - The internal web traffic is plain, meaning it can be monitored and is a bit speedier;
- H2C is still HTTP/2, meaning that if a client connects over HTTP/2, the entire flow will be HTTP/2 down to the web server.
But it has a drawback: if an intruder can place itself in between the web servers and the reverse proxy, it might make you vulnerable to a MITM attack. If you don’t do SSL internally, consider using other mitigation methods.
Cluster
The definition of a cluster is made in the Proxy directive:
<Proxy "balancer://cluster-www-prod">
BalancerMember "h2c://www1.domain.dmz:8080" keepalive=on
BalancerMember "h2c://www2.domain.dmz:8080" keepalive=on
ProxySet lbmethod=bybusyness
</Proxy>
Hereby is defined a cluster named cluster-www-prod
. This cluster has two members, www1
and www2
, reachable with their FQDNs. The cluster is usable using the balancer://cluster-www-prod
URI. These members have listening web servers on port 8080
.
When the manager dispatches a request to a member, it does so using the H2C protocol, which is HTTP/2 without SSL.
Finally, ProxySet
lbmethod
instructs the balancer manager to distribute requests following the bybusyness
method amongst the members. Other methods are available, make sure to enable the corresponding module accordingly.
Proxy directives
Now that a cluster is defined, we can use the ProxyPass
and ProxyPassReverse
directives to instruct HTTPD when to forward the requests:
ProxyPass "/" "balancer://cluster-www-prod/"
ProxyPassReverse "/" "balancer://cluster-www-prod/"
Because we use the cluster’s URI defined earlier, it will be up to the balancer manager to determine which member will handle the request.
Important: do not forget to append "/"
at the end of the cluster name, as it is the web root that is being proxied!
Also enable the following directives:
ProxyPreserveHost On
ProxyErrorOverride On
ProxyPreserveHost
: preserve the host name from the HTTP request, for vhost-based proxying;ProxyErrorOverride
: if an error (4xx, 5xx, 6xx) occurs at the web server level, the proxy should present the corresponding error page.
And finally this one, particularly useful when the initial connection is made using HTTPS but forwarded to an HTTP-listening server (more on that in Bonus, down this page):
RequestHeader set X-Forwarded-Proto "https"
Status pages
Since I also want to have special status pages for both the balancer manager and the entire server, the following ProxyPass
directives are added. Here, "!"
tells HTTPD not to forward, but instead serve locally:
ProxyPass "/balancer-status" !
ProxyPass "/proxy-status" !
The locations are defined as follow:
<Location "/balancer-status">
SetHandler balancer-manager
</Location>
<Location "/proxy-status">
SetHandler server-status
</Location>
You should consider securing them to prevent unauthorized access, like using Basic auth or only allowing a specific IP subnet.
I also want to review each proxied web server’s status, so a 1:1 proxying configuration is needed:
ProxyPass "/www1-status" "h2c://www1.domain.dmz:8080/status"
ProxyPassReverse "/www1-status" "h2c://www1.domain.dmz:8080/status"
ProxyPass "/www2-status" "h2c://www2.domain.dmz:8080/status"
ProxyPassReverse "/www2-status" "h2c://www2.domain.dmz:8080/status"
Configure the web server(s)
Listening port
To better differentiate a web server that is self-sufficient or one that’s only a resource used by a bigger service, I prefer to change the listening port. For this, I’m using 8080
.
In /etc/apache2/ports.conf
, change 80
to 8080
:
# If you just change the port or add more ports here, you will likely also
# have to change the VirtualHost statement in
# /etc/apache2/sites-enabled/000-default.conf
#Listen 80
Listen 8080
And now, at the vhost level:
#<VirtualHost *:80>
<VirtualHost *:8080>
Status page
Because each server also has to serve its status page, the following Location
is declared:
<Location "/status">
SetHandler server-status
</Location>
Remote IP
The mod_proxy_http
module adds a few request headers, to let the proxied server know what’s happening.
One of them, X-Forwarded-For
, gives the real IP address behind the proxied connection. Without that, the web server would only see connections originating from the reverse proxy’s IP address, defeating IP authorization directives, geographic location hints, etc.
To let Apache HTTPD switch the connection IP address with the real IP address, we use the mod_remoteip
module.
This module adds new directives that we can configure:
RemoteIPHeader X-Forwarded-For
RemoteIPProxiesHeader X-Forwarded-By
RemoteIPInternalProxy reverseproxy.domain.dmz 127.0.0.1 ::1
RemoteIPHeader
: the header that contains the real IP address;RemoteIPProxiesHeader
: the header that contains the IP address of the proxy in front of the web server;RemoteIPInternalProxy
: list of FQDNs or IP addresses that we should trust when the value of the header defined byRemoteIPHeader
is set. Without that, any request with a forged value in the header defined by
from anyone would be blindly trusted and HTTPD could believe that an innocent IP is doing malicious activities, which might result in actions being taken against the wrong “offender”.RemoteIPHeader
Testing
Navigate to | Expected result(s) | Indication |
---|---|---|
http(s)://reverseproxy.domain.dmz/balancer-status | Balancer Manager’s webpage is displayed, the two members should be in the OK state | The request wasn’t proxied, the handler handled the request |
http(s)://reverseproxy.domain.dmz/www1-status | server-status handler’s webpage from www1 is displayed, the IP addresses are from the real clients | The request was 1:1 proxied, www1 is up and running, RemoteIP is correctly configured |
http(s)://reverseproxy.domain.dmz/www2-status | Server-status handler’s webpage from www2 is displayed, the IP addresses are from the real clients | The request was 1:1 proxied, www2 is up and running, RemoteIP is correctly configured |
http(s)://reverseproxy.domain.dmz/ | Web content is served | OK state |
Conclusion
Congratulations!
You should now have an Apache HTTPD server acting as a reverse proxy in front of two or more web servers.
Those web servers are configured in a cluster.
The reverse proxy intelligently distributes requests amongst the members of said cluster.
Additional directives forbid proxying for particular locations, as well as 1:1 proxying rules.
The members of the cluster can retrieve the real IP address behind the proxied connection to apply IP-based rules.
Bonus: WordPress
If the web servers behind your reverse proxy are serving a WordPress website and it is configured to upgrade HTTP to HTTPS, you should consider adding this at the beginning of the .htaccess
file at the webroot:
SetEnvIf X-Forwarded-Proto https HTTPS
Using the X-Forwarded-Proto
header with the https
value we defined earlier (at Steps->Configure the reverse proxy->Proxy directives), we instruct HTTPD to conditionally define the HTTPS
environment variable.
This lets WordPress know that the originating connection was made using HTTPS, even tho the actual connection (from the reverse proxy) isn’t.
Without that, WordPress would continually redirect from http://whatever
to https://whatever
, going trough the reverse proxy over and over, resulting in an endless loop, because it is configured to enforce such an upgrade.
Another modification, to prevent the mod_rewrite
module from rewriting /status
and outputting a 404, because the page doesn’t physically exists but should instead be dynamically generated by a handler:
# BEGIN WordPress
[...]
<IfModule mod_rewrite.c>
RewriteEngine On
[...]
RewriteCond %{REQUEST_URI} !=/status
RewriteRule . /index.php [L]
</IfModule>