How-To: setup Apache HTTPD as a load-balancing reverse proxy on Debian

This How-To aims at bringing some help regarding the configuration of Apache HTTPD to make it act as a reverse proxy with load-balancing capabilities. Instructions are tested on GNU/Linux Debian. They should work on any Debian-derived distribution, or with some tweaks here and there, to any GNU/Linux distribution.

Explanations

Reverse Proxy?

While a proxy is traditionally useful when connections leaving internal network areas have to be monitored, like for analyzing web requests from managed clients to prevent unwanted activities, a reverse proxy works the other way around: it handles incoming connections from an outside network area.

When a client establishes a connection to a server hosting the desired service(s), they actually connect to the reverse proxy first, without even realizing it. The reverse proxy can, depending on how it is configured, take some decisions before letting the request pass to the real server(s) behind it:

If you configure your reverse proxy to analyze incoming requests for signs of exploitation attempts, it becomes functionally equivalent to a Web Application Firewall (or WAF). You can also ban the offending clients, meaning that they’ll never get the chance to reach the server(s) they were intending to scrutinize, like an Intrusion Prevention System (or IPS) would;
If you configure your reverse proxy to distribute incoming requests across several servers, it becomes a load balancer.

Load balancing?

Load balancing is a technique that becomes particularly useful when the service you want to offer cannot be disturbed during high affluence periods or downtime (wether planned or not).

The server(s) hosting the service(s) are identically configured and grouped in one or more clusters, meaning they have to be functionally equivalent to one another.

Different algorithms, with different use cases and drawbacks, can be used to distribute the load between the members of a cluster.

Use Case

In my use case, an Apache HTTPD server is configured as a reverse proxy on a Debian VM sitting in the DMZ network area. This machine is reachable from the WAN and a public DNS has a record for it.

Two Debian VMs are running Apache HTTPD, configured as functionally identical web servers.

In this How-To, I’ll guide you trough the additional HTTPD modules to activate, the configuration of the cluster containing the two web servers and the directives to add on both sides.

It is assumed that HTTP/2 is used, as well as HTTP/1.1. If that’s not your case, tweak the instructions accordingly.

Steps

Add modules to Apache HTTPD

Because Apache HTTPD is highly modular and Debian makes it incredibly easy to do so, all you have to do is to execute

# a2enmod http2 proxy proxy_balancer proxy_http proxy_http2 lbmethod_bybusyness headers status

to enable the required modules so HTTPD can handle its reverse proxy and load balancer role.

Some modules are also necessary on the web servers:

# a2enmod http2 status headers remoteip

Configure the reverse proxy

Because the machine hosting the reverse-proxy-configured HTTPD also manages virtual hosts for other purposes, all the configuration is done at a vhost level, except otherwise mentioned.

Protocols

The reverse proxy is meant to handle both HTTP/2 and HTTP/1.1 connections. The following directive

Protocols h2 h2c http/1.1

makes sure that any connection is being offered the chance to be upgraded to HTTP/2 before definitely falling back to HTTP/1.1.

The proxy will only be talking to the web servers using H2C, which is HTTP/2 without SSL. This brings us the following advantages:

The public web traffic can be encrypted if mod_ssl is set accordingly;
The internal web traffic is plain, meaning it can be monitored and is a bit speedier;
H2C is still HTTP/2, meaning that if a client connects over HTTP/2, the entire flow will be HTTP/2 down to the web server.

But it has a drawback: if an intruder can place itself in between the web servers and the reverse proxy, it might make you vulnerable to a MITM attack. If you don’t do SSL internally, consider using other mitigation methods.

Cluster

The definition of a cluster is made in the Proxy directive:

<Proxy "balancer://cluster-www-prod">
        BalancerMember "h2c://www1.domain.dmz:8080" keepalive=on
        BalancerMember "h2c://www2.domain.dmz:8080" keepalive=on
        ProxySet lbmethod=bybusyness
</Proxy>

Hereby is defined a cluster named cluster-www-prod. This cluster has two members, www1 and www2, reachable with their FQDNs. The cluster is usable using the balancer://cluster-www-prod URI. These members have listening web servers on port 8080.

When the manager dispatches a request to a member, it does so using the H2C protocol, which is HTTP/2 without SSL.

Finally, ProxySet lbmethod instructs the balancer manager to distribute requests following the bybusyness method amongst the members. Other methods are available, make sure to enable the corresponding module accordingly.

Proxy directives

Now that a cluster is defined, we can use the ProxyPass and ProxyPassReverse directives to instruct HTTPD when to forward the requests:

ProxyPass "/" "balancer://cluster-www-prod/"
ProxyPassReverse "/" "balancer://cluster-www-prod/"

Because we use the cluster’s URI defined earlier, it will be up to the balancer manager to determine which member will handle the request.

Important: do not forget to append "/" at the end of the cluster name, as it is the web root that is being proxied!

Also enable the following directives:

ProxyPreserveHost On
ProxyErrorOverride On

ProxyPreserveHost: preserve the host name from the HTTP request, for vhost-based proxying;
ProxyErrorOverride: if an error (4xx, 5xx, 6xx) occurs at the web server level, the proxy should present the corresponding error page.

And finally this one, particularly useful when the initial connection is made using HTTPS but forwarded to an HTTP-listening server (more on that in Bonus, down this page):

RequestHeader set X-Forwarded-Proto "https"

Status pages

Since I also want to have special status pages for both the balancer manager and the entire server, the following ProxyPass directives are added. Here, "!" tells HTTPD not to forward, but instead serve locally:

ProxyPass "/balancer-status" !
ProxyPass "/proxy-status" !

The locations are defined as follow:

<Location "/balancer-status">
        SetHandler balancer-manager
</Location>
<Location "/proxy-status">
        SetHandler server-status
</Location>

You should consider securing them to prevent unauthorized access, like using Basic auth or only allowing a specific IP subnet.

I also want to review each proxied web server’s status, so a 1:1 proxying configuration is needed:

ProxyPass "/www1-status" "h2c://www1.domain.dmz:8080/status"
ProxyPassReverse "/www1-status" "h2c://www1.domain.dmz:8080/status"
ProxyPass "/www2-status" "h2c://www2.domain.dmz:8080/status"
ProxyPassReverse "/www2-status" "h2c://www2.domain.dmz:8080/status"

Configure the web server(s)

Listening port

To better differentiate a web server that is self-sufficient or one that’s only a resource used by a bigger service, I prefer to change the listening port. For this, I’m using 8080.

In /etc/apache2/ports.conf, change 80 to 8080:

# If you just change the port or add more ports here, you will likely also
# have to change the VirtualHost statement in
# /etc/apache2/sites-enabled/000-default.conf

#Listen 80
Listen 8080

And now, at the vhost level:

#<VirtualHost *:80>
<VirtualHost *:8080>

Status page

Because each server also has to serve its status page, the following Location is declared:

<Location "/status">
        SetHandler server-status
</Location>

Remote IP

The mod_proxy_http module adds a few request headers, to let the proxied server know what’s happening.

One of them, X-Forwarded-For, gives the real IP address behind the proxied connection. Without that, the web server would only see connections originating from the reverse proxy’s IP address, defeating IP authorization directives, geographic location hints, etc.

To let Apache HTTPD switch the connection IP address with the real IP address, we use the mod_remoteip module.

This module adds new directives that we can configure:

RemoteIPHeader X-Forwarded-For
RemoteIPProxiesHeader X-Forwarded-By
RemoteIPInternalProxy reverseproxy.domain.dmz 127.0.0.1 ::1

RemoteIPHeader: the header that contains the real IP address;
RemoteIPProxiesHeader: the header that contains the IP address of the proxy in front of the web server;
RemoteIPInternalProxy: list of FQDNs or IP addresses that we should trust when the value of the header defined by RemoteIPHeader is set. Without that, any request with a forged value in the header defined by RemoteIPHeader from anyone would be blindly trusted and HTTPD could believe that an innocent IP is doing malicious activities, which might result in actions being taken against the wrong “offender”.

Testing

Navigate to	Expected result(s)	Indication
http(s)://reverseproxy.domain.dmz/balancer-status	Balancer Manager’s webpage is displayed, the two members should be in the OK state	The request wasn’t proxied, the handler handled the request
http(s)://reverseproxy.domain.dmz/www1-status	server-status handler’s webpage from www1 is displayed, the IP addresses are from the real clients	The request was 1:1 proxied, www1 is up and running, RemoteIP is correctly configured
http(s)://reverseproxy.domain.dmz/www2-status	Server-status handler’s webpage from www2 is displayed, the IP addresses are from the real clients	The request was 1:1 proxied, www2 is up and running, RemoteIP is correctly configured
http(s)://reverseproxy.domain.dmz/	Web content is served	OK state

Conclusion

Congratulations!

You should now have an Apache HTTPD server acting as a reverse proxy in front of two or more web servers.

Those web servers are configured in a cluster.

The reverse proxy intelligently distributes requests amongst the members of said cluster.

Additional directives forbid proxying for particular locations, as well as 1:1 proxying rules.

The members of the cluster can retrieve the real IP address behind the proxied connection to apply IP-based rules.

Bonus: WordPress

If the web servers behind your reverse proxy are serving a WordPress website and it is configured to upgrade HTTP to HTTPS, you should consider adding this at the beginning of the .htaccess file at the webroot:

SetEnvIf X-Forwarded-Proto https HTTPS

Using the X-Forwarded-Proto header with the https value we defined earlier (at Steps->Configure the reverse proxy->Proxy directives), we instruct HTTPD to conditionally define the HTTPS environment variable.

This lets WordPress know that the originating connection was made using HTTPS, even tho the actual connection (from the reverse proxy) isn’t.

Without that, WordPress would continually redirect from http://whatever to https://whatever, going trough the reverse proxy over and over, resulting in an endless loop, because it is configured to enforce such an upgrade.

Another modification, to prevent the mod_rewrite module from rewriting /status and outputting a 404, because the page doesn’t physically exists but should instead be dynamically generated by a handler:

# BEGIN WordPress
[...]
<IfModule mod_rewrite.c>
RewriteEngine On
[...]
RewriteCond %{REQUEST_URI} !=/status
RewriteRule . /index.php [L]
</IfModule>