What is a "reverse proxy" in webmastering?

A reverse proxy is a service (server) that:

Appears to be a web server to clients (ie end users)
Appears like a web client (browser) to a web server.

An example of a reverse proxy would be something like Cloudflare or CloudFront.

They can serve a range of purposes including -

Adding https to a site which cant handle it natively (or where https is very compute intensive)
Providing reliability/scalability by divying up requests to multiple servers
Adding a layer of security by hiding the actual site location.
Unifying different sites under a single domain.

(Another way to look at is litereally - it acts like a proxy, but in reverse. Instead of acting on the web clients behalf it acts on the web servers behalf)

An HTTP proxy is a specific kind of server which will receive HTTP requests and forward them to another server.

The original use of proxies included:

Enable users on an internal network without full Internet access to browse the web: you would not have any direct TCP/IP connection to the whole internet, only to internal servers, and one of these, the proxy, which would have access to the rest of the internet, would act as a gateway.
Perform access control: only some users are allowed to access "the web" (HTTP servers on the Internet), or users are allowed to only access some web servers (using either whitelists: only listed servers can be accessed, or blacklists: all servers but those listed can be accessed), or a combination of both.
Perform content control: this can include checking for "sensitive" stuff, like p0rn or malware. It could just block videos. Or downloads of executable files, zips, or whatever.
Caching. This was very common back in the 90s to save on bandwidth: the proxy would cache the results of request, so subsequent users (or the same one) requesting the same URL would get the data from the local cache. Useful when links to the Internet were slow and congested, though it requires many users to access the same ressources to be useful.
Censorship and sniffing of communications. By terminating the HTTP(S) connections at the proxy, data can be decrypted there before being encrypted again on the other leg, which allows all sorts of legitimate and less legitimate stuff to happen.
A combination of the above, and probably a few more I forget.

This type of proxy is still relatively common in enterprise networks, mostly for access and content control. It is also a common tool in countries where the regime wants to know what you do, read, or say.

In most cases, the proxy is configured (though usually automatically) in the browser. In some cases, a transparent proxy can be set up (though this is a lot more difficult with SSL/TLS).

With this type of proxy, the browser can request any URL (provided the proxy is willing, of course), and the communication is usually thought as "internal to external".

The other type of proxy, which came a tiny bit later, is the reverse proxy. In this case, we're doing the opposite: people from all over the Internet will go through this proxy before reaching the final server.

This is useful for:

Load balancing / fault tolerance: one or more proxies will receive the traffic, and then forward to farms of servers based on availability, load, etc.
Separation of traffic: some web servers may be specialised. For instance, mixing on the same server both static files (CSS, JS, images, videos, etc.) and dynamic content (PHP, .NET, Java...) may be suboptimal. A reverse proxy may redirect traffic to different servers based on the request.
Hiding internal architecture details. You could have lots of different ressources (pages) on the same domain, but served by lots of different servers with different technologies. One bit could be PHP, another Java. Even with the same tech, they could use very different stacks, access different databases, etc.
Caching. Frequently requested ressources can be cached on the proxy.
SSL/TLS termination. To reduce the CPU requirements for TLS encryption/decryption on the server, the proxy could do that instead.
CDN: a CDN can be caching + load balancing + fault tolerance pushed to the extreme: you put proxies all over the world, and they serve files from the local cache (a lot quicker), and if not, get the data from the origin server.
Security: the proxy could do all sorts of filtering on URLs and submitted data to block malicious requests.
And probably a lot more cases which I forgot about.

Note that the "original" proxies are now often called "forward proxies" to differentiate them from reverse proxies.

Note that many HTTP server apps, including Apache and Nginx can actually do both HTTP serving and proxying in one or both directions. You can have setups where the same server will both serve local resources for some paths, and proxy to another server for others.

No guys I don't think he's talking about a 3rd party reverse proxy because of his mention of nginx.

People will setup reverse web proxies for various reasons, one of which is to spread the load over multiple web servers. Nginx would process the request and then decide where to send it.

Nginx was actually used extensively in corporate networks as an internal reverse proxy. So all your web requests would go through a local nginx server and that server would process the request and decided (if it matches a filter or some thing) where to send it.

What is a "reverse proxy" in webmastering?

Tags:

Reverse Proxy

Related

Recent Posts