Is Round-Robin DNS "good enough" for load balancing static content?

Solution 1:

Jeff, I disagree, load balancing does not imply redundancy, it's quite the opposite in fact. The more servers you have, the more likely you'll have a failure at a given instant. That's why redundancy IS mandatory when doing load balancing, but unfortunately there are a lot of solutions which only provide load balancing without performing any health check, resulting in a less reliable service.

DNS roundrobin is excellent to increase capacity, by distributing the load across multiple points (potentially geographically distributed). But it does not provide fail-over. You must first describe what type of failure you are trying to cover. A server failure must be covered locally using a standard IP address takeover mechanism (VRRP, CARP, ...). A switch failure is covered by resilient links on the server to two switches. A WAN link failure can be covered by a multi-link setup between you and your provider, using either a routing protocol or a layer2 solution (eg: multi-link PPP). A site failure should be covered by BGP : your IP addresses are replicated over multiple sites and you announce them to the net only where they are available.

From your question, it seems that you only need to provide a server fail-over solution, which is the easiest solution since it does not involve any hardware nor contract with any ISP. You just have to setup the appropriate software on your server for that, and it's by far the cheapest and most reliable solution.

You asked "what if an haproxy machine fails ?". It's the same. All people I know who use haproxy for load balancing and high availability have two machines and run either ucarp, keepalived or heartbeat on them to ensure that one of them is always available.

Hoping this helps!

Solution 2:

As load-balancing, it's ghetto but more-or-less effective. If you had one server that was falling over from the load, and wanted to spread it to multiple servers, that might be a good reason to do this, at least temporarily.

There are a number of valid criticisms of round-robin DNS as load "balancing," and I wouldn't recommend doing it for that other than as a short-term band-aid.

But you say your primary motivation is to avoid a single-server dependency. Without some automated way of taking dead servers out of rotation, it's not very valuable as a way of preventing downtime. (With an automated way of pulling servers from rotation and a short TTL, it becomes ghetto failover. Manually, it's not even that.)

If one of your two round-robined servers goes down, then 50% of your customers will get a failure. This is better than 100% failure with only one server, but almost any other solution that did real failover would be better than this.

If the probability of failure of one server is N, with two servers your probability is 2N. Without automated, fast failover, this scheme increases the probability that some of your users will experience failure.

If you plan to take the dead server out of rotation manually, you're limited by the speed with which you can do that and the DNS TTL. What if the server dies at 4 AM? The best part of true failover is getting to sleep through the night. You already use HAProxy, so you should be familiar with it. I strongly suggest using it, as HAProxy is designed for exactly this situation.


Solution 3:

I've said it several times before, and I'll say it again - if resiliency is the problem then DNS tricks are not the answer.

The best HA systems will allow your clients to keep using the exact same IP address for every request. This is the only way to ensure that clients don't even notice the failure.

So the fundamental rule is that true resilience requires IP routing level trickery. Use a load-balancer appliance, or OSPF "equal cost multi-path", or even VRRP.

DNS on the other hand is an addressing technology. It exists solely to map from one namespace to another. It was not designed to permit very short term dynamic changes to that mapping, and hence when you try to make such changes many clients will either not notice them, or at best will take a long time to notice them.

I would also say that since load isn't a problem for you, that you might just as well have another server ready to run as a hot standby. If you use dumb round-robin you have to proactively change your DNS records when something breaks, so you might just as well proactively flip the hot standby server into action and not change your DNS.


Solution 4:

Round robin DNS is not what people think it is. As an author of DNS server software (namely, BIND) we get users who wonder why their round robin stops working as planned. They don't understand that even with a TTL of 0 seconds there will be some amount of caching out there, since some caches put a minimum time (often 30-300 seconds) no matter what.

Also, while your AUTH servers may do round robin, there is no guarantee the ones you care about -- the caches your users speak to -- will. In short, round robin doesn't guarantee any ordering from the client's point of view, only what your auth servers provide to a cache.

If you want real failover, DNS is but one step. It's not a bad idea to list more than one IP address for two different clusters, but I'd use other technology there (such as simple anycast) to do the actual load balancing. I personally despise hardware load balancing hardware which mucks with DNS as it usually gets it wrong. And don't forget DNSSEC is coming, so if you do choose something in this area ask your vendor what happens when you sign your zone.


Solution 5:

I've read through all answers and one thing I didn't see is that most modern web browsers will try one of the alternative IP addresses if a server is not responding. If I remember correctly then Chrome will even try multiple IP addresses and continue with the server that responds first. So in my opinion DNS Round Robin Load balancing is always better then nothing.

BTW: I see DNS Round Robin more as simple load distribution solution.