Setup for high availability virtualized environment

As a general overview, to achieve High Availability you need:

  1. Multiple servers
  2. Multiple consistant copies of the data
  3. Consistant data that can be accessed between multiple servers
  4. A way of automatically booting a 2nd instance on the standby server

Number 1 is as simple as it sounds - buy two identical servers.

Number 2 can be achieved by a replicating SAN (expensive, very fast, very reliable), or a replicated filesystem on each of the servers (cheap, speed and reliability can depend on your knowledge of the chosen technology).

Number 3 can be achieved by a SAN (one storage LUN, accessed by two servers), or a replicated file system (two seperate storage areas, each server can only see its own).

Number 4 can be achieved by a heartbeat application.

To do this with a small budget, let's say VMWare vSphere, you can use either a SAN or VMWare now offer a self-replicating storage appliance that offers two distinct data stores on two servers that can be used for high availability. vSphere also offers built-in heartbeats and high availability configurations.

To do this with no budget, you could go down the Xen path, and use DRBD to replicate the storage between the two nodes. Then you set up heartbeat to switch the active DRBD storage node and Xen instance to boot up the VM's on the 2nd host when the first goes down.

You won't get 5-nine's (99.999%) uptime using these basic recommendations, but you could easilly get 3-nines (99.9%) by using the cheapest methods if you know what you're doing.


You talk about "expense" in terms of "how much cash will this cost to buy" when discussing shared storage. That's a totally valid point of course, money's tight everywhere.

But if you're talking about High Availability then you you also need to ask "why do we want high availability?" and if the answer is, for example, "because the business turns over $2000 per hour in online sales, so if we're off for an hour then we've lost $2000" then the question of expense and affordability can become "Can we afford not to buy something that enables or greatly improves our high availability deployment?"

This is an important detail and it plays to your comment about budget - the IT 'tail' must not wag the business 'dog' by insisting on an overly complex and expensive solution to a small problem, but at the same time if the business has certain requirements of its IT infrastructure then it has to be prepared to either budget for them properly or to adjust its requirements.

I think virtualisation has a lot of potential in improving the availability of systems, but its not a magic wand. The hardware side of things, while important, is very much secondary to the software requirements - its no good having a SQL database cluster that falls over with no trouble in the event of one of the SQL servers crashing if the front-end application that talks to the database chokes because it can't handle the failover.

And two "highly available" servers sitting next to one another in a datacentre are still vulnerable to power failures, theft, etc. Again, depending on the answer to "why are we doing this?", you might need to consider this aspect quite carefully as it can add expense and complexity to quite a few parts of your project.


Without knowing which DB and application server you use I would recommend:

  • Use XEN >3.2 in PV mode for the VMs (just my personal favorite) - compartments or other lightwight virutalization solutions might fit as well (OpenVZ to name one).
  • Build four VM machines on each physical node
  • Use a local RAID 5 with SAS 3,5" disks - as many disks as locally possible (5 is good)
  • Use 15k RPM disks (your DBs will need it)
  • Use DRBD and OCFS2 to provide cheap "shared" storage, use a fast, secure, reliable local network for this connection (bonding direct interconnects is pretty fast and good).
  • Do the HA on application level
  • Use load-balancing between the pairs of machines, so you get 8 machines doing concurrent tasks

HA-Examples:

  • Application-Server: Use Tomcat in clustered active/active-mode
  • LVS: Use concurrent slave and master replication of lvs
  • Oracle-DB: Use RAC (I don`t know if there are equivalent solution for OpenSource DBs)

If you do HA on application layer that layer knows best how to replicate sessions. If one node goes down (planned or unplanned) the surviving node will take over - including sessions.