Shared storage options for ESXi HA cluster

General notes (stream of consciousness):

  • Think really hard about what you're trying to protect.
  • Nobody uses VMware Fault-Tolerance. Okay, maybe someone does, but there are too many restrictions, and the use case is particularly narrow.
  • Servers are more reliable than you expect, especially when working with quality systems like HP ProLiant. Supermicro would be another story...
  • Assess realistic failure modes. An HP ProLiant Gen9 server isn't just going to fail.
  • You may encounter individual component failures, but there are enough internal redundancies to deal with most issues gracefully.
    • Seriously, redundant power supplies, redundant fans, RAIDing of internal disks, the onboard NIC and FLR adapters rarely fail.
    • Add ILO monitoring, comprehensive hardware health checks, and the range of uptime-impacting items is reduced to DIMM failures and system board problems.

So now we come to shared storage. Shared storage becomes a point of failure, depending upon how it's architected.

  • Something like an MSA SAS-attached array is an option and can work with VMware and two hosts. You can buy them bare and add the requisite capacity.
  • A shared-nothing setup would be beneficial in some respects, but adds certain complexities.
  • There are Hyperconverged options like the VMware vSAN, the HPE StoreVirtual VSA or Starwind's Virtual SAN offering.
  • The HPE VSA may be free for up to 1TB of storage for your setup.
  • An entry-level SAN isn't that compelling considering your space requirements are incredibly low.
  • It's possible to go with single-headed storage... possibly even just a normal HP server with a storage OS of your choice (Linux exporting NFS, Windows Storage Server, etc.)
  • I've documented and outlined a ZFS solution for Linux that can provide dual-head failover and clustering for storage: See: https://github.com/ewwhite/zfs-ha
  • Another solution that can do shared-nothing with a pair of servers is Zetavault.
  • Couple that with Veeam VM-level replication or something array-based, and you've covered 99% of the potential storage issues.

But again, this is a function of your risk. People can easily go down the High Availability rabbit hole...

Dual Hypervisors hosts... okay. Then do you need dual switching fabrics? Stacked switches? Multi-chassis link-aggregation (MLAG/MC-LAG)? One SAN with dual-controllers? Two SANs? SAN replication? VM replication? VM replication to diverse storage?

Do you have power diversity? Multiple PDUs? Multiple UPS units? Is the site generator-backed?

So, what are you left with?

I think it's best to have some options. Maybe contract additional help for coverage. Document the solution well enough so that the customer has some options. Make a DR or system outage runbook/script.


If your company cannot withstand downtime for the users, VMware FT is your choice then. To implement this feature, you'll definetely need some kind of shared storage. For the case, I would recommend looking at software-defined storage (SDS) solutions that are increasingly being used for building virtualized infrasructures. With this approach, you can virtualize the local physical storage resources of your ESXi hosts and turn them into a fully-fledged virtual SAN. VMware VSAN springs immediately to mind, but I would point out some very interesting alternatives that should be much cheaper to implement at ESXi environment. The first candidate is HPE VSA: good level of functionality and an annoying requiremnt of a third voting node for a quorum. Yeah, I know, you can still go 2 nodes, but if you're not ok with downtime, the quorum is a must. The second candidate, on the contrary, has minimalistic hardware footprint with just two physsical hosts along with set of the features like caching, data compression etc. It is StarWind vSAN. The both solutions have free versions, just check and see how you would benefit from them.


The technology you would be best served by is "software defined storage". A VM that makes locally attached disks available to all VMs, ideally providing redundancy by allowing the use of local disks on multiple nodes at the same time (allowing you to lose a node without losing all your VMs). Since we're not talking about product recommendations, I'll leave it at this. It's still a nascent market, but there are some well established options that would fit the bill.