Distributed, Parallel, Fault-tolerant File System

Actually, I don't think there are that many realistic options. In order of preference my picks would be:

  1. Amazon S3. Meets all your requirements, and your optional qualities too. Has a very good track record of uptime and support. It is not in-house; but is that really not a requirement you could work around, f.x. using VPN access or just good old HTTPS... S3 would really be my first choice, if the WAN latency and Amazons pricing work for you. And if the pricing doesn't work for you, well, I doubt a DYI solution will really end up significantly less expensive...
  2. MogileFS seems to fit your requirements perfectly. There is not that much activity around MogileFS, but that's mostly because it's working as intended for its (relatively few) users.
  3. Lustre has really great technology behind it, is a regular local POSIX filesystem (if that is beneficial for you), and has been continuously updated over the years. The big question is whether the whole Sun - Oracle merger will impact Lustre. Long-term, if Sun plays its cards right, then having ZFS and Lustre under one roof could lead to very nice things... Right now, I think Lustre is mostly used in academic and commercial HPC initiatives and not in Internet applications -- this may be untrue, but if Lustre is doing well in Internet applications then they are sure not marketing that fact well...

Hadoop Distributed File System (HDFS) would not match your requirements IMHO. HDFS is awesome, but its bigtable-like approach mean its less accessible than the filesystems above. Of course, if you're really looking for massive scalability and a long-term perspective, then HDFS may be just right -- with Yahoo, Facebook and others invested in Hadoop's growth.

One comment, most of the above systems copy the whole file to 2-3 nodes to achieve redundancy. This takes up mcuh more space than parity encoding / RAID schemes, but it is manageable at scale, and it seems to be the solution everyone has taken. So you will not get the 75% efficiency that you mention...


If it were me, I would be using GlusterFS. The current release is pretty solid and I know people at some very large installations in both the HPC and Internet space that are relying on it in their production systems. You can basically tailor it to your needs by laying out the components as you need them. Unlike Lustre, there are no dedicated metadata servers so central points of failure are minimized, and it's easier to scale the setup.

Unfortunately I don't think there's an easy way to meet your 75% criteria without throwing performance down the drain.

It does run on commodity hardware, however the performance really shines when using Infiniband interconnect. Fortunately the price of IB is really quite low these days.

You might want to check out the guys at Scalable Informatics and their Jackrabbit products as a solution. They support GlusterFS on their hardware, and the price of their solution certainly rivals the cost of putting something together from scratch.