Straightforward Linux Clustering

There's the rocks linux distro which is made for clustering, and is based on CentOS/RHEL.

The strong point of rocks is that it'll for the most part manage and do a lot of the minutia for you.

  • It'll do automatic installation and reinstallation, and if your computers can boot via PXE, the initial install will consist of PXE booting your nodes. If you have a large number of compute-nodes, they use bittorrent internally for distributing packages, which removes a significant bottleneck for (re)installing the entire thing.
  • It'll give you a very homogeneous compute-environment by default.
  • By default it'll set up and use NFS internally, and there's options for using PVFS2 (which I haven't tried).
  • As for queueing/batch systems, it should set up and manage this for you, by default I think it uses SGE, there's also a roll (their software bundling format) for torque.
  • It'll ensure consistency in users/groups/etc. across your cluster
  • It'll graph resource utilization through ganglia

If I were to dig up downsides

  • Adding/removing software from the compute-nodes involves reinstalling them (although, it does ensure homogeneity).
  • Adding/removing software involves either adding a roll (their way of bundling rpms/appliances), or editing xml-files. However, it's fairly well documented so if you're willing to put some effort into reading the documentation you should be ok. Plus there's a mailing-list if you get stuck.
  • It's based on CentOS/RHEL, which is a little behind "bleeding edge"
  • It'll (mostly) force you to do things "their way", minor changes you might get away with maybe modifying some of the xml-config files, major changes might have to be implemented through making, adding or modifying rolls (their sw/addon format)

We have a small cluster that has openSUSE as its base distro, but I do not think it is too important. Ubuntu looks like a viable alternative and has quite a bit of documentation and community support. On top of linux, we run Sun Grid Engine (and our cluster even includes Mac OS machines pretty seamlessly), but slurm would probably work for a simple setup. We share home directories and /usr/local via NFS from a central server. It works just fine for us. More details are available on our website (via Internet Archive).