Ubuntu VM "read only file system" fix?

Although this is a relatively old question, the answer is still the same. You have a virtual machine (running on a physical host) and some sort of storage (either shared storage – a FC SAN, iSCSI storage, an NFS share – or local storage).

With virtualisation, many virtual machines try to access the same physical resources at the same time. Due to physical limitations (number of read/write operations – IOPS; throughput; latency) there might be a problem to satisfy all storage requests of all physical machines at the same time. What usually happens: you will be able to see "SCSI retries" and failed SCSI operations in the operating systems of your virtual machines. If you get too many errors/retries in a certain period of time, the kernel will set the mounted filesystems read-only in order to prevent damage to the filesystem.

To cut the long story short: Your physical storage is not "powerful" enough. There are too many processes (virtual machines) accessing the storage system at the same time, your virtual machines do not get the response from the storage fast enough, and the filesystem goes read-only.

There are not terribly many things you can do. The obvious solution is better/additional storage. You can also modify the parameters for SCSI timeouts in the Linux kernel. Details are described, e.g., in:

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1009465

http://www.cyberciti.biz/tips/vmware-esx-server-scsi-timeout-for-linux-guest.html

However, this will only "postpone" your problems, because the kernel only gets more time before the filesystem will be set read-only. (I.e., you do not solve the cause of the problem.)

My experience (several years with VMware) is that this problem only exists with Linux kernels (we're using RHEL and SLES) and not with Windows servers. Also, this problem occurs on all sorts of storage – FC, iSCSI, local storage. For us, the most critical (and expensive) component in our virtual infrastructure is storage. (We're now using HP LeftHand with 1 Gbps iSCSI connections, and have not had any storage issues ever since. We chose LeftHand (over traditional FC-solutions) for its scalability.


A likely explanation is that there is a hardware problem (partial disk failure), and that the kernel remounted the root filesystem as read-only as soon as it detected the problem, in order to minimize the problem. A more reliable¹ way to check current mount options is cat /proc/mounts (grep ' / ' /proc/mounts for the root filesystem, ignore a rootfs / … line which is an artefact of the boot process). You will presumably find that rw,errors=remount-ro has changed to ro (other options may be displayed in addition).

The kernel logs probably contain the message Remounting filesystem read-only, preceded by disk access errors. The logs normally live in /var/log/kern.log, however if this is on a now read-only filesystem the message will not show up there, though the preceding errors should. You can also see the latest few kernel errors with the dmesg command.

As an aside, under Ubuntu, the usual place for mount points (used by the desktop interface) is under /media (e.g. /media/cdrom0), though you can use /mnt or /mnt/cdrom if you like.

¹ mount reports from /etc/mtab. If the root filesystem is read-only, /etc/mtab can't be kept up-to-date.


What happened was, there was a power failure in the data center recently. Since then, I haven't touched my server. Once our data center loses power, VSphere makes Ubuntu's file system read only until it is restarted. I would have tried restarting but I didn't want all of the monitoring to go crazy. I have silenced Nagios (monitoring service) and everything is working fine now that I have restarted the system. Thanks for all of the input. It is much appreciated.