How do I make my disk unmap pending unreadable sectors

A pending unreadable sector is one that returned a read error and which the drive has marked for remapping at the first possible opportunity. However, it can't do the remapping until one of two things happens:

  1. The sector is reread successfully
  2. The sector is rewritten

Until then, the sector remains pending. So you have two corresponding ways to deal with this:

  1. Keep trying to reread the sector until you succeed
  2. Overwrite that sector with new data

Obviously, (1) is non-destructive, so you should probably try it first, although keep in mind that if the drive is starting to fail in a serious way then continual reading from a bad area is likely to make it fail much more quickly. If you have a lot of pending sectors and other errors, and you care about the data on the drive, I recommend taking it out of service and using the excellent tool ddrescue to recover as much data as possible. Then discard the drive.

If the sector in question contains data you don't care about, or can restore from a backup, then overwriting it is probably the quickest and simplest solution. You can then view the reallocated and pending counts for the drive to make sure the sector was taken care of.

How do you find out what the sector corresponds to in the filesystem? I found an excellent article on the smartmontools web site, here, although it's fairly technical and is specific to ext2/3/4 and reiser file systems.

A simpler approach, which I used on one of my own (Mac) drives, is to use find / -xdev -type f -print0 | xargs -0 ... to read every file on the system. Make a note of the pending count before running this. If the sector is inside a file, you will get an error message from the tool you used to read the files (eg md5sum) showing you the path to it. You can then focus your attentions on re-reading just this file until it reads successfully. Often this will solve the problem, if it's an infrequently-used file which just needed to be reread a few times. If the error goes away, or you don't encounter any errors in reading all the files, check the pending count to see if it's decreased. If it has, the problem was solved by reading.

If the file cannot be read successfully after multiple tries (eg 20) then you need to overwrite the file, or the block within the file, to allow the drive to reallocate the sector. You can use ddrescue on the file (rather than the partition) to overwrite just the one sector, by copying to a temporary file and then copying back again. Note that just removing the file at this point is a bad idea, because the bad sector will go into the free list where it will be harder to find. Completely overwriting it is bad too, because again the sectors will go into the free list. You need to rewrite the existing blocks. The notrunc option of dd is one way to do this.

If you encounter no errors, and the pending count did not decrease, then the sector must be in the freelist or in part of the filesystem infrastructure (eg an inode table). You can try filling up all the free space with cat /dev/zero >tempfile, and then check the pending count. If it goes down, the problem was in the free list and has now gone away.

If the sector is in the infrastructure, you have a more serious problem, and you will probably encounter errors just walking the directory tree. In this situation, I think the only sensible solution is to reformat the drive, optionally using ddrescue to recover data if necessary.

Keep a very close eye on the drive. Sector reallocation is a very good canary in the coal mine, potentially giving you early warning of a drive that is failing. By taking early action you can prevent a later catastrophic and very painful landslide. I'm not suggesting that a few sector reallocations are an indication that you should discard the drive. All modern drives need to do some reallocation. However, if the drive isn't very old (< 1 yr) or you are getting frequent new reallocations (> 1/month) then I recommend you replace it asap.

I don't have empirical evidence to prove it, but my experience suggests that disk problems can be reduced by reading the whole disk once in a while, either by a dd of the raw disk or by reading every file using find. Almost all the disk problems I've experienced in the past several years have cropped up first in rarely-used files, or on machines that are not used much. This makes sense heuristically, too, in that if a sector is being reread frequently the drive has a chance to reallocate it when it first detects a minor problem with that sector rather than waiting until the sector is completely unreadable. The drive is powerless to do anything with a sector unless the host accesses it somehow, either by reading or writing it or by conducting one of the SMART tests.

I'd like to experiment with the idea of a nightly or weekly cron job that reads the whole disk. Currently I'm using a "poor man's RAID" in which I have a second hard drive in the machine and I back up the main disk to it every night. In some ways, this is actually better than RAID mirroring, because if I goof and delete a file by mistake I can get yesterday's version immediately from the backup disk. On the other hand, I believe a hardware RAID controller does a lot of good work in the background to monitor, report and fix disk problems as they emerge. My current backup script uses rsync to avoid copying data that hasn't changed, but in view of the need to reread all sectors maybe it would be better to copy everything, or to have a separate script that reads the entire raw disk every week.