Physically Identify the failed hard drive

Solution 1:

I had this exact problem on a (tower) server just like you explain, and it was easy:

smartctl will output the serial number of the drive

Vendors sometimes ship their own specific tools, like hdparm, that will do the same.

So output the serial of the bad drive, and then use a dentist's mirror and a flashlight to find the drive.

On a rackmount you'll usually have indicator lights like other people have said, but I bet the same would apply.

Solution 2:

Putting stickers on drives (depending on the design of the tray) may not be feasible. By the time the drive dies, the stickers could be dried up and fallen off.

ledctl (from package ledmon) is really the way to go with this.

ledctl locate=/dev/disk/by-id/[drive-id]

or

ledctl locate=/dev/sda

will illuminate the drive fail light on your chassis for the specified drive. I provided two examples to illustrate that it doesn't matter HOW you identify the drive. You can use serial, name, etc... Whatever information is available to you can be used. The drives are referenced multiple ways under the /dev/ and /dev/disk/ path.

To turn the light back off, just execute it again, changing locate to locate_off like so:

ledctl locate_off=/dev/sda

Solution 3:

Usually you would have to hope that the connections are labeled in some fashion then work from the identity of the failed device. For example...and someone would have to comment to correct me...if you have two IDE channels, you have up to 2 drives on each, you could have sda, sdb, sdc, and sdd. If sdd failed it would be the second drive on the cable of the second IDE channel.

If it's SATA and like the system I have in the back room the ports are labeled for each of the sata drives. Again, drive lettering goes from a through whatever the drives go up to, starting at port 0 of the SATA connectors and moving up.

If there are any manufacturing differences, the dmesg |grep sd or dmesg|grep hd should yield some clues.

If you have the serial numbers available I think the hdparm command might give it to you in software so you can trace it that way. You might want to label the drives somewhere if that's the case so you don't have to worry about that when you find there's an issue.

...I knew there was another reason I preferred hardware RAID over software RAID...blinky lights. Really like the blinky lights.

EDIT: smartctl, not hdparm, gives the serial number. My bad.


Solution 4:

Some drives expose a locate "file" in /sys into which you can echo a 1 for turning the locate indicator light on or 0 for off.

$ for light in $( find /sys -name "locate" ) ; do echo 1 > $light ; sleep 10 ; echo 0 > $light; done

Solution 5:

For short answer -- "lsscsi" For Detailed answer -- "lshw -c disk" will show you the HDD and SATA ports in which those connected.