The Blog of Justin Loutsch

I don't know the question, but the answer is 42!

About

I'm Justin, and I live in Boston. I'm a huge geek into process automation and work reduction, and am also an editor at Eat Your Serial. Thanks for dropping by!

Recently a hard drive failed in our RAID 5 QNAP NAS array. The CTO suggested a low level format would make the drive usable again, and he was right. In my case it was a Seagate drive, so I used Seatools and it’s full erase feature. I found that Seatools wouldn’t run via USB, nor would it run on a spare box I had laying around, so it was easiest to burn Seatools to a CD and just plug the drive into my desktop. This took about 14 hours total so I let it run overnight.

Once I plugged the drive back into the QNAP, I expected that the RAID array would begin rebuilding. This was not the case, however, and is apparently a shortcoming of the QNAP system. Their software RAID can’t recover from dead disks apparently, but only ones incorrectly flagged as faulty.

I did a lot of googling to figure out how to fix this, and with a lot of help from this site figured out step 1.

The first step is clone the partition structure of the drives over to the newly formatted drive. To do this, first ssh into the QNAP server as the administrator.

Run fdisk -l (that’s a lowercase L) to get a listing of all disks in the system. My QNAP server has 4 drives, each labeled with a number over the drive bay. These numbers correspond to sda, sdb, sdc, and sdd. In my case, sdc failed, and after replacing it was still recognized as sdc.

Find one that’s complete, as this is what you’ll want to copy over to the new drive. The new drive will be the missing label in the fdisk output (sdc was missing in my case).

Next command: fdisk /dev/drive label (for me: fdisk/dev/sdc)

Type n and tap enter for a new partition. For partition type, enter p for primary. Enter 1 for the partition number, and 66 for the last cylinder size. Stick with the default value for first cylinder, but enter different values for the last cylinder of each partition.
Type n and tap enter for a new partition. For partition type, enter p for primary. Enter 2 for the partition number, and 132 for the last cylinder size.
Type n and tap enter for a new partition. For partition type, enter p for primary. Enter 3 for the partition number, and 243138 for the last cylinder size.
Type n and tap enter for a new partition. For partition type, enter p for primary. Enter 4 for the partition number, and 243200 for the last cylinder size.

Next mark the first partition of your drive as bootable:

Command (m for help): a Partition number (1-4): 1
Then change partition 2 to ‘Linux Swap / Solaris’ format Command (m for help): t Partition number (1-4): 2 Hex code (type L to list codes): 82 Changed system type of partition 2 to 82 (Linux swap / Solaris)

Finally, save the new partition table Command (m for help): w The partition table has been altered!

Ctrl-C to exit fdisk

Eject the new disk and reinsert

The site I linked to then says that all you have to do after this is eject the disk and reinsert it, and the rebuilding will begin. However this is not what happened for me.

The disk structure was cloned just fine, but the array was still running in degraded mode even after I restarted the NAS.

I had to begin the rebuild process manually, and actually ended up calling QNAP to find out exactly how to do this. They wanted to remote into my machine using teamviewer, but I declined and ask them to walk me through the commands (since I’m comfortable with the terminal, this was fine, but YMMV).

This command must also be run while SSH’ed into the QNAP server: mdadm -a /dev/md0 /dev/sdc3

sdc3 is the largest partition in the QNAP layout and the one that is part of the array, while the other smaller ones are for the OS this one contains the data stored in the array.

The rebuild process could take a few hours (I allowed mine to run overnight). To follow the progress of the rebuild, use this command: cat /proc/mdstat (note: this will not work directly from the command prompt upon SSHing into the QNAP. I had to run “cd ..” to go back to the root directory before it would work.

Once the rebuild is complete, reboot the NAS. Once it’s online again and you can log into the web GUI, you’ll see in the RAID management page that the status has gone from “degraded mode” to “ready” and you will see all 4 drives listed instead of just the 3 healthy ones.

One Response to “What to do when a RAID 5 drive fails in your QNAP NAS”

  1. Just a quick note to say thanks so much for posting this. The information got the array on my QNAP up and running again. QNAPs are great, but some of the things that you’d expect to work (like RAID rebuilds after failed disks) just plain don’t. Thankfully the required information is out there (or here in this case)!

    Graeme

Leave a Reply