Sample Header Ad - 728x90

Failed Raid5 array with 5 drives - 2 drives removed

1 vote
0 answers
49 views
# Synopsis Healthy Raid-5 array had 1 drive removed, quickly reinserted and started rebuilding. Then a second drive was removed within 10 minutes. Original drive assignments (sda, sdb etc) have changed due to further user errors (rebooting/swapping drives). Need advice on next steps. # Backstory I am sorry this is so long, but here is the backstory if it helps ***************** My name is Mike. I am not a daily user of Linux, but I can work my way through things that I need to get done usually by doing quick searches to remind me of syntax and reading man pages. I thought I could figure this out with time (its been months), and I now realized this is something I am not comfortable doing without help since the data is invaluable to my friends family. He has no other backups of the data since his backup drive also failed and he did not realize it… He just assumed it was working. To start, this is a QNAP appliance that had a 5 drive raid 5 array using 8TB drives. He logged in and noticed that a drive was marked unhealthy due to bad blocks but it was still a member and the array was still working just fine, so he wanted to replace it with a new drive before it got worse. Unfortunately, he pulled out the wrong drive. He quickly realized it was the wrong drive and put it back in, and it started rebuilding on that drive (I saw that in the qnap logs). Without knowing any better, he pulled out the actual drive he wanted to replace within less than 10 minutes and put in a new drive. He noticed the array was offline and his data was inaccessible, so he put the origional drive back in and rebooted the QNAP hoping that would fix it. Obviously, it didn't. He then called, and I said we do not want to do anything until we backup the data that’s on all of the origional drives. He just so happened to have a few 12/18TB external drives that I used dd to clone the md /sdX3 partitions to (not all partitions - /sdX). (Exact commands I used + a note as to which external drive they are on) dd if=/dev/sda3 of=/share/external/DEV3302_1/2024022_170502-sda3.img (DST:18TB-1) dd if=/dev/sdf3 of=/share/DiskImages/2024022_164848-sdf3.img (DST:18TB-2) dd if=/dev/sdb3 of=/share/external/DEV3302_1/2024022_170502-sdb3.img (DST:18TB-1) dd if=/dev/sdg3 of=/share/external/DEV3305_1/2024022_170502-sdg3.img (DST:12TB) dd if=/dev/sdd3 of=/share/DiskImages/2024022_170502-sdd3_Spare.img (DST:18TB-2) These were just quick backups and due to the age of the drives (5+ years) we figured we would also replace all of the NAS drives with new ones. I then repeated this process with each of the drives, one by one except used a process/commands like this: Insert new drive in an empty slot, it got assigned sdh dd if=/dev/sda of=/dev/sdh Wait 14 hours for it to complete, remove the drive, replace with another new drive and repeat. dd if=/dev/sdb of=/dev/sdh Etc… So, we should have exact copies of the drives. I assumed (I think incorrectly) that we could power off the QNAP, swap the old drives out with the copied drives and then we could start trying commands like mdadm -CfR /dev/md1 --assume-clean -l 5 -n 5 -c 512 -e 1.0 /dev/sda3 /dev/sdb3 /dev/sdg3 missing /dev/sdd3 (I am not certain that command is correct even before the next paragraph) Unfortunatly, after swapping the drives we now have two missing drives instead of 1 and the assignments seemed to have changed (ex: sda is no longer sda). I figured I must have messed up a dd copy of a drive, so we were going to start the process over on the missing drive. I tracked which ones were showing/missing, we reinserted the original disks, however now they again have different assignments but it is back to showing only a single missing drive - I am lost. I might be able to figure out the original order by comparing the drive UUID's? But I do not want to touch anything before asking for advice. # Technical description Here is the output of the recommended commands that were supported on the QNAP. [QNAPUser@QNAP ~]$ uname -a Linux QNAP 5.10.60-qnap #1 SMP Mon Feb 19 12:14:12 CST 2024 x86_64 GNU/Linux [QNAPUser@QNAP ~]$ mdadm --version mdadm - v3.3.4 - 3rd August 2015 [QNAPUser@QNAP ~]$ smartctl --xall /dev/sda -sh: smartctl: command not found [QNAPUser@QNAP ~]$ sudo mdadm --examine /dev/sdb /dev/sdb: MBR Magic : aa55 Partition : 4294967295 sectors at 1 (type ee) [QNAPUser@QNAP ~]$ sudo mdadm --examine /dev/sdc /dev/sdc: MBR Magic : aa55 Partition : 4294967295 sectors at 1 (type ee) [QNAPUser@QNAP ~]$ sudo mdadm --examine /dev/sdd /dev/sdd: MBR Magic : aa55 Partition : 4294967295 sectors at 1 (type ee) [QNAPUser@QNAP ~]$ sudo mdadm --examine /dev/sde /dev/sde: MBR Magic : aa55 Partition : 4294967295 sectors at 1 (type ee) [QNAPUser@QNAP ~]$ sudo mdadm --examine /dev/sdf /dev/sdf: MBR Magic : aa55 Partition : 4294967295 sectors at 1 (type ee) [QNAPUser@QNAP ~]$ sudo mdadm --examine /dev/sdg /dev/sdg: MBR Magic : aa55 Partition : 4294967295 sectors at 1 (type ee) [QNAPUser@QNAP ~]$ sudo mdadm --examine /dev/sdh [QNAPUser@QNAP ~]$ sudo mdadm --examine /dev/sdb3 /dev/sdb3: Magic : a92b4efc Version : 1.0 Feature Map : 0x0 Array UUID : 29f7c4cf:b6273e81:34f3f156:1cd1cfe2 Name : 1 Creation Time : Thu Aug 17 13:28:50 2017 Raid Level : raid5 Raid Devices : 5 Avail Dev Size : 15608143240 (7442.54 GiB 7991.37 GB) Array Size : 31216285696 (29770.17 GiB 31965.48 GB) Used Dev Size : 15608142848 (7442.54 GiB 7991.37 GB) Super Offset : 15608143504 sectors Unused Space : before=0 sectors, after=648 sectors State : clean Device UUID : f49eadd1:661a76d3:6ed998ad:3a39f4a9 Update Time : Thu Feb 29 17:05:02 2024 Bad Block Log : 512 entries available at offset -8 sectors Checksum : d61a661f - correct Events : 89359 Layout : left-symmetric Chunk Size : 512K Device Role : Active device 0 Array State : AAAA. ('A' == active, '.' == missing, 'R' == replacing) [QNAPUser@QNAP ~]$ sudo mdadm --examine /dev/sdc3 /dev/sdc3: Magic : a92b4efc Version : 1.0 Feature Map : 0x0 Array UUID : 29f7c4cf:b6273e81:34f3f156:1cd1cfe2 Name : 1 Creation Time : Thu Aug 17 13:28:50 2017 Raid Level : raid5 Raid Devices : 5 Avail Dev Size : 15608143240 (7442.54 GiB 7991.37 GB) Array Size : 31216285696 (29770.17 GiB 31965.48 GB) Used Dev Size : 15608142848 (7442.54 GiB 7991.37 GB) Super Offset : 15608143504 sectors Unused Space : before=0 sectors, after=648 sectors State : clean Device UUID : b50fdcc1:3024551b:e56c1e38:8f9bc7f8 Update Time : Thu Feb 29 17:05:02 2024 Bad Block Log : 512 entries available at offset -8 sectors Checksum : e780d676 - correct Events : 89359 Layout : left-symmetric Chunk Size : 512K Device Role : Active device 1 Array State : AAAA. ('A' == active, '.' == missing, 'R' == replacing) [QNAPUser@QNAP ~]$ sudo mdadm --examine /dev/sde3 /dev/sde3: Magic : a92b4efc Version : 1.0 Feature Map : 0x0 Array UUID : 29f7c4cf:b6273e81:34f3f156:1cd1cfe2 Name : 1 Creation Time : Thu Aug 17 13:28:50 2017 Raid Level : raid5 Raid Devices : 5 Avail Dev Size : 15608143240 (7442.54 GiB 7991.37 GB) Array Size : 31216285696 (29770.17 GiB 31965.48 GB) Used Dev Size : 15608142848 (7442.54 GiB 7991.37 GB) Super Offset : 15608143504 sectors Unused Space : before=0 sectors, after=648 sectors State : clean Device UUID : ae2c3578:723041ba:f06efdb1:7df6cbb2 Update Time : Thu Feb 29 17:05:02 2024 Bad Block Log : 512 entries available at offset -8 sectors Checksum : 70a95caf - correct Events : 89359 Layout : left-symmetric Chunk Size : 512K Device Role : spare Array State : AAAA. ('A' == active, '.' == missing, 'R' == replacing) [QNAPUser@QNAP ~]$ sudo mdadm --examine /dev/sdg3 /dev/sdg3: Magic : a92b4efc Version : 1.0 Feature Map : 0x0 Array UUID : 29f7c4cf:b6273e81:34f3f156:1cd1cfe2 Name : 1 Creation Time : Thu Aug 17 13:28:50 2017 Raid Level : raid5 Raid Devices : 5 Avail Dev Size : 15608143240 (7442.54 GiB 7991.37 GB) Array Size : 31216285696 (29770.17 GiB 31965.48 GB) Used Dev Size : 15608142848 (7442.54 GiB 7991.37 GB) Super Offset : 15608143504 sectors Unused Space : before=0 sectors, after=648 sectors State : clean Device UUID : cf03e7e1:2ad22385:41793b2c:4f93666c Update Time : Thu Feb 29 16:38:38 2024 Bad Block Log : 512 entries available at offset -8 sectors Checksum : da1a5378 - correct Events : 80401 Layout : left-symmetric Chunk Size : 512K Device Role : Active device 3 Array State : AAAAA ('A' == active, '.' == missing, 'R' == replacing) [QNAPUser@QNAP ~]$ sudo mdadm --examine /dev/sdh3 /dev/sdh3: Magic : a92b4efc Version : 1.0 Feature Map : 0x0 Array UUID : 29f7c4cf:b6273e81:34f3f156:1cd1cfe2 Name : 1 Creation Time : Thu Aug 17 13:28:50 2017 Raid Level : raid5 Raid Devices : 5 Avail Dev Size : 15608143240 (7442.54 GiB 7991.37 GB) Array Size : 31216285696 (29770.17 GiB 31965.48 GB) Used Dev Size : 15608142848 (7442.54 GiB 7991.37 GB) Super Offset : 15608143504 sectors Unused Space : before=0 sectors, after=648 sectors State : clean Device UUID : a06d8a8d:965b58fe:360c43cd:e252a328 Update Time : Thu Feb 29 17:05:02 2024 Bad Block Log : 512 entries available at offset -8 sectors Checksum : 5b32c26d - correct Events : 89359 Layout : left-symmetric Chunk Size : 512K Device Role : Active device 2 Array State : AAAA. ('A' == active, '.' == missing, 'R' == replacing) [QNAPUser@QNAP ~]$ sudo mdadm --detail /dev/md1 (This is the array that is broken)) mdadm: cannot open /dev/md1: No such file or directory [QNAPUser@QNAP ~]$ git clone git://github.com/pturmel/lsdrv.git lsdrv -sh: git: command not found [QNAPUser@QNAP ~]$ cat /proc/mdstat Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath] md3 : active raid1 sdd3 17568371520 blocks super 1.0 [1/1] [U] md2 : active raid1 sdf3 7804071616 blocks super 1.0 [1/1] [U] md322 : active raid1 sdd5(S) sdf5(S) sde5(S) sdg5(S) sdh5(S) sdb5 sdc5 6702656 blocks super 1.0 [2/2] [UU] bitmap: 0/1 pages [0KB], 65536KB chunk md256 : active raid1 sdd2(S) sdf2(S) sde2(S) sdg2(S) sdh2(S) sdb2 sdc2 530112 blocks super 1.0 [2/2] [UU] bitmap: 0/1 pages [0KB], 65536KB chunk md13 : active raid1 sde4 sdg4 sdh4 sdb4 sdc4 sdf4 458880 blocks super 1.0 [24/6] [_UUUUUU_________________] bitmap: 1/1 pages [4KB], 65536KB chunk md9 : active raid1 sde1 sdg1 sdh1 sdb1 sdc1 sdf1 530048 blocks super 1.0 [24/6] [_UUUUUU_________________] bitmap: 1/1 pages [4KB], 65536KB chunk unused devices: [QNAPUser@QNAP ~]$ sudo md_checker Welcome to MD superblock checker (v2.0) - have a nice day~ Scanning system... RAID metadata found! UUID: 29f7c4cf:b6273e81:34f3f156:1cd1cfe2 Level: raid5 Devices: 5 Name: md1 Chunk Size: 512K md Version: 1.0 Creation Time: Aug 17 13:28:50 2017 Status: OFFLINE =============================================================================================== Enclosure | Port | Block Dev Name | # | Status | Last Update Time | Events | Array State =============================================================================================== NAS_HOST 8 /dev/sdb3 0 Active Feb 29 17:05:02 2024 89359 AAAA. NAS_HOST 7 /dev/sdc3 1 Active Feb 29 17:05:02 2024 89359 AAAA. NAS_HOST 9 /dev/sdh3 2 Active Feb 29 17:05:02 2024 89359 AAAA. NAS_HOST 10 /dev/sdg3 3 Active Feb 29 16:38:38 2024 80401 AAAAA ---------------------------------- 4 Missing ------------------------------------------- =============================================================================================== md_checker is a QNAP command, so you might not be familiar with it, but the output should be useful. Based on the output above (**specifically the Last Update Time and Events**), I believe that sdg3 was the first drive to be temporarily pulled from the array and was in the process of rebuilding when the second drive was pulled (now showing as "4 Missing"?) . I believe the second drive is now assigned to sde which is showing Device Role : spare. I am basing this on the the fact that the number of events and Last update time of sdb3, sdc3, sdh3 and sde3 are identical. My goal is to do a recovery using copies of the drives, not the original drives in case something happens to make the issue worse. We do not need the array to be "healthy" or writable since we just need to make a copy/backup of the data. What would be the best way to accomplish this? How can I be certain of the command and order to reassemble the array, and what is the least destructive way to assemble it? I would greatly appreciate any advice I can get, since I am just starting to confuse myself and possibly making the issue worse.
Asked by MikeD (11 rep)
Sep 3, 2024, 10:19 AM
Last activity: Sep 3, 2024, 10:21 AM