Sample Header Ad - 728x90

Data Recovery from RAID 1 - Impossible to mount disk

2 votes
0 answers
107 views
I have an Nvidia DGXA100 station that I use for my research. It has started to shut down brutally (just a couple minutes after startup), probably a watercooling pump breaking down (that would be the second time in as many years). In any case, I'm on a deadline and I have important experimental data on it, and not enough time to extract all of it a couple 10s/100s of MBs at a time (before it shuts down and I wait ~1h for it to cool down naturally) -- we're talking maybe 100GB of data that I would like to transfer, total. So I have recovered the drives to try to extract the data from my laptop. The disks are supposed to be in RAID 1, I think it's hardware RAID but I'm not 100% sure. The station's OS is a fork of Ubuntu called DGXOS. Fiddling around with the drives, I felt stumped trying to extract the data, and so I reach out to you. No partitions are detected in /dev, only /dev/sda If I try sudo mount /dev/sda /mnt, I get
mount: /mnt: can't read superblock on /dev/sda.
       dmesg(1) may have more information after failed mount system call.
Using fdisk -l of df, the drive is not detected. testdisk does not detect the drive either. lsblk only outputs
NAME        MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
sda           8:0    0     0B  0 disk
Looking around this site some people adviced trying sudo mdadm --examine /dev/sda, which yields mdadm: No md superblock detected on /dev/sda. I tried looking through dmesg | grep sda, which yielded
sudo dmesg | grep sda
[160532.911871] sd 0:0:0:0: [sda] 30515200 512-byte logical blocks: (15.6 GB/14.6 GiB)
[160532.912828] sd 0:0:0:0: [sda] Write Protect is off
[160532.912838] sd 0:0:0:0: [sda] Mode Sense: 43 00 00 00
[160532.913729] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[160532.923731]  sda: sda1
[160532.923883] sd 0:0:0:0: [sda] Attached SCSI removable disk
[160533.329382] FAT-fs (sda1): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
[162600.133163] sda: detected capacity change from 30515200 to 0
[331437.482830] sd 0:0:0:0: [sda] Unit Not Ready
[331437.482842] sd 0:0:0:0: [sda] Sense Key : Hardware Error [current] 
[331437.482849] sd 0:0:0:0: [sda] ASC=0x44 >ASCQ=0x81 
[331437.483450] sd 0:0:0:0: [sda] Read Capacity(16) failed: Result: hostbyte=DID_OK driverbyte=DRIVER_OK
[331437.483454] sd 0:0:0:0: [sda] Sense Key : Hardware Error [current] 
[331437.483457] sd 0:0:0:0: [sda] ASC=0x44 >ASCQ=0x81 
[331437.484134] sd 0:0:0:0: [sda] Read Capacity(10) failed: Result: hostbyte=DID_OK driverbyte=DRIVER_OK
[331437.484138] sd 0:0:0:0: [sda] Sense Key : Hardware Error [current] 
[331437.484142] sd 0:0:0:0: [sda] ASC=0x44 >ASCQ=0x81 
[331437.484302] sd 0:0:0:0: [sda] 0 512-byte logical blocks: (0 B/0 B)
[331437.484305] sd 0:0:0:0: [sda] 0-byte physical blocks
[331437.484940] sd 0:0:0:0: [sda] Test WP failed, assume Write Enabled
[331437.485154] sd 0:0:0:0: [sda] Asking for cache data failed
[331437.485161] sd 0:0:0:0: [sda] Assuming drive cache: write through
[331437.485659] sd 0:0:0:0: [sda] Preferred minimum I/O size 4096 bytes not a multiple of physical block size (0 bytes)
[331437.485662] sd 0:0:0:0: [sda] Optimal transfer size 33553920 bytes not a multiple of physical block size (0 bytes)
[331437.486262] sd 0:0:0:0: [sda] Attached SCSI disk
[331519.863944] sd 0:0:0:0: [sda] Unit Not Ready
[331519.863953] sd 0:0:0:0: [sda] Sense Key : Hardware Error [current] 
[331519.863962] sd 0:0:0:0: [sda] ASC=0x44 >ASCQ=0x81 
[331519.864568] sd 0:0:0:0: [sda] Read Capacity(16) failed: Result: hostbyte=DID_OK driverbyte=DRIVER_OK
[331519.864574] sd 0:0:0:0: [sda] Sense Key : Hardware Error [current] 
[331519.864579] sd 0:0:0:0: [sda] ASC=0x44 >ASCQ=0x81 
[331519.865310] sd 0:0:0:0: [sda] Read Capacity(10) failed: Result: hostbyte=DID_OK driverbyte=DRIVER_OK
[331519.865314] sd 0:0:0:0: [sda] Sense Key : Hardware Error [current] 
[331519.865318] sd 0:0:0:0: [sda] ASC=0x44 >ASCQ=0x81 
[331519.865495] sd 0:0:0:0: [sda] 0 512-byte logical blocks: (0 B/0 B)
[331519.865499] sd 0:0:0:0: [sda] 0-byte physical blocks
[331519.866028] sd 0:0:0:0: [sda] Test WP failed, assume Write Enabled
[331519.866204] sd 0:0:0:0: [sda] Asking for cache data failed
[331519.866206] sd 0:0:0:0: [sda] Assuming drive cache: write through
[331519.866728] sd 0:0:0:0: [sda] Preferred minimum I/O size 4096 bytes not a multiple of physical block size (0 bytes)
[331519.866731] sd 0:0:0:0: [sda] Optimal transfer size 33553920 bytes not a multiple of physical block size (0 bytes)
[331519.867278] sd 0:0:0:0: [sda] Attached SCSI disk
[331991.085995] sd 0:0:0:0: [sda] Unit Not Ready
[331991.085999] sd 0:0:0:0: [sda] Sense Key : Hardware Error [current] 
[331991.086003] sd 0:0:0:0: [sda] ASC=0x44 >ASCQ=0x81 
[331991.086291] sd 0:0:0:0: [sda] Read Capacity(16) failed: Result: hostbyte=DID_OK driverbyte=DRIVER_OK
[331991.086294] sd 0:0:0:0: [sda] Sense Key : Hardware Error [current] 
[331991.086296] sd 0:0:0:0: [sda] ASC=0x44 >ASCQ=0x81 
[331991.086661] sd 0:0:0:0: [sda] Read Capacity(10) failed: Result: hostbyte=DID_OK driverbyte=DRIVER_OK
[331991.086664] sd 0:0:0:0: [sda] Sense Key : Hardware Error [current] 
[331991.086666] sd 0:0:0:0: [sda] ASC=0x44 >ASCQ=0x81 
[331991.086755] sd 0:0:0:0: [sda] 0 512-byte logical blocks: (0 B/0 B)
[331991.086757] sd 0:0:0:0: [sda] 0-byte physical blocks
[331991.087041] sd 0:0:0:0: [sda] Test WP failed, assume Write Enabled
[331991.087128] sd 0:0:0:0: [sda] Asking for cache data failed
[331991.087130] sd 0:0:0:0: [sda] Assuming drive cache: write through
[331991.087414] sd 0:0:0:0: [sda] Preferred minimum I/O size 4096 bytes not a multiple of physical block size (0 bytes)
[331991.087417] sd 0:0:0:0: [sda] Optimal transfer size 33553920 bytes not a multiple of physical block size (0 bytes)
[331991.088014] sd 0:0:0:0: [sda] Attached SCSI disk
                sda: rw=4096, sector=2, nr_sectors = 2 limit=0
[332162.048953] EXT4-fs (sda): unable to read superblock
[332816.960674] sd 0:0:0:0: [sda] Unit Not Ready
[332816.960690] sd 0:0:0:0: [sda] Sense Key : Hardware Error [current] 
[332816.960700] sd 0:0:0:0: [sda] ASC=0x44 >ASCQ=0x81 
[332816.961241] sd 0:0:0:0: [sda] Read Capacity(16) failed: Result: hostbyte=DID_OK driverbyte=DRIVER_OK
[332816.961255] sd 0:0:0:0: [sda] Sense Key : Hardware Error [current] 
[332816.961264] sd 0:0:0:0: [sda] ASC=0x44 >ASCQ=0x81 
[332816.961945] sd 0:0:0:0: [sda] Read Capacity(10) failed: Result: hostbyte=DID_OK driverbyte=DRIVER_OK
[332816.961955] sd 0:0:0:0: [sda] Sense Key : Hardware Error [current] 
[332816.961960] sd 0:0:0:0: [sda] ASC=0x44 >ASCQ=0x81 
[332823.835766] sd 0:0:0:0: [sda] Unit Not Ready
[332823.835783] sd 0:0:0:0: [sda] Sense Key : Hardware Error [current] 
[332823.835792] sd 0:0:0:0: [sda] ASC=0x44 >ASCQ=0x81 
[332823.836349] sd 0:0:0:0: [sda] Read Capacity(16) failed: Result: hostbyte=DID_OK driverbyte=DRIVER_OK
[332823.836359] sd 0:0:0:0: [sda] Sense Key : Hardware Error [current] 
[332823.836365] sd 0:0:0:0: [sda] ASC=0x44 >ASCQ=0x81 
[332823.836892] sd 0:0:0:0: [sda] Read Capacity(10) failed: Result: hostbyte=DID_OK driverbyte=DRIVER_OK
[332823.836896] sd 0:0:0:0: [sda] Sense Key : Hardware Error [current] 
[332823.836899] sd 0:0:0:0: [sda] ASC=0x44 >ASCQ=0x81 
[332930.787061] sd 0:0:0:0: [sda] Unit Not Ready
[332930.787077] sd 0:0:0:0: [sda] Sense Key : Hardware Error [current] 
[332930.787086] sd 0:0:0:0: [sda] ASC=0x44 >ASCQ=0x81 
[332930.788060] sd 0:0:0:0: [sda] Read Capacity(16) failed: Result: hostbyte=DID_OK driverbyte=DRIVER_OK
[332930.788071] sd 0:0:0:0: [sda] Sense Key : Hardware Error [current] 
[332930.788077] sd 0:0:0:0: [sda] ASC=0x44 >ASCQ=0x81 
[332930.788932] sd 0:0:0:0: [sda] Read Capacity(10) failed: Result: hostbyte=DID_OK driverbyte=DRIVER_OK
[332930.788937] sd 0:0:0:0: [sda] Sense Key : Hardware Error [current] 
[332930.788942] sd 0:0:0:0: [sda] ASC=0x44 >ASCQ=0x81 
[332930.789163] sd 0:0:0:0: [sda] 0 512-byte logical blocks: (0 B/0 B)
[332930.789166] sd 0:0:0:0: [sda] 0-byte physical blocks
[332930.789758] sd 0:0:0:0: [sda] Test WP failed, assume Write Enabled
[332930.789955] sd 0:0:0:0: [sda] Asking for cache data failed
[332930.789959] sd 0:0:0:0: [sda] Assuming drive cache: write through
[332930.790552] sd 0:0:0:0: [sda] Preferred minimum I/O size 4096 bytes not a multiple of physical block size (0 bytes)
[332930.790556] sd 0:0:0:0: [sda] Optimal transfer size 33553920 bytes not a multiple of physical block size (0 bytes)
[332930.791424] sd 0:0:0:0: [sda] Attached SCSI disk
                sda: rw=4096, sector=2, nr_sectors = 2 limit=0
[333074.300411] EXT4-fs (sda): unable to read superblock
Most of it didn't seem too helpful, besides maybe Volume was not properly unmounted. Some data may be corrupt. Please run fsck. So I ran sudo fsck /dev/sda, and the answer I got was (roughly translated):
fsck.ext2: invalid argument when trying to open /dev/sda

The superblock couldn't be read, or did not contain a correct ext2/ext3/ext4 filesystem.
If the peripheral is valid and truly contains an ext2/3/4 FS (and not a swapfs, ufs, or other), 
then then superblock is corrupted, and you could try running e2fsck with another block :
    e2fsck -b 8193 
 or
    e2fsck -b 32768
At this point I feel reluctant trying my luck with some e2fsck commands. I hope the Nvidia support has a solution for that, but they haven't answered me yet and probably won't during weekends. I should precise -- when I replace the drives in the server, it boots normally, besides the fact that it shuts down extremely quickly. So I would like to try and avoid possibly FS-breaking solutions, as the data is still "technically" intact. I haven't tried to plug a bootable key to do the data transfers, since I assume the problem would be identical, but maybe I will. I apologize for the lack of some details, the station was bought and used "as is", and Nvidia spins a lot of proprietary things in it that are extremely prone to breaking if you try to tweak them, so I didn't dig extremely deep into the station's exact configuration. Thanks a bunch for your time ! Frost Edit : User @frostschutz suggested that I look a little more closely at the dmesg output, this is what I got when plugging in, if it is any help:
[397566.232752] usb 2-2: new SuperSpeed USB device number 5 using xhci_hcd
[397566.246802] usb 2-2: New USB device found, idVendor=152d, idProduct=0578, bcdDevice= 5.08
[397566.246818] usb 2-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[397566.246823] usb 2-2: Product: USB
[397566.246827] usb 2-2: Manufacturer: jmicron
[397566.246831] usb 2-2: SerialNumber: 0000000000080
[397566.249988] scsi host0: uas
[397566.250953] scsi 0:0:0:0: Direct-Access     USB      3.0              0508 PQ: 0 ANSI: 6
[397566.254890] sd 0:0:0:0: Attached scsi generic sg0 type 0
[397572.213020] sd 0:0:0:0: [sda] Unit Not Ready
[397572.213036] sd 0:0:0:0: [sda] Sense Key : Hardware Error [current] 
[397572.213047] sd 0:0:0:0: [sda] ASC=0x44 >ASCQ=0x81 
[397572.214115] sd 0:0:0:0: [sda] Read Capacity(16) failed: Result: hostbyte=DID_OK driverbyte=DRIVER_OK
[397572.214127] sd 0:0:0:0: [sda] Sense Key : Hardware Error [current] 
[397572.214133] sd 0:0:0:0: [sda] ASC=0x44 >ASCQ=0x81 
[397572.215017] sd 0:0:0:0: [sda] Read Capacity(10) failed: Result: hostbyte=DID_OK driverbyte=DRIVER_OK
[397572.215027] sd 0:0:0:0: [sda] Sense Key : Hardware Error [current] 
[397572.215033] sd 0:0:0:0: [sda] ASC=0x44 >ASCQ=0x81 
[397572.215227] sd 0:0:0:0: [sda] 0 512-byte logical blocks: (0 B/0 B)
[397572.215231] sd 0:0:0:0: [sda] 0-byte physical blocks
[397572.215768] sd 0:0:0:0: [sda] Test WP failed, assume Write Enabled
[397572.215931] sd 0:0:0:0: [sda] Asking for cache data failed
[397572.215933] sd 0:0:0:0: [sda] Assuming drive cache: write through
[397572.216484] sd 0:0:0:0: [sda] Preferred minimum I/O size 4096 bytes not a multiple of physical block size (0 bytes)
[397572.216488] sd 0:0:0:0: [sda] Optimal transfer size 33553920 bytes not a multiple of physical block size (0 bytes)
[397572.217257] sd 0:0:0:0: [sda] Attached SCSI disk
Asked by Frost (21 rep)
Sep 14, 2024, 04:44 PM
Last activity: Sep 15, 2024, 08:34 AM