Unix & Linux Stack Exchange

Q&A for users of Linux, FreeBSD and other Unix-like operating systems

Latest Questions

0 votes

0 answers

29 views

How to recover data from unbootable Acer PC drive on Ubuntu?

My ACER PC would not start, so I removed the disk and tried to access it on another PC in Ubuntu 24.04.2, but got this error msg: >Error mounting /dev/sdc4 at /media/ubuntu/ACER:wrong fs type,bad option,bad superblock on dev/sdc4, missing codepage or helper program, or other error. Is there a way to...

                                  My ACER PC would not start, so I removed the disk and tried to access it on another PC in Ubuntu 24.04.2, but got this error msg: 
>Error mounting /dev/sdc4 at /media/ubuntu/ACER:wrong fs type,bad option,bad superblock on dev/sdc4, missing codepage or helper program, or other error.

Is there a way to access this disk ?

Sonja Levorsen (1 rep)

Jul 30, 2025, 01:16 PM • Last activity: Jul 30, 2025, 01:28 PM

1 votes

2 answers

52 views

Does badblock "fix" blocks on an SD card?

ext4 sd-card badblocks

I have an old 16GB SD card that started giving I/O errors, known it's gone bad, I dumped all the content to an image file to see what I could restore, and the Disk manager alerted me that 16.1MB were unreadable and replaced with zero. To do some post-mortem autopsy, I ran `badblocks -n /dev/sdx` (re...

                                  I have an old 16GB SD card that started giving I/O errors, known it's gone bad, I dumped all the content to an image file to see what I could restore, and the Disk manager alerted me that 16.1MB were unreadable and replaced with zero.

To do some post-mortem autopsy, I ran badblocks -n /dev/sdx (read only analysis). After ~18 hours, it reported 309 blocks as bad. To do some other testing, I erased just the partition table (it was two partitions, FAT32 + ext4) and ran badblocks -w /dev/sdx (read/write analysis). To my surprise, this time it took <6 hours and reported no bad blocks. I ran the read only analysis once again and also reported very quickly no bad sector.

How is that possible? I was under the impression that badblocks checks for _physical_ blocks damage, and that in a flash device, if a block is reported as "bad", it's because the flash chip already ran out of space space to replace damaged blocks, but since removing the partition seemingly fixed the issues, it may look like they were some software corruption.

After running badblocks these 3 times, I created a new partition table and a new partition, and filling it with data from /dev/zero reported no I/O error whatsoever.

From what I can see, these are the possible explanations:
* badblocks checks for FS bad blocks and my FS was corrupted, not the SD card flash itself
* badblocks somewhat marked these blocks that are now automatically ignored by the hardware controller and/or the FS
* the SD card noticed the bad memory area while badblocks was finding them and now replaced them with fresh ones. If so, why didn't it do when I got I/O errors from reading the files?

Which one is the more likely?
                                

Mauro F. (133 rep)

Jul 24, 2025, 10:40 AM • Last activity: Jul 24, 2025, 11:51 AM

2 votes

2 answers

6045 views

ls: reading directory '.': Input/output error

ntfs input xubuntu badblocks

xubuntu, NTFS. When I trying to open directory with files: username@username-hp:/media/username/dir1/dir2/music$ la ls: reading directory '.': Input/output error Also if I open the directory with the thunar, there is empty inside, but if I open the directory by console it works: username@username-hp...

                                  xubuntu, NTFS.
When I trying to open directory with files:

    username@username-hp:/media/username/dir1/dir2/music$ la
    ls: reading directory '.': Input/output error

Also if I open the directory with the thunar, there is empty inside, but if I open the directory by console it works:

    username@username-hp:/media/username/dir1/dir2/music$ cd KORN
    username@username-hp:/media/username/dir1/dir2/music/KORN$ ls -alhis
    total 92K
    12484 4,0K drwxrwxrwx 1 username username 4,0K jan 29  2013 .
     7386  88K drwxrwxrwx 1 username username  88K jan  7 17:53 ..
    12485    0 drwxrwxrwx 1 username username    0 jan 29  2013 1994 - Korn

I have access almost to all files in the directory. For example torrent and media player programs have a troubles of several files in the 'music' directory and I can't access to these files from console:

    username@username-hp:/media/username/dir1/dir2/music$ cd InternalDirectory
    bash: cd: InternalDirectory: No such file or directory

Aaaaa, help!
                                

MEAT ROLL (21 rep)

May 21, 2018, 07:25 PM • Last activity: Jun 9, 2025, 05:04 PM

2 votes

1 answers

1897 views

How to check integrity to HDD with UDF format

partition hard-disk fdisk badblocks udf

I'm having problem with a USB external drive, which seem to be formatted with UDF and was being used for MAC and Windows (or at least that is what told). When I attached the HDD to my Linux system the next are the `dmesg` entries: [21784.312960] usb 2-1.2: new high-speed USB device number 5 using eh...

                                  I'm having problem with a USB external drive, which seem to be  formatted with UDF and was being used for MAC and Windows (or at least that is what told). 


When I attached the HDD to my Linux system the next are the dmesg entries:


    [21784.312960] usb 2-1.2: new high-speed USB device number 5 using ehci-pci
    **[21784.406283] usb 2-1.2: New USB device found, idVendor=1058, idProduct=1023**
    [21784.406291] usb 2-1.2: New USB device strings: Mfr=1,Product=2,SerialNumber=3
    [21784.406296] usb 2-1.2: Product: Elements 1023
    **[21784.406299] usb 2-1.2: Manufacturer: Western Digital**
    [21784.406303] usb 2-1.2: SerialNumber: 
    [21784.406815] scsi8 : usb-storage 2-1.2:1.0
    [21785.403470] scsi 8:0:0:0: Direct-Access     WD       Elements 1023    2005 PQ: 0 ANSI: 4
    [21785.404686] sd 8:0:0:0: Attached scsi generic sg2 type 0
    [21785.409491] sd 8:0:0:0: [sdb] 1953519616 512-byte logical blocks: (1.00 TB/931 GiB)
    [21785.410605] sd 8:0:0:0: [sdb] Test WP failed, assume Write Enabled
    [21785.411723] sd 8:0:0:0: [sdb] Asking for cache data failed
    [21785.411729] sd 8:0:0:0: [sdb] Assuming drive cache: write through
    [21785.413600] sd 8:0:0:0: [sdb] Test WP failed, assume Write Enabled
    [21785.414603] sd 8:0:0:0: [sdb] Asking for cache data failed
    [21785.414609] sd 8:0:0:0: [sdb] Assuming drive cache: write through
    **[21785.449997]  sdb: sdb1 **
    [21785.452466] sd 8:0:0:0: [sdb] Test WP failed, assume Write Enabled
    [21785.453503] sd 8:0:0:0: [sdb] Asking for cache data failed
    [21785.453515] sd 8:0:0:0: [sdb] Assuming drive cache: write through
    [21785.453524] sd 8:0:0:0: [sdb] Attached SCSI disk

With that information the HDD seem to be OK. However I'm unable to identify the type of partition (some commands from Windows console show is UDF but I can't confirm with Linux counterparts)

Trying to get more information, the fdisk -l outputs:

    Disk /dev/sdb: 1000.2 GB, 1000202043392 bytes
    255 heads, 63 sectors/track, 121600 cylinders, total 1953519616 sectors
    Units = sectors of 1 * 512 = 512 bytes
    Sector size (logical/physical): 512 bytes / 512 bytes
    I/O size (minimum/optimal): 512 bytes / 512 bytes
    Disk identifier: 0x0bba88f0
    
       Device Boot      Start         End      Blocks   Id  System
    /dev/sdb1               0  1953519615   976759808    5  Extended
    fdisk: unable to read /dev/sdb1: Inappropriate ioctl for device

As the partition doesn't mount, tools like testdisk and fsck can't work here (well testdisk and photorec just freeze trying to read the HDD). And due the size of the disk the badblocks command takes a lot of time (and is still running).

I can't find  many material about the UDF format (and I don't understand why a HDD use it when is ) and neither what to do when is unable to read any partition at all. 

Any suggestion?

                                

RFuentess (21 rep)

Jan 17, 2014, 06:03 AM • Last activity: May 4, 2025, 11:05 AM

6 votes

1 answers

1442 views

Interpreting the output of badblocks: when is it time to replace the MicroSD card?

sd-card badblocks

## Context I am running a Raspberry Pi Zero with a Micro SD as it is designed.  However, for this specific application, I cannot use a read-only system as I usually do with a Raspberry Pi. ## Objective Keep an eye on the health state of the Micro SD (and eventually get a notification...

                                  ## Context
I am running a Raspberry Pi Zero with a Micro SD as it is designed. 
However, for this specific application,
I cannot use a read-only system as I usually do with a Raspberry Pi.

## Objective
Keep an eye on the health state of the Micro SD (and eventually get a notification somehow that it is time to replace the Micro SD).

## Questions
1. What is the best way to monitor the health state of the Micro SD (no fsck as I cannot unmount the Micro SD while the system is running)?
2. Is the following proposed approach the best solution in terms of practicality and effectiveness?

## Proposed approach
I thought of using the output of badblocks to keep an eye on the state
of the Micro SD, and eventually replacing it when it is time. 
But when is it time to replace the Micro SD? 
How many bad blocks are too many? 
Should I look at write or read errors, or both?

I made the following consideration: Micro SD cards automagically reallocate data outside of bad blocks. So if the Micro SD is 80% full, there are 20% of blocks that can "go bad" and still keep the Micro SD running. Adding a little bit of a confidence interval, can we say that if badblocks outputs a number of bad blocks that is below, let's say, 50% of the free-space blocks, it is still safe to use the SD card?

To clarify:
- Total number of blocks: 100
- Free space (in blocks): 20
- Maximum number of blocks that are acceptable to be corrupted (or whatever): 10
                                

giovi321 (919 rep)

Apr 6, 2024, 12:35 PM • Last activity: Apr 8, 2024, 03:48 PM

7 votes

3 answers

16069 views

What is the "-rescue" function of Clonezilla and when should I use it?

filesystems backup badblocks clonezilla

When running [Clonezilla][1] on a NTFS-drive with bad sectors the cloning is interrupted and I get a suggestion from Clonezilla to use the `-rescue` option to save as much as possible from the damaged drive. What does the `-rescue` option do? How do I use it? When should I use it? [1]: http://clonez...

                                  When running Clonezilla  on a NTFS-drive with bad sectors the cloning is interrupted and I get a suggestion from Clonezilla to use the -rescue option to save as much as possible from the damaged drive.

What does the -rescue option do? How do I use it? When should I use it?

PetaspeedBeaver (1398 rep)

Oct 2, 2015, 06:56 PM • Last activity: Mar 25, 2024, 04:16 PM

24 votes

5 answers

25365 views

Linux - Repairing bad blocks on a RAID1 array with GPT

software-raid badblocks smart

The tl;dr: how would I go about fixing a bad block on 1 disk in a RAID1 array? But please read this whole thing for what I've tried already and possible errors in my methods. I've tried to be as detailed as possible, and I'm really hoping for some feedback This is my situation: I have two 2TB disks...

                                  The tl;dr: how would I go about fixing a bad block on 1 disk in a RAID1 array?

But please read this whole thing for what I've tried already and possible errors in my methods. I've tried to be as detailed as possible, and I'm really hoping for some feedback

This is my situation: I have two 2TB disks (same model) set up in a RAID1 array managed by mdadm. About 6 months ago I noticed the first bad block when SMART reported it. Today I noticed more, and am now trying to fix it.

This HOWTO page  seems to be the one article everyone links to to fix bad blocks that SMART is reporting. It's a great page, full of info, however it is fairly outdated and doesn't address my particular setup. Here is how my config is different:

* Instead of one disk, I'm using two disks in a RAID1 array. One disk is reporting errors while the other is fine. The HOWTO is written with only one disk in mind, which bring up various questions such as 'do I use this command on the disk device or the RAID device'?
* I'm using GPT, which fdisk does not support. I've been using gdisk instead, and I'm hoping that it is giving me the same info that I need

So, lets get down to it. This is what I have done, however it doesn't seem to be working. Please feel free to double check my calculations and method for errors. The disk reporting errors is /dev/sda:

    # smartctl -l selftest /dev/sda
    smartctl 5.42 2011-10-20 r3458 [x86_64-linux-3.4.4-2-ARCH] (local build)
    Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net 
    
    === START OF READ SMART DATA SECTION ===
    SMART Self-test log structure revision number 1
    Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
    # 1  Short offline       Completed: read failure       90%     12169         3212761936

With this, we gather that the error resides on LBA 3212761936. Following the HOWTO, I use gdisk to find the start sector to be used later in determining the block number (as I cannot use fdisk since it does not support GPT):

    # gdisk -l /dev/sda
    GPT fdisk (gdisk) version 0.8.5
    
    Partition table scan:
      MBR: protective
      BSD: not present
      APM: not present
      GPT: present
    
    Found valid GPT with protective MBR; using GPT.
    Disk /dev/sda: 3907029168 sectors, 1.8 TiB
    Logical sector size: 512 bytes
    Disk identifier (GUID): CFB87C67-1993-4517-8301-76E16BBEA901
    Partition table holds up to 128 entries
    First usable sector is 34, last usable sector is 3907029134
    Partitions will be aligned on 2048-sector boundaries
    Total free space is 2014 sectors (1007.0 KiB)
    
    Number  Start (sector)    End (sector)  Size       Code  Name
       1            2048      3907029134   1.8 TiB     FD00  Linux RAID

Using tunefs I find the blocksize to be 4096. Using this info and the calculuation from the HOWTO, I conclude that the block in question is ((3212761936 - 2048) * 512) / 4096 = 401594986.

The HOWTO then directs me to debugfs to see if the block is in use (I use the RAID device as it needs an EXT filesystem, this was one of the commands that confused me as I did not, at first, know if I should use /dev/sda or /dev/md0):

    # debugfs
    debugfs 1.42.4 (12-June-2012)
    debugfs:  open /dev/md0
    debugfs:  testb 401594986
    Block 401594986 not in use

So block 401594986 is empty space, I should be able to write over it without problems. Before writing to it, though, I try to make sure that it, indeed, cannot be read:

    # dd if=/dev/sda1 of=/dev/null bs=4096 count=1 seek=401594986
    1+0 records in
    1+0 records out
    4096 bytes (4.1 kB) copied, 0.000198887 s, 20.6 MB/s

If the block could not be read, I wouldn't expect this to work. However, it does. I repeat using /dev/sda, /dev/sda1, /dev/sdb, /dev/sdb1, /dev/md0, and +-5 to the block number to search around the bad block. It all works. I shrug my shoulders and go ahead and commit the write and sync (I use /dev/md0 because I figured modifying one disk and not the other might cause issues, this way both disks overwrite the bad block):

    # dd if=/dev/zero of=/dev/md0 bs=4096 count=1 seek=401594986
    1+0 records in
    1+0 records out
    4096 bytes (4.1 kB) copied, 0.000142366 s, 28.8 MB/s
    # sync 

I would expect that writing to the bad block would have the disks reassign the block to a good one, however running another SMART test shows differently:

    # 1  Short offline       Completed: read failure       90%     12170         3212761936

Back to square 1. So basically, how would I fix a bad block on 1 disk in a RAID1 array? I'm sure I've not done something correctly...

Thanks for your time and patience.


----------


EDIT 1:
-------

I've tried to run an long SMART test, with the same LBA returning as bad (the only difference is it reports 30% remaining rather than 90%):

    SMART Self-test log structure revision number 1
    Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
    # 1  Extended offline    Completed: read failure       30%     12180         3212761936
    # 2  Short offline       Completed: read failure       90%     12170         3212761936

I've also used badblocks with the following output. The output is strange and seems to be miss-formatted, but I tried to test the numbers outputed as blocks but debugfs gives an error

    # badblocks -sv /dev/sda
    Checking blocks 0 to 1953514583
    Checking for bad blocks (read-only test): 1606380968ne, 3:57:08 elapsed. (0/0/0 errors)
    1606380969ne, 3:57:39 elapsed. (1/0/0 errors)
    1606380970ne, 3:58:11 elapsed. (2/0/0 errors)
    1606380971ne, 3:58:43 elapsed. (3/0/0 errors)
    done
    Pass completed, 4 bad blocks found. (4/0/0 errors)
    # debugfs
    debugfs 1.42.4 (12-June-2012)
    debugfs:  open /dev/md0
    debugfs:  testb 1606380968
    Illegal block number passed to ext2fs_test_block_bitmap #1606380968 for block bitmap for /dev/md0
    Block 1606380968 not in use

Not sure where to go from here. badblocks definitely found something, but I'm not sure what to do with the information presented...


----------

EDIT 2
------

More commands and info.

I feel like an idiot forgetting to include this originally. This is SMART values for /dev/sda. I have 1 Current_Pending_Sector, and 0 Offline_Uncorrectable.

    SMART Attributes Data Structure revision number: 16
    Vendor Specific SMART Attributes with Thresholds:
    ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
      1 Raw_Read_Error_Rate     0x002f   100   100   051    Pre-fail  Always       -       166
      2 Throughput_Performance  0x0026   055   055   000    Old_age   Always       -       18345
      3 Spin_Up_Time            0x0023   084   068   025    Pre-fail  Always       -       5078
      4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       75
      5 Reallocated_Sector_Ct   0x0033   252   252   010    Pre-fail  Always       -       0
      7 Seek_Error_Rate         0x002e   252   252   051    Old_age   Always       -       0
      8 Seek_Time_Performance   0x0024   252   252   015    Old_age   Offline      -       0
      9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       12224
     10 Spin_Retry_Count        0x0032   252   252   051    Old_age   Always       -       0
     11 Calibration_Retry_Count 0x0032   252   252   000    Old_age   Always       -       0
     12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       75
    181 Program_Fail_Cnt_Total  0x0022   100   100   000    Old_age   Always       -       1646911
    191 G-Sense_Error_Rate      0x0022   100   100   000    Old_age   Always       -       12
    192 Power-Off_Retract_Count 0x0022   252   252   000    Old_age   Always       -       0
    194 Temperature_Celsius     0x0002   064   059   000    Old_age   Always       -       36 (Min/Max 22/41)
    195 Hardware_ECC_Recovered  0x003a   100   100   000    Old_age   Always       -       0
    196 Reallocated_Event_Count 0x0032   252   252   000    Old_age   Always       -       0
    197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       1
    198 Offline_Uncorrectable   0x0030   252   100   000    Old_age   Offline      -       0
    199 UDMA_CRC_Error_Count    0x0036   200   200   000    Old_age   Always       -       0
    200 Multi_Zone_Error_Rate   0x002a   100   100   000    Old_age   Always       -       30
    223 Load_Retry_Count        0x0032   252   252   000    Old_age   Always       -       0
    225 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       77

    # mdadm -D /dev/md0
    /dev/md0:
            Version : 1.2
      Creation Time : Thu May  5 06:30:21 2011
         Raid Level : raid1
         Array Size : 1953512383 (1863.01 GiB 2000.40 GB)
      Used Dev Size : 1953512383 (1863.01 GiB 2000.40 GB)
       Raid Devices : 2
      Total Devices : 2
        Persistence : Superblock is persistent
    
        Update Time : Tue Jul  3 22:15:51 2012
              State : clean
     Active Devices : 2
    Working Devices : 2
     Failed Devices : 0
      Spare Devices : 0
    
               Name : server:0  (local to host server)
               UUID : e7ebaefd:e05c9d6e:3b558391:9b131afb
             Events : 67889
    
        Number   Major   Minor   RaidDevice State
           2       8        1        0      active sync   /dev/sda1
           1       8       17        1      active sync   /dev/sdb1

As per one of the answers: it would seem I did switch seek and skip for dd. I was using seek as that's what is used with the HOWTO. Using this command causes dd to hang: 
    # dd if=/dev/sda1 of=/dev/null bs=4096 count=1 skip=401594986

Using blocks around that one (..84, ..85, ..87, ..88) seems to work just fine, and using /dev/sdb1 with block 401594986 reads just fine as well (as expected as that disk passed SMART testing). Now, the question that I have is: When writing over this area to reassign the blocks, do I use /dev/sda1 or /dev/md0? I don't want to cause any issues with the RAID array by writing directly to one disk and not having the other disk update.

EDIT 3
------
Writing to the block directly produced filesystem errors. I've chosen an answer that solved the problem quickly:


    # 1  Short offline       Completed without error       00%     14211         -
    # 2  Extended offline    Completed: read failure       30%     12244         3212761936

Thanks to everyone who helped. =)





                                

blitzmann (405 rep)

Jul 3, 2012, 10:24 PM • Last activity: Feb 8, 2024, 02:00 PM

2 votes

4 answers

11392 views

badblocks utility keeps reporting "invalid last block"

linux badblocks

I'm trying to run `badblocks` on a drive with a single partition. The drive contains a FreeBSD file system on it. I boot up using a Linux live USB drive. The drive is unmounted. The output of `fdisk -l` is: Device Boot Start End Id System /dev/sda1 * 63 976773167+ a5 FreeBSD So I run: # badblocks -v...

                                  I'm trying to run badblocks on a drive with a single partition.  The drive contains a FreeBSD file system on it.

I boot up using a Linux live USB drive.  The drive is unmounted.  The output of fdisk -l is:

       Device  Boot     Start          End    Id     System
    /dev/sda1     *        63   976773167+    a5     FreeBSD

So I run:

    # badblocks -v /dev/sda1

And it says:

    badblocks: invalid last block - /dev/sda1

I can't find any useful information about this.  Am I using the badblocks utility correctly here?  Or is this an indication that something is wrong with the drive?

Siler (153 rep)

Oct 15, 2014, 06:19 PM • Last activity: Dec 13, 2023, 06:31 AM

7 votes

2 answers

4848 views

Print list of bad blocks in NAND flash from user space

linux kernel flash-memory badblocks

Is there any user space tool that can retrieve and dump the list of bad blocks in a NAND flash device? I've checked the `mtdinfo` command line utility, and also searched `/proc` and `/sys`, but couldn't find anything. I am looking for something suitable for use from a shell script. I could parse `dm...

                                  Is there any user space tool that can retrieve and dump the list of bad blocks in a NAND flash device? I've checked the mtdinfo command line utility, and also searched /proc and /sys, but couldn't find anything.

I am looking for something suitable for use from a shell script.

I could parse dmesg as the kernel prints bad block information on init, but I am hoping there will be a better way.

Grodriguez (1006 rep)

Feb 24, 2015, 03:44 PM • Last activity: Nov 24, 2023, 07:20 AM

21 votes

4 answers

46830 views

How can I check for bad blocks on an LVM physical volume?

lvm badblocks

When you're using ext4, you can check for badblocks with the command `e2fsck -c /dev/sda1 # or whatever`. This will "blacklist" the blocks by adding them to the bad block inode. What is the equivalent of this for an LVM2 physical volume? The filesystem on it is ext4, but presumably, the bad blocks t...

                                  When you're using ext4, you can check for badblocks with the command e2fsck -c /dev/sda1 # or whatever. This will "blacklist" the blocks by adding them to the bad block inode.

What is the equivalent of this for an LVM2 physical volume? The filesystem on it is ext4, but presumably, the bad blocks that are detected will become invalid as the underlying LVM setup moves data around on the physical disk.

In other words, how can I check for bad blocks to not use in LVM?

strugee (15371 rep)

Sep 23, 2013, 08:19 PM • Last activity: Oct 13, 2023, 10:02 PM

3 votes

1 answers

1135 views

I/O errors, but after running badblocks everything works again : how is that possible?

io smartctl badblocks

**TLDR;** HDD seemed damaged. Unable to format partition (`mkfs.ext4` I/O errors), even with a newly created GPT table. SMART test shows some errors. I was about to throw the disk away. Before that, out of curiosity, I ran a full `badblocks` test. Big surprise : it didn't detect any bad blocks ! Wen...

**TLDR;** HDD seemed damaged. Unable to format partition (mkfs.ext4 I/O errors), even with a newly created GPT table. SMART test shows some errors. I was about to throw the disk away. Before that, out of curiosity, I ran a full badblocks test. Big surprise : it didn't detect any bad blocks ! Went back to GParted, created a GPT table + a few partitions. Everything works fine now ! What did badblocks do ? **The full story** I am trying to make sense of what just happened : I was about to throw a HDD away because I was unable to create partitions on it, and SMART showed some errors. Before throwing the disk away I just wanted to play a little with badblocks, and ... big surprise : badblocks seemed to have repaired my disk ! I didn't even know that it could do that ! So I am happy now, I can indeed use my disk, it works fine, but I am still trying to figure out what just happened ! It's a 4TB Seagate HDD that I hadn't used in a few years. I plugged it in a SATA ↔ USB adapter (adapter works fine, I use it with several other HDDs). Wirh GParted I created a new GPT partition table, and then a partition. It was unable to proceed to the end, there was a mkfs.ext4 I/O error :

(...)
Allocating group tables: done
Writing inode tables: done
Creating journal (131072 blocks): done
Writing superblocks and filesystem accounting information: 0/895
mke2fs 1.46.2 (28-Feb-2021)
mkfs.ext4: Input/output error while writing out and closing file system

I tried several times, with different USB adapters, different USB cables, different USB ports. Never worked. I then did a SMART short test :

# smartctl -t short -C /dev/sde
(...)

# smartctl -a /dev/sde
(...)
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short captive       Completed: read failure       90%       528         191105024
(...)

Obviously the HDD seems defect, right ? So I was about to throw it away, but did a badblocks test before :

# badblocks -wvs -t random -b 4096 /dev/sde
Checking for bad blocks in read-write mode
From block 0 to 976754645
Testing with random pattern: done                                                 
Reading and comparing: done                                                 
Pass completed, 0 bad blocks found. (0/0/0 errors)

The test lasted about 19 hours (4TB disk), it didn't show any errors. I was very surprised ! Back to GParted, created a new GPT table, some partitions, everything went smooth. I ended up doing some copy tests I am used to do, in order to check the disk's performances, and everything seems normal (155MB/s R/W when copying big files). Also did another SMART short test, it completed without error this time :

# smartctl -t short -C /dev/sde
(...)

# smartctl -a /dev/sde
(...)
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short captive       Completed without error       00%       549         -
# 2  Short captive       Completed: read failure       90%       528         191105024
(...)

Can someone make sense of that ? It's as if running badblocks somehow repaired my HDD. How is that possible ? Is badblocks even supposed to do that ? Note : more info is available if needed (full SMART output and full GParted results)

ChennyStar (1969 rep)

Sep 13, 2023, 01:15 PM • Last activity: Sep 13, 2023, 01:22 PM

1 votes

1 answers

1182 views

SSD: `badblocks` / `e2fsck -c` vs reallocated/remapped sectors

ssd badblocks

The `badblocks` utility allows one to find bad blocks on a device, and `e2fsck -c` allows one to add such bad blocks to the bad block inode so that they will not be used for actual data. But for SSD, it is known that bad sectors are normally reallocated (remapped) transparently by the drive (however...

                                  The badblocks utility allows one to find bad blocks on a device, and e2fsck -c allows one to add such bad blocks to the bad block inode so that they will not be used for actual data. But for SSD, it is known that bad sectors are normally reallocated (remapped) transparently by the drive (however, only when a write occurs). So, does it make any sense to use badblocks / e2fsck -c on a SSD?

I suppose that
  * badblocks alone can make sense to get information on the health of the SSD, e.g. by considering the total number of bad blocks (I don't know whether smartctl from smartmontools can do the same thing... perhaps with a long test smartctl -t long, but I haven't seen any clear documentation);
  * it should be discouraged to use e2fsck -c (which adds bad blocks to the bad block inode), because due to the possible reallocation, the associated numbers (logical addresses?) may become obsolete.

But there isn't any warning about the case of SSD in the man pages of these utilities. So I'm wondering...
                                

vinc17 (12504 rep)

Sep 11, 2023, 12:58 PM • Last activity: Sep 11, 2023, 01:37 PM

0 votes

1 answers

897 views

New HDD: Number of Mechanical Start Failures: 6

linux hard-disk smartctl badblocks

I have bought a new HDD. Toshiba P300 2TB; HDWD120UZSVA Data sheet: https://www.toshiba-storage.com/products/toshiba-internal-hard-drives-p300/ To be sure, that this drive is in good health and will remain in good health for a long time, I decided to test it first, before using it. I have made this:...

                                  I have bought a new HDD. Toshiba P300 2TB; HDWD120UZSVA

Data sheet: https://www.toshiba-storage.com/products/toshiba-internal-hard-drives-p300/ 

To be sure, that this drive is in good health and will remain in good health for a long time, I decided to test it first, before using it.

I have made this:

0. the HDD is empty
1. sudo smartctl -t long /dev/sdx
2. sudo badblocks -wvs /dev/sdx
3. I have copied data of about 188 GiB to it
4. I have read some data

Now Smartctl output looks like this:

    smartctl 7.1 2019-12-30 r5022 [x86_64-linux-4.19.0-22-amd64] (local build)
    Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org
    
    === START OF INFORMATION SECTION ===
    Model Family:     Toshiba P300
    Device Model:     TOSHIBA HDWD120
    Serial Number:    22MM8L6AS
    LU WWN Device Id: 5 000039 fdbf309cc
    Firmware Version: MX4OACF0
    User Capacity:    2,000,398,934,016 bytes [2.00 TB]
    Sector Sizes:     512 bytes logical, 4096 bytes physical
    Rotation Rate:    7200 rpm
    Form Factor:      3.5 inches
    Device is:        In smartctl database [for details use: -P show]
    ATA Version is:   ATA8-ACS T13/1699-D revision 4
    SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
    Local Time is:    Sun Oct 30 19:04:09 2022 CET
    SMART support is: Available - device has SMART capability.
    SMART support is: Enabled
    AAM feature is:   Unavailable
    APM feature is:   Disabled
    Rd look-ahead is: Enabled
    Write cache is:   Enabled
    DSN feature is:   Unavailable
    ATA Security is:  Disabled, NOT FROZEN [SEC1]
    
    === START OF READ SMART DATA SECTION ===
    SMART overall-health self-assessment test result: PASSED
    
    General SMART Values:
    Offline data collection status:  (0x84)	Offline data collection activity
    					was suspended by an interrupting command from host.
    					Auto Offline Data Collection: Enabled.
    Self-test execution status:      (   0)	The previous self-test routine completed
    					without error or no self-test has ever 
    					been run.
    Total time to complete Offline 
    data collection: 		(14439) seconds.
    Offline data collection
    capabilities: 			 (0x5b) SMART execute Offline immediate.
    					Auto Offline data collection on/off support.
    					Suspend Offline collection upon new
    					command.
    					Offline surface scan supported.
    					Self-test supported.
    					No Conveyance Self-test supported.
    					Selective Self-test supported.
    SMART capabilities:            (0x0003)	Saves SMART data before entering
    					power-saving mode.
    					Supports SMART auto save timer.
    Error logging capability:        (0x01)	Error logging supported.
    					General Purpose Logging supported.
    Short self-test routine 
    recommended polling time: 	 (   1) minutes.
    Extended self-test routine
    recommended polling time: 	 ( 241) minutes.
    SCT capabilities: 	       (0x003d)	SCT Status supported.
    					SCT Error Recovery Control supported.
    					SCT Feature Control supported.
    					SCT Data Table supported.
    
    SMART Attributes Data Structure revision number: 16
    Vendor Specific SMART Attributes with Thresholds:
    ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
      1 Raw_Read_Error_Rate     PO-R--   100   100   016    -    0
      2 Throughput_Performance  P-S---   139   139   054    -    70
      3 Spin_Up_Time            POS---   128   128   024    -    296 (Average 297)
      4 Start_Stop_Count        -O--C-   100   100   000    -    18
      5 Reallocated_Sector_Ct   PO--CK   100   100   005    -    0
      7 Seek_Error_Rate         PO-R--   100   100   067    -    0
      8 Seek_Time_Performance   P-S---   124   124   020    -    33
      9 Power_On_Hours          -O--C-   100   100   000    -    49
     10 Spin_Retry_Count        PO--C-   100   100   060    -    0
     12 Power_Cycle_Count       -O--CK   100   100   000    -    18
    192 Power-Off_Retract_Count -O--CK   100   100   000    -    22
    193 Load_Cycle_Count        -O--C-   100   100   000    -    22
    194 Temperature_Celsius     -O----   171   171   000    -    35 (Min/Max 21/42)
    196 Reallocated_Event_Count -O--CK   100   100   000    -    0
    197 Current_Pending_Sector  -O---K   100   100   000    -    0
    198 Offline_Uncorrectable   ---R--   100   100   000    -    0
    199 UDMA_CRC_Error_Count    -O-R--   200   200   000    -    0
                                ||||||_ K auto-keep
                                |||||__ C event count
                                ||||___ R error rate
                                |||____ S speed/performance
                                ||_____ O updated online
                                |______ P prefailure warning
    
    General Purpose Log Directory Version 1
    SMART           Log Directory Version 1 [multi-sector log support]
    Address    Access  R/W   Size  Description
    0x00       GPL,SL  R/O      1  Log Directory
    0x01           SL  R/O      1  Summary SMART error log
    0x03       GPL     R/O      1  Ext. Comprehensive SMART error log
    0x04       GPL     R/O      7  Device Statistics log
    0x06           SL  R/O      1  SMART self-test log
    0x07       GPL     R/O      1  Extended self-test log
    0x08       GPL     R/O      2  Power Conditions log
    0x09           SL  R/W      1  Selective self-test log
    0x10       GPL     R/O      1  NCQ Command Error log
    0x11       GPL     R/O      1  SATA Phy Event Counters log
    0x20       GPL     R/O      1  Streaming performance log [OBS-8]
    0x21       GPL     R/O      1  Write stream error log
    0x22       GPL     R/O      1  Read stream error log
    0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log
    0xe0       GPL,SL  R/W      1  SCT Command/Status
    0xe1       GPL,SL  R/W      1  SCT Data Transfer
    
    SMART Extended Comprehensive Error Log Version: 1 (1 sectors)
    No Errors Logged
    
    SMART Extended Self-test Log Version: 1 (1 sectors)
    Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
    # 1  Extended offline    Completed without error       00%         5         -
    
    SMART Selective self-test log data structure revision number 1
     SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
        1        0        0  Not_testing
        2        0        0  Not_testing
        3        0        0  Not_testing
        4        0        0  Not_testing
        5        0        0  Not_testing
    Selective self-test flags (0x0):
      After scanning selected spans, do NOT read-scan remainder of disk.
    If Selective self-test is pending on power-up, resume after 0 minute delay.
    
    SCT Status Version:                  3
    SCT Version (vendor specific):       256 (0x0100)
    Device State:                        SMART Off-line Data Collection executing in background (4)
    Current Temperature:                    35 Celsius
    Power Cycle Min/Max Temperature:     29/35 Celsius
    Lifetime    Min/Max Temperature:     21/42 Celsius
    Under/Over Temperature Limit Count:   0/0
    
    SCT Temperature History Version:     2
    Temperature Sampling Period:         1 minute
    Temperature Logging Interval:        1 minute
    Min/Max recommended Temperature:      0/60 Celsius
    Min/Max Temperature Limit:           -40/70 Celsius
    Temperature History Size (Index):    128 (20)
    
    Index    Estimated Time   Temperature Celsius
      21    2022-10-30 16:57    36  *****************
      22    2022-10-30 16:58    36  *****************
      23    2022-10-30 16:59    36  *****************
      24    2022-10-30 17:00    37  ******************
     ...    ..(  8 skipped).    ..  ******************
      33    2022-10-30 17:09    37  ******************
      34    2022-10-30 17:10    38  *******************
      35    2022-10-30 17:11    38  *******************
      36    2022-10-30 17:12    37  ******************
      37    2022-10-30 17:13    38  *******************
     ...    ..(  2 skipped).    ..  *******************
      40    2022-10-30 17:16    38  *******************
      41    2022-10-30 17:17    37  ******************
      42    2022-10-30 17:18    37  ******************
      43    2022-10-30 17:19    38  *******************
     ...    ..(  5 skipped).    ..  *******************
      49    2022-10-30 17:25    38  *******************
      50    2022-10-30 17:26    39  ********************
     ...    ..( 13 skipped).    ..  ********************
      64    2022-10-30 17:40    39  ********************
      65    2022-10-30 17:41    38  *******************
      66    2022-10-30 17:42    39  ********************
     ...    ..(  2 skipped).    ..  ********************
      69    2022-10-30 17:45    39  ********************
      70    2022-10-30 17:46    38  *******************
     ...    ..(  4 skipped).    ..  *******************
      75    2022-10-30 17:51    38  *******************
      76    2022-10-30 17:52    39  ********************
      77    2022-10-30 17:53    38  *******************
     ...    ..( 17 skipped).    ..  *******************
      95    2022-10-30 18:11    38  *******************
      96    2022-10-30 18:12    39  ********************
      97    2022-10-30 18:13    39  ********************
      98    2022-10-30 18:14    38  *******************
      99    2022-10-30 18:15    39  ********************
     100    2022-10-30 18:16    38  *******************
     101    2022-10-30 18:17    39  ********************
     102    2022-10-30 18:18    38  *******************
     103    2022-10-30 18:19    39  ********************
     104    2022-10-30 18:20    38  *******************
     105    2022-10-30 18:21    39  ********************
     106    2022-10-30 18:22    39  ********************
     107    2022-10-30 18:23    38  *******************
     ...    ..(  7 skipped).    ..  *******************
     115    2022-10-30 18:31    38  *******************
     116    2022-10-30 18:32     ?  -
     117    2022-10-30 18:33    30  ***********
     ...    ..(  2 skipped).    ..  ***********
     120    2022-10-30 18:36    30  ***********
     121    2022-10-30 18:37    31  ************
     ...    ..(  3 skipped).    ..  ************
     125    2022-10-30 18:41    31  ************
     126    2022-10-30 18:42    32  *************
     ...    ..(  3 skipped).    ..  *************
       2    2022-10-30 18:46    32  *************
       3    2022-10-30 18:47    33  **************
     ...    ..(  3 skipped).    ..  **************
       7    2022-10-30 18:51    33  **************
       8    2022-10-30 18:52    34  ***************
     ...    ..(  3 skipped).    ..  ***************
      12    2022-10-30 18:56    34  ***************
      13    2022-10-30 18:57    35  ****************
     ...    ..(  6 skipped).    ..  ****************
      20    2022-10-30 19:04    35  ****************
    
    SCT Error Recovery Control:
               Read: Disabled
              Write: Disabled
    
    Device Statistics (GP Log 0x04)
    Page  Offset Size        Value Flags Description
    0x01  =====  =               =  ===  == General Statistics (rev 1) ==
    0x01  0x008  4              18  ---  Lifetime Power-On Resets
    0x01  0x010  4              49  ---  Power-on Hours
    0x01  0x018  6     16155878806  ---  Logical Sectors Written
    0x01  0x020  6       122590468  ---  Number of Write Commands
    0x01  0x028  6     15677407836  ---  Logical Sectors Read
    0x01  0x030  6       122505183  ---  Number of Read Commands
    0x03  =====  =               =  ===  == Rotating Media Statistics (rev 1) ==
    0x03  0x008  4              49  ---  Spindle Motor Power-on Hours
    0x03  0x010  4              49  ---  Head Flying Hours
    0x03  0x018  4              22  ---  Head Load Events
    0x03  0x020  4               0  ---  Number of Reallocated Logical Sectors
    0x03  0x028  4               0  ---  Read Recovery Attempts
    0x03  0x030  4               6  ---  Number of Mechanical Start Failures
    0x04  =====  =               =  ===  == General Errors Statistics (rev 1) ==
    0x04  0x008  4               0  ---  Number of Reported Uncorrectable Errors
    0x04  0x010  4               4  ---  Resets Between Cmd Acceptance and Completion
    0x05  =====  =               =  ===  == Temperature Statistics (rev 1) ==
    0x05  0x008  1              35  ---  Current Temperature
    0x05  0x010  1              37  N--  Average Short Term Temperature
    0x05  0x018  1               -  N--  Average Long Term Temperature
    0x05  0x020  1              42  ---  Highest Temperature
    0x05  0x028  1              21  ---  Lowest Temperature
    0x05  0x030  1              40  N--  Highest Average Short Term Temperature
    0x05  0x038  1              25  N--  Lowest Average Short Term Temperature
    0x05  0x040  1               -  N--  Highest Average Long Term Temperature
    0x05  0x048  1               -  N--  Lowest Average Long Term Temperature
    0x05  0x050  4               0  ---  Time in Over-Temperature
    0x05  0x058  1              60  ---  Specified Maximum Operating Temperature
    0x05  0x060  4               0  ---  Time in Under-Temperature
    0x05  0x068  1               0  ---  Specified Minimum Operating Temperature
    0x06  =====  =               =  ===  == Transport Statistics (rev 1) ==
    0x06  0x008  4              65  ---  Number of Hardware Resets
    0x06  0x010  4              37  ---  Number of ASR Events
    0x06  0x018  4               0  ---  Number of Interface CRC Errors
                                    |||_ C monitored condition met
                                    ||__ D supports DSN
                                    |___ N normalized value
    
    SATA Phy Event Counters (GP Log 0x11)
    ID      Size     Value  Description
    0x0001  2            0  Command failed due to ICRC error
    0x0002  2            0  R_ERR response for data FIS
    0x0003  2            0  R_ERR response for device-to-host data FIS
    0x0004  2            0  R_ERR response for host-to-device data FIS
    0x0005  2            0  R_ERR response for non-data FIS
    0x0006  2            0  R_ERR response for device-to-host non-data FIS
    0x0007  2            0  R_ERR response for host-to-device non-data FIS
    0x0009  2           16  Transition from drive PhyRdy to drive PhyNRdy
    0x000a  2            0  Device-to-host register FISes sent due to a COMRESET
    0x000b  2            0  CRC errors within host-to-device FIS
    0x000d  2            0  Non-CRC errors within host-to-device FIS

I am a bit shocked about the line

    0x03  0x030  4               6  ---  Number of Mechanical Start Failures

Is this normal for a new HDD? Or should I send it back?
                                

Wogehu (123 rep)

Oct 30, 2022, 06:25 PM • Last activity: Aug 4, 2023, 07:28 PM

7 votes

3 answers

7825 views

Are there error resilient filesystems for Linux?

linux filesystems storage corruption badblocks

Windows [has a "Resilient FileSystem"][1] since Windows 8. Are there similarly resilient filesystems for Linux? What I expect from such a filesystem is that a bad block won't screw up either files or the journal. I'm no FS geek, so please explain if such error-resilience is unfit for a desktop/CPU i...

                                  Windows has a "Resilient FileSystem"  since Windows 8.

Are there similarly resilient filesystems for Linux? What I expect from such a filesystem is that a bad block won't screw up either files or the journal. I'm no FS geek, so please explain if such error-resilience is unfit for a desktop/CPU intensive/memory intensive/lowers the HDD's lifespan/is already in some FS like Ext4/etc.

**Is there something like this available for Linux?**

Camilo Martin (769 rep)

Dec 28, 2011, 10:25 AM • Last activity: Jun 12, 2023, 03:18 PM

0 votes

0 answers

1018 views

How to resolve dd error reading?

hard-disk dd badblocks e2fsck

I have dd error reading # sudo dd if=/dev/sda1 of=backup.img bs=4M status=progress 16478679040 bytes (16 GB, 15 GiB) copied, 154 s, 107 MB/s dd: error reading '/dev/sda1': Input/output error 3928+1 records in 3928+1 records out 16478679040 bytes (16 GB, 15 GiB) copied, 156.816 s, 105 MB/s I thought...

                                  I have dd error reading

    # sudo dd if=/dev/sda1 of=backup.img bs=4M status=progress
    16478679040 bytes (16 GB, 15 GiB) copied, 154 s, 107 MB/s
    dd: error reading '/dev/sda1': Input/output error
    3928+1 records in
    3928+1 records out
    16478679040 bytes (16 GB, 15 GiB) copied, 156.816 s, 105 MB/s

I thought this probably means disk has bad blocks. I ran

    # e2fsck -cfpv -C 0 /dev/sda1
    /dev/sda1: Updating bad block inode.
    
          700895 inodes used (3.66%, out of 19144704)
            2596 non-contiguous files (0.4%)
            1226 non-contiguous directories (0.2%)
                 # of inodes with ind/dind/tind blocks: 0/0/0
                 Extent depth histogram: 638434/974/3
         9995951 blocks used (13.05%, out of 76571648)
               1 bad block
               3 large files
    
          515535 regular files
          117354 directories
              57 character device files
              26 block device files
               0 fifos
             284 links
           67853 symbolic links (61332 fast symbolic links)
              61 sockets
    ------------
          701170 files

It apparently fixed not what important for dd. How to fix what is needed?

                                

Dims (3425 rep)

May 27, 2023, 07:37 AM

1 votes

0 answers

216 views

badblocks stuck in a loop?

hard-disk corruption badblocks

I'm using a program called badblocks to scan disks for errors, and I'm finding that it sometimes appear to get stuck in a loop. I'm using in read/write mode and by default it tests four patterns. Here's an example: ```badblocks -w -s /dev/sdf -b 4096``` ```badblocks -w -s /dev/sdf -b 4096 Testing wi...

-w -s /dev/sdf -b 4096

-w -s /dev/sdf -b 4096
Testing with pattern 0xaa: done                                                 
Reading and comparing: done                                                 
Testing with pattern 0x55: done                                                 
Reading and comparing: done                                                 
Testing with pattern 0xff: done                                                 
Reading and comparing: done                                                 
Testing with pattern 0x00: done                                                 
Reading and comparing: 1.79% done, 23:39:29 elapsed. (0/0/0 errors)

Okay, looks good, except that this is now the third time it's done the final pattern (0x00). When it reaches 100% it just resets the % counter and starts over. This job is not currently running, but I'm now running bad blocks on another machine with another disk, and I've noticed that it's repeating the second pattern. Any ideas? In both cases no errors are reported (so far).

jyoung (131 rep)

Feb 14, 2023, 08:22 PM • Last activity: Feb 14, 2023, 08:50 PM

4 votes

3 answers

1357 views

Is there any way to prevent a bad drive from disappearing from /dev?

partition devices udev block-device badblocks

I'm trying to recover a partition on a bad drive with ddrescue. I run: $ sudo ddrescue -r -1 -v /dev/sdd3 OUT.img dd_rescue_logfile and it seems to do great for a while, but after about an hour, the "current rate" drops to zero because the drive disappeared from /dev. To bring the drive back, the on...

                                  I'm trying to recover a partition on a bad drive with ddrescue. I run:

    $ sudo ddrescue -r -1 -v /dev/sdd3 OUT.img dd_rescue_logfile

and it seems to do great for a while, but after about an hour, the "current rate" drops to zero because the drive disappeared from /dev. To bring the drive back, the only thing I can think of is to reboot the system and run the ddrescue command to resume from where I left off. This makes it very difficult to run the program, as I can't just leave it and forget it for a few days - I have to constantly monitor it to make sure the disk didn't disappear. I have seen this behavior on both Arch linux, and Fedora 22.

I assume that at some point, the kernel has trouble accessing the drive and removes it from /dev. Is there any way to avoid this? To tell the kernel to keep the device there even if it looks like it's broken or non-existent?

Tal (2252 rep)

Jul 25, 2015, 07:20 AM • Last activity: Dec 29, 2022, 05:40 PM

2 votes

2 answers

846 views

badblocks cannot find any badblocks

badblocks

I have an external USB WD disk (with physical and logical block size 512), which reports self test fail in smart with certain LBA. I tried to use badblocks to locate all the failed sectors/blocks, but it always finished the test with "Pass completed, 0 bad blocks found (0/0/0 errors)" regardless usi...

                                  I have an external USB WD disk (with physical and logical block size 512), which reports self test fail in smart with certain LBA.  I tried to use badblocks to locate all the failed sectors/blocks, but it always finished the test with "Pass completed, 0 bad blocks found (0/0/0 errors)" regardless using or not using -w option.

Does it mean that badblocks may miss some error?

Jin HUANG (21 rep)

Jun 19, 2018, 06:29 AM • Last activity: Oct 11, 2022, 01:36 PM

1 votes

0 answers

169 views

ZFS On Linux - recover lacked MBR after physical problem

dd zfs mbr smartctl badblocks

We had a physical problem in our backup server (we suppose an electrical shock due to the thunderstorm). The system is Linux Debian on a disk and the storage in on a two 4 TB disks RAID-1 ZFS (ZFS On Linux) pool. The first symptom we discovered was a frozen system. After we had multiple erratic boot...

                                  We had a physical problem in our backup server (we suppose an electrical shock due to the thunderstorm). The system is Linux Debian on a disk and the storage in on a two 4 TB disks RAID-1 ZFS (ZFS On Linux) pool. The first symptom we discovered was a frozen system. After we had multiple erratic boot we could not go beyond the BIOS. So we moved the system disk on another computer which booted without problem and seemed stable but when we tried to move the ZFS storage in it we discovered that only one disk was detected as a ZFS part pool but could be loaded/mounted with zpool and data were there (lsblk -f simply indicated that the other disk is not partitioned). After several tests to load the second disk the first one showed us that it was no more loadable and was detected as unpartioned.

Note : the commands and the results tested are given below

So we tried to test the healty of the two disks with the SMART tools smartctl but nothing wrong was returned, the disks seemed operational. So we tried to read data with dd with success because no read error was returned. So we tried badblocks which indicated too that everything was ok. Finally we tried gpart which for the moment discovered a possible partition Windows NT/W2K empty but the process is not terminated as the disks are big.

For now the only problem observed is the lack of the MBR but we do not found a tool to recover a ZFS MBR. How to do this ?

Additionally as we have an outdated external clone disk (every month we replace one disk by another which 'resilver" itself so we can externalize the replaced disk), we asked to ourselves if we can copy its MBR to replace the one on the faulty disks. We are not sure if the MBR are exactly the same on a ZFS pool part disk and its mirror or if the differences spring after the MBR execution. If it possible to clone it how to do this with dd ?

---
# The tests and the results

    root@CZ-LIVE:~# lsblk -o NAME,SIZE,FSTYPE
    NAME     SIZE FSTYPE
    ...
    sda      3,6T 
    ...

=> no ZFS Filesystem detected

    root@CZ-LIVE:~# smartctl -t long /dev/sda
    smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-2-amd64] (local build)
    Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

    === START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
    Sending command: "Execute SMART Extended self-test routine immediately in off-line mode".
    Drive command "Execute SMART Extended self-test routine immediately in off-line mode" successful.
    Testing has begun.
    Please wait 54 minutes for test to complete.
    Test will complete after Fri Sep 9 17:24:14 2022 UTC
    Use smartctl -X to abort test.

    root@CZ-LIVE:~# smartctl -l selftest /dev/sda
    smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-2-amd64] (local build)
    Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

    === START OF READ SMART DATA SECTION ===
    SMART Self-test log structure revision number 1
    Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
    # 1 Extended offline Completed without error 00% 4660 -

    root@CZ-LIVE:~# smartctl -A /dev/sda
    smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-2-amd64] (local build)
    Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

    === START OF READ SMART DATA SECTION ===
    SMART Attributes Data Structure revision number: 1
    Vendor Specific SMART Attributes with Thresholds:
    ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
    1 Raw_Read_Error_Rate 0x0003 100 100 006 Pre-fail Always - 0
    3 Spin_Up_Time 0x0003 100 100 000 Pre-fail Always - 16
    4 Start_Stop_Count 0x0002 100 100 020 Old_age Always - 100
    5 Reallocated_Sector_Ct 0x0003 100 100 036 Pre-fail Always - 0
    9 Power_On_Hours 0x0003 100 100 000 Pre-fail Always - 1
    12 Power_Cycle_Count 0x0003 100 100 000 Pre-fail Always - 0
    190 Airflow_Temperature_Cel 0x0003 069 069 050 Pre-fail Always - 31 (Min/Max 31/31)

    root@CZ-LIVE:~# smartctl -A /dev/sda | \
     grep -iE "Power_On_Hours|G-Sense_Error_Rate|Reallocated|Pending|Uncorrectable"
    5 Reallocated_Sector_Ct 0x0003 100 100 036 Pre-fail Always - 0
    9 Power_On_Hours 0x0003 100 100 000 Pre-fail Always - 1

=> nothing special returned by the disk's internal components

dd show if there is read errors (source ) :

    root@CZ-LIVE:~# dd if=/dev/sda of=/dev/null bs=64k conv=noerror status=progress
    4000784842752 octets (4,0 TB, 3,6 TiB) copiés, 104555 s, 38,3 MB/s
    61047148+1 enregistrements lus
    61047148+1 enregistrements écrits
    4000785948160 octets (4,0 TB, 3,6 TiB) copiés, 104556 s, 38,3 MB/s

=> no read errors

    root@CZ-LIVE:~# date ; badblocks -svn /dev/sda ; date
    ven. 16 sept. 2022 17:00:06 UTC
    Checking for bad blocks in non-destructive read-write mode
    From block 0 to 3907017526
    Checking for bad blocks (non-destructive read-write test)
    Testing with random pattern: done
    Pass completed, 0 bad blocks found. (0/0/0 errors)
    dim. 18 sept. 2022 01:54:49 UTC

=> no block error

    root@CZ-LIVE:~# gpart /dev/sda
    Begin scan...
    Possible partition(Windows NT/W2K FS), size(0mb), offset(345079mb)

=> an empty partition ... not detected as ZFS
                                

Le Nain Jaune (167 rep)

Sep 20, 2022, 03:29 PM • Last activity: Sep 21, 2022, 09:57 AM

7 votes

3 answers

4913 views

Really Remapping Bad Blocks on disk

linux smartctl badblocks sata

I have a SATA drive which has a total of 8 bad blocks identified by the `badblocks` program. Supposedly the drive firmware should be able to remap them and substitute spares. I've run `badblocks` in `-n` mode to rewrite the partition in question, and run e2fsck multiple times. Nothing changes, alway...

                                  I have a SATA drive which has a total of 8 bad blocks identified by the badblocks program. Supposedly the drive firmware should be able to remap them and substitute spares. I've run badblocks in -n mode to rewrite the partition in question, and run e2fsck multiple times. Nothing changes, always the same 8 bad blocks.

When I run smartctl it shows the Reallocated_Sector_Ct at 0.

How can I get the firmware to actually remap the 8 bad blocks?

kmand (121 rep)

Jun 4, 2022, 05:44 PM • Last activity: Jun 4, 2022, 10:34 PM

Showing page 1 of 20 total questions