Unix & Linux Stack Exchange

Q&A for users of Linux, FreeBSD and other Unix-like operating systems

Latest Questions

5 votes

1 answers

9217 views

"Non-medium error" in smartctl output

Who can explain what is meaning "Non-medium error" in this output. I think my hard disks have some problems. **Disk1** root@nshost2:/home/david # smartctl -a -d cciss,0 /dev/ciss0 smartctl 6.5 2016-05-07 r4318 [FreeBSD 10.3-RELEASE-p4 amd64] (local build) Copyright (C) 2002-16, Bruce Allen, Christia...

                                  Who can explain what is meaning "Non-medium error" in this output. 

I think my hard disks have some problems.

**Disk1**

    root@nshost2:/home/david # smartctl -a -d cciss,0 /dev/ciss0 
    smartctl 6.5 2016-05-07 r4318 [FreeBSD 10.3-RELEASE-p4 amd64] (local build)
    Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
    
    === START OF INFORMATION SECTION ===
    Vendor:               HP
    Product:              EH0146FARWD
    Revision:             HPDC
    User Capacity:        146,815,737,856 bytes [146 GB]
    Logical block size:   512 bytes
    Rotation Rate:        15030 rpm
    Form Factor:          2.5 inches
    Logical Unit id:      0x5000cca00b7b4c54
    Serial number:        PLX5U32E
    Device type:          disk
    Transport protocol:   SAS (SPL-3)
    Local Time is:        Mon Aug 15 16:37:17 2016 AMT
    SMART support is:     Available - device has SMART capability.
    SMART support is:     Enabled
    Temperature Warning:  Enabled
    
    === START OF READ SMART DATA SECTION ===
    SMART Health Status: OK
    
    Current Drive Temperature:     28 C
    Drive Trip Temperature:        65 C
    
    Manufactured in week 05 of year 2012
    Specified cycle count over device lifetime:  50000
    Accumulated start-stop cycles:  31
    Elements in grown defect list: 0
    
    Vendor (Seagate) cache information
      Blocks sent to initiator = 18233185821261824
    
    Error counter log:
               Errors Corrected by           Total   Correction     Gigabytes    Total
                   ECC          rereads/    errors   algorithm      processed    uncorrected
               fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
    read:          0   168255         0    168255          0      21908.836           0
    write:         0  5365037         0   5365037          0      46145.893           0
    
    Non-medium error count:      691
    
    SMART Self-test log
    Num  Test              Status                 segment  LifeTime  LBA_first_err [SK ASC ASQ]
         Description                              number   (hours)
    # 1  Background short  Completed                   -       7                 - [-   -    -]
    # 2  Background short  Completed                   -       3                 - [-   -    -]
    
    Long (extended) Self Test duration: 1394 seconds [23.2 minutes]

**Disk2**

    root@nshost2:/home/david # smartctl -a -d cciss,1 /dev/ciss0
    smartctl 6.5 2016-05-07 r4318 [FreeBSD 10.3-RELEASE-p4 amd64] (local build)
    Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
    
    === START OF INFORMATION SECTION ===
    Vendor:               HP
    Product:              EH0146FARWD
    Revision:             HPDC
    User Capacity:        146,815,737,856 bytes [146 GB]
    Logical block size:   512 bytes
    Rotation Rate:        15030 rpm
    Form Factor:          2.5 inches
    Logical Unit id:      0x5000cca00b7b2254
    Serial number:        PLX5R9BE
    Device type:          disk
    Transport protocol:   SAS (SPL-3)
    Local Time is:        Mon Aug 15 16:38:56 2016 AMT
    SMART support is:     Available - device has SMART capability.
    SMART support is:     Enabled
    Temperature Warning:  Enabled
    
    === START OF READ SMART DATA SECTION ===
    SMART Health Status: OK
    
    Current Drive Temperature:     26 C
    Drive Trip Temperature:        65 C
    
    Manufactured in week 05 of year 2012
    Specified cycle count over device lifetime:  50000
    Accumulated start-stop cycles:  31
    Elements in grown defect list: 0
    
    Vendor (Seagate) cache information
      Blocks sent to initiator = 18232624858267648
    
    Error counter log:
               Errors Corrected by           Total   Correction     Gigabytes    Total
                   ECC          rereads/    errors   algorithm      processed    uncorrected
               fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
    read:          0   204138         0    204138          0      21981.509           0
    write:         0  3646624         0   3646624          0      46146.250           0
    
    Non-medium error count:      693
    
    SMART Self-test log
    Num  Test              Status                 segment  LifeTime  LBA_first_err [SK ASC ASQ]
         Description                              number   (hours)
    # 1  Background short  Completed                   -       7                 - [-   -    -]
    # 2  Background short  Completed                   -       3                 - [-   -    -]
    
    Long (extended) Self Test duration: 1394 seconds [23.2 minutes]
    
**Here is cciss output**

    
    root@nshost2:/home/david # cciss_vol_status -q /dev/ciss0
    /dev/ciss0: (Smart Array P410i) RAID 1(1+0) Volume 0 status: OK. 





                                

David (369 rep)

Aug 15, 2016, 12:45 PM • Last activity: Jul 22, 2025, 07:22 AM

0 votes

0 answers

36 views

Filesystem becomes read-only at random

debian ssd fsck smartctl smart

Debian crashed on Laptop (Acer Aspire 3, about 4 years old, HDD replaced with ADATA SU650 240GB SSD) and started throwing console errors reading "failed to rotate /var/log/journal: read-only filesystem". It rebooted fine, but a while later refused to load websites and eventually crashed again. Right...

                                  Debian crashed on Laptop (Acer Aspire 3, about 4 years old, HDD replaced with ADATA SU650 240GB SSD) and started throwing console errors reading "failed to rotate /var/log/journal: read-only filesystem".

It rebooted fine, but a while later refused to load websites and eventually crashed again. Right now, it's working fine.

After a quick Google search I installed smartctl to figure out the problem, and though it prints an overall "PASSED", it does have some attributes output "Pre-failed" and I'm not exactly sure how to interpret the rest of the values.

Here's the output:

        smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-37-amd64] (local build)
    Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org
    
    === START OF INFORMATION SECTION ===
    Model Family:     Silicon Motion based SSDs
    Device Model:     ADATA SU650
    Serial Number:    2N20292G46UJ
    LU WWN Device Id: 0 000000 000000000
    Firmware Version: XD0R6305
    User Capacity:    240,057,409,536 bytes [240 GB]
    Sector Size:      512 bytes logical/physical
    Rotation Rate:    Solid State Device
    Form Factor:      2.5 inches
    TRIM Command:     Available, deterministic
    Device is:        In smartctl database 7.3/5319
    ATA Version is:   ACS-3, ATA8-ACS T13/1699-D revision 6
    SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
    Local Time is:    Sun Jun 29 21:36:52 2025 -03
    SMART support is: Available - device has SMART capability.
    SMART support is: Enabled
    
    === START OF READ SMART DATA SECTION ===
    SMART overall-health self-assessment test result: PASSED
    
    General SMART Values:
    Offline data collection status:  (0x00)	Offline data collection activity
    					was never started.
    					Auto Offline Data Collection: Disabled.
    Self-test execution status:      (   0)	The previous self-test routine completed
    					without error or no self-test has ever 
    					been run.
    Total time to complete Offline 
    data collection: 		(    1) seconds.
    Offline data collection
    capabilities: 			 (0x59) SMART execute Offline immediate.
    					No Auto Offline data collection support.
    					Suspend Offline collection upon new
    					command.
    					Offline surface scan supported.
    					Self-test supported.
    					No Conveyance Self-test supported.
    					Selective Self-test supported.
    SMART capabilities:            (0x0002)	Does not save SMART data before
    					entering power-saving mode.
    					Supports SMART auto save timer.
    Error logging capability:        (0x01)	Error logging supported.
    					General Purpose Logging supported.
    Short self-test routine 
    recommended polling time: 	 (   1) minutes.
    Extended self-test routine
    recommended polling time: 	 (   2) minutes.
    
    SMART Attributes Data Structure revision number: 10
    Vendor Specific SMART Attributes with Thresholds:
    ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
      1 Raw_Read_Error_Rate     0x002f   100   100   050    Pre-fail  Always       -       0
      5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
      9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       929
     12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       1439
    160 Uncorrectable_Error_Cnt 0x0032   100   100   050    Old_age   Always       -       0
    161 Valid_Spare_Block_Cnt   0x0032   100   100   050    Old_age   Always       -       100
    163 Initial_Bad_Block_Count 0x0032   100   100   000    Old_age   Always       -       48
    164 Total_Erase_Count       0x0032   100   100   000    Old_age   Always       -       87382
    165 Max_Erase_Count         0x0032   100   100   000    Old_age   Always       -       156
    166 Min_Erase_Count         0x0032   100   100   000    Old_age   Always       -       44
    167 Average_Erase_Count     0x0032   100   100   000    Old_age   Always       -       109
    148 Total_SLC_Erase_Ct      0x0032   100   100   000    Old_age   Always       -       262148
    149 Max_SLC_Erase_Ct        0x0032   100   100   000    Old_age   Always       -       468
    150 Min_SLC_Erase_Ct        0x0032   100   100   000    Old_age   Always       -       132
    151 Average_SLC_Erase_Ct    0x0032   100   100   000    Old_age   Always       -       329
    159 DRAM_1_Bit_Error_Count  0x0032   100   100   000    Old_age   Always       -       0
    168 Max_Erase_Count_of_Spec 0x0032   100   100   000    Old_age   Always       -       468
    169 Remaining_Lifetime_Perc 0x0032   100   100   000    Old_age   Always       -       98
    177 Wear_Leveling_Count     0x0032   100   100   000    Old_age   Always       -       1823
    181 Program_Fail_Cnt_Total  0x0032   100   100   000    Old_age   Always       -       0
    182 Erase_Fail_Count_Total  0x0032   100   100   000    Old_age   Always       -       0
    192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       77
    194 Temperature_Celsius     0x0032   100   100   000    Old_age   Always       -       26
    195 Hardware_ECC_Recovered  0x0032   100   100   000    Old_age   Always       -       403177
    196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
    199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0
    232 Available_Reservd_Space 0x0032   100   100   000    Old_age   Always       -       100
    241 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -       139845
    242 Host_Reads_32MiB        0x0032   100   100   000    Old_age   Always       -       143114
    245 TLC_Writes_32MiB        0x0032   100   100   000    Old_age   Always       -       296002
    
    SMART Error Log Version: 1
    No Errors Logged
    
    SMART Self-test log structure revision number 1
    No self-tests have been logged.  [To run self-tests, use: smartctl -t]
    
    SMART Selective self-test log data structure revision number 0
    Note: revision number not 1 implies that no selective self-test has ever been run
     SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
        1        0        0  Not_testing
        2        0        0  Not_testing
        3        0        0  Not_testing
        4        0        0  Not_testing
        5        0        0  Not_testing
    Selective self-test flags (0x0):
      After scanning selected spans, do NOT read-scan remainder of disk.
    If Selective self-test is pending on power-up, resume after 0 minute delay.

I'd greatly appreciate some advice on what these values mean and what can be done about them. I know that "Old_age" means the device is worn and "Pre-fail" means it's about to give, but I don't really know if this reflects normal wear, lack of maintenance, or is recoverable from.

Thanks in advance!
                                

geistofsttraft (1 rep)

Jun 30, 2025, 12:45 AM • Last activity: Jun 30, 2025, 12:46 AM

6 votes

2 answers

485 views

SMART test and suspend or reboot

hard-disk smartctl

What is the behaviour if I start a long test with smartctl (i.e., `sudo smartctl -t long /dev/...`), and then I suspend the machine or shut it down and restart it later? Will the test "suspend" too and "continue on" when the machine is started up again? What my question mainly aims at: I would like...

                                  What is the behaviour if I start a long test with smartctl (i.e., sudo smartctl -t long /dev/...), and then I suspend the machine or shut it down and restart it later? Will the test "suspend" too and "continue on" when the machine is started up again?

What my question mainly aims at: I would like to schedule regular SMART tests (e.g., with cron), but it still may happen that I suspend or shut down the machine while the test is running, and then I don't know whether the test will continue or is "not finished" and I would have to restart it again. With current, 16+ TB HDDs SMART tests easily can run more than 20 hours...

The same question holds for external disks, can I detached an USB disk while a SMART test is running and will it be continued next time I connect the device?

D. Kovács (163 rep)

Jun 8, 2025, 05:56 AM • Last activity: Jun 9, 2025, 02:21 PM

0 votes

2 answers

2009 views

fsck is taking a lot of time?(buffer I/O error)

ubuntu boot busybox fsck smartctl

I had a irregular powercut 4-5 times in a row within an hour. My ubuntu suddenly went into busy box mode and showed there are errors on /dev/sda5. I then tried: `fsck /dev/sda5 -y` It has taken a lot of time more than an hour and still forcing rewrite. It seems that a lot of blocks are being repaire...

                                  I had a irregular powercut 4-5 times in a row within an hour. My ubuntu suddenly went into busy box mode and showed there are errors on /dev/sda5.

I then tried:
 fsck /dev/sda5 -y

It has taken a lot of time more than an hour and still forcing rewrite. It seems that a lot of blocks are being repaired.

Can someone describe what is going on or suggest any fix?

Atom Store (101 rep)

Dec 28, 2020, 03:00 PM • Last activity: Jun 3, 2025, 05:07 AM

0 votes

0 answers

70 views

Weird failure and "Smartctl open device: /dev/nvme0 failed: Resource temporarily unavailable"

ssd smartctl

in WIN11 I have the issue that my screen is frozen but mouse can move. Used my Debian boot stick to check if it is hardware that is failing. Memory seems to be OK but the SSD is giving me some headaches. *smartctl -a /dev/nvme0* returns: === START OF INFORMATION SECTION === Model Number: KINGSTON SN...

                                  in WIN11 I have the issue that my screen is frozen but mouse can move.
Used my Debian boot stick to check if it is hardware that is failing. Memory seems to be OK but the SSD is giving me some headaches. 

*smartctl -a /dev/nvme0*
returns:

    === START OF INFORMATION SECTION ===
    Model Number:                       KINGSTON SNV2S2000G
    Serial Number:                      50026B7381B094D5
    Firmware Version:                   SBK00104
    PCI Vendor/Subsystem ID:            0x2646
    IEEE OUI Identifier:                0x0026b7
    Controller ID:                      1
    NVMe Version:                       1.4
    Number of Namespaces:               1
    Namespace 1 Size/Capacity:          2,000,398,934,016 [2.00 TB]
    Namespace 1 Formatted LBA Size:     512
    Namespace 1 IEEE EUI-64:            0026b7 381b094d55
    Local Time is:                      Sun Jun  1 14:22:24 2025 UTC
    Firmware Updates (0x12):            1 Slot, no Reset required
    Optional Admin Commands (0x0016):   Format Frmw_DL Self_Test
    Optional NVM Commands (0x009f):     Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Verify
    Log Page Attributes (0x12):         Cmd_Eff_Lg Pers_Ev_Lg
    Maximum Data Transfer Size:         64 Pages
    Warning  Comp. Temp. Threshold:     83 Celsius
    Critical Comp. Temp. Threshold:     90 Celsius
    
    Supported Power States
    St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
     0 +     5.00W       -        -    0  0  0  0        0       0
     1 +     3.50W       -        -    1  1  1  1        0     200
     2 +     2.50W       -        -    2  2  2  2        0    1000
     3 -     1.50W       -        -    3  3  3  3     5000    5000
     4 -     1.50W       -        -    4  4  4  4    20000   70000
    
    Supported LBA Sizes (NSID 0x1)
    Id Fmt  Data  Metadt  Rel_Perf
     0 +     512       0         0
    
    === START OF SMART DATA SECTION ===
    SMART overall-health self-assessment test result: PASSED
    
    SMART/Health Information (NVMe Log 0x02)
    Critical Warning:                   0x00
    Temperature:                        40 Celsius
    Available Spare:                    100%
    Available Spare Threshold:          10%
    Percentage Used:                    0%
    Data Units Read:                    7,508,693 [3.84 TB]
    Data Units Written:                 6,431,227 [3.29 TB]
    Host Read Commands:                 76,389,168
    Host Write Commands:                100,940,793
    Controller Busy Time:               5,706
    Power Cycles:                       196
    Power On Hours:                     292
    Unsafe Shutdowns:                   97
    Media and Data Integrity Errors:    0
    Error Information Log Entries:      0
    Warning  Comp. Temperature Time:    0
    Critical Comp. Temperature Time:    0
    
    Error Information (NVMe Log 0x01, 16 of 64 entries)
    No Errors Logged
    
    Self-test Log (NVMe Log 0x06)
    Self-test status: Extended self-test in progress (26% completed)
    Num  Test_Description  Status                       Power_on_Hours  Failing_LBA  NSID Seg SCT Code
     0   Short             Completed without error                 292            -     -   -   -    -
     1   Extended          Completed without error                 292            -     -   -   -    -
     2   Short             Completed without error                 292            -     -   -   -    -


which looks OK. However after 
*sudo smartctl -t long /dev/nvme0*

I receive: 

    Smartctl open device: /dev/nvme0 failed: Resource temporarily unavailable

after a while ,... 

Do you think the SSD is failing or the controller on the board- or nay other idea??
Unfortunately I do not have a spare SSD here to test. 

Any hints?
Thanky for helping me! 


                                

Timo Bularczyk (1 rep)

Jun 1, 2025, 04:18 PM

1 votes

1 answers

130 views

Wear level and total bytes written in SATA SSD

hardware ssd storage smartctl

On a Samsung SATA SSD, i.e. non NVMe disk, the following are the SmartCtl values that are obtained by running the command `sudo smartctl -a /dev/sda`, SMART Attributes Data Structure revision number: 1 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE...

                                  On a Samsung SATA SSD, i.e. non NVMe disk, the following are the SmartCtl values that are obtained by running the command sudo smartctl -a /dev/sda, 

    SMART Attributes Data Structure revision number: 1
    Vendor Specific SMART Attributes with Thresholds:
    ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
      5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
    177 Wear_Leveling_Count     0x0013   099   099   000    Pre-fail  Always       -       18
    241 Total_LBAs_Written      0x0032   099   099   000    Old_age   Always       -       10452411061


On the same disk when the command sudo skdump /dev/sda is run the following is the output. 

    Overall Status: GOOD
    ID# Name                        Value Worst Thres Pretty      Raw            Type    Updates Good Good/Past
      5 reallocated-sector-count    100   100    10   0 sectors   0x000000000000 prefail online  yes  yes 
    177 wear-leveling-count          99    99     0   18          0x120000000000 prefail online  n/a  n/a 
    241 total-lbas-written           99    99     0   350725.752 TB 0x3d9b036f0200 old-age online  n/a  n/a 


For this I had the following queries
1) skdump command is returning a value of 350725.752 TB written, i.e.  total-lbas-written. Is this correct? 
2) Based on the answer  provided in another post the output of smartctl for total-lbas-written is 10452411061, which equates to 4.86 TB written (i.e. 10452411061/2/1024/1024/1024). This differs significantly from the value reported by the skdump command. Is this value accurate?
The Sector size is 512 bytes. 
3) After looking at various posts in SuperUser and StackExchange, for samsung ssd drives the value of Wear_Leveling_Count determines how much wear leveling has occurred on the SSD. But it is not clear what figure should be considered? The figure of the column **RAW_VALUE** or **VALUE** column. And does having RAW_VALUE of 18 implies that only 18% of the SSD life is left? 
                                

KDM (116 rep)

May 27, 2025, 08:12 AM • Last activity: May 27, 2025, 10:17 AM

8 votes

3 answers

34865 views

Run smartctl on all disks of a server

shell-script logs hard-disk raid smartctl

My question is a quite simple , I want to run the command `smartctl -i -A` on all disks that the server have. Think that I've too much server with different number of disks and RAID Controllers, then I need to scan all drivers for a diagnosis. I'm thinking of running `smartctl --scan | awk '{print $...

                                  My question is a quite simple , I want to run the command smartctl -i -A on all disks that the server have.
Think that I've too much server with different number of disks and RAID Controllers, then I need to scan all drivers for a diagnosis.
I'm thinking of running smartctl --scan | awk '{print $1}' >> test.log, so if I open the test.log I'll have all the drives information in it.  
After this I need to run some if or do constructions to scan with smartctl all drivers.
I don't know if this is the best way to do this, since I need to identify the RAID Controller too.
Am heading in the right direction?

##Edit:
I'm used to use these commands to troubleshoot:

###Without RAID Controller
    
	for i in {c..d}; do
        echo "Disk sd$i" $SN $MD
        smartctl -i -A /dev/sd$i |grep -E "^  "5"|^"197"|^"198"|"FAILING_NOW"|"SERIAL""
    done

###PERC Controller
    
	for i in {0..12}; do
        echo "$i" $SN $MD
        smartctl -i -A -T permissive /dev/sda -d megaraid,$i |grep -E "^  "5"|^"197"|^"198"|"FAILING_NOW"|"SERIAL""
    done
    /usr/sbin/megastatus –physical
    /usr/sbin/megastatus --logical

###3ware Controller
    
	for i in {0..10}; do
        echo "Disk $i" $SN $MD
        smartctl -i -A /dev/twa0 -d 3ware,$i |grep -E "^  "5"|^"197"|^"198"|"FAILING_NOW"|"SERIAL""
    done
	
###SmartArray & Megaraid Controler:
    
	smartctl –a –d cciss,0 /dev/cciss/c0d0
    /opt/3ware/9500/tw_cli show
    cd /tmp

###DD (Rewrite disk block (DESTROY DATA)):

    dd if=/dev/zero of=/dev/HD* bs=4M
    HD*: sda, sdb…

###Burning (Stress test (DESTROY DATA)):
    
	/opt/systems/bin/vs-burnin --destructive --time= /tmp/burninlog.txt

###Dmesg&kernerrors:

    tail /var/log/kernerrors
    dmesg |grep –i –E “”ata”|”fault”|”error”

So what I'm trying to do is automate these commands.  
I want that the script verify all disks that the host have and run the appropriate smartctl command for the case.  
Something like a menu with some options that let me choose if I want to run a smartctl or some destructive command, if I choose to run smartctl  
the script will scan all disks and runs the command according to the host configuration ( with / without RAID controller),  
and if I choose to run a destructive command, the script will ask me to put the disk number that I want to do this.


----------


##Edit 2:
I resolved my problem with the following script:

    #!/bin/bash
    # Troubleshoot.sh
    # A more elaborate version of Troubleshoot.sh.
    
    SUCCESS=0
    E_DB=99    # Error code for missing entry.
    
    declare -A address
    #       -A option declares associative array.
    
    
    
    if [ -f Troubleshoot.log ]
    then
    	rm Troubleshoot.log
    fi
    
    if [ -f HDs.log ]
    then
    	rm HDs.log
    fi
    
    smartctl --scan | awk '{print $1}' >> HDs.log
    lspci | grep -i raid >> HDs.log
    
    getArray ()
    {
    	i=0
        while read line # Read a line
        do
            array[i]=$line # Put it into the array
            i=$(($i + 1))
        done > Troubleshoot.log
    			smartctl -i -A $e >> Troubleshoot.log # Run smartctl into all disks that the host have
        fi
    done
    exit $?   # In this case, exit code = 99, since that is function return.
I don't know if this solution is the right or the best one, but works for me!

Appreciate all help!!
                                

ZeroNegative (103 rep)

Mar 27, 2014, 12:11 PM • Last activity: May 22, 2025, 05:45 PM

5 votes

1 answers

3346 views

How should I interpret this smartctl readout

smartctl

I'm running a Debian Jessie Machine with an external 3TB Hard Drive. The drive has gone offline twice in the last few days, trying to [ls] results in nothing to display, same in Dolphin. Running `smartctl -a /dev/sdc` results in the following. Note that this drive is only a few months old, but it ho...

                                  I'm running a Debian Jessie Machine with an external 3TB Hard Drive.  The drive has gone offline twice in the last few days, trying to [ls] results in nothing to display, same in Dolphin.

Running smartctl -a /dev/sdc results in the following.  Note that this drive is only a few months old, but it holds my fairly large video collection, managed and Viewed using  Plex Media Server.  


    smartctl 6.4 2014-10-07 r4002 [x86_64-linux-3.16.0-4-amd64] (local build)
    Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org
    
    === START OF INFORMATION SECTION ===
    Model Family:     Seagate Barracuda 7200.14 (AF)
    Device Model:     ST3000DM001-1E6166
    Serial Number:    Z1F4HXVG
    LU WWN Device Id: 5 000c50 065ca347a
    Firmware Version: SC48
    User Capacity:    3,000,592,982,016 bytes [3.00 TB]
    Sector Sizes:     512 bytes logical, 4096 bytes physical
    Rotation Rate:    7200 rpm
    Form Factor:      3.5 inches
    Device is:        In smartctl database [for details use: -P show]
    ATA Version is:   ATA8-ACS T13/1699-D revision 4
    SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
    Local Time is:    Wed Dec 10 21:04:09 2014 GMT
    SMART support is: Available - device has SMART capability.
    SMART support is: Enabled
    
    === START OF READ SMART DATA SECTION ===
    SMART Status command failed: scsi error medium or hardware error (serious)
    SMART overall-health self-assessment test result: PASSED
    Warning: This result is based on an Attribute check.
    See vendor-specific Attribute list for marginal Attributes.
    
    General SMART Values:
    Offline data collection status:  (0x00) Offline data collection activity
                                            was never started.
                                            Auto Offline Data Collection: Disabled.
    Self-test execution status:      (   0) The previous self-test routine completed
                                            without error or no self-test has ever 
                                            been run.
    Total time to complete Offline 
    data collection:                (  592) seconds.
    Offline data collection
    capabilities:                    (0x73) SMART execute Offline immediate.
                                            Auto Offline data collection on/off support.
                                            Suspend Offline collection upon new
                                            command.
                                            No Offline surface scan supported.
                                            Self-test supported.
                                            Conveyance Self-test supported.
                                            Selective Self-test supported.
    SMART capabilities:            (0x0003) Saves SMART data before entering
                                            power-saving mode.
                                            Supports SMART auto save timer.
    Error logging capability:        (0x01) Error logging supported.
                                            General Purpose Logging supported.
    Short self-test routine 
    recommended polling time:        (   1) minutes.
    Extended self-test routine
    recommended polling time:        ( 361) minutes.
    Conveyance self-test routine
    recommended polling time:        (   2) minutes.
    SCT capabilities:              (0x3081) SCT Status supported.
    
    SMART Attributes Data Structure revision number: 10
    Vendor Specific SMART Attributes with Thresholds:
    ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
      1 Raw_Read_Error_Rate     0x000f   113   099   006    Pre-fail  Always       -       55323488
      3 Spin_Up_Time            0x0003   092   091   000    Pre-fail  Always       -       0
      4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       55
      5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
      7 Seek_Error_Rate         0x000f   058   052   030    Pre-fail  Always       -       150347021957
      9 Power_On_Hours          0x0032   095   095   000    Old_age   Always       -       5157
     10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
     12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       48
    183 Runtime_Bad_Block       0x0032   099   099   000    Old_age   Always       -       1
    184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
    187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
    188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0 0 0
    189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
    190 Airflow_Temperature_Cel 0x0022   056   037   045    Old_age   Always   In_the_past 44 (Min/Max 41/52 #7782)
    191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
    192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       5
    193 Load_Cycle_Count        0x0032   096   096   000    Old_age   Always       -       8971
    194 Temperature_Celsius     0x0022   044   063   000    Old_age   Always       -       44 (0 16 0 0 0)
    197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
    198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
    199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
    240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       3274h+33m+50.476s
    241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       6233106761
    242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       8133666728
    
    SMART Error Log Version: 1
    No Errors Logged
    
    SMART Self-test log structure revision number 1
    No self-tests have been logged.  [To run self-tests, use: smartctl -t]
    
    SMART Selective self-test log data structure revision number 1
     SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
        1        0        0  Not_testing
        2        0        0  Not_testing
        3        0        0  Not_testing
        4        0        0  Not_testing
        5        0        0  Not_testing
    Selective self-test flags (0x0):
      After scanning selected spans, do NOT read-scan remainder of disk.
    If Selective self-test is pending on power-up, resume after 0 minute delay.
                                

aSystemOverload (781 rep)

Dec 10, 2014, 09:11 PM • Last activity: May 5, 2025, 10:00 PM

5 votes

1 answers

3696 views

How do I prevent the Unsafe Shutdowns reported by smartclt?

ssd smartctl

Based on a [suggestion by eblock](https://unix.stackexchange.com/questions/621310/is-there-a-way-to-find-the-cause-of-unexpected-power-offs-by-inspecting-logfiles#comment1162035_621310), I have run `smartctl` several times for the last few days to check for issues. Below, as an example, is the outpu...

                                  Based on a [suggestion by eblock](https://unix.stackexchange.com/questions/621310/is-there-a-way-to-find-the-cause-of-unexpected-power-offs-by-inspecting-logfiles#comment1162035_621310) , I have run smartctl several times for the last few days to check for issues. Below, as an example, is the output of sudo smartctl -a /dev/nvme0n1p2:

    smartctl 7.0 2019-05-21 r4917 [x86_64-linux-5.5.7-1-default] (SUSE RPM)
    Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org
    
    === START OF INFORMATION SECTION ===
    Model Number:                       Samsung SSD 970 EVO Plus 500GB
    Serial Number:                      S4EVNZFN503427W
    Firmware Version:                   2B2QEXM7
    PCI Vendor/Subsystem ID:            0x144d
    IEEE OUI Identifier:                0x002538
    Total NVM Capacity:                 500,107,862,016 [500 GB]
    Unallocated NVM Capacity:           0
    Controller ID:                      4
    Number of Namespaces:               1
    Namespace 1 Size/Capacity:          500,107,862,016 [500 GB]
    Namespace 1 Utilization:            94,943,219,712 [94.9 GB]
    Namespace 1 Formatted LBA Size:     512
    Namespace 1 IEEE EUI-64:            002538 5501ad2a18
    Local Time is:                      Wed Dec  2 11:19:04 2020 CET
    Firmware Updates (0x16):            3 Slots, no Reset required
    Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
    Optional NVM Commands (0x005f):     Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
    Maximum Data Transfer Size:         512 Pages
    Warning  Comp. Temp. Threshold:     85 Celsius
    Critical Comp. Temp. Threshold:     85 Celsius
    
    Supported Power States
    St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
     0 +     7.80W       -        -    0  0  0  0        0       0
     1 +     6.00W       -        -    1  1  1  1        0       0
     2 +     3.40W       -        -    2  2  2  2        0       0
     3 -   0.0700W       -        -    3  3  3  3      210    1200
     4 -   0.0100W       -        -    4  4  4  4     2000    8000
    
    Supported LBA Sizes (NSID 0x1)
    Id Fmt  Data  Metadt  Rel_Perf
     0 +     512       0         0
    
    === START OF SMART DATA SECTION ===
    SMART overall-health self-assessment test result: PASSED
    
    SMART/Health Information (NVMe Log 0x02)
    Critical Warning:                   0x00
    Temperature:                        38 Celsius
    Available Spare:                    100%
    Available Spare Threshold:          10%
    Percentage Used:                    0%
    Data Units Read:                    382,321 [195 GB]
    Data Units Written:                 695,579 [356 GB]
    Host Read Commands:                 4,525,857
    Host Write Commands:                9,680,786
    Controller Busy Time:               30
    Power Cycles:                       205
    Power On Hours:                     75
    Unsafe Shutdowns:                   73
    Media and Data Integrity Errors:    0
    Error Information Log Entries:      209
    Warning  Comp. Temperature Time:    0
    Critical Comp. Temperature Time:    0
    Temperature Sensor 1:               38 Celsius
    Temperature Sensor 2:               41 Celsius
    
    Error Information (NVMe Log 0x01, max 64 entries)
    No Errors Logged
    
The lines "SMART overall-health self-assessment test result: PASSED" and "No Errors Logged" look reassuring, but the following line doesn't:

    Unsafe Shutdowns:                   73

According to [Using NVMe Command Line Tools to Check NVMe Flash Health](https://www.percona.com/blog/2017/02/09/using-nvme-command-line-tools-to-check-nvme-flash-health/)  by Peter Zaitsev (February 2017), Unsafe Shutdowns refers to

> The number of times a power loss happened without a shutdown notification being sent. Depending on the NVMe device you’re using, an unsafe shutdown might corrupt user data.

There have been a few unexpected shutdowns on my Tuxedo notebook (see [Is there a way to find the cause of unexpected power offs by inspecting logfiles?](https://unix.stackexchange.com/q/621310/158988))  but not 73 times. 

According to [this forum post on Tom's Harware (April 2019)](https://forums.tomshardware.com/threads/unsafe-shut-down-smart-data.3468029/) , disabling fast boot might help. Is this correct or is something else needed?
                                

Tsundoku (838 rep)

Dec 2, 2020, 10:27 AM • Last activity: Apr 15, 2025, 09:01 PM

3 votes

3 answers

3253 views

Files system become suddently read only; how to debug this?

systemd debugging ssd smartctl

My ext-4 root and home filesystem became suddently read-only. How can I find out what was the reason for this? The system is ubuntu 16.04 with systemd (installed on an ssd), where root and home partition are encrypted with dm-crypt and formatted with an ext-4 fs. **Edit** Just after I wrote this pos...

                                  My ext-4 root and home filesystem became suddently read-only. How can I find out what was the reason for this? 

The system is ubuntu 16.04 with systemd (installed on an ssd), where root and home partition are encrypted with dm-crypt and formatted with an ext-4 fs.

**Edit** Just after I wrote this post the system crashed again (two times) with a slightly blinking black/colored screen. Now it seems to work again. 

The /etc/fstab contains for the root partition the mount option errors=remount-ro

The smartctl -a /dev/sda gives

    smartctl -a /dev/sda
    smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.4.0-21-generic] (local build)
    Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
    
    === START OF INFORMATION SECTION ===
    Device Model:     SAMSUNG MZ7PC256HAFU-000L7
    Serial Number:    S0Y5NSAC602442
    Firmware Version: CXM72L1Q
    User Capacity:    256,060,514,304 bytes [256 GB]
    Sector Size:      512 bytes logical/physical
    Rotation Rate:    Solid State Device
    Device is:        Not in smartctl database [for details use: -P showall]
    ATA Version is:   ATA8-ACS T13/1699-D revision 4c
    SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
    Local Time is:    Mon May 23 17:07:40 2016 UTC
    SMART support is: Available - device has SMART capability.
    SMART support is: Enabled
    
    === START OF READ SMART DATA SECTION ===
    SMART overall-health self-assessment test result: PASSED
    
    General SMART Values:
    Offline data collection status:  (0x02)	Offline data collection activity
    					was completed without error.
    					Auto Offline Data Collection: Disabled.
    Self-test execution status:      (   0)	The previous self-test routine completed
    					without error or no self-test has ever 
    					been run.
    Total time to complete Offline 
    data collection: 		( 1020) seconds.
    Offline data collection
    capabilities: 			 (0x5b) SMART execute Offline immediate.
    					Auto Offline data collection on/off support.
    					Suspend Offline collection upon new
    					command.
    					Offline surface scan supported.
    					Self-test supported.
    					No Conveyance Self-test supported.
    					Selective Self-test supported.
    SMART capabilities:            (0x0003)	Saves SMART data before entering
    					power-saving mode.
    					Supports SMART auto save timer.
    Error logging capability:        (0x01)	Error logging supported.
    					General Purpose Logging supported.
    Short self-test routine 
    recommended polling time: 	 (   2) minutes.
    Extended self-test routine
    recommended polling time: 	 (  17) minutes.
    
    SMART Attributes Data Structure revision number: 1
    Vendor Specific SMART Attributes with Thresholds:
    ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
      9 Power_On_Hours          0x0032   098   098   000    Old_age   Always       -       6093
     12 Power_Cycle_Count       0x0032   097   097   000    Old_age   Always       -       2810
    175 Program_Fail_Count_Chip 0x0032   100   100   010    Old_age   Always       -       0
    176 Erase_Fail_Count_Chip   0x0032   100   100   010    Old_age   Always       -       0
    177 Wear_Leveling_Count     0x0013   095   095   017    Pre-fail  Always       -       169
    178 Used_Rsvd_Blk_Cnt_Chip  0x0013   094   094   010    Pre-fail  Always       -       230
    179 Used_Rsvd_Blk_Cnt_Tot   0x0013   094   094   010    Pre-fail  Always       -       450
    180 Unused_Rsvd_Blk_Cnt_Tot 0x0013   094   094   010    Pre-fail  Always       -       7614
    181 Program_Fail_Cnt_Total  0x0032   100   100   010    Old_age   Always       -       0
    182 Erase_Fail_Count_Total  0x0032   100   100   010    Old_age   Always       -       0
    183 Runtime_Bad_Block       0x0013   100   100   010    Pre-fail  Always       -       0
    184 End-to-End_Error        0x0033   100   100   097    Pre-fail  Always       -       0
    187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
    190 Airflow_Temperature_Cel 0x0032   066   042   000    Old_age   Always       -       34
    195 Hardware_ECC_Recovered  0x001a   200   200   000    Old_age   Always       -       0
    198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0
    199 UDMA_CRC_Error_Count    0x003e   253   253   000    Old_age   Always       -       1
    233 Media_Wearout_Indicator 0x003a   200   200   000    Old_age   Always       -       0
    234 Unknown_Attribute       0x0012   100   100   000    Old_age   Always       -       0
    235 Unknown_Attribute       0x0012   099   099   000    Old_age   Always       -       48
    236 Unknown_Attribute       0x0012   099   099   000    Old_age   Always       -       48
    237 Unknown_Attribute       0x0012   099   099   000    Old_age   Always       -       169
    238 Unknown_Attribute       0x0012   099   099   000    Old_age   Always       -       450
    
    SMART Error Log Version: 1
    No Errors Logged
    
    SMART Self-test log structure revision number 1
    Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
    # 1  Extended offline    Completed without error       00%      6092         -
    
    SMART Selective self-test log data structure revision number 1
     SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
        1        0        0  Not_testing
        2        0        0  Not_testing
        3        0        0  Not_testing
        4        0        0  Not_testing
        5        0        0  Not_testing
    Selective self-test flags (0x0):
      After scanning selected spans, do NOT read-scan remainder of disk.
    If Selective self-test is pending on power-up, resume after 0 minute delay.




                                

student (18865 rep)

May 23, 2016, 01:11 PM • Last activity: Mar 4, 2025, 03:09 AM

2 votes

1 answers

80 views

Reassemble Raid5 Array after disabling TPM

software-raid mdadm raid5 data-recovery smartctl

Edit: Both /dev/sdd and /dev/sde are missing super blocks. I assume this cannot be fixed. I am whipping the drives and starting over. I just finished coping 8TB worth of data to a new raid5 array. I just turned off TPM in my bios, and this array was no longer readable. I would like to fix this rathe...

                                  Edit: Both /dev/sdd and /dev/sde are missing super blocks. I assume this cannot be fixed. I am whipping the drives and starting over.

I just finished coping 8TB worth of data to a new raid5 array. I just turned off TPM in my bios, and this array was no longer readable. I would like to fix this rather than starting over. I tried to reassemble it, and got this error.

    $ sudo mdadm --assemble /dev/md0 /dev/sda /dev/sdb /dev/sdd /dev/sde -f
    mdadm: No super block found on /dev/sdd (Expected magic a92b4efc, got 00000000)
    mdadm: no RAID superblock on /dev/sdd
    mdadm: /dev/sdd has no superblock - assembly aborted

Here's what examining /sdd resulted in.

    $sudo mdadm -E /dev/sdd
    /dev/sdd:
       MBR Magic : aa55
    Partition :   4294967295 sectors at            1 (type ee)

Here's some more diagnostics:

    sudo mdadm --examine /dev/sd*
    /dev/sda:
              Magic : a92b4efc
            Version : 1.2
        Feature Map : 0x1
         Array UUID : 7844a579:00996056:06c4e1dd:0e70ebcb
               Name : scott-LinuxMint:0  (local to host scott-LinuxMint)
      Creation Time : Thu Jan  2 12:50:26 2025
         Raid Level : raid5
       Raid Devices : 4
    
     Avail Dev Size : 7813772976 sectors (3.64 TiB 4.00 TB)
         Array Size : 11720659392 KiB (10.92 TiB 12.00 TB)
      Used Dev Size : 7813772928 sectors (3.64 TiB 4.00 TB)
        Data Offset : 264192 sectors
       Super Offset : 8 sectors
       Unused Space : before=264112 sectors, after=48 sectors
              State : clean
        Device UUID : 0febcd7e:7581f3c8:7b5962c5:cbddee7c
    
    Internal Bitmap : 8 sectors from superblock
        Update Time : Fri Jan  3 22:05:37 2025
      Bad Block Log : 512 entries available at offset 24 sectors
           Checksum : 852d7efe - correct
             Events : 6116
    
             Layout : left-symmetric
         Chunk Size : 64K
    
       Device Role : Active device 0
       Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing)
    /dev/sdb:
              Magic : a92b4efc
            Version : 1.2
        Feature Map : 0x1
         Array UUID : 7844a579:00996056:06c4e1dd:0e70ebcb
               Name : scott-LinuxMint:0  (local to host scott-LinuxMint)
      Creation Time : Thu Jan  2 12:50:26 2025
         Raid Level : raid5
       Raid Devices : 4
    
     Avail Dev Size : 7813772976 sectors (3.64 TiB 4.00 TB)
         Array Size : 11720659392 KiB (10.92 TiB 12.00 TB)
      Used Dev Size : 7813772928 sectors (3.64 TiB 4.00 TB)
        Data Offset : 264192 sectors
       Super Offset : 8 sectors
       Unused Space : before=264112 sectors, after=48 sectors
              State : clean
        Device UUID : d2280c55:cf16ae93:aaa5e4a0:71e30dbb
    
    Internal Bitmap : 8 sectors from superblock
        Update Time : Fri Jan  3 22:05:37 2025
      Bad Block Log : 512 entries available at offset 24 sectors
           Checksum : 3fc7a3f1 - correct
             Events : 6116
    
             Layout : left-symmetric
         Chunk Size : 64K
    
       Device Role : Active device 1
       Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing)
    /dev/sdc:
       MBR Magic : aa55
    Partition :   4294967295 sectors at            1 (type ee)
    /dev/sdc1:
       MBR Magic : aa55
    Partition :   1836016416 sectors at   1936269394 (type 4f)
    Partition :    544437093 sectors at   1917848077 (type 73)
    Partition :    544175136 sectors at   1818575915 (type 2b)
    Partition :        54974 sectors at   2844524554 (type 61)
    /dev/sdd:
       MBR Magic : aa55
    Partition :   4294967295 sectors at            1 (type ee)
    mdadm: No md superblock detected on /dev/sdd1.
    /dev/sde:
       MBR Magic : aa55
    Partition :   4294967295 sectors at            1 (type ee)
    mdadm: No md superblock detected on /dev/sde1.

And the drive seems healthy.

    $sudo smartctl -d ata -a /dev/sdd
    smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.8.0-51-generic] (local build)
    Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org
    
    === START OF INFORMATION SECTION ===
    Model Family:     Seagate Skyhawk
    Device Model:     ST4000VX007-2DT166
    Serial Number:    ZDH61N4Z
    LU WWN Device Id: 5 000c50 0b4cf0507
    Firmware Version: CV11
    User Capacity:    4,000,787,030,016 bytes [4.00 TB]
    Sector Sizes:     512 bytes logical, 4096 bytes physical
    Rotation Rate:    5980 rpm
    Form Factor:      3.5 inches
    Device is:        In smartctl database 7.3/5528
    ATA Version is:   ACS-3 T13/2161-D revision 5
    SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
    Local Time is:    Fri Jan  3 23:01:40 2025 EST
    SMART support is: Available - device has SMART capability.
    SMART support is: Enabled
    
    === START OF READ SMART DATA SECTION ===
    SMART overall-health self-assessment test result: PASSED
    
    General SMART Values:
    Offline data collection status:  (0x82)	Offline data collection activity
    					was completed without error.
    					Auto Offline Data Collection: Enabled.
    Self-test execution status:      (   0)	The previous self-test routine completed
    					without error or no self-test has ever 
    					been run.
    Total time to complete Offline 
    data collection: 		(  591) seconds.
    Offline data collection
    capabilities: 			 (0x7b) SMART execute Offline immediate.
    					Auto Offline data collection on/off support.
    					Suspend Offline collection upon new
    					command.
    					Offline surface scan supported.
    					Self-test supported.
    					Conveyance Self-test supported.
    					Selective Self-test supported.
    SMART capabilities:            (0x0003)	Saves SMART data before entering
    					power-saving mode.
    					Supports SMART auto save timer.
    Error logging capability:        (0x01)	Error logging supported.
    					General Purpose Logging supported.
    Short self-test routine 
    recommended polling time: 	 (   1) minutes.
    Extended self-test routine
    recommended polling time: 	 ( 633) minutes.
    Conveyance self-test routine
    recommended polling time: 	 (   2) minutes.
    SCT capabilities: 	       (0x50bd)	SCT Status supported.
    					SCT Error Recovery Control supported.
    					SCT Feature Control supported.
    					SCT Data Table supported.
    
    SMART Attributes Data Structure revision number: 10
    Vendor Specific SMART Attributes with Thresholds:
    ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
      1 Raw_Read_Error_Rate     0x000f   075   064   044    Pre-fail  Always       -       30305794
      3 Spin_Up_Time            0x0003   094   093   000    Pre-fail  Always       -       0
      4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       276
      5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
      7 Seek_Error_Rate         0x000f   095   060   045    Pre-fail  Always       -       3166340513
      9 Power_On_Hours          0x0032   069   069   000    Old_age   Always       -       27536h+49m+43.964s
     10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
     12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       104
    184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
    187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
    188 Command_Timeout         0x0032   100   099   000    Old_age   Always       -       7864440
    189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
    190 Airflow_Temperature_Cel 0x0022   081   047   040    Old_age   Always       -       19 (Min/Max 19/19)
    191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
    192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       117
    193 Load_Cycle_Count        0x0032   099   099   000    Old_age   Always       -       2608
    194 Temperature_Celsius     0x0022   019   053   000    Old_age   Always       -       19 (0 6 0 0 0)
    197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
    198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
    199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
    240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       27376h+00m+21.311s
    241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       247975821685
    242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       124682775664
    
    SMART Error Log Version: 1
    No Errors Logged
    
    SMART Self-test log structure revision number 1
    No self-tests have been logged.  [To run self-tests, use: smartctl -t]
    
    SMART Selective self-test log data structure revision number 1
     SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
        1        0        0  Not_testing
        2        0        0  Not_testing
        3        0        0  Not_testing
        4        0        0  Not_testing
        5        0        0  Not_testing
    Selective self-test flags (0x0):
      After scanning selected spans, do NOT read-scan remainder of disk.
    If Selective self-test is pending on power-up, resume after 0 minute delay.
    
    The above only provides legacy SMART information - try 'smartctl -x' for more

Let me know if you can help. I am very new to this.

Edit: added this fdisk test. I do have another unrelated drive, /dev/sdc.

    $ sudo fdisk -l /dev/sd?       
    The primary GPT table is corrupt, but the backup appears OK, so that will be used.
    Disk /dev/sda: 3.64 TiB, 4000787030016 bytes, 7814037168 sectors
    Disk model: ST4000VX007-2DT1
    Units: sectors of 1 * 512 = 512 bytes
    Sector size (logical/physical): 512 bytes / 4096 bytes
    I/O size (minimum/optimal): 4096 bytes / 4096 bytes
    Disklabel type: gpt
    Disk identifier: 0384604C-4E8B-4E0A-8423-2139A918120C
    
    Device     Start        End    Sectors  Size Type
    /dev/sda1   2048 7814035455 7814033408  3.6T Linux filesystem
    The primary GPT table is corrupt, but the backup appears OK, so that will be used.
    
    
    Disk /dev/sdb: 3.64 TiB, 4000787030016 bytes, 7814037168 sectors
    Disk model: ST4000VX007-2DT1
    Units: sectors of 1 * 512 = 512 bytes
    Sector size (logical/physical): 512 bytes / 4096 bytes
    I/O size (minimum/optimal): 4096 bytes / 4096 bytes
    Disklabel type: gpt
    Disk identifier: C079AF04-F6C8-4FB3-9E12-FEFCC65D008F
    
    Device     Start        End    Sectors  Size Type
    /dev/sdb1   2048 7814035455 7814033408  3.6T Linux filesystem
    
    
    Disk /dev/sdc: 7.28 TiB, 8001563222016 bytes, 15628053168 sectors
    Disk model: HGST HDN728080AL
    Units: sectors of 1 * 512 = 512 bytes
    Sector size (logical/physical): 512 bytes / 4096 bytes
    I/O size (minimum/optimal): 4096 bytes / 4096 bytes
    Disklabel type: gpt
    Disk identifier: 5237C016-4DE9-408A-A37B-F1F59F33776E
    
    Device     Start         End     Sectors  Size Type
    /dev/sdc1   2048 15627233279 15627231232  7.3T Microsoft basic data
    
    
    Disk /dev/sdd: 3.64 TiB, 4000787030016 bytes, 7814037168 sectors
    Disk model: ST4000VX007-2DT1
    Units: sectors of 1 * 512 = 512 bytes
    Sector size (logical/physical): 512 bytes / 4096 bytes
    I/O size (minimum/optimal): 4096 bytes / 4096 bytes
    Disklabel type: gpt
    Disk identifier: 56B6E76B-3B41-486B-8857-AD2BEA8D589A
    
    Device     Start        End    Sectors  Size Type
    /dev/sdd1   2048 7814035455 7814033408  3.6T Linux filesystem
    
    
    Disk /dev/sde: 3.64 TiB, 4000787030016 bytes, 7814037168 sectors
    Disk model: ST4000VX007-2DT1
    Units: sectors of 1 * 512 = 512 bytes
    Sector size (logical/physical): 512 bytes / 4096 bytes
    I/O size (minimum/optimal): 4096 bytes / 4096 bytes
    Disklabel type: gpt
    Disk identifier: B306782C-5C23-4C41-A6B9-79AF1FCC6F0E
    
    Device     Start        End    Sectors  Size Type
    /dev/sde1   2048 7814035455 7814033408  3.6T Linux filesystem


                                

Scott Mayo (21 rep)

Jan 4, 2025, 04:32 AM • Last activity: Jan 10, 2025, 11:07 PM

0 votes

2 answers

149 views

How to recreate file system in an external hard drive from scratch?

filesystems hard-disk data-recovery external-hdd smartctl

I've got a Western Digital Technologies, Inc. Elements 25A2. This is the way it introduces itself through the `lsusb` command. Unfortunately, I cannot format it. Through Gnome Disk Utility the command refuses to run and returns: “Error wiping device: Failed to probe the device ‘/dev/sdd’ (udisks-err...

                                  I've got a Western Digital Technologies, Inc. Elements 25A2. This is the way it introduces itself through the lsusb command.

Unfortunately, I cannot format it. Through Gnome Disk Utility the command refuses to run and returns: “Error wiping device: Failed to probe the device ‘/dev/sdd’ (udisks-error-quark, 0)”.

I tried various commands but they all failed:
 - sudo fsck /dev/sdd
 - sudo e2fsck -b 8193 /dev/sdd
 - sudo e2fsck -b 32768 /dev/sdd

I want to recreate a file system because we can consider that the device is totally empty (and notice I haven't got any image copy).

    (base) avy@machine:~$ sudo parted -l
    Erreur: /dev/sdd : étiquette de disque inconnue #unknown disk's label
    Modèle : WD Elements 25A2 (scsi)                                          
    Disque /dev/sdd : 2000GB
    Taille des secteurs (logiques/physiques) : 512B/512B #block's size (logical/physical)
    Table de partitions : unknown #partition table
    Drapeaux de disque : #disk's flags

As far as I know, solutions like DDRescue are based on existing image of a file system (e.g. sudo ddrescue  .img ), so I can’t use them.

The hard drive itself seems "healthy" after SMART control:

 - sudo smartctl --health /dev/sdd → SMART overall-health self-assessment test result: PASSED
 - sudo smartctl --log=error /dev/sdd → SMART Error Log Version: 1 \n No Errors Logged

So, I have a little hope to avoid throwing it in the trash.

Here are the kernel messages relating to the drive:

sudo dmesg --follow

    [  444.527131] usb 2-4: Product: Elements 25A2
    [  444.527134] usb 2-4: Manufacturer: Western Digital
    [  444.527138] usb 2-4: SerialNumber: 575855314533383859304150
    [  444.528739] usb-storage 2-4:1.0: USB Mass Storage device detected
    [  444.529073] scsi host6: usb-storage 2-4:1.0
    [  445.546672] scsi 6:0:0:0: Direct-Access     WD       Elements 25A2    1021 PQ: 0 ANSI: 6
    [  445.546937] sd 6:0:0:0: Attached scsi generic sg3 type 0
    [  445.547812] sd 6:0:0:0: [sdd] Spinning up disk...
    [  446.570416] ........ready
    [  453.738963] sd 6:0:0:0: [sdd] 3906963456 512-byte logical blocks: (2.00 TB/1.82 TiB)
    [  453.739246] sd 6:0:0:0: [sdd] Write Protect is off
    [  453.739251] sd 6:0:0:0: [sdd] Mode Sense: 47 00 10 08
    [  453.739479] sd 6:0:0:0: [sdd] No Caching mode page found
    [  453.739487] sd 6:0:0:0: [sdd] Assuming drive cache: write through
    [  456.492061] sd 6:0:0:0: [sdd] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
    [  456.492069] sd 6:0:0:0: [sdd] tag#0 Sense Key : Medium Error [current] 
    [  456.492074] sd 6:0:0:0: [sdd] tag#0 Add. Sense: Unrecovered read error
    [  456.492080] sd 6:0:0:0: [sdd] tag#0 CDB: Read(10) 28 00 00 00 00 00 00 00 08 00
    [  456.492087] blk_update_request: critical medium error, dev sdd, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
    [  456.492096] Buffer I/O error on dev sdd, logical block 0, async page read
    [  459.561974] sd 6:0:0:0: [sdd] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
    [  459.561982] sd 6:0:0:0: [sdd] tag#0 Sense Key : Medium Error [current] 
    [  459.561988] sd 6:0:0:0: [sdd] tag#0 Add. Sense: Unrecovered read error
    [  459.561994] sd 6:0:0:0: [sdd] tag#0 CDB: Read(10) 28 00 00 00 00 00 00 00 08 00
    [  459.562001] blk_update_request: critical medium error, dev sdd, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
    [  459.562010] Buffer I/O error on dev sdd, logical block 0, async page read
    [  462.747978] sd 6:0:0:0: [sdd] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
    [  462.747986] sd 6:0:0:0: [sdd] tag#0 Sense Key : Medium Error [current] 
    [  462.747991] sd 6:0:0:0: [sdd] tag#0 Add. Sense: Unrecovered read error
    [  462.747997] sd 6:0:0:0: [sdd] tag#0 CDB: Read(10) 28 00 00 00 00 00 00 00 08 00
    [  462.748004] blk_update_request: critical medium error, dev sdd, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
    [  462.748012] Buffer I/O error on dev sdd, logical block 0, async page read
    [  462.748032] ldm_validate_partition_table(): Disk read failed.
    [  465.948181] sd 6:0:0:0: [sdd] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
    [  465.948188] sd 6:0:0:0: [sdd] tag#0 Sense Key : Medium Error [current] 
    [  465.948192] sd 6:0:0:0: [sdd] tag#0 Add. Sense: Unrecovered read error
    [  465.948197] sd 6:0:0:0: [sdd] tag#0 CDB: Read(10) 28 00 00 00 00 00 00 00 08 00
    [  465.948202] blk_update_request: critical medium error, dev sdd, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
    [  465.948210] Buffer I/O error on dev sdd, logical block 0, async page read
    [  469.216369] sd 6:0:0:0: [sdd] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
    [  469.216371] sd 6:0:0:0: [sdd] tag#0 Sense Key : Medium Error [current] 
    [  469.216372] sd 6:0:0:0: [sdd] tag#0 Add. Sense: Unrecovered read error
    [  469.216374] sd 6:0:0:0: [sdd] tag#0 CDB: Read(10) 28 00 00 00 00 00 00 00 08 00
    [  469.216375] blk_update_request: critical medium error, dev sdd, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
    [  469.216378] Buffer I/O error on dev sdd, logical block 0, async page read
    [  472.382131] sd 6:0:0:0: [sdd] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
    [  472.382139] sd 6:0:0:0: [sdd] tag#0 Sense Key : Medium Error [current] 
    [  472.382145] sd 6:0:0:0: [sdd] tag#0 Add. Sense: Unrecovered read error
    [  472.382151] sd 6:0:0:0: [sdd] tag#0 CDB: Read(10) 28 00 00 00 00 00 00 00 08 00
    [  472.382157] blk_update_request: critical medium error, dev sdd, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
    [  472.382166] Buffer I/O error on dev sdd, logical block 0, async page read
    [  475.548568] sd 6:0:0:0: [sdd] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
    [  475.548570] sd 6:0:0:0: [sdd] tag#0 Sense Key : Medium Error [current] 
    [  475.548571] sd 6:0:0:0: [sdd] tag#0 Add. Sense: Unrecovered read error
    [  475.548573] sd 6:0:0:0: [sdd] tag#0 CDB: Read(10) 28 00 00 00 00 00 00 00 08 00
    [  475.548574] blk_update_request: critical medium error, dev sdd, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
    [  475.548577] Buffer I/O error on dev sdd, logical block 0, async page read
    [  475.548611] Dev sdd: unable to read RDB block 0
    [  478.628200] sd 6:0:0:0: [sdd] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
    [  478.628208] sd 6:0:0:0: [sdd] tag#0 Sense Key : Medium Error [current] 
    [  478.628213] sd 6:0:0:0: [sdd] tag#0 Add. Sense: Unrecovered read error
    [  478.628220] sd 6:0:0:0: [sdd] tag#0 CDB: Read(10) 28 00 00 00 00 00 00 00 08 00
    [  478.628227] blk_update_request: critical medium error, dev sdd, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
    [  478.628236] Buffer I/O error on dev sdd, logical block 0, async page read
    [  481.707768] sd 6:0:0:0: [sdd] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
    [  481.707776] sd 6:0:0:0: [sdd] tag#0 Sense Key : Medium Error [current] 
    [  481.707782] sd 6:0:0:0: [sdd] tag#0 Add. Sense: Unrecovered read error
    [  481.707788] sd 6:0:0:0: [sdd] tag#0 CDB: Read(10) 28 00 00 00 00 00 00 00 08 00
    [  481.707795] blk_update_request: critical medium error, dev sdd, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
    [  481.707804] Buffer I/O error on dev sdd, logical block 0, async page read
    [  484.864341] sd 6:0:0:0: [sdd] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
    [  484.864349] sd 6:0:0:0: [sdd] tag#0 Sense Key : Medium Error [current] 
    [  484.864355] sd 6:0:0:0: [sdd] tag#0 Add. Sense: Unrecovered read error
    [  484.864361] sd 6:0:0:0: [sdd] tag#0 CDB: Read(10) 28 00 00 00 00 18 00 00 08 00
    [  484.864368] blk_update_request: critical medium error, dev sdd, sector 24 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
    [  484.864377] Buffer I/O error on dev sdd, logical block 3, async page read
    [  487.982899] sd 6:0:0:0: [sdd] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
    [  487.982907] sd 6:0:0:0: [sdd] tag#0 Sense Key : Medium Error [current] 
    [  487.982912] sd 6:0:0:0: [sdd] tag#0 Add. Sense: Unrecovered read error
    [  487.982919] sd 6:0:0:0: [sdd] tag#0 CDB: Read(10) 28 00 00 00 00 00 00 00 08 00
    [  487.982925] blk_update_request: critical medium error, dev sdd, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
    [  487.982935] Buffer I/O error on dev sdd, logical block 0, async page read
    [  491.116325] sd 6:0:0:0: [sdd] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
    [  491.116333] sd 6:0:0:0: [sdd] tag#0 Sense Key : Medium Error [current] 
    [  491.116339] sd 6:0:0:0: [sdd] tag#0 Add. Sense: Unrecovered read error
    [  491.116345] sd 6:0:0:0: [sdd] tag#0 CDB: Read(10) 28 00 00 00 00 00 00 00 08 00
    [  491.116351] blk_update_request: critical medium error, dev sdd, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
    [  491.116361] Buffer I/O error on dev sdd, logical block 0, async page read
    [  491.116432]  sdd: unable to read partition table
    [  491.504986] sd 6:0:0:0: [sdd] Attached SCSI disk
    [  494.284826] sd 6:0:0:0: [sdd] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
    [  494.284834] sd 6:0:0:0: [sdd] tag#0 Sense Key : Medium Error [current] 
    [  494.284840] sd 6:0:0:0: [sdd] tag#0 Add. Sense: Unrecovered read error
    [  494.284846] sd 6:0:0:0: [sdd] tag#0 CDB: Read(10) 28 00 00 00 00 00 00 00 08 00
    [  494.284853] blk_update_request: critical medium error, dev sdd, sector 0 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
    [  497.472400] sd 6:0:0:0: [sdd] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
    [  497.472408] sd 6:0:0:0: [sdd] tag#0 Sense Key : Medium Error [current] 
    [  497.472413] sd 6:0:0:0: [sdd] tag#0 Add. Sense: Unrecovered read error
    [  497.472419] sd 6:0:0:0: [sdd] tag#0 CDB: Read(10) 28 00 00 00 00 00 00 00 08 00
    [  497.472426] blk_update_request: critical medium error, dev sdd, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
    [  497.472435] Buffer I/O error on dev sdd, logical block 0, async page read


----------
Disclaimer: I won't accept answers such as "Buy a new hard-drive", "Use warranty" (all examples given are true stories). There's nothing of value in this hard-drive, I want to improve my IT skills and bring added value to Stack Exchange platforms.
                                

AvyWam (113 rep)

Nov 29, 2024, 05:34 PM • Last activity: Jan 9, 2025, 10:26 PM

1 votes

3 answers

1472 views

can we automatically wait the required time for smartmontools/smartctl?

smartctl smartmontools

Can we do something like this in a script (preferably zsh): smartctl -t long /dev/sda smartctl -t long /dev/sdb smartctl -t long /dev/sdc [Wait however long smartctl needs] smartctl -H /dev/sda smartctl -H /dev/sdb smartctl -H /dev/sdc As is obvious I'm just trying to automate this.

                                      Can we do something like this in a script (preferably zsh):
    
    smartctl -t long /dev/sda
    smartctl -t long /dev/sdb
    smartctl -t long /dev/sdc
    
    [Wait however long smartctl needs]
    
    smartctl -H /dev/sda
    smartctl -H /dev/sdb
    smartctl -H /dev/sdc

As is obvious I'm just trying to automate this.

                                

Ray Andrews (2615 rep)

Aug 8, 2017, 12:37 AM • Last activity: Nov 9, 2024, 11:05 AM

1 votes

1 answers

406 views

smartctl & device type mismatch

smartctl scsi

I will keep it short, I am trying to better understand the different standards of storage type interfaces, but the output of `smartctl` is confusing me a little. Is this an actual problem in my system (like a saw on another post where some firmware was outdated) or am I misunderstanding the output o...

I will keep it short, I am trying to better understand the different standards of storage type interfaces, but the output of smartctl is confusing me a little. Is this an actual problem in my system (like a saw on another post where some firmware was outdated) or am I misunderstanding the output of smartctl. Observe:

> sudo smartctl --scan
/dev/sda -d scsi # /dev/sda, SCSI device
/dev/nvme0 -d nvme # /dev/nvme0, NVMe device

I have an HDD and an NVMe, but the HDD isn't SCSI as far as I know, unless it is "[Why do my SATA devices show up under /proc/scsi/scsi?](https://unix.stackexchange.com/questions/3901/why-do-my-sata-devices-show-up-under-proc-scsi-scsi) ". But if it is, why can I use both -d ata and -d scsi to get information on it:

> sudo smartctl -d ata --info /dev/sda
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.10.5] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Scorpio Black (AF)
Device Model:     WDC WD5000BPKT-75PK4T0
Serial Number:    WD-WX11EC114329
LU WWN Device Id: 5 0014ee 6ad29b3f3
Firmware Version: 01.01A01
User Capacity:    500,107,862,016 bytes [500 GB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Device is:        In smartctl database 7.3/5387
ATA Version is:   ATA8-ACS (minor revision not indicated)
SATA Version is:  SATA 2.6, 3.0 Gb/s
Local Time is:    Thu Aug 29 14:09:19 2024 WEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

> sudo smartctl -d scsi --info /dev/sda
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.10.5] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

User Capacity:        500,107,862,016 bytes [500 GB]
Logical block size:   512 bytes
Physical block size:  4096 bytes
LU is fully provisioned
Rotation Rate:        7200 rpm
Logical Unit id:      0x50014ee6ad29b3f3
Serial number:        WD-WX11EC114329
Device type:          disk
Local Time is:        Thu Aug 29 14:09:35 2024 WEST
SMART support is:     Unavailable - device lacks SMART capability.

According to the output of both, ata is clearly the "correct" type, but sudo smartctl -d ata --scan returns nothing, (unlike sudo smartctl -d scsi --scan). Why does it seem that I can use both ata and scsi to access information, and why is it detected as scsi by --scan?

Mathias Sven (273 rep)

Aug 29, 2024, 01:19 PM • Last activity: Aug 29, 2024, 02:00 PM

3 votes

0 answers

85 views

HDD hiccups randomly, no SMART errors

hard-disk dmesg smartctl

I use an SSD as boot drive and an HDD as ```/home``` drive. For about the last 2 weeks, the HDD randomly "hiccups" and the system takes one or two seconds to come back to itself. It does not reboot nor crash. There was a slow and hot case Fan, I thought that might be the issue and disconnected it, b...

I use an SSD as boot drive and an HDD as

/home

drive. For about the last 2 weeks, the HDD randomly "hiccups" and the system takes one or two seconds to come back to itself. It does not reboot nor crash. There was a slow and hot case Fan, I thought that might be the issue and disconnected it, but the problem persists. I've also changed the SATA cable and port, but to no avail. I bought the HDD in March. There is also a beep but I'm not sure if it comes from the motherboard buzzer or the HDD. The outputs of [hdsentinel](https://www.hdsentinel.com/) and

HDD Device  1: /dev/sdb
HDD Model ID : ST2000DM008-2UB102
HDD Serial No: WK30LBZ6
HDD Revision : 0001
HDD Size     : 1907729 MB
Interface    : S-ATA Gen3, 6 Gbps
Temperature  : 41 °C
Highest Temp.: 49 °C
Health       : 100 %
Performance  : 100 %
Power on time: 48 days, 17 hours
Est. lifetime: more than 1000 days
  The hard disk status is PERFECT. Problematic or weak sectors were not found and there are no spin up or data transfer errors. 
    No actions needed.

The results of long test from

smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.10.1-arch1-1] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate BarraCuda 3.5 (SMR)
Device Model:     ST2000DM008-2UB102
Serial Number:    WK30LBZ6
LU WWN Device Id: 5 000c50 0f1a9797e
Firmware Version: 0001
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
TRIM Command:     Available
Device is:        In smartctl database 7.3/5528
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 1.5 Gb/s)
Local Time is:    Sat Jul 27 04:21:20 2024 +03
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)	Offline data collection activity
					was never started.
					Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		(    0) seconds.
Offline data collection
capabilities: 			 (0x73) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					No Offline surface scan supported.
					Self-test supported.
					Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   1) minutes.
Extended self-test routine
recommended polling time: 	 ( 200) minutes.
Conveyance self-test routine
recommended polling time: 	 (   2) minutes.
SCT capabilities: 	       (0x30a5)	SCT Status supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   080   064   006    Pre-fail  Always       -       91635948
  3 Spin_Up_Time            0x0003   099   095   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       692
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   082   060   045    Pre-fail  Always       -       142332177
  9 Power_On_Hours          0x0032   099   099   000    Old_age   Always       -       1180h+17m+12.848s
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       692
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   099   000    Old_age   Always       -       0 0 3
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   058   051   040    Old_age   Always       -       42 (Min/Max 42/43)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       406
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       1444
194 Temperature_Celsius     0x0022   042   049   000    Old_age   Always       -       42 (0 18 0 0 0)
195 Hardware_ECC_Recovered  0x001a   080   064   000    Old_age   Always       -       91635948
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       1144h+28m+33.866s
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       6730140016
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       1535526694

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%      1180         -
# 2  Short offline       Completed without error       00%      1169         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

The above only provides legacy SMART information - try 'smartctl -x' for more

There are some errors on

, but I don't know what to make out of the errors.

[Fri Jul 26 14:45:06 2024] ata3.00: exception Emask 0x10 SAct 0x200 SErr 0x40d0202 action 0xe frozen
[Fri Jul 26 14:45:06 2024] ata3.00: irq_stat 0x00000040, connection status changed
[Fri Jul 26 14:45:06 2024] ata3: SError: { RecovComm Persist PHYRdyChg CommWake 10B8B DevExch }
[Fri Jul 26 14:45:06 2024] ata3.00: failed command: READ FPDMA QUEUED
[Fri Jul 26 14:45:06 2024] ata3.00: cmd 60/08:48:98:11:04/00:00:0a:00:00/40 tag 9 ncq dma 4096 in
                                    res 40/00:00:00:4f:c2/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
[Fri Jul 26 14:45:06 2024] ata3.00: status: { DRDY }
[Fri Jul 26 14:45:06 2024] ata3: hard resetting link
[Fri Jul 26 14:45:11 2024] ata3: link is slow to respond, please be patient (ready=0)
[Fri Jul 26 14:45:13 2024] ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[Fri Jul 26 14:45:13 2024] ata3.00: configured for UDMA/100
[Fri Jul 26 14:45:13 2024] ata3: EH complete
[Fri Jul 26 14:45:30 2024] ata3.00: exception Emask 0x10 SAct 0x11c000 SErr 0x40d0202 action 0xe frozen
[Fri Jul 26 14:45:30 2024] ata3.00: irq_stat 0x00000040, connection status changed
[Fri Jul 26 14:45:30 2024] ata3: SError: { RecovComm Persist PHYRdyChg CommWake 10B8B DevExch }
[Fri Jul 26 14:45:30 2024] ata3.00: failed command: READ FPDMA QUEUED
[Fri Jul 26 14:45:30 2024] ata3.00: cmd 60/18:70:08:2b:c0/00:00:46:00:00/40 tag 14 ncq dma 12288 in
                                    res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x10 (ATA bus error)
[Fri Jul 26 14:45:30 2024] ata3.00: status: { DRDY }
[Fri Jul 26 14:45:30 2024] ata3.00: failed command: READ FPDMA QUEUED
[Fri Jul 26 14:45:30 2024] ata3.00: cmd 60/08:78:20:2b:c0/00:00:46:00:00/40 tag 15 ncq dma 4096 in
                                    res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x10 (ATA bus error)
[Fri Jul 26 14:45:30 2024] ata3.00: status: { DRDY }
[Fri Jul 26 14:45:30 2024] ata3.00: failed command: READ FPDMA QUEUED
[Fri Jul 26 14:45:30 2024] ata3.00: cmd 60/e0:80:28:2b:c0/00:00:46:00:00/40 tag 16 ncq dma 114688 in
                                    res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x10 (ATA bus error)
[Fri Jul 26 14:45:30 2024] ata3.00: status: { DRDY }
[Fri Jul 26 14:45:30 2024] ata3.00: failed command: WRITE FPDMA QUEUED
[Fri Jul 26 14:45:30 2024] ata3.00: cmd 61/28:a0:a8:6b:54/00:00:0c:00:00/40 tag 20 ncq dma 20480 out
                                    res 40/00:ff:ff:00:00/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
[Fri Jul 26 14:45:30 2024] ata3.00: status: { DRDY }
[Fri Jul 26 14:45:30 2024] ata3: hard resetting link
[Fri Jul 26 14:45:33 2024] retire_capture_urb: 1 callbacks suppressed
[Fri Jul 26 14:45:36 2024] ata3: link is slow to respond, please be patient (ready=0)
[Fri Jul 26 14:45:38 2024] ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[Fri Jul 26 14:45:38 2024] ata3.00: configured for UDMA/100
[Fri Jul 26 14:45:38 2024] sd 2:0:0:0: [sdb] tag#14 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=7s
[Fri Jul 26 14:45:38 2024] sd 2:0:0:0: [sdb] tag#14 Sense Key : Illegal Request [current] 
[Fri Jul 26 14:45:38 2024] sd 2:0:0:0: [sdb] tag#14 Add. Sense: Unaligned write command
[Fri Jul 26 14:45:38 2024] sd 2:0:0:0: [sdb] tag#14 CDB: Read(10) 28 00 46 c0 2b 08 00 00 18 00
[Fri Jul 26 14:45:38 2024] I/O error, dev sdb, sector 1186999048 op 0x0:(READ) flags 0x80700 phys_seg 3 prio class 3
[Fri Jul 26 14:45:38 2024] sd 2:0:0:0: [sdb] tag#16 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=7s
[Fri Jul 26 14:45:38 2024] sd 2:0:0:0: [sdb] tag#16 Sense Key : Illegal Request [current] 
[Fri Jul 26 14:45:38 2024] sd 2:0:0:0: [sdb] tag#16 Add. Sense: Unaligned write command
[Fri Jul 26 14:45:38 2024] sd 2:0:0:0: [sdb] tag#16 CDB: Read(10) 28 00 46 c0 2b 28 00 00 e0 00
[Fri Jul 26 14:45:38 2024] I/O error, dev sdb, sector 1186999080 op 0x0:(READ) flags 0x80700 phys_seg 28 prio class 3
[Fri Jul 26 14:45:38 2024] ata3: EH complete
[Fri Jul 26 14:45:42 2024] ata3.00: exception Emask 0x10 SAct 0x80 SErr 0x40d0202 action 0xe frozen
[Fri Jul 26 14:45:42 2024] ata3.00: irq_stat 0x00000040, connection status changed
[Fri Jul 26 14:45:42 2024] ata3: SError: { RecovComm Persist PHYRdyChg CommWake 10B8B DevExch }
[Fri Jul 26 14:45:42 2024] ata3.00: failed command: READ FPDMA QUEUED
[Fri Jul 26 14:45:42 2024] ata3.00: cmd 60/08:38:d0:47:21/00:00:0b:00:00/40 tag 7 ncq dma 4096 in
                                    res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x10 (ATA bus error)
[Fri Jul 26 14:45:42 2024] ata3.00: status: { DRDY }
[Fri Jul 26 14:45:42 2024] ata3: hard resetting link
[Fri Jul 26 14:45:47 2024] ata3: link is slow to respond, please be patient (ready=0)
[Fri Jul 26 14:45:49 2024] ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[Fri Jul 26 14:45:49 2024] ata3.00: configured for UDMA/100
[Fri Jul 26 14:45:49 2024] ata3: EH complete
[Fri Jul 26 14:46:03 2024] retire_capture_urb: 10 callbacks suppressed
[Fri Jul 26 14:46:17 2024] ata3.00: limiting speed to UDMA/33:PIO4
[Fri Jul 26 14:46:17 2024] ata3.00: exception Emask 0x10 SAct 0x40000200 SErr 0x40d0202 action 0xe frozen
[Fri Jul 26 14:46:17 2024] ata3.00: irq_stat 0x00000040, connection status changed
[Fri Jul 26 14:46:17 2024] ata3: SError: { RecovComm Persist PHYRdyChg CommWake 10B8B DevExch }
[Fri Jul 26 14:46:17 2024] ata3.00: failed command: READ FPDMA QUEUED
[Fri Jul 26 14:46:17 2024] ata3.00: cmd 60/08:48:20:a4:c7/00:00:27:00:00/40 tag 9 ncq dma 4096 in
                                    res 40/00:00:00:4f:c2/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
[Fri Jul 26 14:46:17 2024] ata3.00: status: { DRDY }
[Fri Jul 26 14:46:17 2024] ata3.00: failed command: READ FPDMA QUEUED
[Fri Jul 26 14:46:17 2024] ata3.00: cmd 60/08:f0:18:aa:44/00:00:3e:00:00/40 tag 30 ncq dma 4096 in
                                    res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x10 (ATA bus error)
[Fri Jul 26 14:46:17 2024] ata3.00: status: { DRDY }
[Fri Jul 26 14:46:17 2024] ata3: hard resetting link
[Fri Jul 26 14:46:23 2024] ata3: link is slow to respond, please be patient (ready=0)
[Fri Jul 26 14:46:24 2024] ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[Fri Jul 26 14:46:24 2024] ata3.00: configured for UDMA/33
[Fri Jul 26 14:46:24 2024] sd 2:0:0:0: [sdb] tag#9 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=7s
[Fri Jul 26 14:46:24 2024] sd 2:0:0:0: [sdb] tag#9 Sense Key : Illegal Request [current] 
[Fri Jul 26 14:46:24 2024] sd 2:0:0:0: [sdb] tag#9 Add. Sense: Unaligned write command
[Fri Jul 26 14:46:24 2024] sd 2:0:0:0: [sdb] tag#9 CDB: Read(10) 28 00 27 c7 a4 20 00 00 08 00
[Fri Jul 26 14:46:24 2024] I/O error, dev sdb, sector 667395104 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
[Fri Jul 26 14:46:24 2024] sd 2:0:0:0: [sdb] tag#30 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=7s
[Fri Jul 26 14:46:24 2024] sd 2:0:0:0: [sdb] tag#30 Sense Key : Illegal Request [current] 
[Fri Jul 26 14:46:24 2024] sd 2:0:0:0: [sdb] tag#30 Add. Sense: Unaligned write command
[Fri Jul 26 14:46:24 2024] sd 2:0:0:0: [sdb] tag#30 CDB: Read(10) 28 00 3e 44 aa 18 00 00 08 00
[Fri Jul 26 14:46:24 2024] I/O error, dev sdb, sector 1044687384 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
[Fri Jul 26 14:46:24 2024] ata3: EH complete
[Fri Jul 26 14:46:30 2024] retire_capture_urb: 3 callbacks suppressed
[Fri Jul 26 14:46:35 2024] retire_capture_urb: 12 callbacks suppressed
[Fri Jul 26 14:46:50 2024] ata3.00: exception Emask 0x10 SAct 0x39bf0 SErr 0x40d0202 action 0xe frozen
[Fri Jul 26 14:46:50 2024] ata3.00: irq_stat 0x00000040, connection status changed
[Fri Jul 26 14:46:50 2024] ata3: SError: { RecovComm Persist PHYRdyChg CommWake 10B8B DevExch }
[Fri Jul 26 14:46:50 2024] ata3.00: failed command: READ FPDMA QUEUED
[Fri Jul 26 14:46:50 2024] ata3.00: cmd 60/20:20:00:c6:50/00:00:40:00:00/40 tag 4 ncq dma 16384 in
                                    res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x10 (ATA bus error)
[Fri Jul 26 14:46:50 2024] ata3.00: status: { DRDY }
[Fri Jul 26 14:46:50 2024] ata3.00: failed command: READ FPDMA QUEUED
[Fri Jul 26 14:46:50 2024] ata3.00: cmd 60/20:28:00:4c:50/00:00:40:00:00/40 tag 5 ncq dma 16384 in
                                    res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x10 (ATA bus error)
[Fri Jul 26 14:46:50 2024] ata3.00: status: { DRDY }
[Fri Jul 26 14:46:50 2024] ata3.00: failed command: READ FPDMA QUEUED
[Fri Jul 26 14:46:50 2024] ata3.00: cmd 60/20:30:78:cb:44/00:00:0f:00:00/40 tag 6 ncq dma 16384 in
                                    res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x10 (ATA bus error)
[Fri Jul 26 14:46:50 2024] ata3.00: status: { DRDY }
[Fri Jul 26 14:46:50 2024] ata3.00: failed command: READ FPDMA QUEUED
[Fri Jul 26 14:46:50 2024] ata3.00: cmd 60/20:38:00:0b:50/00:00:40:00:00/40 tag 7 ncq dma 16384 in
                                    res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x10 (ATA bus error)
[Fri Jul 26 14:46:50 2024] ata3.00: status: { DRDY }
[Fri Jul 26 14:46:50 2024] ata3.00: failed command: READ FPDMA QUEUED
[Fri Jul 26 14:46:50 2024] ata3.00: cmd 60/20:40:00:4a:50/00:00:40:00:00/40 tag 8 ncq dma 16384 in
                                    res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x10 (ATA bus error)
[Fri Jul 26 14:46:50 2024] ata3.00: status: { DRDY }
[Fri Jul 26 14:46:50 2024] ata3.00: failed command: READ FPDMA QUEUED
[Fri Jul 26 14:46:50 2024] ata3.00: cmd 60/20:48:00:15:2c/00:00:0e:00:00/40 tag 9 ncq dma 16384 in
                                    res 40/00:00:00:4f:c2/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
[Fri Jul 26 14:46:50 2024] ata3.00: status: { DRDY }
[Fri Jul 26 14:46:50 2024] ata3.00: failed command: READ FPDMA QUEUED
[Fri Jul 26 14:46:50 2024] ata3.00: cmd 60/20:58:00:08:50/00:00:40:00:00/40 tag 11 ncq dma 16384 in
                                    res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x10 (ATA bus error)
[Fri Jul 26 14:46:50 2024] ata3.00: status: { DRDY }
[Fri Jul 26 14:46:50 2024] ata3.00: failed command: READ FPDMA QUEUED
[Fri Jul 26 14:46:50 2024] ata3.00: cmd 60/20:60:00:34:51/00:00:0c:00:00/40 tag 12 ncq dma 16384 in
                                    res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x10 (ATA bus error)
[Fri Jul 26 14:46:50 2024] ata3.00: status: { DRDY }
[Fri Jul 26 14:46:50 2024] ata3.00: failed command: READ FPDMA QUEUED
[Fri Jul 26 14:46:50 2024] ata3.00: cmd 60/40:78:20:a6:51/00:00:40:00:00/40 tag 15 ncq dma 32768 in
                                    res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x10 (ATA bus error)
[Fri Jul 26 14:46:50 2024] ata3.00: status: { DRDY }
[Fri Jul 26 14:46:50 2024] ata3.00: failed command: READ FPDMA QUEUED
[Fri Jul 26 14:46:50 2024] ata3.00: cmd 60/80:80:60:a6:51/00:00:40:00:00/40 tag 16 ncq dma 65536 in
                                    res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x10 (ATA bus error)
[Fri Jul 26 14:46:50 2024] ata3.00: status: { DRDY }
[Fri Jul 26 14:46:50 2024] ata3.00: failed command: WRITE FPDMA QUEUED
[Fri Jul 26 14:46:50 2024] ata3.00: cmd 61/08:88:40:55:45/00:00:41:00:00/40 tag 17 ncq dma 4096 out
                                    res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x10 (ATA bus error)
[Fri Jul 26 14:46:50 2024] ata3.00: status: { DRDY }
[Fri Jul 26 14:46:50 2024] ata3: hard resetting link
[Fri Jul 26 14:46:56 2024] ata3: link is slow to respond, please be patient (ready=0)
[Fri Jul 26 14:46:57 2024] retire_capture_urb: 3 callbacks suppressed
[Fri Jul 26 14:46:59 2024] ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[Fri Jul 26 14:46:59 2024] ata3.00: configured for UDMA/33
[Fri Jul 26 14:46:59 2024] sd 2:0:0:0: [sdb] tag#4 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=9s
[Fri Jul 26 14:46:59 2024] sd 2:0:0:0: [sdb] tag#4 Sense Key : Illegal Request [current] 
[Fri Jul 26 14:46:59 2024] sd 2:0:0:0: [sdb] tag#4 Add. Sense: Unaligned write command
[Fri Jul 26 14:46:59 2024] sd 2:0:0:0: [sdb] tag#4 CDB: Read(10) 28 00 40 50 c6 00 00 00 20 00
[Fri Jul 26 14:46:59 2024] I/O error, dev sdb, sector 1079035392 op 0x0:(READ) flags 0x80700 phys_seg 4 prio class 0
[Fri Jul 26 14:46:59 2024] sd 2:0:0:0: [sdb] tag#5 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=9s
[Fri Jul 26 14:46:59 2024] sd 2:0:0:0: [sdb] tag#5 Sense Key : Illegal Request [current] 
[Fri Jul 26 14:46:59 2024] sd 2:0:0:0: [sdb] tag#5 Add. Sense: Unaligned write command
[Fri Jul 26 14:46:59 2024] sd 2:0:0:0: [sdb] tag#5 CDB: Read(10) 28 00 40 50 4c 00 00 00 20 00
[Fri Jul 26 14:46:59 2024] I/O error, dev sdb, sector 1079004160 op 0x0:(READ) flags 0x80700 phys_seg 4 prio class 0
[Fri Jul 26 14:46:59 2024] sd 2:0:0:0: [sdb] tag#6 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=9s
[Fri Jul 26 14:46:59 2024] sd 2:0:0:0: [sdb] tag#6 Sense Key : Illegal Request [current] 
[Fri Jul 26 14:46:59 2024] sd 2:0:0:0: [sdb] tag#6 Add. Sense: Unaligned write command
[Fri Jul 26 14:46:59 2024] sd 2:0:0:0: [sdb] tag#6 CDB: Read(10) 28 00 0f 44 cb 78 00 00 20 00
[Fri Jul 26 14:46:59 2024] I/O error, dev sdb, sector 256166776 op 0x0:(READ) flags 0x80700 phys_seg 4 prio class 0
[Fri Jul 26 14:46:59 2024] sd 2:0:0:0: [sdb] tag#7 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=9s
[Fri Jul 26 14:46:59 2024] sd 2:0:0:0: [sdb] tag#7 Sense Key : Illegal Request [current] 
[Fri Jul 26 14:46:59 2024] sd 2:0:0:0: [sdb] tag#7 Add. Sense: Unaligned write command
[Fri Jul 26 14:46:59 2024] sd 2:0:0:0: [sdb] tag#7 CDB: Read(10) 28 00 40 50 0b 00 00 00 20 00
[Fri Jul 26 14:46:59 2024] I/O error, dev sdb, sector 1078987520 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
[Fri Jul 26 14:46:59 2024] sd 2:0:0:0: [sdb] tag#8 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=9s
[Fri Jul 26 14:46:59 2024] sd 2:0:0:0: [sdb] tag#8 Sense Key : Illegal Request [current] 
[Fri Jul 26 14:46:59 2024] sd 2:0:0:0: [sdb] tag#8 Add. Sense: Unaligned write command
[Fri Jul 26 14:46:59 2024] sd 2:0:0:0: [sdb] tag#8 CDB: Read(10) 28 00 40 50 4a 00 00 00 20 00
[Fri Jul 26 14:46:59 2024] I/O error, dev sdb, sector 1079003648 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
[Fri Jul 26 14:46:59 2024] sd 2:0:0:0: [sdb] tag#9 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=9s
[Fri Jul 26 14:46:59 2024] sd 2:0:0:0: [sdb] tag#9 Sense Key : Illegal Request [current] 
[Fri Jul 26 14:46:59 2024] sd 2:0:0:0: [sdb] tag#9 Add. Sense: Unaligned write command
[Fri Jul 26 14:46:59 2024] sd 2:0:0:0: [sdb] tag#9 CDB: Read(10) 28 00 0e 2c 15 00 00 00 20 00
[Fri Jul 26 14:46:59 2024] I/O error, dev sdb, sector 237769984 op 0x0:(READ) flags 0x80700 phys_seg 4 prio class 0
[Fri Jul 26 14:46:59 2024] sd 2:0:0:0: [sdb] tag#11 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=9s
[Fri Jul 26 14:46:59 2024] sd 2:0:0:0: [sdb] tag#11 Sense Key : Illegal Request [current] 
[Fri Jul 26 14:46:59 2024] sd 2:0:0:0: [sdb] tag#11 Add. Sense: Unaligned write command
[Fri Jul 26 14:46:59 2024] sd 2:0:0:0: [sdb] tag#11 CDB: Read(10) 28 00 40 50 08 00 00 00 20 00
[Fri Jul 26 14:46:59 2024] I/O error, dev sdb, sector 1078986752 op 0x0:(READ) flags 0x80700 phys_seg 4 prio class 0
[Fri Jul 26 14:46:59 2024] sd 2:0:0:0: [sdb] tag#12 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=9s
[Fri Jul 26 14:46:59 2024] sd 2:0:0:0: [sdb] tag#12 Sense Key : Illegal Request [current] 
[Fri Jul 26 14:46:59 2024] sd 2:0:0:0: [sdb] tag#12 Add. Sense: Unaligned write command
[Fri Jul 26 14:46:59 2024] sd 2:0:0:0: [sdb] tag#12 CDB: Read(10) 28 00 0c 51 34 00 00 00 20 00
[Fri Jul 26 14:46:59 2024] I/O error, dev sdb, sector 206648320 op 0x0:(READ) flags 0x80700 phys_seg 4 prio class 0
[Fri Jul 26 14:46:59 2024] sd 2:0:0:0: [sdb] tag#15 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=8s
[Fri Jul 26 14:46:59 2024] sd 2:0:0:0: [sdb] tag#15 Sense Key : Illegal Request [current] 
[Fri Jul 26 14:46:59 2024] sd 2:0:0:0: [sdb] tag#15 Add. Sense: Unaligned write command
[Fri Jul 26 14:46:59 2024] sd 2:0:0:0: [sdb] tag#15 CDB: Read(10) 28 00 40 51 a6 20 00 00 40 00
[Fri Jul 26 14:46:59 2024] I/O error, dev sdb, sector 1079092768 op 0x0:(READ) flags 0x80700 phys_seg 8 prio class 0
[Fri Jul 26 14:46:59 2024] sd 2:0:0:0: [sdb] tag#16 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=8s
[Fri Jul 26 14:46:59 2024] sd 2:0:0:0: [sdb] tag#16 Sense Key : Illegal Request [current] 
[Fri Jul 26 14:46:59 2024] sd 2:0:0:0: [sdb] tag#16 Add. Sense: Unaligned write command
[Fri Jul 26 14:46:59 2024] sd 2:0:0:0: [sdb] tag#16 CDB: Read(10) 28 00 40 51 a6 60 00 00 80 00
[Fri Jul 26 14:46:59 2024] I/O error, dev sdb, sector 1079092832 op 0x0:(READ) flags 0x80700 phys_seg 16 prio class 0
[Fri Jul 26 14:46:59 2024] ata3: EH complete
[Fri Jul 26 14:47:15 2024] ata3.00: exception Emask 0x10 SAct 0x20000 SErr 0x40d0202 action 0xe frozen
[Fri Jul 26 14:47:15 2024] ata3.00: irq_stat 0x00000040, connection status changed
[Fri Jul 26 14:47:15 2024] ata3: SError: { RecovComm Persist PHYRdyChg CommWake 10B8B DevExch }
[Fri Jul 26 14:47:15 2024] ata3.00: failed command: READ FPDMA QUEUED
[Fri Jul 26 14:47:15 2024] ata3.00: cmd 60/08:88:00:f9:80/00:00:04:00:00/40 tag 17 ncq dma 4096 in
                                    res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x10 (ATA bus error)
[Fri Jul 26 14:47:15 2024] ata3.00: status: { DRDY }

What should I search for? Are there any other logs that these types of error logged in? I have a second PC, but don't have any other PCs that I can plug this HDD into. I have external HDD boxes though.

Emre Talha (185 rep)

Jul 26, 2024, 12:11 PM • Last activity: Jul 29, 2024, 07:57 PM

4 votes

2 answers

1270 views

smartctl lies that NVME has lifespan of ~2800TBW? What is the real lifespan of my NVME?

smartctl nvme smartmontools

`smartctl -x` on my Samsung SSD 860 EVO M.2 2TB shows: ``` Device Statistics (GP Log 0x04) Page Offset Size Value Flags Description 0x01 ===== = = === == General Statistics (rev 1) == 0x01 0x008 4 1132 --- Lifetime Power-On Resets 0x01 0x010 4 6584 --- Power-on Hours 0x01 0x018 6 59675855461 --- Log...

smartctl -x on my Samsung SSD 860 EVO M.2 2TB shows:

Device Statistics (GP Log 0x04)
Page  Offset Size        Value Flags Description
0x01  =====  =               =  ===  == General Statistics (rev 1) ==
0x01  0x008  4            1132  ---  Lifetime Power-On Resets
0x01  0x010  4            6584  ---  Power-on Hours
0x01  0x018  6     59675855461  ---  Logical Sectors Written
0x01  0x020  6      1711777462  ---  Number of Write Commands
0x01  0x028  6     51882440157  ---  Logical Sectors Read
0x01  0x030  6      1869976194  ---  Number of Read Commands
0x01  0x038  6          293000  ---  Date and Time TimeStamp
0x04  =====  =               =  ===  == General Errors Statistics (rev 1) ==
0x04  0x008  4               0  ---  Number of Reported Uncorrectable Errors
0x04  0x010  4              97  ---  Resets Between Cmd Acceptance and Completion
0x05  =====  =               =  ===  == Temperature Statistics (rev 1) ==
0x05  0x008  1              40  ---  Current Temperature
0x05  0x020  1              64  ---  Highest Temperature
0x05  0x028  1              18  ---  Lowest Temperature
0x05  0x058  1              70  ---  Specified Maximum Operating Temperature
0x06  =====  =               =  ===  == Transport Statistics (rev 1) ==
0x06  0x008  4           20530  ---  Number of Hardware Resets
0x06  0x010  4               0  ---  Number of ASR Events
0x06  0x018  4               0  ---  Number of Interface CRC Errors
0x07  =====  =               =  ===  == Solid State Device Statistics (rev 1) ==
0x07  0x008  1               1  N--  Percentage Used Endurance Indicator
                                |||_ C monitored condition met
                                ||__ D supports DSN
                                |___ N normalized value

A paltry 28TB written sounds a little low for the past year I've had this NVME but it's believable. However, the Percentaged Used Endurance Indicator is only at 1%. That would suggest there's still around 100x that or 2800TBW left in this device, which is more than twice the rated 1200TBW so it can't be a rounding error. Is smartctl lying? (Not that it would lie; I mean, is my NVME lying to smartctl, is smartctl misinterpreting my NVME, etc and etc?) How do I find out the real TBW life remaining in my NVME for sure?

Jack G (269 rep)

Jul 18, 2024, 01:33 AM • Last activity: Jul 18, 2024, 01:42 PM

1 votes

1 answers

828 views

ATA error count increased / failing ssd?

ssd smartctl

So, being annoyed with the constraints of a "fixed" disk layout on my desktop, the other day I decided to migrate my `/` and `/home` to a LVM based configuration. A part of this process I did an rsync data migration to the new logical volumes (essentially using `rsync -avxHAWX --numeric-ids --progre...

So, being annoyed with the constraints of a "fixed" disk layout on my desktop, the other day I decided to migrate my / and /home to a LVM based configuration. A part of this process I did an rsync data migration to the new logical volumes (essentially using rsync -avxHAWX --numeric-ids --progress ...) from a live CD launched via grub. While doing this, I eventually encountered errors from rsync similar to failed verification update discarded. As this was mostly related to non-essential files such as browser cache and having had no issue with the disk previously, I did not think much of it and unfortunately did not keep the exact error messages. Having completed the migration successfully (apart form the previously mentioned issue) the occurrence of the errors from rsync began to bother me, thinking that perhaps the disk is faulty and I would like to determine if that is the case before I use it to store data I actually want to keep. Looking at the syslog, I noticed the following:

Jun 15 11:36:33 master smartd: Device: /dev/sdb [SAT], ATA error count increased from 7 to 1351

which I think looks a bit odd and as well as a significant increase but I do not really know how to interpret it. A simple fsck initially reported no issue but on a subsequent run eventually gave a There are xx inodes containing multiply-claimed blocks which so far seems to have been fixable. During this time entries like the following

Jun 15 15:19:33 master kernel: [13390.589630] blk_update_request: I/O error, dev sdb, sector 158304840 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Jun 15 15:19:33 master kernel: [13390.589636] Buffer I/O error on dev sdb2, logical block 6987849, async page read
Jun 15 15:19:34 master kernel: [13390.773634] sd 1:0:0:0: [sdb] tag#27 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
Jun 15 15:19:34 master kernel: [13390.773638] sd 1:0:0:0: [sdb] tag#27 Sense Key : Medium Error [current] 
Jun 15 15:19:34 master kernel: [13390.773641] sd 1:0:0:0: [sdb] tag#27 Add. Sense: Unrecovered read error - auto reallocate failed
Jun 15 15:19:34 master kernel: [13390.773644] sd 1:0:0:0: [sdb] tag#27 CDB: Read(10) 28 00 09 6f 8a 88 00 00 08 00
Jun 15 15:19:34 master kernel: [13390.773646] blk_update_request: I/O error, dev sdb, sector 158304904 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0

appeared in syslog, though seems to have stopped after the filesystem was fixed. The ATA error count appears to continue to increase, however:

Jun 15 15:36:33 master smartd: Device: /dev/sdb [SAT], ATA error count increased from 4549 to 4623

It might be worth mentioning that as part of the migration I added a few new disks, moved some sata cables around and took an old sata cable into use. I currently have no new cables to replace the used ones but plan on buying some in the coming days. The disk was bought from new about two and a half years ago and as such is out of warranty by local standards and it is probably not worth doing a factory RMA considering the shipping cost from Europe. At this point I do not really trust the state of the disk but I am curious about what is going on with it and any thoughts on the cause and/or solution are welcome. I have provided info from smartctl and hdparm below but as with the syslog entries, I do not really know how to interpret it for certain. Please let me know if any additional information is needed. Regards. # Info ## System

$ uname -a
Linux master 5.15.0-107-generic #117~20.04.1-Ubuntu SMP Tue Apr 30 10:35:57 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

$ lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 20.04.6 LTS
Release:	20.04
Codename:	focal

## Disk ### hdparm

$ sudo hdparm -I /dev/sdb

/dev/sdb:

ATA device, with non-removable media
	Model Number:       Samsung SSD 870 EVO 500GB               
	Serial Number:      XXXXXXXXXXXXX     
	Firmware Revision:  XXXXXXXXXXXXX
	Transport:          Serial, ATA8-AST, SATA 1.0a, SATA II Extensions, SATA Rev 2.5, SATA Rev 2.6, SATA Rev 3.0
Standards:
	Used: unknown (minor revision code 0x005e) 
	Supported: 11 8 7 6 5 
	Likely used: 11
Configuration:
	Logical		max	current
	cylinders	16383	16383
	heads		16	16
	sectors/track	63	63
	--
	CHS current addressable sectors:    16514064
	LBA    user addressable sectors:   268435455
	LBA48  user addressable sectors:   976773168
	Logical  Sector size:                   512 bytes
	Physical Sector size:                   512 bytes
	Logical Sector-0 offset:                  0 bytes
	device size with M = 1024*1024:      476940 MBytes
	device size with M = 1000*1000:      500107 MBytes (500 GB)
	cache/buffer size  = unknown
	Form Factor: 2.5 inch
	Nominal Media Rotation Rate: Solid State Device
Capabilities:
	LBA, IORDY(can be disabled)
	Queue depth: 32
	Standby timer values: spec'd by Standard, no device specific minimum
	R/W multiple sector transfer: Max = 1	Current = 1
	DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6 
	     Cycle time: min=120ns recommended=120ns
	PIO: pio0 pio1 pio2 pio3 pio4 
	     Cycle time: no flow control=120ns  IORDY flow control=120ns
Commands/features:
	Enabled	Supported:
	   *	SMART feature set
	    	Security Mode feature set
	   *	Power Management feature set
	   *	Write cache
	   *	Look-ahead
	   *	Host Protected Area feature set
	   *	WRITE_BUFFER command
	   *	READ_BUFFER command
	   *	NOP cmd
	   *	DOWNLOAD_MICROCODE
	    	SET_MAX security extension
	   *	48-bit Address feature set
	   *	Device Configuration Overlay feature set
	   *	Mandatory FLUSH_CACHE
	   *	FLUSH_CACHE_EXT
	   *	SMART error logging
	   *	SMART self-test
	   *	General Purpose Logging feature set
	   *	WRITE_{DMA|MULTIPLE}_FUA_EXT
	   *	64-bit World wide name
	    	Write-Read-Verify feature set
	   *	WRITE_UNCORRECTABLE_EXT command
	   *	{READ,WRITE}_DMA_EXT_GPL commands
	   *	Segmented DOWNLOAD_MICROCODE
	   *	Gen1 signaling speed (1.5Gb/s)
	   *	Gen2 signaling speed (3.0Gb/s)
	   *	Gen3 signaling speed (6.0Gb/s)
	   *	Native Command Queueing (NCQ)
	   *	Phy event counters
	   *	READ_LOG_DMA_EXT equivalent to READ_LOG_EXT
	   *	DMA Setup Auto-Activate optimization
	    	Device-initiated interface power management
	   *	Asynchronous notification (eg. media change)
	   *	Software settings preservation
	    	Device Sleep (DEVSLP)
	    	unknown 78
	   *	SMART Command Transport (SCT) feature set
	   *	SCT Write Same (AC2)
	   *	SCT Error Recovery Control (AC3)
	   *	SCT Features Control (AC4)
	   *	SCT Data Tables (AC5)
	   *	reserved 69
	   *	DOWNLOAD MICROCODE DMA command
	   *	SET MAX SETPASSWORD/UNLOCK DMA commands
	   *	WRITE BUFFER DMA command
	   *	READ BUFFER DMA command
	   *	Data Set Management TRIM supported (limit 8 blocks)
	   *	Deterministic read ZEROs after TRIM
Security: 
	Master password revision code = 65534
		supported
	not	enabled
	not	locked
		frozen
	not	expired: security count
		supported: enhanced erase
	4min for SECURITY ERASE UNIT. 8min for ENHANCED SECURITY ERASE UNIT.
Logical Unit WWN Device Identifier: 5002538f31447d6e
	NAA		: 5
	IEEE OUI	: 002538
	Unique ID	: f31447d6e
Device Sleep:
	DEVSLP Exit Timeout (DETO): 50 ms (drive)
	Minimum DEVSLP Assertion Time (MDAT): 30 ms (drive)
Checksum: correct

### smartctl -i

$ sudo smartctl -i /dev/sdb
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.15.0-107-generic] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     Samsung SSD 870 EVO 500GB
Serial Number:    XXXXXXXXXXXXX
LU WWN Device Id: XXXXXXXXXXXXX
Firmware Version: XXXXXXXXXXXXX
User Capacity:    500.107.862.016 bytes [500 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-4 T13/BSR INCITS 529 revision 5
SATA Version is:  SATA 3.3, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sat Jun 15 15:57:00 2024 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

### smartctl -H

$ sudo smartctl -H /dev/sdb
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.15.0-107-generic] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

### smartctl -x

$ sudo smartctl -x /dev/sdb
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.15.0-107-generic] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     Samsung SSD 870 EVO 500GB
Serial Number:    XXXXXXXXXXXXX
LU WWN Device Id: XXXXXXXXXXXXX
Firmware Version: XXXXXXXXXXXXX
User Capacity:    500.107.862.016 bytes [500 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-4 T13/BSR INCITS 529 revision 5
SATA Version is:  SATA 3.3, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sat Jun 15 15:59:45 2024 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM feature is:   Unavailable
Rd look-ahead is: Enabled
Write cache is:   Enabled
DSN feature is:   Unavailable
ATA Security is:  Disabled, frozen [SEC2]
Wt Cache Reorder: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)	Offline data collection activity
					was never started.
					Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		(    0) seconds.
Offline data collection
capabilities: 			 (0x53) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					No Offline surface scan supported.
					Self-test supported.
					No Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 (  85) minutes.
SCT capabilities: 	       (0x003d)	SCT Status supported.
					SCT Error Recovery Control supported.
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  5 Reallocated_Sector_Ct   PO--CK   092   092   010    -    38
  9 Power_On_Hours          -O--CK   098   098   000    -    9906
 12 Power_Cycle_Count       -O--CK   098   098   000    -    1110
177 Wear_Leveling_Count     PO--C-   099   099   000    -    21
179 Used_Rsvd_Blk_Cnt_Tot   PO--C-   092   092   010    -    38
181 Program_Fail_Cnt_Total  -O--CK   100   100   010    -    0
182 Erase_Fail_Count_Total  -O--CK   100   100   010    -    0
183 Runtime_Bad_Block       PO--C-   092   092   010    -    38
187 Reported_Uncorrect      -O--CK   099   099   000    -    4623
190 Airflow_Temperature_Cel -O--CK   077   059   000    -    23
195 Hardware_ECC_Recovered  -O-RC-   199   199   000    -    4623
199 UDMA_CRC_Error_Count    -OSRCK   100   100   000    -    0
235 Unknown_Attribute       -O--C-   099   099   000    -    17
241 Total_LBAs_Written      -O--CK   099   099   000    -    14662157986
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

General Purpose Log Directory Version 1
SMART           Log Directory Version 1 [multi-sector log support]
Address    Access  R/W   Size  Description
0x00       GPL,SL  R/O      1  Log Directory
0x01           SL  R/O      1  Summary SMART error log
0x02           SL  R/O      1  Comprehensive SMART error log
0x03       GPL     R/O      1  Ext. Comprehensive SMART error log
0x04       GPL,SL  R/O      8  Device Statistics log
0x06           SL  R/O      1  SMART self-test log
0x07       GPL     R/O      1  Extended self-test log
0x09           SL  R/W      1  Selective self-test log
0x10       GPL     R/O      1  NCQ Command Error log
0x11       GPL     R/O      1  SATA Phy Event Counters log
0x13       GPL     R/O      1  SATA NCQ Send and Receive log
0x30       GPL,SL  R/O      9  IDENTIFY DEVICE data log
0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log
0xa1           SL  VS      16  Device vendor specific log
0xa5           SL  VS      16  Device vendor specific log
0xce           SL  VS      16  Device vendor specific log
0xe0       GPL,SL  R/W      1  SCT Command/Status
0xe1       GPL,SL  R/W      1  SCT Data Transfer

SMART Extended Comprehensive Error Log Version: 1 (1 sectors)
Device Error Count: 4623 (device log contains only the most recent 4 errors)
	CR     = Command Register
	FEATR  = Features Register
	COUNT  = Count (was: Sector Count) Register
	LBA_48 = Upper bytes of LBA High/Mid/Low Registers ]  ATA-8
	LH     = LBA High (was: Cylinder High) Register    ]   LBA
	LM     = LBA Mid (was: Cylinder Low) Register      ] Register
	LL     = LBA Low (was: Sector Number) Register     ]
	DV     = Device (was: Device/Head) Register
	DC     = Device Control Register
	ER     = Error register
	ST     = Status register
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 4623  occurred at disk power-on lifetime: 9906 hours (412 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 f0 00 00 20 01 a7 80 40 00  Error: UNC at LBA = 0x2001a780 = 536979328

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 00 08 00 f0 00 00 20 01 a7 80 40 1e     05:33:19.963  READ FPDMA QUEUED
  47 00 00 00 01 00 00 00 00 06 30 40 1d     05:33:19.963  READ LOG DMA EXT
  47 00 00 00 01 00 00 00 00 00 30 40 1d     05:33:19.963  READ LOG DMA EXT
  47 00 00 00 01 00 00 00 00 00 00 40 1d     05:33:19.963  READ LOG DMA EXT
  47 00 00 00 01 00 00 00 00 08 30 40 1d     05:33:19.963  READ LOG DMA EXT

Error 4622  occurred at disk power-on lifetime: 9906 hours (412 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 e8 00 00 20 01 a7 80 40 00  Error: UNC at LBA = 0x2001a780 = 536979328

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 00 08 00 e8 00 00 20 01 a7 80 40 1d     05:33:19.791  READ FPDMA QUEUED
  47 00 00 00 01 00 00 00 00 06 30 40 0e     05:33:19.791  READ LOG DMA EXT
  47 00 00 00 01 00 00 00 00 00 30 40 0e     05:33:19.791  READ LOG DMA EXT
  47 00 00 00 01 00 00 00 00 00 00 40 0e     05:33:19.791  READ LOG DMA EXT
  47 00 00 00 01 00 00 00 00 08 30 40 0e     05:33:19.791  READ LOG DMA EXT

Error 4621  occurred at disk power-on lifetime: 9906 hours (412 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 70 00 00 09 6f 96 48 40 00  Error: UNC at LBA = 0x096f9648 = 158307912

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 00 08 00 70 00 00 09 6f 96 48 40 0e     05:33:19.527  READ FPDMA QUEUED
  47 00 00 00 01 00 00 00 00 06 30 40 0b     05:33:19.527  READ LOG DMA EXT
  47 00 00 00 01 00 00 00 00 00 30 40 0b     05:33:19.527  READ LOG DMA EXT
  47 00 00 00 01 00 00 00 00 00 00 40 0b     05:33:19.527  READ LOG DMA EXT
  47 00 00 00 01 00 00 00 00 08 30 40 0b     05:33:19.527  READ LOG DMA EXT

Error 4620  occurred at disk power-on lifetime: 9906 hours (412 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 58 00 00 09 6f 84 c8 40 00  Error: WP at LBA = 0x096f84c8 = 158303432

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  61 00 08 00 58 00 00 09 6f 84 c8 40 0b     05:33:19.351  WRITE FPDMA QUEUED
  61 00 08 00 50 00 00 09 6f 84 48 40 0a     05:33:19.351  WRITE FPDMA QUEUED
  61 00 08 00 48 00 00 09 6f 7f 08 40 09     05:33:19.351  WRITE FPDMA QUEUED
  61 00 08 00 40 00 00 09 6f 7e c8 40 08     05:33:19.351  WRITE FPDMA QUEUED
  61 00 08 00 38 00 00 09 6f 7e 88 40 07     05:33:19.351  WRITE FPDMA QUEUED

SMART Extended Self-test Log Version: 1 (1 sectors)
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%      9906         -
# 2  Extended offline    Completed: read failure       90%      6147         119371776
# 3  Extended offline    Completed: read failure       90%      6132         119371776
# 4  Extended offline    Completed: read failure       90%      6132         119371776
# 5  Extended offline    Completed: read failure       90%      6125         119371776
# 6  Short offline       Completed without error       00%      6125         -
# 7  Short offline       Completed without error       00%      6125         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
  256        0    65535  Read_scanning was never started
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

SCT Status Version:                  3
SCT Version (vendor specific):       256 (0x0100)
Device State:                        Active (0)
Current Temperature:                    23 Celsius
Power Cycle Min/Max Temperature:     20/36 Celsius
Lifetime    Min/Max Temperature:     13/41 Celsius
Specified Max Operating Temperature:    70 Celsius
Under/Over Temperature Limit Count:   0/0
SMART Status:                        0xc24f (PASSED)

SCT Temperature History Version:     2
Temperature Sampling Period:         10 minutes
Temperature Logging Interval:        10 minutes
Min/Max recommended Temperature:      0/70 Celsius
Min/Max Temperature Limit:            0/70 Celsius
Temperature History Size (Index):    128 (59)

Index    Estimated Time   Temperature Celsius
  60    2024-06-14 18:40    22  ***
  61    2024-06-14 18:50    23  ****
  62    2024-06-14 19:00    24  *****
  63    2024-06-14 19:10    23  ****
 ...    ..(  3 skipped).    ..  ****
  67    2024-06-14 19:50    23  ****
  68    2024-06-14 20:00    24  *****
  69    2024-06-14 20:10    23  ****
  70    2024-06-14 20:20    23  ****
  71    2024-06-14 20:30    24  *****
  72    2024-06-14 20:40    23  ****
  73    2024-06-14 20:50    23  ****
  74    2024-06-14 21:00    24  *****
  75    2024-06-14 21:10    23  ****
  76    2024-06-14 21:20    24  *****
  77    2024-06-14 21:30    24  *****
  78    2024-06-14 21:40    23  ****
  79    2024-06-14 21:50    24  *****
  80    2024-06-14 22:00    23  ****
  81    2024-06-14 22:10    24  *****
  82    2024-06-14 22:20    25  ******
  83    2024-06-14 22:30    24  *****
  84    2024-06-14 22:40    30  ***********
  85    2024-06-14 22:50    25  ******
  86    2024-06-14 23:00    26  *******
  87    2024-06-14 23:10    24  *****
  88    2024-06-14 23:20    24  *****
  89    2024-06-14 23:30    24  *****
  90    2024-06-14 23:40    25  ******
  91    2024-06-14 23:50    24  *****
  92    2024-06-15 00:00    24  *****
  93    2024-06-15 00:10    24  *****
  94    2024-06-15 00:20    25  ******
  95    2024-06-15 00:30    24  *****
 ...    ..( 17 skipped).    ..  *****
 113    2024-06-15 03:30    24  *****
 114    2024-06-15 03:40    23  ****
 115    2024-06-15 03:50    24  *****
 116    2024-06-15 04:00    24  *****
 117    2024-06-15 04:10    23  ****
 118    2024-06-15 04:20    24  *****
 ...    ..(  5 skipped).    ..  *****
 124    2024-06-15 05:20    24  *****
 125    2024-06-15 05:30    26  *******
 126    2024-06-15 05:40    24  *****
 127    2024-06-15 05:50    23  ****
   0    2024-06-15 06:00    25  ******
   1    2024-06-15 06:10    24  *****
   2    2024-06-15 06:20    24  *****
   3    2024-06-15 06:30    25  ******
   4    2024-06-15 06:40    25  ******
   5    2024-06-15 06:50    23  ****
 ...    ..(  2 skipped).    ..  ****
   8    2024-06-15 07:20    23  ****
   9    2024-06-15 07:30    25  ******
  10    2024-06-15 07:40    25  ******
  11    2024-06-15 07:50    23  ****
  12    2024-06-15 08:00    23  ****
  13    2024-06-15 08:10    25  ******
  14    2024-06-15 08:20    33  **************
  15    2024-06-15 08:30    34  ***************
  16    2024-06-15 08:40    24  *****
  17    2024-06-15 08:50    23  ****
  18    2024-06-15 09:00    25  ******
  19    2024-06-15 09:10    23  ****
  20    2024-06-15 09:20    24  *****
  21    2024-06-15 09:30     ?  -
  22    2024-06-15 09:40     ?  -
  23    2024-06-15 09:50    21  **
  24    2024-06-15 10:00    22  ***
  25    2024-06-15 10:10    22  ***
  26    2024-06-15 10:20    23  ****
  27    2024-06-15 10:30    22  ***
 ...    ..(  4 skipped).    ..  ***
  32    2024-06-15 11:20    22  ***
  33    2024-06-15 11:30    23  ****
  34    2024-06-15 11:40    22  ***
 ...    ..(  5 skipped).    ..  ***
  40    2024-06-15 12:40    22  ***
  41    2024-06-15 12:50    28  *********
  42    2024-06-15 13:00    36  *****************
  43    2024-06-15 13:10    32  *************
  44    2024-06-15 13:20    32  *************
  45    2024-06-15 13:30    23  ****
  46    2024-06-15 13:40    32  *************
  47    2024-06-15 13:50    32  *************
  48    2024-06-15 14:00    33  **************
  49    2024-06-15 14:10    32  *************
  50    2024-06-15 14:20    33  **************
  51    2024-06-15 14:30    33  **************
  52    2024-06-15 14:40    23  ****
  53    2024-06-15 14:50    23  ****
  54    2024-06-15 15:00    24  *****
  55    2024-06-15 15:10    23  ****
 ...    ..(  3 skipped).    ..  ****
  59    2024-06-15 15:50    23  ****

SCT Error Recovery Control:
           Read: Disabled
          Write: Disabled

Device Statistics (GP Log 0x04)
Page  Offset Size        Value Flags Description
0x01  =====  =               =  ===  == General Statistics (rev 1) ==
0x01  0x008  4            1110  ---  Lifetime Power-On Resets
0x01  0x010  4            9906  ---  Power-on Hours
0x01  0x018  6     14662157986  ---  Logical Sectors Written
0x01  0x020  6       162802561  ---  Number of Write Commands
0x01  0x028  6     10406868695  ---  Logical Sectors Read
0x01  0x030  6       263456207  ---  Number of Read Commands
0x01  0x038  6         3403000  ---  Date and Time TimeStamp
0x04  =====  =               =  ===  == General Errors Statistics (rev 1) ==
0x04  0x008  4            4623  ---  Number of Reported Uncorrectable Errors
0x04  0x010  4               1  ---  Resets Between Cmd Acceptance and Completion
0x05  =====  =               =  ===  == Temperature Statistics (rev 1) ==
0x05  0x008  1              23  ---  Current Temperature
0x05  0x020  1              41  ---  Highest Temperature
0x05  0x028  1              13  ---  Lowest Temperature
0x05  0x058  1              70  ---  Specified Maximum Operating Temperature
0x06  =====  =               =  ===  == Transport Statistics (rev 1) ==
0x06  0x008  4            2521  ---  Number of Hardware Resets
0x06  0x010  4               0  ---  Number of ASR Events
0x06  0x018  4               0  ---  Number of Interface CRC Errors
0x07  =====  =               =  ===  == Solid State Device Statistics (rev 1) ==
0x07  0x008  1               0  N--  Percentage Used Endurance Indicator
                                |||_ C monitored condition met
                                ||__ D supports DSN
                                |___ N normalized value

Pending Defects log (GP Log 0x0c) not supported

SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x0001  2            0  Command failed due to ICRC error
0x0002  2            0  R_ERR response for data FIS
0x0003  2            0  R_ERR response for device-to-host data FIS
0x0004  2            0  R_ERR response for host-to-device data FIS
0x0005  2            0  R_ERR response for non-data FIS
0x0006  2            0  R_ERR response for device-to-host non-data FIS
0x0007  2            0  R_ERR response for host-to-device non-data FIS
0x0008  2            0  Device-to-host non-data FIS retries
0x0009  2           14  Transition from drive PhyRdy to drive PhyNRdy
0x000a  2           14  Device-to-host register FISes sent due to a COMRESET
0x000b  2            0  CRC errors within host-to-device FIS
0x000d  2            0  Non-CRC errors within host-to-device FIS
0x000f  2            0  R_ERR response for host-to-device data FIS, CRC
0x0010  2            0  R_ERR response for host-to-device data FIS, non-CRC
0x0012  2            0  R_ERR response for host-to-device non-data FIS, CRC
0x0013  2            0  R_ERR response for host-to-device non-data FIS, non-CRC

ndx (23 rep)

Jun 15, 2024, 02:39 PM • Last activity: Jun 17, 2024, 03:08 PM

1 votes

2 answers

752 views

Disable automatic S.M.A.R.T. tests

debian smartctl smartmontools

I have a small (single-board) Zimaboard server that is running Debian 12 (bookworm) 24/7 in my bedroom. This server has a single HDD hooked-up, which remains unmounted and in sleep mode except for a daily back-up cycle (after which it is unmounted and goes back to sleep). Unfortunately, every Monday...

                                  I have a small (single-board) Zimaboard server that is running Debian 12 (bookworm) 24/7 in my bedroom. This server has a single HDD hooked-up, which remains unmounted and in sleep mode except for a daily back-up cycle (after which it is unmounted and goes back to sleep).

Unfortunately, every Monday at 00:45 AM the server decides to wake up the HDD (and myself...) to execute what sounds like a short S.M.A.R.T test. I then have to grab my phone and issue a sleep command to stop the HDD from making its typical HDD noises afterwards (humming, occasional clicks, ...). As you can imagine, this is incredibly annoying, so I want to fix it.

I first looked for crontab schedules (executed as user or root), but I didn't see anything relevant. journalctl --since ... --until ... didn't report anything useful either. The only uncommented line in /etc/smartd.conf says: DEVICESCAN -d removable -n standby -m root -M exec /usr/share/smartmontools/smartd-runner. I don't see anything there that could point to a weekly maintenance schedule.

Is there any way to find out what triggered the S.M.A.R.T. test (or similar)? Where do I look for log entries? And how do I prevent it from executing in the middle of the night?

MPA (113 rep)

May 4, 2024, 12:40 PM • Last activity: May 6, 2024, 07:51 PM

4 votes

0 answers

4577 views

What's the difference between SMART "long" and "offline" tests?

hard-disk smartctl

What's the difference between the two below? What exactly is tested by each test? ``` smartctl -t offline /dev/sda smartctl -t long /dev/sda ``` According to the [smartctl](https://manpages.ubuntu.com/manpages/noble/en/man8/smartctl.8.html) documentation: > *offline* - [ATA] runs SMART Immediate Off...

What's the difference between the two below? What exactly is tested by each test?

smartctl -t offline /dev/sda
smartctl -t long /dev/sda

According to the [smartctl](https://manpages.ubuntu.com/manpages/noble/en/man8/smartctl.8.html) documentation: > *offline* - [ATA] runs SMART Immediate Offline Test. This immediately starts the test described above. This command can be given during normal system operation. The effects of this test are visible only in that it updates the SMART Attribute values, and if errors are found they will appear in the SMART error log, visible with the '-l error' option. > > *offline* - [SCSI] runs the default self test in foreground. No entry is placed in the self test log. > > *long* - [ATA] runs SMART Extended Self Test (tens of minutes to several hours). This is a longer and more thorough version of the Short Self Test described above. Note that this command can be given during normal system operation (unless run in captive mode - see the '-C' option below). > > *long* - [SCSI] runs the "Background long" self-test. However, this description is somewhat vague and I still don't fully understand the actual difference. Also, which of the tests performs a complete surface scan and sector reallocation if needed?

vvv444 (141 rep)

Apr 30, 2024, 12:14 PM • Last activity: May 2, 2024, 04:26 PM

0 votes

1 answers

1528 views

Is this drive dead?: Samsung SSD 970 EVO Plus 1TB

nvme smartctl

Having bought a used PC and now installing smartd on it, I'm getting smartd "Critical Warning (0x04): Reliability" emails about it (full [pastebin](https://pastebin.com/2rc5cvwg)). The `Percentage Used: 112%` is concerning. Is that enough for smartd to declare "Critical Warning (0x04): Reliability"?...

Having bought a used PC and now installing smartd on it, I'm getting smartd "Critical Warning (0x04): Reliability" emails about it (full [pastebin](https://pastebin.com/2rc5cvwg)) . The Percentage Used: 112% is concerning. Is that enough for smartd to declare "Critical Warning (0x04): Reliability"?

This message was generated by the smartd daemon running on:

   host name:  kosh
   DNS domain: [Empty]

The following warning/error was logged by the smartd daemon:

Device: /dev/nvme0, Critical Warning (0x04): Reliability

Device info:
Samsung SSD 970 EVO Plus 1TB, S/N:S4EWNM0R328374F, FW:2B2QEXM7, 1.00 TB



=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!
- NVM subsystem reliability has been degraded

SMART/Health Information (NVMe Log 0x02)


Percentage Used:                    112%


Error Information (NVMe Log 0x01, 16 of 64 entries)
Num   ErrCount  SQId   CmdId  Status  PELoc          LBA  NSID    VS  Message
  0       4357     0  0x0010  0x4004      -            0     0     -  Invalid Field in Command

Self-test Log (NVMe Log 0x06)
Self-test status: No self-test in progress
No Self-tests Logged

It looks to me like the "Invalid Field in Command" errors are red herrings since I'm running smartmontools version 7.4 where https://www.smartmontools.org/ticket/1222 has been fixed, so that should not cause tests to fail. I then ran:

$ sudo smartctl -t short /dev/nvme0n1

and now sudo smartctl --all /dev/nvme0n1 ends with:

Self-test Log (NVMe Log 0x06)
Self-test status: No self-test in progress
Num  Test_Description  Status                       Power_on_Hours  Failing_LBA  NSID Seg SCT Code
 0   Short             Completed: failed segments             3535            -     1   2   -    -
 1   Short             Completed: failed segments             3535            -     1   2   -    -

But I don't know how to get more information about the "failed segments". Is this enough for me to conclude that the disk is bad and needs replacement, or it there still hope for it?

Peter V. Mørch (665 rep)

Apr 25, 2024, 11:50 AM • Last activity: Apr 25, 2024, 01:38 PM

Showing page 1 of 20 total questions