Unix & Linux Stack Exchange

Q&A for users of Linux, FreeBSD and other Unix-like operating systems

Latest Questions

0 votes

0 answers

36 views

Filesystem becomes read-only at random

Debian crashed on Laptop (Acer Aspire 3, about 4 years old, HDD replaced with ADATA SU650 240GB SSD) and started throwing console errors reading "failed to rotate /var/log/journal: read-only filesystem". It rebooted fine, but a while later refused to load websites and eventually crashed again. Right...

                                  Debian crashed on Laptop (Acer Aspire 3, about 4 years old, HDD replaced with ADATA SU650 240GB SSD) and started throwing console errors reading "failed to rotate /var/log/journal: read-only filesystem".

It rebooted fine, but a while later refused to load websites and eventually crashed again. Right now, it's working fine.

After a quick Google search I installed smartctl to figure out the problem, and though it prints an overall "PASSED", it does have some attributes output "Pre-failed" and I'm not exactly sure how to interpret the rest of the values.

Here's the output:

        smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-37-amd64] (local build)
    Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org
    
    === START OF INFORMATION SECTION ===
    Model Family:     Silicon Motion based SSDs
    Device Model:     ADATA SU650
    Serial Number:    2N20292G46UJ
    LU WWN Device Id: 0 000000 000000000
    Firmware Version: XD0R6305
    User Capacity:    240,057,409,536 bytes [240 GB]
    Sector Size:      512 bytes logical/physical
    Rotation Rate:    Solid State Device
    Form Factor:      2.5 inches
    TRIM Command:     Available, deterministic
    Device is:        In smartctl database 7.3/5319
    ATA Version is:   ACS-3, ATA8-ACS T13/1699-D revision 6
    SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
    Local Time is:    Sun Jun 29 21:36:52 2025 -03
    SMART support is: Available - device has SMART capability.
    SMART support is: Enabled
    
    === START OF READ SMART DATA SECTION ===
    SMART overall-health self-assessment test result: PASSED
    
    General SMART Values:
    Offline data collection status:  (0x00)	Offline data collection activity
    					was never started.
    					Auto Offline Data Collection: Disabled.
    Self-test execution status:      (   0)	The previous self-test routine completed
    					without error or no self-test has ever 
    					been run.
    Total time to complete Offline 
    data collection: 		(    1) seconds.
    Offline data collection
    capabilities: 			 (0x59) SMART execute Offline immediate.
    					No Auto Offline data collection support.
    					Suspend Offline collection upon new
    					command.
    					Offline surface scan supported.
    					Self-test supported.
    					No Conveyance Self-test supported.
    					Selective Self-test supported.
    SMART capabilities:            (0x0002)	Does not save SMART data before
    					entering power-saving mode.
    					Supports SMART auto save timer.
    Error logging capability:        (0x01)	Error logging supported.
    					General Purpose Logging supported.
    Short self-test routine 
    recommended polling time: 	 (   1) minutes.
    Extended self-test routine
    recommended polling time: 	 (   2) minutes.
    
    SMART Attributes Data Structure revision number: 10
    Vendor Specific SMART Attributes with Thresholds:
    ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
      1 Raw_Read_Error_Rate     0x002f   100   100   050    Pre-fail  Always       -       0
      5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
      9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       929
     12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       1439
    160 Uncorrectable_Error_Cnt 0x0032   100   100   050    Old_age   Always       -       0
    161 Valid_Spare_Block_Cnt   0x0032   100   100   050    Old_age   Always       -       100
    163 Initial_Bad_Block_Count 0x0032   100   100   000    Old_age   Always       -       48
    164 Total_Erase_Count       0x0032   100   100   000    Old_age   Always       -       87382
    165 Max_Erase_Count         0x0032   100   100   000    Old_age   Always       -       156
    166 Min_Erase_Count         0x0032   100   100   000    Old_age   Always       -       44
    167 Average_Erase_Count     0x0032   100   100   000    Old_age   Always       -       109
    148 Total_SLC_Erase_Ct      0x0032   100   100   000    Old_age   Always       -       262148
    149 Max_SLC_Erase_Ct        0x0032   100   100   000    Old_age   Always       -       468
    150 Min_SLC_Erase_Ct        0x0032   100   100   000    Old_age   Always       -       132
    151 Average_SLC_Erase_Ct    0x0032   100   100   000    Old_age   Always       -       329
    159 DRAM_1_Bit_Error_Count  0x0032   100   100   000    Old_age   Always       -       0
    168 Max_Erase_Count_of_Spec 0x0032   100   100   000    Old_age   Always       -       468
    169 Remaining_Lifetime_Perc 0x0032   100   100   000    Old_age   Always       -       98
    177 Wear_Leveling_Count     0x0032   100   100   000    Old_age   Always       -       1823
    181 Program_Fail_Cnt_Total  0x0032   100   100   000    Old_age   Always       -       0
    182 Erase_Fail_Count_Total  0x0032   100   100   000    Old_age   Always       -       0
    192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       77
    194 Temperature_Celsius     0x0032   100   100   000    Old_age   Always       -       26
    195 Hardware_ECC_Recovered  0x0032   100   100   000    Old_age   Always       -       403177
    196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
    199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0
    232 Available_Reservd_Space 0x0032   100   100   000    Old_age   Always       -       100
    241 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -       139845
    242 Host_Reads_32MiB        0x0032   100   100   000    Old_age   Always       -       143114
    245 TLC_Writes_32MiB        0x0032   100   100   000    Old_age   Always       -       296002
    
    SMART Error Log Version: 1
    No Errors Logged
    
    SMART Self-test log structure revision number 1
    No self-tests have been logged.  [To run self-tests, use: smartctl -t]
    
    SMART Selective self-test log data structure revision number 0
    Note: revision number not 1 implies that no selective self-test has ever been run
     SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
        1        0        0  Not_testing
        2        0        0  Not_testing
        3        0        0  Not_testing
        4        0        0  Not_testing
        5        0        0  Not_testing
    Selective self-test flags (0x0):
      After scanning selected spans, do NOT read-scan remainder of disk.
    If Selective self-test is pending on power-up, resume after 0 minute delay.

I'd greatly appreciate some advice on what these values mean and what can be done about them. I know that "Old_age" means the device is worn and "Pre-fail" means it's about to give, but I don't really know if this reflects normal wear, lack of maintenance, or is recoverable from.

Thanks in advance!
                                

geistofsttraft (1 rep)

Jun 30, 2025, 12:45 AM • Last activity: Jun 30, 2025, 12:46 AM

4 votes

1 answers

3211 views

NVMe errors diagnostics

logs disk nvme smart

I would like to understand why I get the below mails about S.M.A.R.T. of my new NVMe drive. **DMESG** ```lang-none $ dmesg --ctime | grep -i nvm [Mon Aug 8 10:48:31 2022] nvme nvme0: pci function 0000:3d:00.0 [Mon Aug 8 10:48:31 2022] nvme nvme0: missing or invalid SUBNQN field. [Mon Aug 8 10:48:31...

I would like to understand why I get the below mails about S.M.A.R.T. of my new NVMe drive. **DMESG**

-none
$ dmesg --ctime | grep -i nvm

[Mon Aug  8 10:48:31 2022] nvme nvme0: pci function 0000:3d:00.0
[Mon Aug  8 10:48:31 2022] nvme nvme0: missing or invalid SUBNQN field.
[Mon Aug  8 10:48:31 2022] nvme nvme0: Shutdown timeout set to 8 seconds
[Mon Aug  8 10:48:31 2022] nvme nvme0: 8/0/0 default/read/poll queues
[Mon Aug  8 10:48:31 2022]  nvme0n1: p1 p2
[Mon Aug  8 10:48:37 2022] EXT4-fs (nvme0n1p2): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.
[Mon Aug  8 10:48:37 2022] EXT4-fs (nvme0n1p2): re-mounted. Opts: errors=remount-ro. Quota mode: none.

**NVME ERRORS**

-none
$ sudo nvme error-log /dev/nvme0

...
 Entry   
.................
error_count     : 0
sqid            : 0
cmdid           : 0
status_field    : 0(SUCCESS: The command completed successfully)
phase_tag       : 0
parm_err_loc    : 0
lba             : 0
nsid            : 0
vs              : 0
trtype          : The transport type is not indicated or the error is not transport related.
cs              : 0
trtype_spec_info: 0
.................
...

Could anyone shed some light on why I am getting new mails like this: **MAIL**

-none
# mail

Message 44:
From root@dell-laptop-CENSORED  Sun Aug  7 08:13:07 2022
X-Original-To: root
To: root@dell-laptop-CENSORED
Subject: SMART error (ErrorCount) detected on host: dell-inspiron-15
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 8bit
Date: Sun,  7 Aug 2022 08:12:59 +0200 (CEST)
From: root 

This message was generated by the smartd daemon running on:

   host name:  dell-inspiron-15
   DNS domain: [Empty]

The following warning/error was logged by the smartd daemon:

Device: /dev/nvme0, number of Error Log entries increased from 485 to 486

Device info:
Samsung SSD 970 EVO Plus 2TB, S/N:, FW:2B2QEXM7, 2.00 TB

For details see host's SYSLOG.

You can also use the smartctl utility for further investigation.
The original message about this issue was sent at Fri Apr 22 09:53:56 2022 CEST
Another message will be sent in 24 hours if the problem persists.

**SMART**

-none
# smartctl -a /dev/nvme0n1

smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-43-generic] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       Samsung SSD 970 EVO Plus 2TB
Serial Number:                      
Firmware Version:                   2B2QEXM7
PCI Vendor/Subsystem ID:            0x144d
IEEE OUI Identifier:                0x002538
Total NVM Capacity:                 2,000,398,934,016 [2.00 TB]
Unallocated NVM Capacity:           0
Controller ID:                      4
NVMe Version:                       1.3
Number of Namespaces:               1
Namespace 1 Size/Capacity:          2,000,398,934,016 [2.00 TB]
Namespace 1 Utilization:            544,784,187,392 [544 GB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            002538 5221904ad7
Local Time is:                      Mon Aug  8 11:13:10 2022 CEST
Firmware Updates (0x16):            3 Slots, no Reset required
Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
Optional NVM Commands (0x005f):     Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Log Page Attributes (0x03):         S/H_per_NS Cmd_Eff_Lg
Maximum Data Transfer Size:         512 Pages
Warning  Comp. Temp. Threshold:     85 Celsius
Critical Comp. Temp. Threshold:     85 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     7.50W       -        -    0  0  0  0        0       0
 1 +     5.90W       -        -    1  1  1  1        0       0
 2 +     3.60W       -        -    2  2  2  2        0       0
 3 -   0.0700W       -        -    3  3  3  3      210    1200
 4 -   0.0050W       -        -    4  4  4  4     2000    8000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        44 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    0%
Data Units Read:                    5,565,230 [2.84 TB]
Data Units Written:                 2,658,490 [1.36 TB]
Host Read Commands:                 29,877,415
Host Write Commands:                18,211,598
Controller Busy Time:               112
Power Cycles:                       240
Power On Hours:                     215
Unsafe Shutdowns:                   5
Media and Data Integrity Errors:    0
Error Information Log Entries:      502
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Temperature Sensor 1:               44 Celsius
Temperature Sensor 2:               39 Celsius

Error Information (NVMe Log 0x01, 16 of 64 entries)
Num   ErrCount  SQId   CmdId  Status  PELoc          LBA  NSID    VS
  0        502     0  0x1005  0x4004      -            0     0     -

**SYSLOG**

-none
# cat /var/log/syslog | grep -i smart | grep -i nvm

Aug  7 16:08:27 dell-inspiron-15 smartd: Device: /dev/nvme0, opened
Aug  7 16:08:27 dell-inspiron-15 smartd: Device: /dev/nvme0, Samsung SSD 970 EVO Plus 2TB, S/N:S4J4NM0T201785H, FW:2B2QEXM7, 2.00 TB
Aug  7 16:08:27 dell-inspiron-15 smartd: Device: /dev/nvme0, is SMART capable. Adding to "monitor" list.
Aug  7 16:08:27 dell-inspiron-15 smartd: Device: /dev/nvme0, state read from /var/lib/smartmontools/smartd.Samsung_SSD_970_EVO_Plus_2TB-S4J4NM0T201785H.nvme.state
Aug  7 16:08:27 dell-inspiron-15 smartd: Monitoring 1 ATA/SATA, 0 SCSI/SAS and 1 NVMe devices
Aug  7 16:08:28 dell-inspiron-15 smartd: Device: /dev/nvme0, number of Error Log entries increased from 486 to 487
Aug  7 16:08:28 dell-inspiron-15 smartd: Device: /dev/nvme0, state written to /var/lib/smartmontools/smartd.Samsung_SSD_970_EVO_Plus_2TB-S4J4NM0T201785H.nvme.state
Aug  8 07:17:38 dell-inspiron-15 smartd: Device: /dev/nvme0, opened
Aug  8 07:17:38 dell-inspiron-15 smartd: Device: /dev/nvme0, Samsung SSD 970 EVO Plus 2TB, S/N:S4J4NM0T201785H, FW:2B2QEXM7, 2.00 TB
Aug  8 07:17:38 dell-inspiron-15 smartd: Device: /dev/nvme0, is SMART capable. Adding to "monitor" list.
Aug  8 07:17:38 dell-inspiron-15 smartd: Device: /dev/nvme0, state read from /var/lib/smartmontools/smartd.Samsung_SSD_970_EVO_Plus_2TB-S4J4NM0T201785H.nvme.state
Aug  8 07:17:38 dell-inspiron-15 smartd: Monitoring 1 ATA/SATA, 0 SCSI/SAS and 1 NVMe devices
Aug  8 07:17:38 dell-inspiron-15 smartd: Device: /dev/nvme0, number of Error Log entries increased from 487 to 488
Aug  8 07:17:38 dell-inspiron-15 smartd: Device: /dev/nvme0, state written to /var/lib/smartmontools/smartd.Samsung_SSD_970_EVO_Plus_2TB-S4J4NM0T201785H.nvme.state
Aug  8 08:21:16 dell-inspiron-15 smartd: Device: /dev/nvme0, state written to /var/lib/smartmontools/smartd.Samsung_SSD_970_EVO_Plus_2TB-S4J4NM0T201785H.nvme.state
Aug  8 11:14:00 dell-inspiron-15 smartd: Device: /dev/nvme0, opened
Aug  8 11:14:00 dell-inspiron-15 smartd: Device: /dev/nvme0, Samsung SSD 970 EVO Plus 2TB, S/N:S4J4NM0T201785H, FW:2B2QEXM7, 2.00 TB
Aug  8 11:14:00 dell-inspiron-15 smartd: Device: /dev/nvme0, is SMART capable. Adding to "monitor" list.
Aug  8 11:14:00 dell-inspiron-15 smartd: Device: /dev/nvme0, state read from /var/lib/smartmontools/smartd.Samsung_SSD_970_EVO_Plus_2TB-S4J4NM0T201785H.nvme.state
Aug  8 11:14:00 dell-inspiron-15 smartd: Monitoring 1 ATA/SATA, 0 SCSI/SAS and 1 NVMe devices
Aug  8 11:14:00 dell-inspiron-15 smartd: Device: /dev/nvme0, number of Error Log entries increased from 488 to 494
Aug  8 11:14:01 dell-inspiron-15 smartd: Device: /dev/nvme0, state written to /var/lib/smartmontools/smartd.Samsung_SSD_970_EVO_Plus_2TB-S4J4NM0T201785H.nvme.state
Aug  8 12:48:40 dell-inspiron-15 smartd: Device: /dev/nvme0, opened
Aug  8 12:48:40 dell-inspiron-15 smartd: Device: /dev/nvme0, Samsung SSD 970 EVO Plus 2TB, S/N:S4J4NM0T201785H, FW:2B2QEXM7, 2.00 TB
Aug  8 12:48:40 dell-inspiron-15 smartd: Device: /dev/nvme0, is SMART capable. Adding to "monitor" list.
Aug  8 12:48:40 dell-inspiron-15 smartd: Device: /dev/nvme0, state read from /var/lib/smartmontools/smartd.Samsung_SSD_970_EVO_Plus_2TB-S4J4NM0T201785H.nvme.state
Aug  8 12:48:40 dell-inspiron-15 smartd: Monitoring 1 ATA/SATA, 0 SCSI/SAS and 1 NVMe devices
Aug  8 12:48:40 dell-inspiron-15 smartd: Device: /dev/nvme0, number of Error Log entries increased from 494 to 502
Aug  8 12:48:40 dell-inspiron-15 smartd: Device: /dev/nvme0, state written to /var/lib/smartmontools/smartd.Samsung_SSD_970_EVO_Plus_2TB-S4J4NM0T201785H.nvme.state

Vlastimil Burián (30505 rep)

Aug 8, 2022, 09:22 AM • Last activity: Oct 29, 2024, 10:53 AM

8 votes

2 answers

7379 views

SMART health-test and status

hard-disk smartctl smart

I have an external USB-drive which is giving me the following output on running the command $ smartctl /dev/sdb -H on it: SMART Status not supported: Incomplete response, ATA output registers missing SMART overall-health self-assessment test result: PASSED Warning: This result is based on an Attribu...

                                  I have an external USB-drive which is giving me the following output on running the command 

    $ smartctl /dev/sdb -H

on it: 

    SMART Status not supported: Incomplete response, ATA output registers missing
    SMART overall-health self-assessment test result: PASSED 
    Warning: This result is based on an Attribute check. 

Could you elaborate if this is something to worry about or if it is just a wrong setting? Generally, what is the meaning of the health status in simplified form? 

Maybe as a relevant aside: The short and long tests finish without issues.

user3058865 (183 rep)

Aug 6, 2017, 02:34 PM • Last activity: Apr 2, 2024, 07:41 PM

24 votes

5 answers

25365 views

Linux - Repairing bad blocks on a RAID1 array with GPT

software-raid badblocks smart

The tl;dr: how would I go about fixing a bad block on 1 disk in a RAID1 array? But please read this whole thing for what I've tried already and possible errors in my methods. I've tried to be as detailed as possible, and I'm really hoping for some feedback This is my situation: I have two 2TB disks...

                                  The tl;dr: how would I go about fixing a bad block on 1 disk in a RAID1 array?

But please read this whole thing for what I've tried already and possible errors in my methods. I've tried to be as detailed as possible, and I'm really hoping for some feedback

This is my situation: I have two 2TB disks (same model) set up in a RAID1 array managed by mdadm. About 6 months ago I noticed the first bad block when SMART reported it. Today I noticed more, and am now trying to fix it.

This HOWTO page  seems to be the one article everyone links to to fix bad blocks that SMART is reporting. It's a great page, full of info, however it is fairly outdated and doesn't address my particular setup. Here is how my config is different:

* Instead of one disk, I'm using two disks in a RAID1 array. One disk is reporting errors while the other is fine. The HOWTO is written with only one disk in mind, which bring up various questions such as 'do I use this command on the disk device or the RAID device'?
* I'm using GPT, which fdisk does not support. I've been using gdisk instead, and I'm hoping that it is giving me the same info that I need

So, lets get down to it. This is what I have done, however it doesn't seem to be working. Please feel free to double check my calculations and method for errors. The disk reporting errors is /dev/sda:

    # smartctl -l selftest /dev/sda
    smartctl 5.42 2011-10-20 r3458 [x86_64-linux-3.4.4-2-ARCH] (local build)
    Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net 
    
    === START OF READ SMART DATA SECTION ===
    SMART Self-test log structure revision number 1
    Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
    # 1  Short offline       Completed: read failure       90%     12169         3212761936

With this, we gather that the error resides on LBA 3212761936. Following the HOWTO, I use gdisk to find the start sector to be used later in determining the block number (as I cannot use fdisk since it does not support GPT):

    # gdisk -l /dev/sda
    GPT fdisk (gdisk) version 0.8.5
    
    Partition table scan:
      MBR: protective
      BSD: not present
      APM: not present
      GPT: present
    
    Found valid GPT with protective MBR; using GPT.
    Disk /dev/sda: 3907029168 sectors, 1.8 TiB
    Logical sector size: 512 bytes
    Disk identifier (GUID): CFB87C67-1993-4517-8301-76E16BBEA901
    Partition table holds up to 128 entries
    First usable sector is 34, last usable sector is 3907029134
    Partitions will be aligned on 2048-sector boundaries
    Total free space is 2014 sectors (1007.0 KiB)
    
    Number  Start (sector)    End (sector)  Size       Code  Name
       1            2048      3907029134   1.8 TiB     FD00  Linux RAID

Using tunefs I find the blocksize to be 4096. Using this info and the calculuation from the HOWTO, I conclude that the block in question is ((3212761936 - 2048) * 512) / 4096 = 401594986.

The HOWTO then directs me to debugfs to see if the block is in use (I use the RAID device as it needs an EXT filesystem, this was one of the commands that confused me as I did not, at first, know if I should use /dev/sda or /dev/md0):

    # debugfs
    debugfs 1.42.4 (12-June-2012)
    debugfs:  open /dev/md0
    debugfs:  testb 401594986
    Block 401594986 not in use

So block 401594986 is empty space, I should be able to write over it without problems. Before writing to it, though, I try to make sure that it, indeed, cannot be read:

    # dd if=/dev/sda1 of=/dev/null bs=4096 count=1 seek=401594986
    1+0 records in
    1+0 records out
    4096 bytes (4.1 kB) copied, 0.000198887 s, 20.6 MB/s

If the block could not be read, I wouldn't expect this to work. However, it does. I repeat using /dev/sda, /dev/sda1, /dev/sdb, /dev/sdb1, /dev/md0, and +-5 to the block number to search around the bad block. It all works. I shrug my shoulders and go ahead and commit the write and sync (I use /dev/md0 because I figured modifying one disk and not the other might cause issues, this way both disks overwrite the bad block):

    # dd if=/dev/zero of=/dev/md0 bs=4096 count=1 seek=401594986
    1+0 records in
    1+0 records out
    4096 bytes (4.1 kB) copied, 0.000142366 s, 28.8 MB/s
    # sync 

I would expect that writing to the bad block would have the disks reassign the block to a good one, however running another SMART test shows differently:

    # 1  Short offline       Completed: read failure       90%     12170         3212761936

Back to square 1. So basically, how would I fix a bad block on 1 disk in a RAID1 array? I'm sure I've not done something correctly...

Thanks for your time and patience.


----------


EDIT 1:
-------

I've tried to run an long SMART test, with the same LBA returning as bad (the only difference is it reports 30% remaining rather than 90%):

    SMART Self-test log structure revision number 1
    Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
    # 1  Extended offline    Completed: read failure       30%     12180         3212761936
    # 2  Short offline       Completed: read failure       90%     12170         3212761936

I've also used badblocks with the following output. The output is strange and seems to be miss-formatted, but I tried to test the numbers outputed as blocks but debugfs gives an error

    # badblocks -sv /dev/sda
    Checking blocks 0 to 1953514583
    Checking for bad blocks (read-only test): 1606380968ne, 3:57:08 elapsed. (0/0/0 errors)
    1606380969ne, 3:57:39 elapsed. (1/0/0 errors)
    1606380970ne, 3:58:11 elapsed. (2/0/0 errors)
    1606380971ne, 3:58:43 elapsed. (3/0/0 errors)
    done
    Pass completed, 4 bad blocks found. (4/0/0 errors)
    # debugfs
    debugfs 1.42.4 (12-June-2012)
    debugfs:  open /dev/md0
    debugfs:  testb 1606380968
    Illegal block number passed to ext2fs_test_block_bitmap #1606380968 for block bitmap for /dev/md0
    Block 1606380968 not in use

Not sure where to go from here. badblocks definitely found something, but I'm not sure what to do with the information presented...


----------

EDIT 2
------

More commands and info.

I feel like an idiot forgetting to include this originally. This is SMART values for /dev/sda. I have 1 Current_Pending_Sector, and 0 Offline_Uncorrectable.

    SMART Attributes Data Structure revision number: 16
    Vendor Specific SMART Attributes with Thresholds:
    ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
      1 Raw_Read_Error_Rate     0x002f   100   100   051    Pre-fail  Always       -       166
      2 Throughput_Performance  0x0026   055   055   000    Old_age   Always       -       18345
      3 Spin_Up_Time            0x0023   084   068   025    Pre-fail  Always       -       5078
      4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       75
      5 Reallocated_Sector_Ct   0x0033   252   252   010    Pre-fail  Always       -       0
      7 Seek_Error_Rate         0x002e   252   252   051    Old_age   Always       -       0
      8 Seek_Time_Performance   0x0024   252   252   015    Old_age   Offline      -       0
      9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       12224
     10 Spin_Retry_Count        0x0032   252   252   051    Old_age   Always       -       0
     11 Calibration_Retry_Count 0x0032   252   252   000    Old_age   Always       -       0
     12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       75
    181 Program_Fail_Cnt_Total  0x0022   100   100   000    Old_age   Always       -       1646911
    191 G-Sense_Error_Rate      0x0022   100   100   000    Old_age   Always       -       12
    192 Power-Off_Retract_Count 0x0022   252   252   000    Old_age   Always       -       0
    194 Temperature_Celsius     0x0002   064   059   000    Old_age   Always       -       36 (Min/Max 22/41)
    195 Hardware_ECC_Recovered  0x003a   100   100   000    Old_age   Always       -       0
    196 Reallocated_Event_Count 0x0032   252   252   000    Old_age   Always       -       0
    197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       1
    198 Offline_Uncorrectable   0x0030   252   100   000    Old_age   Offline      -       0
    199 UDMA_CRC_Error_Count    0x0036   200   200   000    Old_age   Always       -       0
    200 Multi_Zone_Error_Rate   0x002a   100   100   000    Old_age   Always       -       30
    223 Load_Retry_Count        0x0032   252   252   000    Old_age   Always       -       0
    225 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       77

    # mdadm -D /dev/md0
    /dev/md0:
            Version : 1.2
      Creation Time : Thu May  5 06:30:21 2011
         Raid Level : raid1
         Array Size : 1953512383 (1863.01 GiB 2000.40 GB)
      Used Dev Size : 1953512383 (1863.01 GiB 2000.40 GB)
       Raid Devices : 2
      Total Devices : 2
        Persistence : Superblock is persistent
    
        Update Time : Tue Jul  3 22:15:51 2012
              State : clean
     Active Devices : 2
    Working Devices : 2
     Failed Devices : 0
      Spare Devices : 0
    
               Name : server:0  (local to host server)
               UUID : e7ebaefd:e05c9d6e:3b558391:9b131afb
             Events : 67889
    
        Number   Major   Minor   RaidDevice State
           2       8        1        0      active sync   /dev/sda1
           1       8       17        1      active sync   /dev/sdb1

As per one of the answers: it would seem I did switch seek and skip for dd. I was using seek as that's what is used with the HOWTO. Using this command causes dd to hang: 
    # dd if=/dev/sda1 of=/dev/null bs=4096 count=1 skip=401594986

Using blocks around that one (..84, ..85, ..87, ..88) seems to work just fine, and using /dev/sdb1 with block 401594986 reads just fine as well (as expected as that disk passed SMART testing). Now, the question that I have is: When writing over this area to reassign the blocks, do I use /dev/sda1 or /dev/md0? I don't want to cause any issues with the RAID array by writing directly to one disk and not having the other disk update.

EDIT 3
------
Writing to the block directly produced filesystem errors. I've chosen an answer that solved the problem quickly:


    # 1  Short offline       Completed without error       00%     14211         -
    # 2  Extended offline    Completed: read failure       30%     12244         3212761936

Thanks to everyone who helped. =)





                                

blitzmann (405 rep)

Jul 3, 2012, 10:24 PM • Last activity: Feb 8, 2024, 02:00 PM

2 votes

0 answers

490 views

Are SMART offline data collection and offline attributes obsolete?

smartctl smart smartmontools

**TLDR;** I tried to understand the difference between SMART `Offline` and `Always` attributes, and thus to understand what SMART offline data collection is, and if I should enable it on my HDDs. `smartmontools`' [official wiki][1] states that : > Note that the SMART automatic offline test command i...

**TLDR;** I tried to understand the difference between SMART Offline and Always attributes, and thus to understand what SMART offline data collection is, and if I should enable it on my HDDs. smartmontools' official wiki states that : > Note that the SMART automatic offline test command is listed as Obsolete in every version of the ATA and ATA/ATAPI Specifications. (...) However it is implemented and used by many vendors. After some extensive reading on the web, and also some tests, the conclusion I reached is: - Nowadays SMART offline data collection is obsolete - All data is updated in real time (e.g. Offline and Always attributes behave the same way) - There is no need to enable "Auto Offline Data Collection" (# smartctl --offlineauto=on /dev/sda), nor to ever launch it manually (# smartctl -t offline /dev/sda). - As for the reason all this offline stuff is still in smartmontools, it's probably to keep it compatible with some very old HDDs that indeed implemented real offline attributes. Am I right ? Or do I miss something ? ---------- **MORE DETAILS** I did some tests on a HDD, which has 3 offline attributes (and has Auto Offline Data Collection disabled):

# smartctl -a /dev/sda
(...)
Offline data collection status:  (0x00)    Offline data collection activity
                    was never started.
                    Auto Offline Data Collection: Disabled.
(...)
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       235 (114 97 0)
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       13381561756
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       20472945077
(...)

I then wrote some data on that drive, and noted that all 3 attributes were updated in real time. Thus, they are in fact online (or Always) attributes, and not Offline ones. I did the same test on a few other HDDs, the behavior was identical.

ChennyStar (1969 rep)

Nov 6, 2023, 04:55 PM • Last activity: Nov 6, 2023, 05:01 PM

2 votes

1 answers

3395 views

How can I view the smart logs for an NVMe disk in Linux when smartclt is showing there are errors?

linux debian smartctl nvme smart

My daily driver (Debian Bookworm RC3 + KDE Plasma) is configured to send me emails containing error notifications. Today, I received the following email: ``` This message was generated by the smartd daemon running on: host name: desk DNS domain: local.lan The following warning/error was logged by th...

My daily driver (Debian Bookworm RC3 + KDE Plasma) is configured to send me emails containing error notifications. Today, I received the following email:

This message was generated by the smartd daemon running on:

   host name:  desk
   DNS domain: local.lan

The following warning/error was logged by the smartd daemon:

Device: /dev/nvme0, number of Error Log entries increased from 1754 to 1758

Device info:
KBG30ZMV256G TOSHIBA, S/N:X8OPD1PGP12P, FW:ADHA0101

For details see host's SYSLOG.

You can also use the smartctl utility for further investigation.
The original message about this issue was sent at Wed May 17 16:09:04 2023 EDT
Another message will be sent in 24 hours if the problem persists.

This is what sudo journalctl -t smart shows:

May 20 15:19:47 desk smartd: smartd 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-9-amd64] (local build)
May 20 15:19:47 desk smartd: Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org
May 20 15:19:47 desk smartd: Opened configuration file /etc/smartd.conf
May 20 15:19:47 desk smartd: Drive: DEVICESCAN, implied '-a' Directive on line 21 of file /etc/smartd.conf
May 20 15:19:47 desk smartd: Configuration file /etc/smartd.conf was parsed, found DEVICESCAN, scanning devices
May 20 15:19:47 desk smartd: Device: /dev/sda, type changed from 'scsi' to 'sat'
May 20 15:19:47 desk smartd: Device: /dev/sda [SAT], opened
May 20 15:19:47 desk smartd: Device: /dev/sda [SAT], CT4000MX500SSD1, S/N:2304E6A3D318, WWN:5-00a075-1e6a3d318, FW:M3CR045, 4.00 TB
May 20 15:19:47 desk smartd: Device: /dev/sda [SAT], not found in smartd database 7.3/5319.
May 20 15:19:47 desk smartd: Device: /dev/sda [SAT], is SMART capable. Adding to "monitor" list.
May 20 15:19:47 desk smartd: Device: /dev/sda [SAT], state read from /var/lib/smartmontools/smartd.CT4000MX500SSD1-2304E6A3D318.ata.state
May 20 15:19:47 desk smartd: Device: /dev/nvme0, opened
May 20 15:19:47 desk smartd: Device: /dev/nvme0, KBG30ZMV256G TOSHIBA, S/N:X8OPD1PGP12P, FW:ADHA0101
May 20 15:19:47 desk smartd: Device: /dev/nvme0, is SMART capable. Adding to "monitor" list.
May 20 15:19:47 desk smartd: Device: /dev/nvme0, state read from /var/lib/smartmontools/smartd.KBG30ZMV256G_TOSHIBA-X8OPD1PGP12P.nvme.state
May 20 15:19:47 desk smartd: Monitoring 1 ATA/SATA, 0 SCSI/SAS and 1 NVMe devices
May 20 15:19:48 desk smartd: Device: /dev/nvme0, number of Error Log entries increased from 1754 to 1758
May 20 15:19:48 desk smartd: Sending warning via /usr/share/smartmontools/smartd-runner to root ...
May 20 15:19:48 desk smartd: Warning via /usr/share/smartmontools/smartd-runner to root: successful
May 20 15:19:48 desk smartd: Device: /dev/sda [SAT], state written to /var/lib/smartmontools/smartd.CT4000MX500SSD1-2304E6A3D318.ata.state
May 20 15:19:48 desk smartd: Device: /dev/nvme0, state written to /var/lib/smartmontools/smartd.KBG30ZMV256G_TOSHIBA-X8OPD1PGP12P.nvme.state
May 20 15:49:48 desk smartd: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 73 to 74
May 20 22:49:48 desk smartd: Device: /dev/nvme0, number of Error Log entries increased from 1758 to 1760

When I run sudo smartctl -i -a /dev/nvme0, it shows me the error count, but I can't figure out how to see the log message associated to the increase count:

smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-9-amd64] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       KBG30ZMV256G TOSHIBA
Serial Number:                      X8OPD1PGP12P
Firmware Version:                   ADHA0101
PCI Vendor/Subsystem ID:            0x1179
IEEE OUI Identifier:                0x00080d
Controller ID:                      0
NVMe Version:                       1.2.1
Number of Namespaces:               1
Namespace 1 Size/Capacity:          256,060,514,304 [256 GB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            00080d 04004ad9aa
Local Time is:                      Sat May 20 23:09:32 2023 EDT
Firmware Updates (0x12):            1 Slot, no Reset required
Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
Optional NVM Commands (0x0017):     Comp Wr_Unc DS_Mngmt Sav/Sel_Feat
Log Page Attributes (0x02):         Cmd_Eff_Lg
Maximum Data Transfer Size:         512 Pages
Warning  Comp. Temp. Threshold:     82 Celsius
Critical Comp. Temp. Threshold:     85 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     3.30W       -        -    0  0  0  0        0       0
 1 +     2.70W       -        -    1  1  1  1        0       0
 2 +     2.30W       -        -    2  2  2  2        0       0
 3 -   0.0500W       -        -    4  4  4  4     8000   32000
 4 -   0.0050W       -        -    4  4  4  4     8000   40000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 -    4096       0         0
 1 +     512       0         3

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        32 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    30%
Data Units Read:                    23,188,612 [11.8 TB]
Data Units Written:                 39,727,036 [20.3 TB]
Host Read Commands:                 222,771,983
Host Write Commands:                498,052,687
Controller Busy Time:               7,440
Power Cycles:                       291
Power On Hours:                     20,378
Unsafe Shutdowns:                   615
Media and Data Integrity Errors:    0
Error Information Log Entries:      1,760
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Temperature Sensor 1:               32 Celsius

Error Information (NVMe Log 0x01, 16 of 64 entries)
Num   ErrCount  SQId   CmdId  Status  PELoc          LBA  NSID    VS
  0       1760     0  0x501a  0xc005  0x028            -     1     -
  1       1759     0  0xb012  0xc005  0x028            -     1     -
  2       1758     0  0x5010  0xc005  0x028            -     0     -

How can I figure out what the errors are?

IMTheNachoMan (433 rep)

May 21, 2023, 03:12 AM • Last activity: Sep 28, 2023, 08:38 AM

1 votes

1 answers

232 views

SMART error (CurrentPendingSector) and (OfflineUncorrectableSector)

hard-disk smartctl smart smartmontools

I have been receiving the following error messages every day for several months now, and I do not know how to stop receiving these messages. `CurrentPendingSector` ``` This message was generated by the smartd daemon running on: host name: myhost DNS domain: [Empty] The following warning/error was lo...

I have been receiving the following error messages every day for several months now, and I do not know how to stop receiving these messages. CurrentPendingSector

This message was generated by the smartd daemon running on:

   host name:  myhost
   DNS domain: [Empty]

The following warning/error was logged by the smartd daemon:

Device: /dev/sda [SAT], 6 Currently unreadable (pending) sectors

Device info:
KingFast, S/N:03112222C0002, FW:U0803A0, 256 GB

For details see host's SYSLOG.

You can also use the smartctl utility for further investigation.
The original message about this issue was sent at Fri Feb  3 19:41:29 2023 PST
Another message will be sent in 24 hours if the problem persists.

OfflineUncorrectableSector

This message was generated by the smartd daemon running on:

   host name:  myhost
   DNS domain: [Empty]

The following warning/error was logged by the smartd daemon:

Device: /dev/sda [SAT], 3 Offline uncorrectable sectors

Device info:
KingFast, S/N:03112222C0002, FW:U0803A0, 256 GB

For details see host's SYSLOG.

You can also use the smartctl utility for further investigation.
The original message about this issue was sent at Fri Feb  3 19:41:29 2023 PST
Another message will be sent in 24 hours if the problem persists.

smartctl -a /dev/sda

smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.19.0-46-generic] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     KingFast
Serial Number:    03112222C0002
Firmware Version: U0803A0
User Capacity:    256,060,514,304 bytes [256 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
TRIM Command:     Available
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-2 T13/2015-D revision 3
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sat Jul  8 15:44:59 2023 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x02)	Offline data collection activity
					was completed without error.
					Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		(  120) seconds.
Offline data collection
capabilities: 			 (0x11) SMART execute Offline immediate.
					No Auto Offline data collection support.
					Suspend Offline collection upon new
					command.
					No Offline surface scan supported.
					Self-test supported.
					No Conveyance Self-test supported.
					No Selective Self-test supported.
SMART capabilities:            (0x0002)	Does not save SMART data before
					entering power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 (  10) minutes.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x0032   100   100   050    Old_age   Always       -       0
  5 Reallocated_Sector_Ct   0x0032   100   100   050    Old_age   Always       -       6
  9 Power_On_Hours          0x0032   100   100   050    Old_age   Always       -       3335
 12 Power_Cycle_Count       0x0032   100   100   050    Old_age   Always       -       440
160 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       3
161 Unknown_Attribute       0x0033   100   100   050    Pre-fail  Always       -       86
163 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       26
164 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       79004
165 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       481
166 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       6
167 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       114
168 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       5050
169 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       98
175 Program_Fail_Count_Chip 0x0032   100   100   050    Old_age   Always       -       0
176 Erase_Fail_Count_Chip   0x0032   100   100   050    Old_age   Always       -       0
177 Wear_Leveling_Count     0x0032   100   100   050    Old_age   Always       -       0
178 Used_Rsvd_Blk_Cnt_Chip  0x0032   100   100   050    Old_age   Always       -       6
181 Program_Fail_Cnt_Total  0x0032   100   100   050    Old_age   Always       -       0
182 Erase_Fail_Count_Total  0x0032   100   100   050    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   050    Old_age   Always       -       88
194 Temperature_Celsius     0x0022   100   100   050    Old_age   Always       -       35
195 Hardware_ECC_Recovered  0x0032   100   100   050    Old_age   Always       -       0
196 Reallocated_Event_Count 0x0032   100   100   050    Old_age   Always       -       3
197 Current_Pending_Sector  0x0032   100   100   050    Old_age   Always       -       6
198 Offline_Uncorrectable   0x0032   100   100   050    Old_age   Always       -       3
199 UDMA_CRC_Error_Count    0x0032   100   100   050    Old_age   Always       -       0
232 Available_Reservd_Space 0x0032   100   100   050    Old_age   Always       -       86
241 Total_LBAs_Written      0x0030   100   100   050    Old_age   Offline      -       168900
242 Total_LBAs_Read         0x0030   100   100   050    Old_age   Offline      -       815543
245 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       191939

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%      3329         -
# 2  Short offline       Completed without error       00%      3325         -
# 3  Short offline       Completed without error       00%      3321         -
# 4  Short offline       Completed without error       00%      3313         -
# 5  Short offline       Completed without error       00%      3309         -
# 6  Short offline       Completed without error       00%      3306         -
# 7  Extended offline    Completed without error       00%      3250         -
# 8  Extended offline    Completed without error       00%      3232         -
# 9  Extended offline    Completed without error       00%      3229         -
#10  Extended offline    Completed without error       00%       976         -
#11  Extended offline    Completed without error       00%       968         -

Selective Self-tests/Logging not supported

I have tried to ignore the 197 and 198 errors in /etc/smartd.conf with

/dev/sda -d removable -n standby -H -l error -l selftest -f -t -I 197 -I 198 -s (S/../.././(01|09|17)|L/../../3/11) -m root -M exec /usr/share/smartmontools/smartd-runner

to no avail. I also do not see any LBA_of_first_error in the self-test section. To me, it appears that `SMART overall-health self-assessment test result: PASSED ` is healthy and the self-tests return no errors. My current understanding is that the disk appears to be healthy but is still sending these messages erroneously. Is there something that I'm missing? The /dev/sda drive is a KingFast 256 GB SSD, and I'm not sure if this would be relevant as I could not find anything online for this particular drive or manufacturer. How would I be able to stop receiving these messages but still have SMART monitoring for other genuine issues on the drive, and how would I fix the issue if this error message really does indicate some problem with the drive? Thanks! Edit: After running smartctl -t long /dev/sda, I have

smartctl -a /dev/sda
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.19.0-46-generic] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     KingFast
Serial Number:    03112222C0002
Firmware Version: U0803A0
User Capacity:    256,060,514,304 bytes [256 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
TRIM Command:     Available
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-2 T13/2015-D revision 3
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sun Jul  9 10:05:33 2023 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x03)	Offline data collection activity
					is in progress.
					Auto Offline Data Collection: Disabled.
Self-test execution status:      ( 241)	Self-test routine in progress...
					10% of test remaining.
Total time to complete Offline 
data collection: 		(  600) seconds.
Offline data collection
capabilities: 			 (0x11) SMART execute Offline immediate.
					No Auto Offline data collection support.
					Suspend Offline collection upon new
					command.
					No Offline surface scan supported.
					Self-test supported.
					No Conveyance Self-test supported.
					No Selective Self-test supported.
SMART capabilities:            (0x0002)	Does not save SMART data before
					entering power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 (  10) minutes.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x0032   100   100   050    Old_age   Always       -       0
  5 Reallocated_Sector_Ct   0x0032   100   100   050    Old_age   Always       -       6
  9 Power_On_Hours          0x0032   100   100   050    Old_age   Always       -       3341
 12 Power_Cycle_Count       0x0032   100   100   050    Old_age   Always       -       441
160 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       3
161 Unknown_Attribute       0x0033   100   100   050    Pre-fail  Always       -       86
163 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       26
164 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       79553
165 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       482
166 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       6
167 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       115
168 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       5050
169 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       98
175 Program_Fail_Count_Chip 0x0032   100   100   050    Old_age   Always       -       0
176 Erase_Fail_Count_Chip   0x0032   100   100   050    Old_age   Always       -       0
177 Wear_Leveling_Count     0x0032   100   100   050    Old_age   Always       -       0
178 Used_Rsvd_Blk_Cnt_Chip  0x0032   100   100   050    Old_age   Always       -       6
181 Program_Fail_Cnt_Total  0x0032   100   100   050    Old_age   Always       -       0
182 Erase_Fail_Count_Total  0x0032   100   100   050    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   050    Old_age   Always       -       88
194 Temperature_Celsius     0x0022   100   100   050    Old_age   Always       -       46
195 Hardware_ECC_Recovered  0x0032   100   100   050    Old_age   Always       -       0
196 Reallocated_Event_Count 0x0032   100   100   050    Old_age   Always       -       3
197 Current_Pending_Sector  0x0032   100   100   050    Old_age   Always       -       6
198 Offline_Uncorrectable   0x0032   100   100   050    Old_age   Always       -       3
199 UDMA_CRC_Error_Count    0x0032   100   100   050    Old_age   Always       -       0
232 Available_Reservd_Space 0x0032   100   100   050    Old_age   Always       -       86
241 Total_LBAs_Written      0x0030   100   100   050    Old_age   Offline      -       170468
242 Total_LBAs_Read         0x0030   100   100   050    Old_age   Offline      -       815560
245 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       193199

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%      3337         -
# 2  Short offline       Completed without error       00%      3329         -
# 3  Short offline       Completed without error       00%      3325         -
# 4  Short offline       Completed without error       00%      3321         -
# 5  Short offline       Completed without error       00%      3313         -
# 6  Short offline       Completed without error       00%      3309         -
# 7  Short offline       Completed without error       00%      3306         -
# 8  Extended offline    Completed without error       00%      3250         -
# 9  Extended offline    Completed without error       00%      3232         -
#10  Extended offline    Completed without error       00%      3229         -
#11  Extended offline    Completed without error       00%       976         -
#12  Extended offline    Completed without error       00%       968         -

Selective Self-tests/Logging not supported

The #12 Extended offline test Completed without error, so I'm not really sure what I'm supposed to do from here. **Edit #2:** I have also run the following which I believe indicate that there are no errors with the drive:

badblocks -sv /dev/sda
Checking blocks 0 to 250059095
Checking for bad blocks (read-only test): done                                                 
Pass completed, 0 bad blocks found. (0/0/0 errors)

dd if=/dev/sda of=/dev/null bs=64K conv=noerror
3907173+1 records in
3907173+1 records out
256060514304 bytes (256 GB, 238 GiB) copied, 485.648 s, 527 MB/s

jameszp (93 rep)

Jul 9, 2023, 03:04 AM • Last activity: Jul 18, 2023, 06:01 PM

7 votes

5 answers

8769 views

S.M.A.R.T shows high Load_Cycle_Count | Why and how to prevent the number from increaseing?

hard-disk smartctl smart

i just realized that **some of my HDD's have a huge Load_Cycle_Count** when reading out their S.M.A.R.T data. Some are short before failing and i am asking myself why this is the case and if there is anything i can do to prevent them from dying. alex@ga-P55A-UD5:~$ sudo smartctl -a /dev/sdb smartctl...

alex@ga-P55A-UD5:~$ sudo smartctl -a /dev/sdb
smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.4.0-142-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Caviar Green (AF)
Device Model:     WDC WD10EARS-00Y5B1
[...]
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  4 Start_Stop_Count        0x0032   090   090   000    Old_age   Always       -       10281
  9 Power_On_Hours          0x0032   062   062   000    Old_age   Always       -       28456
193 Load_Cycle_Count        0x0032   001   001   000    Old_age   Always       -       611460

alex@ga-P55A-UD5:~$ sudo smartctl -a /dev/sdc
smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.4.0-142-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Caviar Green
Device Model:     WDC WD6400AADS-00M2B0
[...]
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  4 Start_Stop_Count        0x0032   093   093   000    Old_age   Always       -       7615
  9 Power_On_Hours          0x0032   057   057   000    Old_age   Always       -       31743
193 Load_Cycle_Count        0x0032   053   053   000    Old_age   Always       -       442121

alex@silent-ssd:~$ sudo smartctl -a /dev/sdd
smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.4.0-142-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Green
Device Model:     WDC WD20EARX-00PASB0
[...]
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   098   098   000    Old_age   Always       -       2477
  9 Power_On_Hours          0x0032   085   085   000    Old_age   Always       -       11176
193 Load_Cycle_Count        0x0032   181   181   000    Old_age   Always       -       59646

AlexOnLinux (725 rep)

Mar 4, 2019, 10:42 AM • Last activity: Jun 19, 2023, 11:27 AM

1 votes

1 answers

188 views

Does my disk support SMART?

raid smartctl smart

I'm confused about this smartctl output. It says SMART status is not supported, but then it says it PASSED. ``` # sudo smartctl -H -d megaraid,24 /dev/sdb smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.59.1.el7.x86_64] (local build) Copyright (C) 2002-18, Bruce Allen, Christian Franke, www....

I'm confused about this smartctl output. It says SMART status is not supported, but then it says it PASSED.

# sudo smartctl -H -d megaraid,24 /dev/sdb
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.59.1.el7.x86_64] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Status not supported: ATA return descriptor not supported by controller firmware
SMART overall-health self-assessment test result: PASSED
Warning: This result is based on an Attribute check.

# echo $?
4

According to the man page, status code 4 means prefail Attribute is less than the danger threshold.

EXIT STATUS
...
...
    Bit 4: We found prefail Attributes <= threshold.

So I'm confused, is SMART data available on this disk or not?

Timothy Pulliam (3953 rep)

May 9, 2023, 04:36 PM • Last activity: May 9, 2023, 10:05 PM

0 votes

0 answers

269 views

Which hard drive failed SMART ? Synology NAS

hard-disk nas synology smart

I had 2 x 4TB Red Pro drives in a Sinology NAS. One of them was reported as failing (couple of years ago) by Synology. I believe it was a SMART failure. So I pulled out both drives bought a new one and was able to copy all data from the drive that was failing (not failed yet). Both 4 TB drives seem...

                                  I had 2 x 4TB Red Pro drives in a Sinology NAS. One of them was reported as failing (couple of years ago) by Synology. I believe it was a SMART failure. So I pulled out both drives bought a new one and was able to copy all data from the drive that was failing (not failed yet). Both 4 TB drives seem to be working fine when mounted on a linux machine. I can copy data to and from the one reported as "failing" by Synology. The thing is I'm not a 100% sure which drive was reported as "failing" as it was couple of years ago. 
Is there a command/test I can run to figure out which drive is failing so I exclude it (or discard it) from my primary NAS ?

I tried smartctl -a /dev/ for my drives and the self assessment result says "Passed" and I see no errors there. 

Update: After running the test recommended by @meuh - "smartctl -t short " I get the following error:

    SMART Self-test log structure revision number 1
    Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
    # 1  Short offline       Completed: read failure       90%     21687         8359136
                                

Curious101 (311 rep)

Apr 16, 2023, 08:31 PM • Last activity: Apr 19, 2023, 04:13 AM

0 votes

2 answers

797 views

SSD's SMART Status not supported behind DELL PERC H730 Mini controller

centos temperature sata smart

I would like to output the temperature for each of my drives (NVME, SATA, SAS) in my Dell R630, but it couldn't show my SATA **Samsung SSD 870 EVO 250GB** (`/dev/sdc`)'s temperature, which is behind DELL PERC H730 Mini controller: `hddtemp` command shows: /dev/sda: SAMSUNG AREA7680S5xnNTRI: 37°...

                                  I would like to output the temperature for each of my drives (NVME, SATA, SAS) in my Dell R630, but it couldn't show my SATA **Samsung SSD 870 EVO 250GB** (/dev/sdc)'s temperature, which is behind DELL PERC H730 Mini controller:

hddtemp command shows:

    /dev/sda: SAMSUNG AREA7680S5xnNTRI: 37°C
    /dev/sdb: SAMSUNG AREA7680S5xnNTRI: 36°C
    /dev/sdc: DELL PERC H730 Mini: S.M.A.R.T. not available
When I tried to use smartctl, it shows:

    Smartctl open device: /dev/sdc failed: DELL or MegaRaid controller, please try adding '-d megaraid,N'

I then use   smartctl -a -d megaraid,0 /dev/sdc

It does shows my device name correctly:

    === START OF INFORMATION SECTION ===
    Device Model:     Samsung SSD 870 EVO 250GB
and

    SMART support is: Available - device has SMART capability.
    SMART support is: Enabled

but SMART status shows:

    === START OF READ SMART DATA SECTION ===
    SMART Status not supported: ATA return descriptor not supported by controller firmware


May I know how to find out the temperature of the SSD behind DELL PERC H730 Mini controller?
                                

JCTL (3 rep)

Mar 30, 2023, 07:26 AM • Last activity: Mar 30, 2023, 12:28 PM

1 votes

2 answers

3301 views

Cannot get smartctl working

smartctl smart

On my debian wheezy server I use a **software raid 1** with two harddisks `dev/sda3` and `dev/sdb3` connected into `dev/md2`: mdadm --detail /dev/md2 Number Major Minor RaidDevice State 0 8 3 0 active sync /dev/sda3 1 8 19 1 active sync /dev/sdb3 The raid seems to be fine, but on one of the disks SM...

                                  On my debian wheezy server I use a **software raid 1** with two harddisks dev/sda3 and dev/sdb3 connected into dev/md2:

    mdadm --detail /dev/md2
    Number   Major   Minor   RaidDevice State
       0       8        3        0      active sync   /dev/sda3
       1       8       19        1      active sync   /dev/sdb3

The raid seems to be fine, but on one of the disks SMART is not running: 

    smartctl --all /dev/sda

sais:

    SMART support is: Available - device has SMART capability.
    SMART support is: Disabled

While /dev/sdb gives a lot of SMART information.

I tried to start it with

    smartctl -s on /dev/sda -T verypermissive not working

But it doesn't start:

    Error SMART Enable failed: scsi error aborted command
    Smartctl: SMART Enable Failed.

How can I get it running? Or does it mean the disk has a problem?

rubo77 (30435 rep)

Feb 8, 2015, 10:59 PM • Last activity: Nov 1, 2022, 02:41 PM

2 votes

4 answers

866 views

smartmontools: Should I replace my SSHD?

hard-disk smart

Today, when I was watching a video in Firefox, suddenly the following window pupped up: [![enter image description here][1]][1] [1]: https://i.sstatic.net/X1UA6.jpg Or the Output from GSmartContol: smartctl 7.1 2019-12-30 r5022 [x86_64-linux-4.19.0-22-amd64] (local build) Copyright (C) 2002-19, Bruc...

                                  Today, when I was watching a video in Firefox, suddenly the following window pupped up:



Or the Output from GSmartContol:

    smartctl 7.1 2019-12-30 r5022 [x86_64-linux-4.19.0-22-amd64] (local build)
    Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org
    
    === START OF INFORMATION SECTION ===
    Model Family:     Seagate Laptop SSHD
    Device Model:     ST500LM000-1EJ162-SSHD
    Serial Number:    W3715AR9
    LU WWN Device Id: 5 000c50 06e236b9f
    Firmware Version: HPD3
    User Capacity:    500,107,862,016 bytes [500 GB]
    Sector Sizes:     512 bytes logical, 4096 bytes physical
    Rotation Rate:    5400 rpm
    Form Factor:      2.5 inches
    Device is:        In smartctl database [for details use: -P show]
    ATA Version is:   ACS-2, ACS-3 T13/2161-D revision 3b
    SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
    Local Time is:    Sun Oct 23 14:41:09 2022 CEST
    SMART support is: Available - device has SMART capability.
    SMART support is: Enabled
    AAM feature is:   Unavailable
    APM level is:     254 (maximum performance)
    Rd look-ahead is: Enabled
    Write cache is:   Enabled
    DSN feature is:   Unavailable
    ATA Security is:  Disabled, frozen [SEC2]
    
    === START OF READ SMART DATA SECTION ===
    SMART overall-health self-assessment test result: PASSED
    
    General SMART Values:
    Offline data collection status:  (0x82)	Offline data collection activity
    					was completed without error.
    					Auto Offline Data Collection: Enabled.
    Self-test execution status:      (   0)	The previous self-test routine completed
    					without error or no self-test has ever 
    					been run.
    Total time to complete Offline 
    data collection: 		(  634) seconds.
    Offline data collection
    capabilities: 			 (0x5b) SMART execute Offline immediate.
    					Auto Offline data collection on/off support.
    					Suspend Offline collection upon new
    					command.
    					Offline surface scan supported.
    					Self-test supported.
    					No Conveyance Self-test supported.
    					Selective Self-test supported.
    SMART capabilities:            (0x0003)	Saves SMART data before entering
    					power-saving mode.
    					Supports SMART auto save timer.
    Error logging capability:        (0x01)	Error logging supported.
    					General Purpose Logging supported.
    Short self-test routine 
    recommended polling time: 	 (   2) minutes.
    Extended self-test routine
    recommended polling time: 	 (  99) minutes.
    SCT capabilities: 	       (0x1081)	SCT Status supported.
    
    SMART Attributes Data Structure revision number: 10
    Vendor Specific SMART Attributes with Thresholds:
    ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
      1 Raw_Read_Error_Rate     POSR-K   118   099   006    -    195697992
      3 Spin_Up_Time            PO---K   099   099   000    -    0
      4 Start_Stop_Count        -O--CK   093   093   020    -    7676
      5 Reallocated_Sector_Ct   PO--CK   100   100   036    -    0
      7 Seek_Error_Rate         POSR-K   082   060   030    -    4473742513
      9 Power_On_Hours          -O--CK   087   087   000    -    11853
     10 Spin_Retry_Count        PO--CK   100   100   097    -    0
     12 Power_Cycle_Count       -O--CK   093   093   020    -    7668
    180 Unknown_HDD_Attribute   -O-R-K   100   100   000    -    64025461
    183 Runtime_Bad_Block       -O--CK   100   100   000    -    0
    184 End-to-End_Error        PO--CK   100   100   097    -    0
    187 Reported_Uncorrect      -O--CK   100   100   000    -    0
    188 Command_Timeout         -O--CK   100   099   000    -    2
    189 High_Fly_Writes         -O-RCK   063   063   000    -    37
    190 Airflow_Temperature_Cel -O---K   069   055   045    -    31 (Min/Max 28/32)
    191 G-Sense_Error_Rate      -O--CK   100   100   000    -    0
    192 Power-Off_Retract_Count -O--CK   100   100   000    -    228
    193 Load_Cycle_Count        -O--CK   097   097   000    -    7777
    194 Temperature_Celsius     -O---K   031   045   000    -    31 (0 14 0 0 0)
    196 Reallocated_Event_Count -O--CK   100   100   000    -    0
    197 Current_Pending_Sector  -O--CK   100   100   000    -    16
    198 Offline_Uncorrectable   ----CK   100   100   000    -    16
    199 UDMA_CRC_Error_Count    -O--CK   200   200   000    -    0
    254 Free_Fall_Sensor        -O--CK   100   100   000    -    0
                                ||||||_ K auto-keep
                                |||||__ C event count
                                ||||___ R error rate
                                |||____ S speed/performance
                                ||_____ O updated online
                                |______ P prefailure warning
    
    General Purpose Log Directory Version 1
    SMART           Log Directory Version 1 [multi-sector log support]
    Address    Access  R/W   Size  Description
    0x00       GPL,SL  R/O      1  Log Directory
    0x01           SL  R/O      1  Summary SMART error log
    0x02           SL  R/O      5  Comprehensive SMART error log
    0x03       GPL     R/O      5  Ext. Comprehensive SMART error log
    0x06           SL  R/O      1  SMART self-test log
    0x07       GPL     R/O      1  Extended self-test log
    0x09           SL  R/W      1  Selective self-test log
    0x10       GPL     R/O      1  NCQ Command Error log
    0x11       GPL     R/O      1  SATA Phy Event Counters log
    0x21       GPL     R/O      1  Write stream error log
    0x22       GPL     R/O      1  Read stream error log
    0x24       GPL     R/O   1223  Current Device Internal Status Data log
    0x25       GPL     R/O   1223  Saved Device Internal Status Data log
    0x30       GPL,SL  R/O      9  IDENTIFY DEVICE data log
    0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log
    0xa1       GPL,SL  VS      20  Device vendor specific log
    0xa2       GPL     VS    3900  Device vendor specific log
    0xa8       GPL,SL  VS     129  Device vendor specific log
    0xa9       GPL,SL  VS       1  Device vendor specific log
    0xab       GPL     VS       1  Device vendor specific log
    0xae       GPL     VS       1  Device vendor specific log
    0xb0       GPL     VS    4580  Device vendor specific log
    0xb6       GPL     VS    1918  Device vendor specific log
    0xbe-0xbf  GPL     VS   65535  Device vendor specific log
    0xc1       GPL,SL  VS      10  Device vendor specific log
    0xc2       GPL,SL  VS      50  Device vendor specific log
    0xc4       GPL,SL  VS       5  Device vendor specific log
    0xe0       GPL,SL  R/W      1  SCT Command/Status
    0xe1       GPL,SL  R/W      1  SCT Data Transfer
    
    SMART Extended Comprehensive Error Log Version: 1 (5 sectors)
    Device Error Count: 1
    	CR     = Command Register
    	FEATR  = Features Register
    	COUNT  = Count (was: Sector Count) Register
    	LBA_48 = Upper bytes of LBA High/Mid/Low Registers ]  ATA-8
    	LH     = LBA High (was: Cylinder High) Register    ]   LBA
    	LM     = LBA Mid (was: Cylinder Low) Register      ] Register
    	LL     = LBA Low (was: Sector Number) Register     ]
    	DV     = Device (was: Device/Head) Register
    	DC     = Device Control Register
    	ER     = Error register
    	ST     = Status register
    Powered_Up_Time is measured from power on, and printed as
    DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
    SS=sec, and sss=millisec. It "wraps" after 49.710 days.
    
    Error 1  occurred at disk power-on lifetime: 8134 hours (338 days + 22 hours)
      When the command that caused the error occurred, the device was active or idle.
    
      After command completion occurred, registers were:
      ER -- ST COUNT  LBA_48  LH LM LL DV DC
      -- -- -- == -- == == == -- -- -- -- --
      40 -- 51 00 00 00 00 00 a0 3a 40 00 00  Error: UNC at LBA = 0x00a03a40 = 10500672
    
      Commands leading to the command that caused the error were:
      CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
      -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
      25 00 00 00 2a 00 00 00 a0 3a 40 e0 00     01:31:49.827  READ DMA EXT
      25 00 00 00 35 00 00 00 a0 42 0b e0 00     01:31:49.348  READ DMA EXT
      25 00 00 00 0b 00 00 00 a0 42 00 e0 00     01:31:49.345  READ DMA EXT
      25 00 00 00 15 00 00 03 93 ac 6b e0 00     01:31:49.342  READ DMA EXT
      25 00 00 00 2b 00 00 03 93 ac 40 e0 00     01:31:49.339  READ DMA EXT
    
    SMART Extended Self-test Log Version: 1 (1 sectors)
    Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
    # 1  Short offline       Completed without error       00%     11852         -
    # 2  Short offline       Completed without error       00%     11847         -
    # 3  Short offline       Completed without error       00%     11844         -
    # 4  Short offline       Completed without error       00%     11835         -
    # 5  Short offline       Completed without error       00%     11830         -
    # 6  Short offline       Completed without error       00%     11823         -
    # 7  Short offline       Completed without error       00%     11818         -
    # 8  Short offline       Completed without error       00%     11814         -
    # 9  Short offline       Completed without error       00%     11806         -
    #10  Short offline       Completed without error       00%     11801         -
    #11  Short offline       Completed without error       00%     11792         -
    #12  Short offline       Completed without error       00%     11790         -
    #13  Short offline       Completed without error       00%     11780         -
    #14  Short offline       Completed without error       00%     11772         -
    #15  Short offline       Completed without error       00%     11765         -
    #16  Short offline       Completed without error       00%     11756         -
    #17  Short offline       Completed without error       00%     11751         -
    #18  Short offline       Completed without error       00%     11747         -
    #19  Short offline       Completed without error       00%     11740         -
    
    SMART Selective self-test log data structure revision number 1
     SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
        1        0        0  Not_testing
        2        0        0  Not_testing
        3        0        0  Not_testing
        4        0        0  Not_testing
        5        0        0  Not_testing
    Selective self-test flags (0x0):
      After scanning selected spans, do NOT read-scan remainder of disk.
    If Selective self-test is pending on power-up, resume after 0 minute delay.
    
    SCT Status Version:                  3
    SCT Version (vendor specific):       522 (0x020a)
    Device State:                        Active (0)
    Current Temperature:                    31 Celsius
    Power Cycle Min/Max Temperature:     25/32 Celsius
    Lifetime    Min/Max Temperature:     16/44 Celsius
    Under/Over Temperature Limit Count:   0/2
    
    SCT Data Table command not supported
    
    SCT Error Recovery Control command not supported
    
    Device Statistics (GP/SMART Log 0x04) not supported
    
    SATA Phy Event Counters (GP Log 0x11)
    ID      Size     Value  Description
    0x000a  2            3  Device-to-host register FISes sent due to a COMRESET
    0x0001  2            0  Command failed due to ICRC error
    0x0003  2            0  R_ERR response for device-to-host data FIS
    0x0004  2            0  R_ERR response for host-to-device data FIS
    0x0006  2            0  R_ERR response for device-to-host non-data FIS
    0x0007  2            0  R_ERR response for host-to-device non-data FIS

Also today, when I was booting Linux it was not booting. So I have restarted the boot and it worked without problem. This was before this error popped up. No idea if this boot issue has something to do with the smartmontools error. The booting issue was before I had this error warning.

**Confusing:**
In the reoprt there is a line "Error 1  occurred at disk power-on lifetime: 8134 hours (338 days + 22 hours)". 
But there is no date. My expectation was, that there would be a date at which this error occured, so that I can show what todays date is and can definitely assign the error to the date of today.
As I did not found a date in the whole output of the txt file, I was looking for the actual lifetime of my sshd, because it was said, that the error occurred at 8134h. So my expectation was, that I can somewhere find the amount of hours my sshd has run until the current time. But I also did not found this.

Which host's syslog is meant?
Maybe this one:
/var/log/syslog ?

If yes: Here it is: 
https://workupload.com/file/NVD2gpdrvHp 

But my main question is: Is there a high risk, that my sshd soon will die?

It is said, that the hard disk health status has changed. But where can I now find the current health status?

Thank you. 
                                

Wogehu (123 rep)

Oct 23, 2022, 01:51 PM • Last activity: Oct 25, 2022, 07:09 AM

2 votes

2 answers

999 views

smartd output to screen, not email

notifications smart smartmontools

I'm trying to get smartd working; it is determined that messages will be sent via email via 'mail' which I've never used. But I recall that a few years ago I had smartd sending it's warnings directly to the screen via a popup text box. I'm trying to figure out how to do that again. The info for the...

                                  I'm trying to get smartd working; it is determined that messages will be sent via email via 'mail' which I've never used.  But I recall that a few years ago I had smartd sending it's warnings directly to the screen via a popup text box.  I'm trying to figure out how to do that again.  The info for the 'screen' command baffles me.  tmux likewise.  Or I suppose it could be a notification.  When I have a few weeks to study it, I'll get 'mail' working but for now I'd prefer a popup message anyway.

==================================================

In 'smartd.conf':

    DEVICESCAN -a -m  -M exec notify -M test

... Ok, added full path, much better:

    DEVICESCAN -a -m  -M exec /bin/notify -M test

... 'notify' runs fine from CLI, is an executable script:

    /bin/notify-send "$(systemctl status smartd)"

... but although:

     systemctl restart smartd; systemctl status smartd

... reports no errors, I get no 'test' result.

BTW, so far no results at all using those variables you mentioned.

... 

$ smartd ... shows two notifications, one for each of my two disks!  So why does 'systemctl restart smartd' show nothing?

Ray Andrews (2615 rep)

Oct 17, 2022, 01:56 PM • Last activity: Oct 17, 2022, 10:08 PM

1 votes

2 answers

1479 views

Why smartd need database?

smartctl smart smartmontools

Do smartd need the database? or smartctl needs the database? I saw smart tool github keep updating database: https://github.com/smartmontools/smartmontools/labels/drivedb In my understanding, smartd will scan all disks then why does it need a database? what's the function/purpose to use a database i...

                                  Do smartd need the database?

or

smartctl needs the database?

I saw smart tool github keep updating database:

https://github.com/smartmontools/smartmontools/labels/drivedb 

In my understanding, smartd will scan all disks then why does it need a database? what's the function/purpose to use a database in smartd/smartctl?

Mark K (955 rep)

Aug 12, 2022, 03:32 AM • Last activity: Aug 12, 2022, 11:52 AM

0 votes

1 answers

99 views

Prometheus DiskTooManyReallocatedSectors

linux hard-disk node.js smart prometheus-exporter

I have Prometheus Alert Manager running on several linux machines. *(https://prometheus.io/docs/alerting/latest/alertmanager/)* One of them is reporting *2 reallocated sectors*. I got the setup-alert from here: *https://awesome-prometheus-alerts.grep.to/rules.html* **1. What is my course of action?*...

                                  I have Prometheus Alert Manager running on several linux machines. *(https://prometheus.io/docs/alerting/latest/alertmanager/)* 

One of them is reporting *2 reallocated sectors*. I got the setup-alert from here:
*https://awesome-prometheus-alerts.grep.to/rules.html* 

**1. What is my course of action?** Replace with an SDD?

**2. What is the priority ...weeks, months?**

DavidDunham (117 rep)

Jul 2, 2022, 03:08 AM • Last activity: Jul 3, 2022, 07:51 AM

1 votes

3 answers

998 views

Deciphering smartctl results

usb hard-disk smartctl sata smart

I'm trying to use an internal 2.5" HDD to serve as my external storage media and for backups. This HDD had been previously on a Windows machine for a long time before some of my files became corrupted, and so I simply changed it for an SSD. CrystalDiskInfo reports the drive's health as 'good', howev...

smartctl 7.3 2022-02-28 r5338 [x86_64-w64-mingw32-w10-21H2] (sf-7.3-1)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Mobile HDD
Device Model:     ST1000LM035-1RK172
Firmware Version: SDM1
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Form Factor:      2.5 inches
Device is:        In smartctl database 7.3/5319
ATA Version is:   ACS-3 T13/2161-D revision 3b
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 1.5 Gb/s)
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)	Offline data collection activity
					was never started.
					Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		(    0) seconds.
Offline data collection
capabilities: 			 (0x71) SMART execute Offline immediate.
					No Auto Offline data collection support.
					Suspend Offline collection upon new
					command.
					No Offline surface scan supported.
					Self-test supported.
					Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   1) minutes.
Extended self-test routine
recommended polling time: 	 ( 169) minutes.
Conveyance self-test routine
recommended polling time: 	 (   2) minutes.
SCT capabilities: 	       (0x3035)	SCT Status supported.
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   072   051   006    Pre-fail  Always       -       15037148
  3 Spin_Up_Time            0x0003   099   099   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   065   065   020    Old_age   Always       -       36007
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   080   060   045    Pre-fail  Always       -       22014115380
  9 Power_On_Hours          0x0032   100   086   000    Old_age   Always       -       55 (236 20 0)
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   092   020    Old_age   Always       -       56
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   001   001   000    Old_age   Always       -       3990
188 Command_Timeout         0x0032   100   081   000    Old_age   Always       -       30067195949
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   068   049   040    Old_age   Always       -       32 (Min/Max 29/32)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       197
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       69
193 Load_Cycle_Count        0x0032   064   064   000    Old_age   Always       -       73637
194 Temperature_Celsius     0x0022   032   051   000    Old_age   Always       -       32 (0 18 0 0 0)
197 Current_Pending_Sector  0x0012   100   051   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   051   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       3
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       12591 (154 139 0)
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       98336960706
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       114926523300
254 Free_Fall_Sensor        0x0032   100   100   000    Old_age   Always       -       0

SMART Error Log Version: 1
ATA Error Count: 3475 (device log contains only the most recent five errors)
	CR = Command Register [HEX]
	FR = Features Register [HEX]
	SC = Sector Count Register [HEX]
	SN = Sector Number Register [HEX]
	CL = Cylinder Low Register [HEX]
	CH = Cylinder High Register [HEX]
	DH = Device/Head Register [HEX]
	DC = Device Command Register [HEX]
	ER = Error register [HEX]
	ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 3475 occurred at disk power-on lifetime: 29 hours (1 days + 5 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 88 7d f4 09  Error: UNC at LBA = 0x09f47d88 = 167017864

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 08 88 7d f4 49 00      00:38:49.208  READ FPDMA QUEUED
  60 00 08 80 7d f4 49 00      00:38:49.195  READ FPDMA QUEUED
  60 00 08 b0 a8 21 49 00      00:38:49.182  READ FPDMA QUEUED
  60 00 08 a8 a8 21 49 00      00:38:49.181  READ FPDMA QUEUED
  60 00 08 a0 a8 21 49 00      00:38:49.181  READ FPDMA QUEUED

Error 3474 occurred at disk power-on lifetime: 29 hours (1 days + 5 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 80 7d f4 09  Error: UNC at LBA = 0x09f47d80 = 167017856

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 08 80 7d f4 49 00      00:38:46.657  READ FPDMA QUEUED
  60 00 08 78 7d f4 49 00      00:38:46.625  READ FPDMA QUEUED
  ef 10 03 00 00 00 a0 00      00:38:46.615  SET FEATURES [Enable SATA feature]
  ef 10 02 00 00 00 a0 00      00:38:46.605  SET FEATURES [Enable SATA feature]
  27 00 00 00 00 00 e0 00      00:38:46.578  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]

Error 3473 occurred at disk power-on lifetime: 29 hours (1 days + 5 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 78 7d f4 09  Error: UNC at LBA = 0x09f47d78 = 167017848

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 08 78 7d f4 49 00      00:38:44.127  READ FPDMA QUEUED
  60 00 08 70 7d f4 49 00      00:38:44.095  READ FPDMA QUEUED
  ef 10 03 00 00 00 a0 00      00:38:44.085  SET FEATURES [Enable SATA feature]
  ef 10 02 00 00 00 a0 00      00:38:44.076  SET FEATURES [Enable SATA feature]
  27 00 00 00 00 00 e0 00      00:38:44.049  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]

Error 3472 occurred at disk power-on lifetime: 29 hours (1 days + 5 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 70 7d f4 09  Error: UNC at LBA = 0x09f47d70 = 167017840

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 08 70 7d f4 49 00      00:38:41.598  READ FPDMA QUEUED
  60 00 08 68 7d f4 49 00      00:38:41.566  READ FPDMA QUEUED
  ef 10 03 00 00 00 a0 00      00:38:41.556  SET FEATURES [Enable SATA feature]
  ef 10 02 00 00 00 a0 00      00:38:41.547  SET FEATURES [Enable SATA feature]
  27 00 00 00 00 00 e0 00      00:38:41.520  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]

Error 3471 occurred at disk power-on lifetime: 29 hours (1 days + 5 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 68 7d f4 09  Error: UNC at LBA = 0x09f47d68 = 167017832

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 08 68 7d f4 49 00      00:38:39.013  READ FPDMA QUEUED
  60 00 08 60 7d f4 49 00      00:38:38.987  READ FPDMA QUEUED
  ef 10 03 00 00 00 a0 00      00:38:38.977  SET FEATURES [Enable SATA feature]
  ef 10 02 00 00 00 a0 00      00:38:38.968  SET FEATURES [Enable SATA feature]
  27 00 00 00 00 00 e0 00      00:38:38.941  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure       90%        22         282912
# 2  Extended offline    Completed: read failure       90%        20         282912
# 3  Extended offline    Completed: read failure       90%        18         282912
# 4  Conveyance offline  Completed: read failure       90%        18         282912
# 5  Short offline       Completed: read failure       80%        18         282912
# 6  Extended offline    Completed: read failure       90%        17         282912
# 7  Extended offline    Completed: read failure       90%         2         64400520
# 8  Extended offline    Completed: read failure       90%         2         64400520
# 9  Extended offline    Completed: read failure       90%         2         64400520
#10  Extended offline    Completed without error       00%      9245         -
#11  Short offline       Completed without error       00%      4741         -
#12  Short offline       Completed without error       00%      4667         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Which is bit confusing to me. The drive is attached to my computer via USB (adaptor) which has no brand name on it, but smartctl reveals it to be "USB JMicron". Is this "USB JMicron" adaptor causing the "UltraDMA CRC Errors" or other problems? As I'm not sure what had caused the then data corruption, I am wondering if this drive is actually safe and reliable based on the info from smartctl. Any help with deciphering the diagnostic info would be much appreciated.

Sepp A (73 rep)

Jul 1, 2022, 04:24 PM • Last activity: Jul 1, 2022, 06:36 PM

4 votes

1 answers

3336 views

Linux on Marvell 88SE9230. How to get stats?

linux smartctl smart

I use Marvell 88SE9230 controller on my home Linux server. HP does have utility to setup raid and get some stats. But I'm wondering how to get any status from a Linux system. Quick googling shows only Linux drivers for accessing array itself on previous versions of kernel, but I want to know SMART s...

                                  I use Marvell 88SE9230 controller on my home Linux server. HP does have utility to setup raid and get some stats. But I'm wondering how to get any status from a Linux system. Quick googling shows only Linux drivers for accessing array itself on previous versions of kernel, but I want to know SMART status of drives.
 
Smartctl doesn't work:

    root@iris:~# smartctl -a -d marvell -T verypermissive /dev/sda
    smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.0-96-generic] (local build)
    Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org
    
    Read Device Identity failed: Unknown error
    
    === START OF INFORMATION SECTION ===
    Device Model:     [No Information Found]
    Serial Number:    [No Information Found]
    Firmware Version: [No Information Found]
    Device is:        Not in smartctl database [for details use: -P showall]
    ATA Version is:   [No Information Found]
    Local Time is:    Thu Jan 27 19:11:54 2022 MSK
    SMART support is: Ambiguous - ATA IDENTIFY DEVICE words 82-83 don't show if SMART supported.
    SMART support is: Ambiguous - ATA IDENTIFY DEVICE words 85-87 don't show if SMART is enabled.
                      Checking to be sure by trying SMART RETURN STATUS command.
    SMART support is: Unknown - Try option -s with argument 'on' to enable it.
    Read SMART Data failed: Success
    
    === START OF READ SMART DATA SECTION ===
    SMART Status command failed: Success
    SMART overall-health self-assessment test result: UNKNOWN!
    SMART Status, Attributes and Thresholds cannot be read.
    
    Read SMART Error Log failed: Success
    
    Read SMART Self-test Log failed: Success
    
    Selective Self-tests/Logging not supported

How can I get at least some stats from controller? 
                                

Hills of Eternity (88 rep)

Jan 27, 2022, 04:14 PM • Last activity: Apr 3, 2022, 05:47 PM

0 votes

1 answers

2520 views

How to disable smartd

osx smartctl smart

So I have a `Mac` which I installed `smartontools`, to see my `smart data`. And I thought that `smartd` would be helpful in doing short tests on my `Mac SSD` But I found out via `Google` that `smartd` only runs tests during `03:00am` and no way my Mac will be powered on at that time. I understand th...

                                  So I have a Mac which I installed smartontools, to see my smart data. 

And I thought that smartd would be helpful in doing short tests on my Mac SSD

But I found out via Google that smartd only runs tests during 03:00am and no way my Mac will be powered on at that time. 

I understand that smartd is for servers which run 24/7 so there is no use of smartd. 

So I would like to disable it and write my own simple bash script which runs short tests on my Mac SSD. 

So is there any way I can disable smartd or remove it without affecting smartctl?

Jddhhdhdi283838 (1 rep)

Dec 25, 2017, 12:56 PM • Last activity: Feb 13, 2022, 09:02 PM

3 votes

0 answers

1190 views

Linux-image generates samsung nvme errors

kernel ssd nvme smart

After upgrade from kernel **5.10** to **5.14** and now to **5.15.3**. There are errors increased in smart after every boot in Samsung 970 Evo nvme disk (and most likely others Samsung's nvmes) like this one: `Error Information Log Entries: 41` <- increased after every boot `nvme error-log /dev/nvme0...

                                  After upgrade from kernel **5.10** to **5.14** and now to **5.15.3**. There are errors increased in smart after every boot in Samsung 970 Evo nvme disk (and most likely others Samsung's nvmes) like this one:

Error Information Log Entries: 41   <- increased after every boot

nvme error-log /dev/nvme0:

    status_field    : 0x4004(INVALID_FIELD: A reserved coded value or an unsupported value in a defined field)

Does someone know if this is a bug ? or maybe kernel try to talk to 
nvme disk in non supported (by a disk) way ? or maybe there is a need 
for some kind of exclude samsung disks from this kind of communication ?

Is there any kernel boot parameter solving this behavior ?
It is worth to mention this issue is not present in 5.10 kernel. 

---
There is a bug report filled 27 Sep 2021:  https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=995161  but seems like nothing has changed since then.

EdiD (342 rep)

Nov 26, 2021, 11:42 AM • Last activity: Dec 1, 2021, 12:30 PM

Showing page 1 of 20 total questions