Sample Header Ad - 728x90

Unix & Linux Stack Exchange

Q&A for users of Linux, FreeBSD and other Unix-like operating systems

Latest Questions

0 votes
0 answers
36 views
Filesystem becomes read-only at random
Debian crashed on Laptop (Acer Aspire 3, about 4 years old, HDD replaced with ADATA SU650 240GB SSD) and started throwing console errors reading "failed to rotate /var/log/journal: read-only filesystem". It rebooted fine, but a while later refused to load websites and eventually crashed again. Right...
Debian crashed on Laptop (Acer Aspire 3, about 4 years old, HDD replaced with ADATA SU650 240GB SSD) and started throwing console errors reading "failed to rotate /var/log/journal: read-only filesystem". It rebooted fine, but a while later refused to load websites and eventually crashed again. Right now, it's working fine. After a quick Google search I installed smartctl to figure out the problem, and though it prints an overall "PASSED", it does have some attributes output "Pre-failed" and I'm not exactly sure how to interpret the rest of the values. Here's the output: smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-37-amd64] (local build) Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Silicon Motion based SSDs Device Model: ADATA SU650 Serial Number: 2N20292G46UJ LU WWN Device Id: 0 000000 000000000 Firmware Version: XD0R6305 User Capacity: 240,057,409,536 bytes [240 GB] Sector Size: 512 bytes logical/physical Rotation Rate: Solid State Device Form Factor: 2.5 inches TRIM Command: Available, deterministic Device is: In smartctl database 7.3/5319 ATA Version is: ACS-3, ATA8-ACS T13/1699-D revision 6 SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is: Sun Jun 29 21:36:52 2025 -03 SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 1) seconds. Offline data collection capabilities: (0x59) SMART execute Offline immediate. No Auto Offline data collection support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0002) Does not save SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 2) minutes. SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 100 100 050 Pre-fail Always - 0 5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 929 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 1439 160 Uncorrectable_Error_Cnt 0x0032 100 100 050 Old_age Always - 0 161 Valid_Spare_Block_Cnt 0x0032 100 100 050 Old_age Always - 100 163 Initial_Bad_Block_Count 0x0032 100 100 000 Old_age Always - 48 164 Total_Erase_Count 0x0032 100 100 000 Old_age Always - 87382 165 Max_Erase_Count 0x0032 100 100 000 Old_age Always - 156 166 Min_Erase_Count 0x0032 100 100 000 Old_age Always - 44 167 Average_Erase_Count 0x0032 100 100 000 Old_age Always - 109 148 Total_SLC_Erase_Ct 0x0032 100 100 000 Old_age Always - 262148 149 Max_SLC_Erase_Ct 0x0032 100 100 000 Old_age Always - 468 150 Min_SLC_Erase_Ct 0x0032 100 100 000 Old_age Always - 132 151 Average_SLC_Erase_Ct 0x0032 100 100 000 Old_age Always - 329 159 DRAM_1_Bit_Error_Count 0x0032 100 100 000 Old_age Always - 0 168 Max_Erase_Count_of_Spec 0x0032 100 100 000 Old_age Always - 468 169 Remaining_Lifetime_Perc 0x0032 100 100 000 Old_age Always - 98 177 Wear_Leveling_Count 0x0032 100 100 000 Old_age Always - 1823 181 Program_Fail_Cnt_Total 0x0032 100 100 000 Old_age Always - 0 182 Erase_Fail_Count_Total 0x0032 100 100 000 Old_age Always - 0 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 77 194 Temperature_Celsius 0x0032 100 100 000 Old_age Always - 26 195 Hardware_ECC_Recovered 0x0032 100 100 000 Old_age Always - 403177 196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0 199 UDMA_CRC_Error_Count 0x0032 100 100 000 Old_age Always - 0 232 Available_Reservd_Space 0x0032 100 100 000 Old_age Always - 100 241 Host_Writes_32MiB 0x0032 100 100 000 Old_age Always - 139845 242 Host_Reads_32MiB 0x0032 100 100 000 Old_age Always - 143114 245 TLC_Writes_32MiB 0x0032 100 100 000 Old_age Always - 296002 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 No self-tests have been logged. [To run self-tests, use: smartctl -t] SMART Selective self-test log data structure revision number 0 Note: revision number not 1 implies that no selective self-test has ever been run SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. I'd greatly appreciate some advice on what these values mean and what can be done about them. I know that "Old_age" means the device is worn and "Pre-fail" means it's about to give, but I don't really know if this reflects normal wear, lack of maintenance, or is recoverable from. Thanks in advance!
geistofsttraft (1 rep)
Jun 30, 2025, 12:45 AM • Last activity: Jun 30, 2025, 12:46 AM
4 votes
1 answers
3211 views
NVMe errors diagnostics
I would like to understand why I get the below mails about S.M.A.R.T. of my new NVMe drive. **DMESG** ```lang-none $ dmesg --ctime | grep -i nvm [Mon Aug 8 10:48:31 2022] nvme nvme0: pci function 0000:3d:00.0 [Mon Aug 8 10:48:31 2022] nvme nvme0: missing or invalid SUBNQN field. [Mon Aug 8 10:48:31...
I would like to understand why I get the below mails about S.M.A.R.T. of my new NVMe drive. **DMESG**
-none
$ dmesg --ctime | grep -i nvm

[Mon Aug  8 10:48:31 2022] nvme nvme0: pci function 0000:3d:00.0
[Mon Aug  8 10:48:31 2022] nvme nvme0: missing or invalid SUBNQN field.
[Mon Aug  8 10:48:31 2022] nvme nvme0: Shutdown timeout set to 8 seconds
[Mon Aug  8 10:48:31 2022] nvme nvme0: 8/0/0 default/read/poll queues
[Mon Aug  8 10:48:31 2022]  nvme0n1: p1 p2
[Mon Aug  8 10:48:37 2022] EXT4-fs (nvme0n1p2): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.
[Mon Aug  8 10:48:37 2022] EXT4-fs (nvme0n1p2): re-mounted. Opts: errors=remount-ro. Quota mode: none.
**NVME ERRORS**
-none
$ sudo nvme error-log /dev/nvme0

...
 Entry   
.................
error_count     : 0
sqid            : 0
cmdid           : 0
status_field    : 0(SUCCESS: The command completed successfully)
phase_tag       : 0
parm_err_loc    : 0
lba             : 0
nsid            : 0
vs              : 0
trtype          : The transport type is not indicated or the error is not transport related.
cs              : 0
trtype_spec_info: 0
.................
...
Could anyone shed some light on why I am getting new mails like this: **MAIL**
-none
# mail

Message 44:
From root@dell-laptop-CENSORED  Sun Aug  7 08:13:07 2022
X-Original-To: root
To: root@dell-laptop-CENSORED
Subject: SMART error (ErrorCount) detected on host: dell-inspiron-15
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 8bit
Date: Sun,  7 Aug 2022 08:12:59 +0200 (CEST)
From: root 


This message was generated by the smartd daemon running on:

   host name:  dell-inspiron-15
   DNS domain: [Empty]

The following warning/error was logged by the smartd daemon:

Device: /dev/nvme0, number of Error Log entries increased from 485 to 486

Device info:
Samsung SSD 970 EVO Plus 2TB, S/N:, FW:2B2QEXM7, 2.00 TB
                                    
For details see host's SYSLOG.

You can also use the smartctl utility for further investigation.
The original message about this issue was sent at Fri Apr 22 09:53:56 2022 CEST
Another message will be sent in 24 hours if the problem persists.
**SMART**
-none
# smartctl -a /dev/nvme0n1

smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-43-generic] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       Samsung SSD 970 EVO Plus 2TB
Serial Number:                      
Firmware Version:                   2B2QEXM7
PCI Vendor/Subsystem ID:            0x144d
IEEE OUI Identifier:                0x002538
Total NVM Capacity:                 2,000,398,934,016 [2.00 TB]
Unallocated NVM Capacity:           0
Controller ID:                      4
NVMe Version:                       1.3
Number of Namespaces:               1
Namespace 1 Size/Capacity:          2,000,398,934,016 [2.00 TB]
Namespace 1 Utilization:            544,784,187,392 [544 GB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            002538 5221904ad7
Local Time is:                      Mon Aug  8 11:13:10 2022 CEST
Firmware Updates (0x16):            3 Slots, no Reset required
Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
Optional NVM Commands (0x005f):     Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Log Page Attributes (0x03):         S/H_per_NS Cmd_Eff_Lg
Maximum Data Transfer Size:         512 Pages
Warning  Comp. Temp. Threshold:     85 Celsius
Critical Comp. Temp. Threshold:     85 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     7.50W       -        -    0  0  0  0        0       0
 1 +     5.90W       -        -    1  1  1  1        0       0
 2 +     3.60W       -        -    2  2  2  2        0       0
 3 -   0.0700W       -        -    3  3  3  3      210    1200
 4 -   0.0050W       -        -    4  4  4  4     2000    8000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        44 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    0%
Data Units Read:                    5,565,230 [2.84 TB]
Data Units Written:                 2,658,490 [1.36 TB]
Host Read Commands:                 29,877,415
Host Write Commands:                18,211,598
Controller Busy Time:               112
Power Cycles:                       240
Power On Hours:                     215
Unsafe Shutdowns:                   5
Media and Data Integrity Errors:    0
Error Information Log Entries:      502
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Temperature Sensor 1:               44 Celsius
Temperature Sensor 2:               39 Celsius

Error Information (NVMe Log 0x01, 16 of 64 entries)
Num   ErrCount  SQId   CmdId  Status  PELoc          LBA  NSID    VS
  0        502     0  0x1005  0x4004      -            0     0     -
**SYSLOG**
-none
# cat /var/log/syslog | grep -i smart | grep -i nvm

Aug  7 16:08:27 dell-inspiron-15 smartd: Device: /dev/nvme0, opened
Aug  7 16:08:27 dell-inspiron-15 smartd: Device: /dev/nvme0, Samsung SSD 970 EVO Plus 2TB, S/N:S4J4NM0T201785H, FW:2B2QEXM7, 2.00 TB
Aug  7 16:08:27 dell-inspiron-15 smartd: Device: /dev/nvme0, is SMART capable. Adding to "monitor" list.
Aug  7 16:08:27 dell-inspiron-15 smartd: Device: /dev/nvme0, state read from /var/lib/smartmontools/smartd.Samsung_SSD_970_EVO_Plus_2TB-S4J4NM0T201785H.nvme.state
Aug  7 16:08:27 dell-inspiron-15 smartd: Monitoring 1 ATA/SATA, 0 SCSI/SAS and 1 NVMe devices
Aug  7 16:08:28 dell-inspiron-15 smartd: Device: /dev/nvme0, number of Error Log entries increased from 486 to 487
Aug  7 16:08:28 dell-inspiron-15 smartd: Device: /dev/nvme0, state written to /var/lib/smartmontools/smartd.Samsung_SSD_970_EVO_Plus_2TB-S4J4NM0T201785H.nvme.state
Aug  8 07:17:38 dell-inspiron-15 smartd: Device: /dev/nvme0, opened
Aug  8 07:17:38 dell-inspiron-15 smartd: Device: /dev/nvme0, Samsung SSD 970 EVO Plus 2TB, S/N:S4J4NM0T201785H, FW:2B2QEXM7, 2.00 TB
Aug  8 07:17:38 dell-inspiron-15 smartd: Device: /dev/nvme0, is SMART capable. Adding to "monitor" list.
Aug  8 07:17:38 dell-inspiron-15 smartd: Device: /dev/nvme0, state read from /var/lib/smartmontools/smartd.Samsung_SSD_970_EVO_Plus_2TB-S4J4NM0T201785H.nvme.state
Aug  8 07:17:38 dell-inspiron-15 smartd: Monitoring 1 ATA/SATA, 0 SCSI/SAS and 1 NVMe devices
Aug  8 07:17:38 dell-inspiron-15 smartd: Device: /dev/nvme0, number of Error Log entries increased from 487 to 488
Aug  8 07:17:38 dell-inspiron-15 smartd: Device: /dev/nvme0, state written to /var/lib/smartmontools/smartd.Samsung_SSD_970_EVO_Plus_2TB-S4J4NM0T201785H.nvme.state
Aug  8 08:21:16 dell-inspiron-15 smartd: Device: /dev/nvme0, state written to /var/lib/smartmontools/smartd.Samsung_SSD_970_EVO_Plus_2TB-S4J4NM0T201785H.nvme.state
Aug  8 11:14:00 dell-inspiron-15 smartd: Device: /dev/nvme0, opened
Aug  8 11:14:00 dell-inspiron-15 smartd: Device: /dev/nvme0, Samsung SSD 970 EVO Plus 2TB, S/N:S4J4NM0T201785H, FW:2B2QEXM7, 2.00 TB
Aug  8 11:14:00 dell-inspiron-15 smartd: Device: /dev/nvme0, is SMART capable. Adding to "monitor" list.
Aug  8 11:14:00 dell-inspiron-15 smartd: Device: /dev/nvme0, state read from /var/lib/smartmontools/smartd.Samsung_SSD_970_EVO_Plus_2TB-S4J4NM0T201785H.nvme.state
Aug  8 11:14:00 dell-inspiron-15 smartd: Monitoring 1 ATA/SATA, 0 SCSI/SAS and 1 NVMe devices
Aug  8 11:14:00 dell-inspiron-15 smartd: Device: /dev/nvme0, number of Error Log entries increased from 488 to 494
Aug  8 11:14:01 dell-inspiron-15 smartd: Device: /dev/nvme0, state written to /var/lib/smartmontools/smartd.Samsung_SSD_970_EVO_Plus_2TB-S4J4NM0T201785H.nvme.state
Aug  8 12:48:40 dell-inspiron-15 smartd: Device: /dev/nvme0, opened
Aug  8 12:48:40 dell-inspiron-15 smartd: Device: /dev/nvme0, Samsung SSD 970 EVO Plus 2TB, S/N:S4J4NM0T201785H, FW:2B2QEXM7, 2.00 TB
Aug  8 12:48:40 dell-inspiron-15 smartd: Device: /dev/nvme0, is SMART capable. Adding to "monitor" list.
Aug  8 12:48:40 dell-inspiron-15 smartd: Device: /dev/nvme0, state read from /var/lib/smartmontools/smartd.Samsung_SSD_970_EVO_Plus_2TB-S4J4NM0T201785H.nvme.state
Aug  8 12:48:40 dell-inspiron-15 smartd: Monitoring 1 ATA/SATA, 0 SCSI/SAS and 1 NVMe devices
Aug  8 12:48:40 dell-inspiron-15 smartd: Device: /dev/nvme0, number of Error Log entries increased from 494 to 502
Aug  8 12:48:40 dell-inspiron-15 smartd: Device: /dev/nvme0, state written to /var/lib/smartmontools/smartd.Samsung_SSD_970_EVO_Plus_2TB-S4J4NM0T201785H.nvme.state
Vlastimil Burián (30505 rep)
Aug 8, 2022, 09:22 AM • Last activity: Oct 29, 2024, 10:53 AM
8 votes
2 answers
7379 views
SMART health-test and status
I have an external USB-drive which is giving me the following output on running the command $ smartctl /dev/sdb -H on it: SMART Status not supported: Incomplete response, ATA output registers missing SMART overall-health self-assessment test result: PASSED Warning: This result is based on an Attribu...
I have an external USB-drive which is giving me the following output on running the command $ smartctl /dev/sdb -H on it: SMART Status not supported: Incomplete response, ATA output registers missing SMART overall-health self-assessment test result: PASSED Warning: This result is based on an Attribute check. Could you elaborate if this is something to worry about or if it is just a wrong setting? Generally, what is the meaning of the health status in simplified form? Maybe as a relevant aside: The short and long tests finish without issues.
user3058865 (183 rep)
Aug 6, 2017, 02:34 PM • Last activity: Apr 2, 2024, 07:41 PM
24 votes
5 answers
25365 views
Linux - Repairing bad blocks on a RAID1 array with GPT
The tl;dr: how would I go about fixing a bad block on 1 disk in a RAID1 array? But please read this whole thing for what I've tried already and possible errors in my methods. I've tried to be as detailed as possible, and I'm really hoping for some feedback This is my situation: I have two 2TB disks...
The tl;dr: how would I go about fixing a bad block on 1 disk in a RAID1 array? But please read this whole thing for what I've tried already and possible errors in my methods. I've tried to be as detailed as possible, and I'm really hoping for some feedback This is my situation: I have two 2TB disks (same model) set up in a RAID1 array managed by mdadm. About 6 months ago I noticed the first bad block when SMART reported it. Today I noticed more, and am now trying to fix it. This HOWTO page seems to be the one article everyone links to to fix bad blocks that SMART is reporting. It's a great page, full of info, however it is fairly outdated and doesn't address my particular setup. Here is how my config is different: * Instead of one disk, I'm using two disks in a RAID1 array. One disk is reporting errors while the other is fine. The HOWTO is written with only one disk in mind, which bring up various questions such as 'do I use this command on the disk device or the RAID device'? * I'm using GPT, which fdisk does not support. I've been using gdisk instead, and I'm hoping that it is giving me the same info that I need So, lets get down to it. This is what I have done, however it doesn't seem to be working. Please feel free to double check my calculations and method for errors. The disk reporting errors is /dev/sda: # smartctl -l selftest /dev/sda smartctl 5.42 2011-10-20 r3458 [x86_64-linux-3.4.4-2-ARCH] (local build) Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net === START OF READ SMART DATA SECTION === SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed: read failure 90% 12169 3212761936 With this, we gather that the error resides on LBA 3212761936. Following the HOWTO, I use gdisk to find the start sector to be used later in determining the block number (as I cannot use fdisk since it does not support GPT): # gdisk -l /dev/sda GPT fdisk (gdisk) version 0.8.5 Partition table scan: MBR: protective BSD: not present APM: not present GPT: present Found valid GPT with protective MBR; using GPT. Disk /dev/sda: 3907029168 sectors, 1.8 TiB Logical sector size: 512 bytes Disk identifier (GUID): CFB87C67-1993-4517-8301-76E16BBEA901 Partition table holds up to 128 entries First usable sector is 34, last usable sector is 3907029134 Partitions will be aligned on 2048-sector boundaries Total free space is 2014 sectors (1007.0 KiB) Number Start (sector) End (sector) Size Code Name 1 2048 3907029134 1.8 TiB FD00 Linux RAID Using tunefs I find the blocksize to be 4096. Using this info and the calculuation from the HOWTO, I conclude that the block in question is ((3212761936 - 2048) * 512) / 4096 = 401594986. The HOWTO then directs me to debugfs to see if the block is in use (I use the RAID device as it needs an EXT filesystem, this was one of the commands that confused me as I did not, at first, know if I should use /dev/sda or /dev/md0): # debugfs debugfs 1.42.4 (12-June-2012) debugfs: open /dev/md0 debugfs: testb 401594986 Block 401594986 not in use So block 401594986 is empty space, I should be able to write over it without problems. Before writing to it, though, I try to make sure that it, indeed, cannot be read: # dd if=/dev/sda1 of=/dev/null bs=4096 count=1 seek=401594986 1+0 records in 1+0 records out 4096 bytes (4.1 kB) copied, 0.000198887 s, 20.6 MB/s If the block could not be read, I wouldn't expect this to work. However, it does. I repeat using /dev/sda, /dev/sda1, /dev/sdb, /dev/sdb1, /dev/md0, and +-5 to the block number to search around the bad block. It all works. I shrug my shoulders and go ahead and commit the write and sync (I use /dev/md0 because I figured modifying one disk and not the other might cause issues, this way both disks overwrite the bad block): # dd if=/dev/zero of=/dev/md0 bs=4096 count=1 seek=401594986 1+0 records in 1+0 records out 4096 bytes (4.1 kB) copied, 0.000142366 s, 28.8 MB/s # sync I would expect that writing to the bad block would have the disks reassign the block to a good one, however running another SMART test shows differently: # 1 Short offline Completed: read failure 90% 12170 3212761936 Back to square 1. So basically, how would I fix a bad block on 1 disk in a RAID1 array? I'm sure I've not done something correctly... Thanks for your time and patience. ---------- EDIT 1: ------- I've tried to run an long SMART test, with the same LBA returning as bad (the only difference is it reports 30% remaining rather than 90%): SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed: read failure 30% 12180 3212761936 # 2 Short offline Completed: read failure 90% 12170 3212761936 I've also used badblocks with the following output. The output is strange and seems to be miss-formatted, but I tried to test the numbers outputed as blocks but debugfs gives an error # badblocks -sv /dev/sda Checking blocks 0 to 1953514583 Checking for bad blocks (read-only test): 1606380968ne, 3:57:08 elapsed. (0/0/0 errors) 1606380969ne, 3:57:39 elapsed. (1/0/0 errors) 1606380970ne, 3:58:11 elapsed. (2/0/0 errors) 1606380971ne, 3:58:43 elapsed. (3/0/0 errors) done Pass completed, 4 bad blocks found. (4/0/0 errors) # debugfs debugfs 1.42.4 (12-June-2012) debugfs: open /dev/md0 debugfs: testb 1606380968 Illegal block number passed to ext2fs_test_block_bitmap #1606380968 for block bitmap for /dev/md0 Block 1606380968 not in use Not sure where to go from here. badblocks definitely found something, but I'm not sure what to do with the information presented... ---------- EDIT 2 ------ More commands and info. I feel like an idiot forgetting to include this originally. This is SMART values for /dev/sda. I have 1 Current_Pending_Sector, and 0 Offline_Uncorrectable. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 100 100 051 Pre-fail Always - 166 2 Throughput_Performance 0x0026 055 055 000 Old_age Always - 18345 3 Spin_Up_Time 0x0023 084 068 025 Pre-fail Always - 5078 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 75 5 Reallocated_Sector_Ct 0x0033 252 252 010 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 252 252 051 Old_age Always - 0 8 Seek_Time_Performance 0x0024 252 252 015 Old_age Offline - 0 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 12224 10 Spin_Retry_Count 0x0032 252 252 051 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 252 252 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 75 181 Program_Fail_Cnt_Total 0x0022 100 100 000 Old_age Always - 1646911 191 G-Sense_Error_Rate 0x0022 100 100 000 Old_age Always - 12 192 Power-Off_Retract_Count 0x0022 252 252 000 Old_age Always - 0 194 Temperature_Celsius 0x0002 064 059 000 Old_age Always - 36 (Min/Max 22/41) 195 Hardware_ECC_Recovered 0x003a 100 100 000 Old_age Always - 0 196 Reallocated_Event_Count 0x0032 252 252 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 100 100 000 Old_age Always - 1 198 Offline_Uncorrectable 0x0030 252 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0036 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x002a 100 100 000 Old_age Always - 30 223 Load_Retry_Count 0x0032 252 252 000 Old_age Always - 0 225 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 77 # mdadm -D /dev/md0 /dev/md0: Version : 1.2 Creation Time : Thu May 5 06:30:21 2011 Raid Level : raid1 Array Size : 1953512383 (1863.01 GiB 2000.40 GB) Used Dev Size : 1953512383 (1863.01 GiB 2000.40 GB) Raid Devices : 2 Total Devices : 2 Persistence : Superblock is persistent Update Time : Tue Jul 3 22:15:51 2012 State : clean Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Name : server:0 (local to host server) UUID : e7ebaefd:e05c9d6e:3b558391:9b131afb Events : 67889 Number Major Minor RaidDevice State 2 8 1 0 active sync /dev/sda1 1 8 17 1 active sync /dev/sdb1 As per one of the answers: it would seem I did switch seek and skip for dd. I was using seek as that's what is used with the HOWTO. Using this command causes dd to hang: # dd if=/dev/sda1 of=/dev/null bs=4096 count=1 skip=401594986 Using blocks around that one (..84, ..85, ..87, ..88) seems to work just fine, and using /dev/sdb1 with block 401594986 reads just fine as well (as expected as that disk passed SMART testing). Now, the question that I have is: When writing over this area to reassign the blocks, do I use /dev/sda1 or /dev/md0? I don't want to cause any issues with the RAID array by writing directly to one disk and not having the other disk update. EDIT 3 ------ Writing to the block directly produced filesystem errors. I've chosen an answer that solved the problem quickly: # 1 Short offline Completed without error 00% 14211 - # 2 Extended offline Completed: read failure 30% 12244 3212761936 Thanks to everyone who helped. =)
blitzmann (405 rep)
Jul 3, 2012, 10:24 PM • Last activity: Feb 8, 2024, 02:00 PM
2 votes
0 answers
490 views
Are SMART offline data collection and offline attributes obsolete?
**TLDR;** I tried to understand the difference between SMART `Offline` and `Always` attributes, and thus to understand what SMART offline data collection is, and if I should enable it on my HDDs. `smartmontools`' [official wiki][1] states that : > Note that the SMART automatic offline test command i...
**TLDR;** I tried to understand the difference between SMART Offline and Always attributes, and thus to understand what SMART offline data collection is, and if I should enable it on my HDDs. smartmontools' official wiki states that : > Note that the SMART automatic offline test command is listed as Obsolete in every version of the ATA and ATA/ATAPI Specifications. (...) However it is implemented and used by many vendors. After some extensive reading on the web, and also some tests, the conclusion I reached is: - Nowadays SMART offline data collection is obsolete - All data is updated in real time (e.g. Offline and Always attributes behave the same way) - There is no need to enable "Auto Offline Data Collection" (# smartctl --offlineauto=on /dev/sda), nor to ever launch it manually (# smartctl -t offline /dev/sda). - As for the reason all this offline stuff is still in smartmontools, it's probably to keep it compatible with some very old HDDs that indeed implemented real offline attributes. Am I right ? Or do I miss something ? ---------- **MORE DETAILS** I did some tests on a HDD, which has 3 offline attributes (and has Auto Offline Data Collection disabled):
# smartctl -a /dev/sda
(...)
Offline data collection status:  (0x00)    Offline data collection activity
                    was never started.
                    Auto Offline Data Collection: Disabled.
(...)
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       235 (114 97 0)
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       13381561756
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       20472945077
(...)
I then wrote some data on that drive, and noted that all 3 attributes were updated in real time. Thus, they are in fact online (or Always) attributes, and not Offline ones. I did the same test on a few other HDDs, the behavior was identical.
ChennyStar (1969 rep)
Nov 6, 2023, 04:55 PM • Last activity: Nov 6, 2023, 05:01 PM
2 votes
1 answers
3395 views
How can I view the smart logs for an NVMe disk in Linux when smartclt is showing there are errors?
My daily driver (Debian Bookworm RC3 + KDE Plasma) is configured to send me emails containing error notifications. Today, I received the following email: ``` This message was generated by the smartd daemon running on: host name: desk DNS domain: local.lan The following warning/error was logged by th...
My daily driver (Debian Bookworm RC3 + KDE Plasma) is configured to send me emails containing error notifications. Today, I received the following email:
This message was generated by the smartd daemon running on:

   host name:  desk
   DNS domain: local.lan

The following warning/error was logged by the smartd daemon:

Device: /dev/nvme0, number of Error Log entries increased from 1754 to 1758

Device info:
KBG30ZMV256G TOSHIBA, S/N:X8OPD1PGP12P, FW:ADHA0101

For details see host's SYSLOG.

You can also use the smartctl utility for further investigation.
The original message about this issue was sent at Wed May 17 16:09:04 2023 EDT
Another message will be sent in 24 hours if the problem persists.
This is what sudo journalctl -t smart shows:
May 20 15:19:47 desk smartd: smartd 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-9-amd64] (local build)
May 20 15:19:47 desk smartd: Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org
May 20 15:19:47 desk smartd: Opened configuration file /etc/smartd.conf
May 20 15:19:47 desk smartd: Drive: DEVICESCAN, implied '-a' Directive on line 21 of file /etc/smartd.conf
May 20 15:19:47 desk smartd: Configuration file /etc/smartd.conf was parsed, found DEVICESCAN, scanning devices
May 20 15:19:47 desk smartd: Device: /dev/sda, type changed from 'scsi' to 'sat'
May 20 15:19:47 desk smartd: Device: /dev/sda [SAT], opened
May 20 15:19:47 desk smartd: Device: /dev/sda [SAT], CT4000MX500SSD1, S/N:2304E6A3D318, WWN:5-00a075-1e6a3d318, FW:M3CR045, 4.00 TB
May 20 15:19:47 desk smartd: Device: /dev/sda [SAT], not found in smartd database 7.3/5319.
May 20 15:19:47 desk smartd: Device: /dev/sda [SAT], is SMART capable. Adding to "monitor" list.
May 20 15:19:47 desk smartd: Device: /dev/sda [SAT], state read from /var/lib/smartmontools/smartd.CT4000MX500SSD1-2304E6A3D318.ata.state
May 20 15:19:47 desk smartd: Device: /dev/nvme0, opened
May 20 15:19:47 desk smartd: Device: /dev/nvme0, KBG30ZMV256G TOSHIBA, S/N:X8OPD1PGP12P, FW:ADHA0101
May 20 15:19:47 desk smartd: Device: /dev/nvme0, is SMART capable. Adding to "monitor" list.
May 20 15:19:47 desk smartd: Device: /dev/nvme0, state read from /var/lib/smartmontools/smartd.KBG30ZMV256G_TOSHIBA-X8OPD1PGP12P.nvme.state
May 20 15:19:47 desk smartd: Monitoring 1 ATA/SATA, 0 SCSI/SAS and 1 NVMe devices
May 20 15:19:48 desk smartd: Device: /dev/nvme0, number of Error Log entries increased from 1754 to 1758
May 20 15:19:48 desk smartd: Sending warning via /usr/share/smartmontools/smartd-runner to root ...
May 20 15:19:48 desk smartd: Warning via /usr/share/smartmontools/smartd-runner to root: successful
May 20 15:19:48 desk smartd: Device: /dev/sda [SAT], state written to /var/lib/smartmontools/smartd.CT4000MX500SSD1-2304E6A3D318.ata.state
May 20 15:19:48 desk smartd: Device: /dev/nvme0, state written to /var/lib/smartmontools/smartd.KBG30ZMV256G_TOSHIBA-X8OPD1PGP12P.nvme.state
May 20 15:49:48 desk smartd: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 73 to 74
May 20 22:49:48 desk smartd: Device: /dev/nvme0, number of Error Log entries increased from 1758 to 1760
When I run sudo smartctl -i -a /dev/nvme0, it shows me the error count, but I can't figure out how to see the log message associated to the increase count:
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-9-amd64] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       KBG30ZMV256G TOSHIBA
Serial Number:                      X8OPD1PGP12P
Firmware Version:                   ADHA0101
PCI Vendor/Subsystem ID:            0x1179
IEEE OUI Identifier:                0x00080d
Controller ID:                      0
NVMe Version:                       1.2.1
Number of Namespaces:               1
Namespace 1 Size/Capacity:          256,060,514,304 [256 GB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            00080d 04004ad9aa
Local Time is:                      Sat May 20 23:09:32 2023 EDT
Firmware Updates (0x12):            1 Slot, no Reset required
Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
Optional NVM Commands (0x0017):     Comp Wr_Unc DS_Mngmt Sav/Sel_Feat
Log Page Attributes (0x02):         Cmd_Eff_Lg
Maximum Data Transfer Size:         512 Pages
Warning  Comp. Temp. Threshold:     82 Celsius
Critical Comp. Temp. Threshold:     85 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     3.30W       -        -    0  0  0  0        0       0
 1 +     2.70W       -        -    1  1  1  1        0       0
 2 +     2.30W       -        -    2  2  2  2        0       0
 3 -   0.0500W       -        -    4  4  4  4     8000   32000
 4 -   0.0050W       -        -    4  4  4  4     8000   40000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 -    4096       0         0
 1 +     512       0         3

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        32 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    30%
Data Units Read:                    23,188,612 [11.8 TB]
Data Units Written:                 39,727,036 [20.3 TB]
Host Read Commands:                 222,771,983
Host Write Commands:                498,052,687
Controller Busy Time:               7,440
Power Cycles:                       291
Power On Hours:                     20,378
Unsafe Shutdowns:                   615
Media and Data Integrity Errors:    0
Error Information Log Entries:      1,760
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Temperature Sensor 1:               32 Celsius

Error Information (NVMe Log 0x01, 16 of 64 entries)
Num   ErrCount  SQId   CmdId  Status  PELoc          LBA  NSID    VS
  0       1760     0  0x501a  0xc005  0x028            -     1     -
  1       1759     0  0xb012  0xc005  0x028            -     1     -
  2       1758     0  0x5010  0xc005  0x028            -     0     -
How can I figure out what the errors are?
IMTheNachoMan (433 rep)
May 21, 2023, 03:12 AM • Last activity: Sep 28, 2023, 08:38 AM
1 votes
1 answers
232 views
SMART error (CurrentPendingSector) and (OfflineUncorrectableSector)
I have been receiving the following error messages every day for several months now, and I do not know how to stop receiving these messages. `CurrentPendingSector` ``` This message was generated by the smartd daemon running on: host name: myhost DNS domain: [Empty] The following warning/error was lo...
I have been receiving the following error messages every day for several months now, and I do not know how to stop receiving these messages. CurrentPendingSector
This message was generated by the smartd daemon running on:

   host name:  myhost
   DNS domain: [Empty]

The following warning/error was logged by the smartd daemon:

Device: /dev/sda [SAT], 6 Currently unreadable (pending) sectors

Device info:
KingFast, S/N:03112222C0002, FW:U0803A0, 256 GB

For details see host's SYSLOG.

You can also use the smartctl utility for further investigation.
The original message about this issue was sent at Fri Feb  3 19:41:29 2023 PST
Another message will be sent in 24 hours if the problem persists.
OfflineUncorrectableSector
This message was generated by the smartd daemon running on:

   host name:  myhost
   DNS domain: [Empty]

The following warning/error was logged by the smartd daemon:

Device: /dev/sda [SAT], 3 Offline uncorrectable sectors

Device info:
KingFast, S/N:03112222C0002, FW:U0803A0, 256 GB

For details see host's SYSLOG.

You can also use the smartctl utility for further investigation.
The original message about this issue was sent at Fri Feb  3 19:41:29 2023 PST
Another message will be sent in 24 hours if the problem persists.
smartctl -a /dev/sda
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.19.0-46-generic] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     KingFast
Serial Number:    03112222C0002
Firmware Version: U0803A0
User Capacity:    256,060,514,304 bytes [256 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
TRIM Command:     Available
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-2 T13/2015-D revision 3
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sat Jul  8 15:44:59 2023 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x02)	Offline data collection activity
					was completed without error.
					Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		(  120) seconds.
Offline data collection
capabilities: 			 (0x11) SMART execute Offline immediate.
					No Auto Offline data collection support.
					Suspend Offline collection upon new
					command.
					No Offline surface scan supported.
					Self-test supported.
					No Conveyance Self-test supported.
					No Selective Self-test supported.
SMART capabilities:            (0x0002)	Does not save SMART data before
					entering power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 (  10) minutes.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x0032   100   100   050    Old_age   Always       -       0
  5 Reallocated_Sector_Ct   0x0032   100   100   050    Old_age   Always       -       6
  9 Power_On_Hours          0x0032   100   100   050    Old_age   Always       -       3335
 12 Power_Cycle_Count       0x0032   100   100   050    Old_age   Always       -       440
160 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       3
161 Unknown_Attribute       0x0033   100   100   050    Pre-fail  Always       -       86
163 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       26
164 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       79004
165 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       481
166 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       6
167 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       114
168 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       5050
169 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       98
175 Program_Fail_Count_Chip 0x0032   100   100   050    Old_age   Always       -       0
176 Erase_Fail_Count_Chip   0x0032   100   100   050    Old_age   Always       -       0
177 Wear_Leveling_Count     0x0032   100   100   050    Old_age   Always       -       0
178 Used_Rsvd_Blk_Cnt_Chip  0x0032   100   100   050    Old_age   Always       -       6
181 Program_Fail_Cnt_Total  0x0032   100   100   050    Old_age   Always       -       0
182 Erase_Fail_Count_Total  0x0032   100   100   050    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   050    Old_age   Always       -       88
194 Temperature_Celsius     0x0022   100   100   050    Old_age   Always       -       35
195 Hardware_ECC_Recovered  0x0032   100   100   050    Old_age   Always       -       0
196 Reallocated_Event_Count 0x0032   100   100   050    Old_age   Always       -       3
197 Current_Pending_Sector  0x0032   100   100   050    Old_age   Always       -       6
198 Offline_Uncorrectable   0x0032   100   100   050    Old_age   Always       -       3
199 UDMA_CRC_Error_Count    0x0032   100   100   050    Old_age   Always       -       0
232 Available_Reservd_Space 0x0032   100   100   050    Old_age   Always       -       86
241 Total_LBAs_Written      0x0030   100   100   050    Old_age   Offline      -       168900
242 Total_LBAs_Read         0x0030   100   100   050    Old_age   Offline      -       815543
245 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       191939

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%      3329         -
# 2  Short offline       Completed without error       00%      3325         -
# 3  Short offline       Completed without error       00%      3321         -
# 4  Short offline       Completed without error       00%      3313         -
# 5  Short offline       Completed without error       00%      3309         -
# 6  Short offline       Completed without error       00%      3306         -
# 7  Extended offline    Completed without error       00%      3250         -
# 8  Extended offline    Completed without error       00%      3232         -
# 9  Extended offline    Completed without error       00%      3229         -
#10  Extended offline    Completed without error       00%       976         -
#11  Extended offline    Completed without error       00%       968         -

Selective Self-tests/Logging not supported
I have tried to ignore the 197 and 198 errors in /etc/smartd.conf with
/dev/sda -d removable -n standby -H -l error -l selftest -f -t -I 197 -I 198 -s (S/../.././(01|09|17)|L/../../3/11) -m root -M exec /usr/share/smartmontools/smartd-runner
to no avail. I also do not see any LBA_of_first_error in the self-test section. To me, it appears that `SMART overall-health self-assessment test result: PASSED ` is healthy and the self-tests return no errors. My current understanding is that the disk appears to be healthy but is still sending these messages erroneously. Is there something that I'm missing? The /dev/sda drive is a KingFast 256 GB SSD, and I'm not sure if this would be relevant as I could not find anything online for this particular drive or manufacturer. How would I be able to stop receiving these messages but still have SMART monitoring for other genuine issues on the drive, and how would I fix the issue if this error message really does indicate some problem with the drive? Thanks! Edit: After running smartctl -t long /dev/sda, I have
smartctl -a /dev/sda
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.19.0-46-generic] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     KingFast
Serial Number:    03112222C0002
Firmware Version: U0803A0
User Capacity:    256,060,514,304 bytes [256 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
TRIM Command:     Available
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-2 T13/2015-D revision 3
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sun Jul  9 10:05:33 2023 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x03)	Offline data collection activity
					is in progress.
					Auto Offline Data Collection: Disabled.
Self-test execution status:      ( 241)	Self-test routine in progress...
					10% of test remaining.
Total time to complete Offline 
data collection: 		(  600) seconds.
Offline data collection
capabilities: 			 (0x11) SMART execute Offline immediate.
					No Auto Offline data collection support.
					Suspend Offline collection upon new
					command.
					No Offline surface scan supported.
					Self-test supported.
					No Conveyance Self-test supported.
					No Selective Self-test supported.
SMART capabilities:            (0x0002)	Does not save SMART data before
					entering power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 (  10) minutes.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x0032   100   100   050    Old_age   Always       -       0
  5 Reallocated_Sector_Ct   0x0032   100   100   050    Old_age   Always       -       6
  9 Power_On_Hours          0x0032   100   100   050    Old_age   Always       -       3341
 12 Power_Cycle_Count       0x0032   100   100   050    Old_age   Always       -       441
160 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       3
161 Unknown_Attribute       0x0033   100   100   050    Pre-fail  Always       -       86
163 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       26
164 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       79553
165 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       482
166 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       6
167 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       115
168 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       5050
169 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       98
175 Program_Fail_Count_Chip 0x0032   100   100   050    Old_age   Always       -       0
176 Erase_Fail_Count_Chip   0x0032   100   100   050    Old_age   Always       -       0
177 Wear_Leveling_Count     0x0032   100   100   050    Old_age   Always       -       0
178 Used_Rsvd_Blk_Cnt_Chip  0x0032   100   100   050    Old_age   Always       -       6
181 Program_Fail_Cnt_Total  0x0032   100   100   050    Old_age   Always       -       0
182 Erase_Fail_Count_Total  0x0032   100   100   050    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   050    Old_age   Always       -       88
194 Temperature_Celsius     0x0022   100   100   050    Old_age   Always       -       46
195 Hardware_ECC_Recovered  0x0032   100   100   050    Old_age   Always       -       0
196 Reallocated_Event_Count 0x0032   100   100   050    Old_age   Always       -       3
197 Current_Pending_Sector  0x0032   100   100   050    Old_age   Always       -       6
198 Offline_Uncorrectable   0x0032   100   100   050    Old_age   Always       -       3
199 UDMA_CRC_Error_Count    0x0032   100   100   050    Old_age   Always       -       0
232 Available_Reservd_Space 0x0032   100   100   050    Old_age   Always       -       86
241 Total_LBAs_Written      0x0030   100   100   050    Old_age   Offline      -       170468
242 Total_LBAs_Read         0x0030   100   100   050    Old_age   Offline      -       815560
245 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       193199

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%      3337         -
# 2  Short offline       Completed without error       00%      3329         -
# 3  Short offline       Completed without error       00%      3325         -
# 4  Short offline       Completed without error       00%      3321         -
# 5  Short offline       Completed without error       00%      3313         -
# 6  Short offline       Completed without error       00%      3309         -
# 7  Short offline       Completed without error       00%      3306         -
# 8  Extended offline    Completed without error       00%      3250         -
# 9  Extended offline    Completed without error       00%      3232         -
#10  Extended offline    Completed without error       00%      3229         -
#11  Extended offline    Completed without error       00%       976         -
#12  Extended offline    Completed without error       00%       968         -

Selective Self-tests/Logging not supported
The #12 Extended offline test Completed without error, so I'm not really sure what I'm supposed to do from here. **Edit #2:** I have also run the following which I believe indicate that there are no errors with the drive:
badblocks -sv /dev/sda
Checking blocks 0 to 250059095
Checking for bad blocks (read-only test): done                                                 
Pass completed, 0 bad blocks found. (0/0/0 errors)
dd if=/dev/sda of=/dev/null bs=64K conv=noerror
3907173+1 records in
3907173+1 records out
256060514304 bytes (256 GB, 238 GiB) copied, 485.648 s, 527 MB/s
jameszp (93 rep)
Jul 9, 2023, 03:04 AM • Last activity: Jul 18, 2023, 06:01 PM
7 votes
5 answers
8769 views
S.M.A.R.T shows high Load_Cycle_Count | Why and how to prevent the number from increaseing?
i just realized that **some of my HDD's have a huge Load_Cycle_Count** when reading out their S.M.A.R.T data. Some are short before failing and i am asking myself why this is the case and if there is anything i can do to prevent them from dying. alex@ga-P55A-UD5:~$ sudo smartctl -a /dev/sdb smartctl...
i just realized that **some of my HDD's have a huge Load_Cycle_Count** when reading out their S.M.A.R.T data. Some are short before failing and i am asking myself why this is the case and if there is anything i can do to prevent them from dying.
alex@ga-P55A-UD5:~$ sudo smartctl -a /dev/sdb
smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.4.0-142-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Caviar Green (AF)
Device Model:     WDC WD10EARS-00Y5B1
[...]
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  4 Start_Stop_Count        0x0032   090   090   000    Old_age   Always       -       10281
  9 Power_On_Hours          0x0032   062   062   000    Old_age   Always       -       28456
193 Load_Cycle_Count        0x0032   001   001   000    Old_age   Always       -       611460
alex@ga-P55A-UD5:~$ sudo smartctl -a /dev/sdc
smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.4.0-142-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Caviar Green
Device Model:     WDC WD6400AADS-00M2B0
[...]
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  4 Start_Stop_Count        0x0032   093   093   000    Old_age   Always       -       7615
  9 Power_On_Hours          0x0032   057   057   000    Old_age   Always       -       31743
193 Load_Cycle_Count        0x0032   053   053   000    Old_age   Always       -       442121
alex@silent-ssd:~$ sudo smartctl -a /dev/sdd
smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.4.0-142-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Green
Device Model:     WDC WD20EARX-00PASB0
[...]
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   098   098   000    Old_age   Always       -       2477
  9 Power_On_Hours          0x0032   085   085   000    Old_age   Always       -       11176
193 Load_Cycle_Count        0x0032   181   181   000    Old_age   Always       -       59646

AlexOnLinux (725 rep)
Mar 4, 2019, 10:42 AM • Last activity: Jun 19, 2023, 11:27 AM
1 votes
1 answers
188 views
Does my disk support SMART?
I'm confused about this smartctl output. It says SMART status is not supported, but then it says it PASSED. ``` # sudo smartctl -H -d megaraid,24 /dev/sdb smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.59.1.el7.x86_64] (local build) Copyright (C) 2002-18, Bruce Allen, Christian Franke, www....
I'm confused about this smartctl output. It says SMART status is not supported, but then it says it PASSED.
# sudo smartctl -H -d megaraid,24 /dev/sdb
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.59.1.el7.x86_64] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Status not supported: ATA return descriptor not supported by controller firmware
SMART overall-health self-assessment test result: PASSED
Warning: This result is based on an Attribute check.

# echo $?
4
According to the man page, status code 4 means prefail Attribute is less than the danger threshold.
EXIT STATUS
...
...
    Bit 4: We found prefail Attributes <= threshold.
So I'm confused, is SMART data available on this disk or not?
Timothy Pulliam (3953 rep)
May 9, 2023, 04:36 PM • Last activity: May 9, 2023, 10:05 PM
0 votes
0 answers
269 views
Which hard drive failed SMART ? Synology NAS
I had 2 x 4TB Red Pro drives in a Sinology NAS. One of them was reported as failing (couple of years ago) by Synology. I believe it was a SMART failure. So I pulled out both drives bought a new one and was able to copy all data from the drive that was failing (not failed yet). Both 4 TB drives seem...
I had 2 x 4TB Red Pro drives in a Sinology NAS. One of them was reported as failing (couple of years ago) by Synology. I believe it was a SMART failure. So I pulled out both drives bought a new one and was able to copy all data from the drive that was failing (not failed yet). Both 4 TB drives seem to be working fine when mounted on a linux machine. I can copy data to and from the one reported as "failing" by Synology. The thing is I'm not a 100% sure which drive was reported as "failing" as it was couple of years ago. Is there a command/test I can run to figure out which drive is failing so I exclude it (or discard it) from my primary NAS ? I tried smartctl -a /dev/ for my drives and the self assessment result says "Passed" and I see no errors there. Update: After running the test recommended by @meuh - "smartctl -t short " I get the following error: SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed: read failure 90% 21687 8359136
Curious101 (311 rep)
Apr 16, 2023, 08:31 PM • Last activity: Apr 19, 2023, 04:13 AM
0 votes
2 answers
797 views
SSD's SMART Status not supported behind DELL PERC H730 Mini controller
I would like to output the temperature for each of my drives (NVME, SATA, SAS) in my Dell R630, but it couldn't show my SATA **Samsung SSD 870 EVO 250GB** (`/dev/sdc`)'s temperature, which is behind DELL PERC H730 Mini controller: `hddtemp` command shows: /dev/sda: SAMSUNG AREA7680S5xnNTRI: 37&#176;...
I would like to output the temperature for each of my drives (NVME, SATA, SAS) in my Dell R630, but it couldn't show my SATA **Samsung SSD 870 EVO 250GB** (/dev/sdc)'s temperature, which is behind DELL PERC H730 Mini controller: hddtemp command shows: /dev/sda: SAMSUNG AREA7680S5xnNTRI: 37°C /dev/sdb: SAMSUNG AREA7680S5xnNTRI: 36°C /dev/sdc: DELL PERC H730 Mini: S.M.A.R.T. not available When I tried to use smartctl, it shows: Smartctl open device: /dev/sdc failed: DELL or MegaRaid controller, please try adding '-d megaraid,N' I then use smartctl -a -d megaraid,0 /dev/sdc It does shows my device name correctly: === START OF INFORMATION SECTION === Device Model: Samsung SSD 870 EVO 250GB and SMART support is: Available - device has SMART capability. SMART support is: Enabled but SMART status shows: === START OF READ SMART DATA SECTION === SMART Status not supported: ATA return descriptor not supported by controller firmware May I know how to find out the temperature of the SSD behind DELL PERC H730 Mini controller?
JCTL (3 rep)
Mar 30, 2023, 07:26 AM • Last activity: Mar 30, 2023, 12:28 PM
1 votes
2 answers
3301 views
Cannot get smartctl working
On my debian wheezy server I use a **software raid 1** with two harddisks `dev/sda3` and `dev/sdb3` connected into `dev/md2`: mdadm --detail /dev/md2 Number Major Minor RaidDevice State 0 8 3 0 active sync /dev/sda3 1 8 19 1 active sync /dev/sdb3 The raid seems to be fine, but on one of the disks SM...
On my debian wheezy server I use a **software raid 1** with two harddisks dev/sda3 and dev/sdb3 connected into dev/md2: mdadm --detail /dev/md2 Number Major Minor RaidDevice State 0 8 3 0 active sync /dev/sda3 1 8 19 1 active sync /dev/sdb3 The raid seems to be fine, but on one of the disks SMART is not running: smartctl --all /dev/sda sais: SMART support is: Available - device has SMART capability. SMART support is: Disabled While /dev/sdb gives a lot of SMART information. I tried to start it with smartctl -s on /dev/sda -T verypermissive not working But it doesn't start: Error SMART Enable failed: scsi error aborted command Smartctl: SMART Enable Failed. How can I get it running? Or does it mean the disk has a problem?
rubo77 (30435 rep)
Feb 8, 2015, 10:59 PM • Last activity: Nov 1, 2022, 02:41 PM
2 votes
4 answers
866 views
smartmontools: Should I replace my SSHD?
Today, when I was watching a video in Firefox, suddenly the following window pupped up: [![enter image description here][1]][1] [1]: https://i.sstatic.net/X1UA6.jpg Or the Output from GSmartContol: smartctl 7.1 2019-12-30 r5022 [x86_64-linux-4.19.0-22-amd64] (local build) Copyright (C) 2002-19, Bruc...
Today, when I was watching a video in Firefox, suddenly the following window pupped up: enter image description here Or the Output from GSmartContol: smartctl 7.1 2019-12-30 r5022 [x86_64-linux-4.19.0-22-amd64] (local build) Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Seagate Laptop SSHD Device Model: ST500LM000-1EJ162-SSHD Serial Number: W3715AR9 LU WWN Device Id: 5 000c50 06e236b9f Firmware Version: HPD3 User Capacity: 500,107,862,016 bytes [500 GB] Sector Sizes: 512 bytes logical, 4096 bytes physical Rotation Rate: 5400 rpm Form Factor: 2.5 inches Device is: In smartctl database [for details use: -P show] ATA Version is: ACS-2, ACS-3 T13/2161-D revision 3b SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is: Sun Oct 23 14:41:09 2022 CEST SMART support is: Available - device has SMART capability. SMART support is: Enabled AAM feature is: Unavailable APM level is: 254 (maximum performance) Rd look-ahead is: Enabled Write cache is: Enabled DSN feature is: Unavailable ATA Security is: Disabled, frozen [SEC2] === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 634) seconds. Offline data collection capabilities: (0x5b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 99) minutes. SCT capabilities: (0x1081) SCT Status supported. SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE 1 Raw_Read_Error_Rate POSR-K 118 099 006 - 195697992 3 Spin_Up_Time PO---K 099 099 000 - 0 4 Start_Stop_Count -O--CK 093 093 020 - 7676 5 Reallocated_Sector_Ct PO--CK 100 100 036 - 0 7 Seek_Error_Rate POSR-K 082 060 030 - 4473742513 9 Power_On_Hours -O--CK 087 087 000 - 11853 10 Spin_Retry_Count PO--CK 100 100 097 - 0 12 Power_Cycle_Count -O--CK 093 093 020 - 7668 180 Unknown_HDD_Attribute -O-R-K 100 100 000 - 64025461 183 Runtime_Bad_Block -O--CK 100 100 000 - 0 184 End-to-End_Error PO--CK 100 100 097 - 0 187 Reported_Uncorrect -O--CK 100 100 000 - 0 188 Command_Timeout -O--CK 100 099 000 - 2 189 High_Fly_Writes -O-RCK 063 063 000 - 37 190 Airflow_Temperature_Cel -O---K 069 055 045 - 31 (Min/Max 28/32) 191 G-Sense_Error_Rate -O--CK 100 100 000 - 0 192 Power-Off_Retract_Count -O--CK 100 100 000 - 228 193 Load_Cycle_Count -O--CK 097 097 000 - 7777 194 Temperature_Celsius -O---K 031 045 000 - 31 (0 14 0 0 0) 196 Reallocated_Event_Count -O--CK 100 100 000 - 0 197 Current_Pending_Sector -O--CK 100 100 000 - 16 198 Offline_Uncorrectable ----CK 100 100 000 - 16 199 UDMA_CRC_Error_Count -O--CK 200 200 000 - 0 254 Free_Fall_Sensor -O--CK 100 100 000 - 0 ||||||_ K auto-keep |||||__ C event count ||||___ R error rate |||____ S speed/performance ||_____ O updated online |______ P prefailure warning General Purpose Log Directory Version 1 SMART Log Directory Version 1 [multi-sector log support] Address Access R/W Size Description 0x00 GPL,SL R/O 1 Log Directory 0x01 SL R/O 1 Summary SMART error log 0x02 SL R/O 5 Comprehensive SMART error log 0x03 GPL R/O 5 Ext. Comprehensive SMART error log 0x06 SL R/O 1 SMART self-test log 0x07 GPL R/O 1 Extended self-test log 0x09 SL R/W 1 Selective self-test log 0x10 GPL R/O 1 NCQ Command Error log 0x11 GPL R/O 1 SATA Phy Event Counters log 0x21 GPL R/O 1 Write stream error log 0x22 GPL R/O 1 Read stream error log 0x24 GPL R/O 1223 Current Device Internal Status Data log 0x25 GPL R/O 1223 Saved Device Internal Status Data log 0x30 GPL,SL R/O 9 IDENTIFY DEVICE data log 0x80-0x9f GPL,SL R/W 16 Host vendor specific log 0xa1 GPL,SL VS 20 Device vendor specific log 0xa2 GPL VS 3900 Device vendor specific log 0xa8 GPL,SL VS 129 Device vendor specific log 0xa9 GPL,SL VS 1 Device vendor specific log 0xab GPL VS 1 Device vendor specific log 0xae GPL VS 1 Device vendor specific log 0xb0 GPL VS 4580 Device vendor specific log 0xb6 GPL VS 1918 Device vendor specific log 0xbe-0xbf GPL VS 65535 Device vendor specific log 0xc1 GPL,SL VS 10 Device vendor specific log 0xc2 GPL,SL VS 50 Device vendor specific log 0xc4 GPL,SL VS 5 Device vendor specific log 0xe0 GPL,SL R/W 1 SCT Command/Status 0xe1 GPL,SL R/W 1 SCT Data Transfer SMART Extended Comprehensive Error Log Version: 1 (5 sectors) Device Error Count: 1 CR = Command Register FEATR = Features Register COUNT = Count (was: Sector Count) Register LBA_48 = Upper bytes of LBA High/Mid/Low Registers ] ATA-8 LH = LBA High (was: Cylinder High) Register ] LBA LM = LBA Mid (was: Cylinder Low) Register ] Register LL = LBA Low (was: Sector Number) Register ] DV = Device (was: Device/Head) Register DC = Device Control Register ER = Error register ST = Status register Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days. Error 1 occurred at disk power-on lifetime: 8134 hours (338 days + 22 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER -- ST COUNT LBA_48 LH LM LL DV DC -- -- -- == -- == == == -- -- -- -- -- 40 -- 51 00 00 00 00 00 a0 3a 40 00 00 Error: UNC at LBA = 0x00a03a40 = 10500672 Commands leading to the command that caused the error were: CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name -- == -- == -- == == == -- -- -- -- -- --------------- -------------------- 25 00 00 00 2a 00 00 00 a0 3a 40 e0 00 01:31:49.827 READ DMA EXT 25 00 00 00 35 00 00 00 a0 42 0b e0 00 01:31:49.348 READ DMA EXT 25 00 00 00 0b 00 00 00 a0 42 00 e0 00 01:31:49.345 READ DMA EXT 25 00 00 00 15 00 00 03 93 ac 6b e0 00 01:31:49.342 READ DMA EXT 25 00 00 00 2b 00 00 03 93 ac 40 e0 00 01:31:49.339 READ DMA EXT SMART Extended Self-test Log Version: 1 (1 sectors) Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 11852 - # 2 Short offline Completed without error 00% 11847 - # 3 Short offline Completed without error 00% 11844 - # 4 Short offline Completed without error 00% 11835 - # 5 Short offline Completed without error 00% 11830 - # 6 Short offline Completed without error 00% 11823 - # 7 Short offline Completed without error 00% 11818 - # 8 Short offline Completed without error 00% 11814 - # 9 Short offline Completed without error 00% 11806 - #10 Short offline Completed without error 00% 11801 - #11 Short offline Completed without error 00% 11792 - #12 Short offline Completed without error 00% 11790 - #13 Short offline Completed without error 00% 11780 - #14 Short offline Completed without error 00% 11772 - #15 Short offline Completed without error 00% 11765 - #16 Short offline Completed without error 00% 11756 - #17 Short offline Completed without error 00% 11751 - #18 Short offline Completed without error 00% 11747 - #19 Short offline Completed without error 00% 11740 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. SCT Status Version: 3 SCT Version (vendor specific): 522 (0x020a) Device State: Active (0) Current Temperature: 31 Celsius Power Cycle Min/Max Temperature: 25/32 Celsius Lifetime Min/Max Temperature: 16/44 Celsius Under/Over Temperature Limit Count: 0/2 SCT Data Table command not supported SCT Error Recovery Control command not supported Device Statistics (GP/SMART Log 0x04) not supported SATA Phy Event Counters (GP Log 0x11) ID Size Value Description 0x000a 2 3 Device-to-host register FISes sent due to a COMRESET 0x0001 2 0 Command failed due to ICRC error 0x0003 2 0 R_ERR response for device-to-host data FIS 0x0004 2 0 R_ERR response for host-to-device data FIS 0x0006 2 0 R_ERR response for device-to-host non-data FIS 0x0007 2 0 R_ERR response for host-to-device non-data FIS Also today, when I was booting Linux it was not booting. So I have restarted the boot and it worked without problem. This was before this error popped up. No idea if this boot issue has something to do with the smartmontools error. The booting issue was before I had this error warning. **Confusing:** In the reoprt there is a line "Error 1 occurred at disk power-on lifetime: 8134 hours (338 days + 22 hours)". But there is no date. My expectation was, that there would be a date at which this error occured, so that I can show what todays date is and can definitely assign the error to the date of today. As I did not found a date in the whole output of the txt file, I was looking for the actual lifetime of my sshd, because it was said, that the error occurred at 8134h. So my expectation was, that I can somewhere find the amount of hours my sshd has run until the current time. But I also did not found this. Which host's syslog is meant? Maybe this one: /var/log/syslog ? If yes: Here it is: https://workupload.com/file/NVD2gpdrvHp But my main question is: Is there a high risk, that my sshd soon will die? It is said, that the hard disk health status has changed. But where can I now find the current health status? Thank you.
Wogehu (123 rep)
Oct 23, 2022, 01:51 PM • Last activity: Oct 25, 2022, 07:09 AM
2 votes
2 answers
999 views
smartd output to screen, not email
I'm trying to get smartd working; it is determined that messages will be sent via email via 'mail' which I've never used. But I recall that a few years ago I had smartd sending it's warnings directly to the screen via a popup text box. I'm trying to figure out how to do that again. The info for the...
I'm trying to get smartd working; it is determined that messages will be sent via email via 'mail' which I've never used. But I recall that a few years ago I had smartd sending it's warnings directly to the screen via a popup text box. I'm trying to figure out how to do that again. The info for the 'screen' command baffles me. tmux likewise. Or I suppose it could be a notification. When I have a few weeks to study it, I'll get 'mail' working but for now I'd prefer a popup message anyway. ================================================== In 'smartd.conf': DEVICESCAN -a -m -M exec notify -M test ... Ok, added full path, much better: DEVICESCAN -a -m -M exec /bin/notify -M test ... 'notify' runs fine from CLI, is an executable script: /bin/notify-send "$(systemctl status smartd)" ... but although: systemctl restart smartd; systemctl status smartd ... reports no errors, I get no 'test' result. BTW, so far no results at all using those variables you mentioned. ... $ smartd ... shows two notifications, one for each of my two disks! So why does 'systemctl restart smartd' show nothing?
Ray Andrews (2615 rep)
Oct 17, 2022, 01:56 PM • Last activity: Oct 17, 2022, 10:08 PM
1 votes
2 answers
1479 views
Why smartd need database?
Do smartd need the database? or smartctl needs the database? I saw smart tool github keep updating database: https://github.com/smartmontools/smartmontools/labels/drivedb In my understanding, smartd will scan all disks then why does it need a database? what's the function/purpose to use a database i...
Do smartd need the database? or smartctl needs the database? I saw smart tool github keep updating database: https://github.com/smartmontools/smartmontools/labels/drivedb In my understanding, smartd will scan all disks then why does it need a database? what's the function/purpose to use a database in smartd/smartctl?
Mark K (955 rep)
Aug 12, 2022, 03:32 AM • Last activity: Aug 12, 2022, 11:52 AM
0 votes
1 answers
99 views
Prometheus DiskTooManyReallocatedSectors
I have Prometheus Alert Manager running on several linux machines. *(https://prometheus.io/docs/alerting/latest/alertmanager/)* One of them is reporting *2 reallocated sectors*. I got the setup-alert from here: *https://awesome-prometheus-alerts.grep.to/rules.html* **1. What is my course of action?*...
I have Prometheus Alert Manager running on several linux machines. *(https://prometheus.io/docs/alerting/latest/alertmanager/)* One of them is reporting *2 reallocated sectors*. I got the setup-alert from here: *https://awesome-prometheus-alerts.grep.to/rules.html* **1. What is my course of action?** Replace with an SDD? **2. What is the priority ...weeks, months?** enter image description here
DavidDunham (117 rep)
Jul 2, 2022, 03:08 AM • Last activity: Jul 3, 2022, 07:51 AM
1 votes
3 answers
998 views
Deciphering smartctl results
I'm trying to use an internal 2.5" HDD to serve as my external storage media and for backups. This HDD had been previously on a Windows machine for a long time before some of my files became corrupted, and so I simply changed it for an SSD. CrystalDiskInfo reports the drive's health as 'good', howev...
I'm trying to use an internal 2.5" HDD to serve as my external storage media and for backups. This HDD had been previously on a Windows machine for a long time before some of my files became corrupted, and so I simply changed it for an SSD. CrystalDiskInfo reports the drive's health as 'good', however, HDDScan shows a warning for "UltraDMA CRC Errors". This is the results of running smartctl -a on it:
smartctl 7.3 2022-02-28 r5338 [x86_64-w64-mingw32-w10-21H2] (sf-7.3-1)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Mobile HDD
Device Model:     ST1000LM035-1RK172
Firmware Version: SDM1
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Form Factor:      2.5 inches
Device is:        In smartctl database 7.3/5319
ATA Version is:   ACS-3 T13/2161-D revision 3b
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 1.5 Gb/s)
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)	Offline data collection activity
					was never started.
					Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		(    0) seconds.
Offline data collection
capabilities: 			 (0x71) SMART execute Offline immediate.
					No Auto Offline data collection support.
					Suspend Offline collection upon new
					command.
					No Offline surface scan supported.
					Self-test supported.
					Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   1) minutes.
Extended self-test routine
recommended polling time: 	 ( 169) minutes.
Conveyance self-test routine
recommended polling time: 	 (   2) minutes.
SCT capabilities: 	       (0x3035)	SCT Status supported.
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   072   051   006    Pre-fail  Always       -       15037148
  3 Spin_Up_Time            0x0003   099   099   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   065   065   020    Old_age   Always       -       36007
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   080   060   045    Pre-fail  Always       -       22014115380
  9 Power_On_Hours          0x0032   100   086   000    Old_age   Always       -       55 (236 20 0)
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   092   020    Old_age   Always       -       56
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   001   001   000    Old_age   Always       -       3990
188 Command_Timeout         0x0032   100   081   000    Old_age   Always       -       30067195949
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   068   049   040    Old_age   Always       -       32 (Min/Max 29/32)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       197
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       69
193 Load_Cycle_Count        0x0032   064   064   000    Old_age   Always       -       73637
194 Temperature_Celsius     0x0022   032   051   000    Old_age   Always       -       32 (0 18 0 0 0)
197 Current_Pending_Sector  0x0012   100   051   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   051   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       3
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       12591 (154 139 0)
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       98336960706
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       114926523300
254 Free_Fall_Sensor        0x0032   100   100   000    Old_age   Always       -       0

SMART Error Log Version: 1
ATA Error Count: 3475 (device log contains only the most recent five errors)
	CR = Command Register [HEX]
	FR = Features Register [HEX]
	SC = Sector Count Register [HEX]
	SN = Sector Number Register [HEX]
	CL = Cylinder Low Register [HEX]
	CH = Cylinder High Register [HEX]
	DH = Device/Head Register [HEX]
	DC = Device Command Register [HEX]
	ER = Error register [HEX]
	ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 3475 occurred at disk power-on lifetime: 29 hours (1 days + 5 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 88 7d f4 09  Error: UNC at LBA = 0x09f47d88 = 167017864

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 08 88 7d f4 49 00      00:38:49.208  READ FPDMA QUEUED
  60 00 08 80 7d f4 49 00      00:38:49.195  READ FPDMA QUEUED
  60 00 08 b0 a8 21 49 00      00:38:49.182  READ FPDMA QUEUED
  60 00 08 a8 a8 21 49 00      00:38:49.181  READ FPDMA QUEUED
  60 00 08 a0 a8 21 49 00      00:38:49.181  READ FPDMA QUEUED

Error 3474 occurred at disk power-on lifetime: 29 hours (1 days + 5 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 80 7d f4 09  Error: UNC at LBA = 0x09f47d80 = 167017856

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 08 80 7d f4 49 00      00:38:46.657  READ FPDMA QUEUED
  60 00 08 78 7d f4 49 00      00:38:46.625  READ FPDMA QUEUED
  ef 10 03 00 00 00 a0 00      00:38:46.615  SET FEATURES [Enable SATA feature]
  ef 10 02 00 00 00 a0 00      00:38:46.605  SET FEATURES [Enable SATA feature]
  27 00 00 00 00 00 e0 00      00:38:46.578  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]

Error 3473 occurred at disk power-on lifetime: 29 hours (1 days + 5 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 78 7d f4 09  Error: UNC at LBA = 0x09f47d78 = 167017848

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 08 78 7d f4 49 00      00:38:44.127  READ FPDMA QUEUED
  60 00 08 70 7d f4 49 00      00:38:44.095  READ FPDMA QUEUED
  ef 10 03 00 00 00 a0 00      00:38:44.085  SET FEATURES [Enable SATA feature]
  ef 10 02 00 00 00 a0 00      00:38:44.076  SET FEATURES [Enable SATA feature]
  27 00 00 00 00 00 e0 00      00:38:44.049  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]

Error 3472 occurred at disk power-on lifetime: 29 hours (1 days + 5 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 70 7d f4 09  Error: UNC at LBA = 0x09f47d70 = 167017840

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 08 70 7d f4 49 00      00:38:41.598  READ FPDMA QUEUED
  60 00 08 68 7d f4 49 00      00:38:41.566  READ FPDMA QUEUED
  ef 10 03 00 00 00 a0 00      00:38:41.556  SET FEATURES [Enable SATA feature]
  ef 10 02 00 00 00 a0 00      00:38:41.547  SET FEATURES [Enable SATA feature]
  27 00 00 00 00 00 e0 00      00:38:41.520  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]

Error 3471 occurred at disk power-on lifetime: 29 hours (1 days + 5 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 68 7d f4 09  Error: UNC at LBA = 0x09f47d68 = 167017832

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 08 68 7d f4 49 00      00:38:39.013  READ FPDMA QUEUED
  60 00 08 60 7d f4 49 00      00:38:38.987  READ FPDMA QUEUED
  ef 10 03 00 00 00 a0 00      00:38:38.977  SET FEATURES [Enable SATA feature]
  ef 10 02 00 00 00 a0 00      00:38:38.968  SET FEATURES [Enable SATA feature]
  27 00 00 00 00 00 e0 00      00:38:38.941  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure       90%        22         282912
# 2  Extended offline    Completed: read failure       90%        20         282912
# 3  Extended offline    Completed: read failure       90%        18         282912
# 4  Conveyance offline  Completed: read failure       90%        18         282912
# 5  Short offline       Completed: read failure       80%        18         282912
# 6  Extended offline    Completed: read failure       90%        17         282912
# 7  Extended offline    Completed: read failure       90%         2         64400520
# 8  Extended offline    Completed: read failure       90%         2         64400520
# 9  Extended offline    Completed: read failure       90%         2         64400520
#10  Extended offline    Completed without error       00%      9245         -
#11  Short offline       Completed without error       00%      4741         -
#12  Short offline       Completed without error       00%      4667         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
Which is bit confusing to me. The drive is attached to my computer via USB (adaptor) which has no brand name on it, but smartctl reveals it to be "USB JMicron". Is this "USB JMicron" adaptor causing the "UltraDMA CRC Errors" or other problems? As I'm not sure what had caused the then data corruption, I am wondering if this drive is actually safe and reliable based on the info from smartctl. Any help with deciphering the diagnostic info would be much appreciated.
Sepp A (73 rep)
Jul 1, 2022, 04:24 PM • Last activity: Jul 1, 2022, 06:36 PM
4 votes
1 answers
3336 views
Linux on Marvell 88SE9230. How to get stats?
I use Marvell 88SE9230 controller on my home Linux server. HP does have utility to setup raid and get some stats. But I'm wondering how to get any status from a Linux system. Quick googling shows only Linux drivers for accessing array itself on previous versions of kernel, but I want to know SMART s...
I use Marvell 88SE9230 controller on my home Linux server. HP does have utility to setup raid and get some stats. But I'm wondering how to get any status from a Linux system. Quick googling shows only Linux drivers for accessing array itself on previous versions of kernel, but I want to know SMART status of drives. Smartctl doesn't work: root@iris:~# smartctl -a -d marvell -T verypermissive /dev/sda smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.0-96-generic] (local build) Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org Read Device Identity failed: Unknown error === START OF INFORMATION SECTION === Device Model: [No Information Found] Serial Number: [No Information Found] Firmware Version: [No Information Found] Device is: Not in smartctl database [for details use: -P showall] ATA Version is: [No Information Found] Local Time is: Thu Jan 27 19:11:54 2022 MSK SMART support is: Ambiguous - ATA IDENTIFY DEVICE words 82-83 don't show if SMART supported. SMART support is: Ambiguous - ATA IDENTIFY DEVICE words 85-87 don't show if SMART is enabled. Checking to be sure by trying SMART RETURN STATUS command. SMART support is: Unknown - Try option -s with argument 'on' to enable it. Read SMART Data failed: Success === START OF READ SMART DATA SECTION === SMART Status command failed: Success SMART overall-health self-assessment test result: UNKNOWN! SMART Status, Attributes and Thresholds cannot be read. Read SMART Error Log failed: Success Read SMART Self-test Log failed: Success Selective Self-tests/Logging not supported How can I get at least some stats from controller?
Hills of Eternity (88 rep)
Jan 27, 2022, 04:14 PM • Last activity: Apr 3, 2022, 05:47 PM
0 votes
1 answers
2520 views
How to disable smartd
So I have a `Mac` which I installed `smartontools`, to see my `smart data`. And I thought that `smartd` would be helpful in doing short tests on my `Mac SSD` But I found out via `Google` that `smartd` only runs tests during `03:00am` and no way my Mac will be powered on at that time. I understand th...
So I have a Mac which I installed smartontools, to see my smart data. And I thought that smartd would be helpful in doing short tests on my Mac SSD But I found out via Google that smartd only runs tests during 03:00am and no way my Mac will be powered on at that time. I understand that smartd is for servers which run 24/7 so there is no use of smartd. So I would like to disable it and write my own simple bash script which runs short tests on my Mac SSD. So is there any way I can disable smartd or remove it without affecting smartctl?
Jddhhdhdi283838 (1 rep)
Dec 25, 2017, 12:56 PM • Last activity: Feb 13, 2022, 09:02 PM
3 votes
0 answers
1190 views
Linux-image generates samsung nvme errors
After upgrade from kernel **5.10** to **5.14** and now to **5.15.3**. There are errors increased in smart after every boot in Samsung 970 Evo nvme disk (and most likely others Samsung's nvmes) like this one: `Error Information Log Entries: 41` <- increased after every boot `nvme error-log /dev/nvme0...
After upgrade from kernel **5.10** to **5.14** and now to **5.15.3**. There are errors increased in smart after every boot in Samsung 970 Evo nvme disk (and most likely others Samsung's nvmes) like this one: Error Information Log Entries: 41 <- increased after every boot nvme error-log /dev/nvme0: status_field : 0x4004(INVALID_FIELD: A reserved coded value or an unsupported value in a defined field) Does someone know if this is a bug ? or maybe kernel try to talk to nvme disk in non supported (by a disk) way ? or maybe there is a need for some kind of exclude samsung disks from this kind of communication ? Is there any kernel boot parameter solving this behavior ? It is worth to mention this issue is not present in 5.10 kernel. --- There is a bug report filled 27 Sep 2021: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=995161 but seems like nothing has changed since then.
EdiD (342 rep)
Nov 26, 2021, 11:42 AM • Last activity: Dec 1, 2021, 12:30 PM
Showing page 1 of 20 total questions