Unix & Linux Stack Exchange
Q&A for users of Linux, FreeBSD and other Unix-like operating systems
Latest Questions
0
votes
0
answers
36
views
Filesystem becomes read-only at random
Debian crashed on Laptop (Acer Aspire 3, about 4 years old, HDD replaced with ADATA SU650 240GB SSD) and started throwing console errors reading "failed to rotate /var/log/journal: read-only filesystem". It rebooted fine, but a while later refused to load websites and eventually crashed again. Right...
Debian crashed on Laptop (Acer Aspire 3, about 4 years old, HDD replaced with ADATA SU650 240GB SSD) and started throwing console errors reading "failed to rotate /var/log/journal: read-only filesystem".
It rebooted fine, but a while later refused to load websites and eventually crashed again. Right now, it's working fine.
After a quick Google search I installed smartctl to figure out the problem, and though it prints an overall "PASSED", it does have some attributes output "Pre-failed" and I'm not exactly sure how to interpret the rest of the values.
Here's the output:
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-37-amd64] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Silicon Motion based SSDs
Device Model: ADATA SU650
Serial Number: 2N20292G46UJ
LU WWN Device Id: 0 000000 000000000
Firmware Version: XD0R6305
User Capacity: 240,057,409,536 bytes [240 GB]
Sector Size: 512 bytes logical/physical
Rotation Rate: Solid State Device
Form Factor: 2.5 inches
TRIM Command: Available, deterministic
Device is: In smartctl database 7.3/5319
ATA Version is: ACS-3, ATA8-ACS T13/1699-D revision 6
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Sun Jun 29 21:36:52 2025 -03
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 1) seconds.
Offline data collection
capabilities: (0x59) SMART execute Offline immediate.
No Auto Offline data collection support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0002) Does not save SMART data before
entering power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 2) minutes.
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 100 100 050 Pre-fail Always - 0
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 929
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 1439
160 Uncorrectable_Error_Cnt 0x0032 100 100 050 Old_age Always - 0
161 Valid_Spare_Block_Cnt 0x0032 100 100 050 Old_age Always - 100
163 Initial_Bad_Block_Count 0x0032 100 100 000 Old_age Always - 48
164 Total_Erase_Count 0x0032 100 100 000 Old_age Always - 87382
165 Max_Erase_Count 0x0032 100 100 000 Old_age Always - 156
166 Min_Erase_Count 0x0032 100 100 000 Old_age Always - 44
167 Average_Erase_Count 0x0032 100 100 000 Old_age Always - 109
148 Total_SLC_Erase_Ct 0x0032 100 100 000 Old_age Always - 262148
149 Max_SLC_Erase_Ct 0x0032 100 100 000 Old_age Always - 468
150 Min_SLC_Erase_Ct 0x0032 100 100 000 Old_age Always - 132
151 Average_SLC_Erase_Ct 0x0032 100 100 000 Old_age Always - 329
159 DRAM_1_Bit_Error_Count 0x0032 100 100 000 Old_age Always - 0
168 Max_Erase_Count_of_Spec 0x0032 100 100 000 Old_age Always - 468
169 Remaining_Lifetime_Perc 0x0032 100 100 000 Old_age Always - 98
177 Wear_Leveling_Count 0x0032 100 100 000 Old_age Always - 1823
181 Program_Fail_Cnt_Total 0x0032 100 100 000 Old_age Always - 0
182 Erase_Fail_Count_Total 0x0032 100 100 000 Old_age Always - 0
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 77
194 Temperature_Celsius 0x0032 100 100 000 Old_age Always - 26
195 Hardware_ECC_Recovered 0x0032 100 100 000 Old_age Always - 403177
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0
199 UDMA_CRC_Error_Count 0x0032 100 100 000 Old_age Always - 0
232 Available_Reservd_Space 0x0032 100 100 000 Old_age Always - 100
241 Host_Writes_32MiB 0x0032 100 100 000 Old_age Always - 139845
242 Host_Reads_32MiB 0x0032 100 100 000 Old_age Always - 143114
245 TLC_Writes_32MiB 0x0032 100 100 000 Old_age Always - 296002
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]
SMART Selective self-test log data structure revision number 0
Note: revision number not 1 implies that no selective self-test has ever been run
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
I'd greatly appreciate some advice on what these values mean and what can be done about them. I know that "Old_age" means the device is worn and "Pre-fail" means it's about to give, but I don't really know if this reflects normal wear, lack of maintenance, or is recoverable from.
Thanks in advance!
geistofsttraft
(1 rep)
Jun 30, 2025, 12:45 AM
• Last activity: Jun 30, 2025, 12:46 AM
4
votes
1
answers
3211
views
NVMe errors diagnostics
I would like to understand why I get the below mails about S.M.A.R.T. of my new NVMe drive. **DMESG** ```lang-none $ dmesg --ctime | grep -i nvm [Mon Aug 8 10:48:31 2022] nvme nvme0: pci function 0000:3d:00.0 [Mon Aug 8 10:48:31 2022] nvme nvme0: missing or invalid SUBNQN field. [Mon Aug 8 10:48:31...
I would like to understand why I get the below mails about S.M.A.R.T. of my new NVMe drive.
**DMESG**
-none
$ dmesg --ctime | grep -i nvm
[Mon Aug 8 10:48:31 2022] nvme nvme0: pci function 0000:3d:00.0
[Mon Aug 8 10:48:31 2022] nvme nvme0: missing or invalid SUBNQN field.
[Mon Aug 8 10:48:31 2022] nvme nvme0: Shutdown timeout set to 8 seconds
[Mon Aug 8 10:48:31 2022] nvme nvme0: 8/0/0 default/read/poll queues
[Mon Aug 8 10:48:31 2022] nvme0n1: p1 p2
[Mon Aug 8 10:48:37 2022] EXT4-fs (nvme0n1p2): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.
[Mon Aug 8 10:48:37 2022] EXT4-fs (nvme0n1p2): re-mounted. Opts: errors=remount-ro. Quota mode: none.
**NVME ERRORS**
-none
$ sudo nvme error-log /dev/nvme0
...
Entry
.................
error_count : 0
sqid : 0
cmdid : 0
status_field : 0(SUCCESS: The command completed successfully)
phase_tag : 0
parm_err_loc : 0
lba : 0
nsid : 0
vs : 0
trtype : The transport type is not indicated or the error is not transport related.
cs : 0
trtype_spec_info: 0
.................
...
Could anyone shed some light on why I am getting new mails like this:
**MAIL**
-none
# mail
Message 44:
From root@dell-laptop-CENSORED Sun Aug 7 08:13:07 2022
X-Original-To: root
To: root@dell-laptop-CENSORED
Subject: SMART error (ErrorCount) detected on host: dell-inspiron-15
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 8bit
Date: Sun, 7 Aug 2022 08:12:59 +0200 (CEST)
From: root
This message was generated by the smartd daemon running on:
host name: dell-inspiron-15
DNS domain: [Empty]
The following warning/error was logged by the smartd daemon:
Device: /dev/nvme0, number of Error Log entries increased from 485 to 486
Device info:
Samsung SSD 970 EVO Plus 2TB, S/N:, FW:2B2QEXM7, 2.00 TB
For details see host's SYSLOG.
You can also use the smartctl utility for further investigation.
The original message about this issue was sent at Fri Apr 22 09:53:56 2022 CEST
Another message will be sent in 24 hours if the problem persists.
**SMART**
-none
# smartctl -a /dev/nvme0n1
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-43-generic] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Number: Samsung SSD 970 EVO Plus 2TB
Serial Number:
Firmware Version: 2B2QEXM7
PCI Vendor/Subsystem ID: 0x144d
IEEE OUI Identifier: 0x002538
Total NVM Capacity: 2,000,398,934,016 [2.00 TB]
Unallocated NVM Capacity: 0
Controller ID: 4
NVMe Version: 1.3
Number of Namespaces: 1
Namespace 1 Size/Capacity: 2,000,398,934,016 [2.00 TB]
Namespace 1 Utilization: 544,784,187,392 [544 GB]
Namespace 1 Formatted LBA Size: 512
Namespace 1 IEEE EUI-64: 002538 5221904ad7
Local Time is: Mon Aug 8 11:13:10 2022 CEST
Firmware Updates (0x16): 3 Slots, no Reset required
Optional Admin Commands (0x0017): Security Format Frmw_DL Self_Test
Optional NVM Commands (0x005f): Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Log Page Attributes (0x03): S/H_per_NS Cmd_Eff_Lg
Maximum Data Transfer Size: 512 Pages
Warning Comp. Temp. Threshold: 85 Celsius
Critical Comp. Temp. Threshold: 85 Celsius
Supported Power States
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
0 + 7.50W - - 0 0 0 0 0 0
1 + 5.90W - - 1 1 1 1 0 0
2 + 3.60W - - 2 2 2 2 0 0
3 - 0.0700W - - 3 3 3 3 210 1200
4 - 0.0050W - - 4 4 4 4 2000 8000
Supported LBA Sizes (NSID 0x1)
Id Fmt Data Metadt Rel_Perf
0 + 512 0 0
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SMART/Health Information (NVMe Log 0x02)
Critical Warning: 0x00
Temperature: 44 Celsius
Available Spare: 100%
Available Spare Threshold: 10%
Percentage Used: 0%
Data Units Read: 5,565,230 [2.84 TB]
Data Units Written: 2,658,490 [1.36 TB]
Host Read Commands: 29,877,415
Host Write Commands: 18,211,598
Controller Busy Time: 112
Power Cycles: 240
Power On Hours: 215
Unsafe Shutdowns: 5
Media and Data Integrity Errors: 0
Error Information Log Entries: 502
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0
Temperature Sensor 1: 44 Celsius
Temperature Sensor 2: 39 Celsius
Error Information (NVMe Log 0x01, 16 of 64 entries)
Num ErrCount SQId CmdId Status PELoc LBA NSID VS
0 502 0 0x1005 0x4004 - 0 0 -
**SYSLOG**
-none
# cat /var/log/syslog | grep -i smart | grep -i nvm
Aug 7 16:08:27 dell-inspiron-15 smartd: Device: /dev/nvme0, opened
Aug 7 16:08:27 dell-inspiron-15 smartd: Device: /dev/nvme0, Samsung SSD 970 EVO Plus 2TB, S/N:S4J4NM0T201785H, FW:2B2QEXM7, 2.00 TB
Aug 7 16:08:27 dell-inspiron-15 smartd: Device: /dev/nvme0, is SMART capable. Adding to "monitor" list.
Aug 7 16:08:27 dell-inspiron-15 smartd: Device: /dev/nvme0, state read from /var/lib/smartmontools/smartd.Samsung_SSD_970_EVO_Plus_2TB-S4J4NM0T201785H.nvme.state
Aug 7 16:08:27 dell-inspiron-15 smartd: Monitoring 1 ATA/SATA, 0 SCSI/SAS and 1 NVMe devices
Aug 7 16:08:28 dell-inspiron-15 smartd: Device: /dev/nvme0, number of Error Log entries increased from 486 to 487
Aug 7 16:08:28 dell-inspiron-15 smartd: Device: /dev/nvme0, state written to /var/lib/smartmontools/smartd.Samsung_SSD_970_EVO_Plus_2TB-S4J4NM0T201785H.nvme.state
Aug 8 07:17:38 dell-inspiron-15 smartd: Device: /dev/nvme0, opened
Aug 8 07:17:38 dell-inspiron-15 smartd: Device: /dev/nvme0, Samsung SSD 970 EVO Plus 2TB, S/N:S4J4NM0T201785H, FW:2B2QEXM7, 2.00 TB
Aug 8 07:17:38 dell-inspiron-15 smartd: Device: /dev/nvme0, is SMART capable. Adding to "monitor" list.
Aug 8 07:17:38 dell-inspiron-15 smartd: Device: /dev/nvme0, state read from /var/lib/smartmontools/smartd.Samsung_SSD_970_EVO_Plus_2TB-S4J4NM0T201785H.nvme.state
Aug 8 07:17:38 dell-inspiron-15 smartd: Monitoring 1 ATA/SATA, 0 SCSI/SAS and 1 NVMe devices
Aug 8 07:17:38 dell-inspiron-15 smartd: Device: /dev/nvme0, number of Error Log entries increased from 487 to 488
Aug 8 07:17:38 dell-inspiron-15 smartd: Device: /dev/nvme0, state written to /var/lib/smartmontools/smartd.Samsung_SSD_970_EVO_Plus_2TB-S4J4NM0T201785H.nvme.state
Aug 8 08:21:16 dell-inspiron-15 smartd: Device: /dev/nvme0, state written to /var/lib/smartmontools/smartd.Samsung_SSD_970_EVO_Plus_2TB-S4J4NM0T201785H.nvme.state
Aug 8 11:14:00 dell-inspiron-15 smartd: Device: /dev/nvme0, opened
Aug 8 11:14:00 dell-inspiron-15 smartd: Device: /dev/nvme0, Samsung SSD 970 EVO Plus 2TB, S/N:S4J4NM0T201785H, FW:2B2QEXM7, 2.00 TB
Aug 8 11:14:00 dell-inspiron-15 smartd: Device: /dev/nvme0, is SMART capable. Adding to "monitor" list.
Aug 8 11:14:00 dell-inspiron-15 smartd: Device: /dev/nvme0, state read from /var/lib/smartmontools/smartd.Samsung_SSD_970_EVO_Plus_2TB-S4J4NM0T201785H.nvme.state
Aug 8 11:14:00 dell-inspiron-15 smartd: Monitoring 1 ATA/SATA, 0 SCSI/SAS and 1 NVMe devices
Aug 8 11:14:00 dell-inspiron-15 smartd: Device: /dev/nvme0, number of Error Log entries increased from 488 to 494
Aug 8 11:14:01 dell-inspiron-15 smartd: Device: /dev/nvme0, state written to /var/lib/smartmontools/smartd.Samsung_SSD_970_EVO_Plus_2TB-S4J4NM0T201785H.nvme.state
Aug 8 12:48:40 dell-inspiron-15 smartd: Device: /dev/nvme0, opened
Aug 8 12:48:40 dell-inspiron-15 smartd: Device: /dev/nvme0, Samsung SSD 970 EVO Plus 2TB, S/N:S4J4NM0T201785H, FW:2B2QEXM7, 2.00 TB
Aug 8 12:48:40 dell-inspiron-15 smartd: Device: /dev/nvme0, is SMART capable. Adding to "monitor" list.
Aug 8 12:48:40 dell-inspiron-15 smartd: Device: /dev/nvme0, state read from /var/lib/smartmontools/smartd.Samsung_SSD_970_EVO_Plus_2TB-S4J4NM0T201785H.nvme.state
Aug 8 12:48:40 dell-inspiron-15 smartd: Monitoring 1 ATA/SATA, 0 SCSI/SAS and 1 NVMe devices
Aug 8 12:48:40 dell-inspiron-15 smartd: Device: /dev/nvme0, number of Error Log entries increased from 494 to 502
Aug 8 12:48:40 dell-inspiron-15 smartd: Device: /dev/nvme0, state written to /var/lib/smartmontools/smartd.Samsung_SSD_970_EVO_Plus_2TB-S4J4NM0T201785H.nvme.state
Vlastimil Burián
(30505 rep)
Aug 8, 2022, 09:22 AM
• Last activity: Oct 29, 2024, 10:53 AM
8
votes
2
answers
7379
views
SMART health-test and status
I have an external USB-drive which is giving me the following output on running the command $ smartctl /dev/sdb -H on it: SMART Status not supported: Incomplete response, ATA output registers missing SMART overall-health self-assessment test result: PASSED Warning: This result is based on an Attribu...
I have an external USB-drive which is giving me the following output on running the command
$ smartctl /dev/sdb -H
on it:
SMART Status not supported: Incomplete response, ATA output registers missing
SMART overall-health self-assessment test result: PASSED
Warning: This result is based on an Attribute check.
Could you elaborate if this is something to worry about or if it is just a wrong setting? Generally, what is the meaning of the health status in simplified form?
Maybe as a relevant aside: The short and long tests finish without issues.
user3058865
(183 rep)
Aug 6, 2017, 02:34 PM
• Last activity: Apr 2, 2024, 07:41 PM
24
votes
5
answers
25365
views
Linux - Repairing bad blocks on a RAID1 array with GPT
The tl;dr: how would I go about fixing a bad block on 1 disk in a RAID1 array? But please read this whole thing for what I've tried already and possible errors in my methods. I've tried to be as detailed as possible, and I'm really hoping for some feedback This is my situation: I have two 2TB disks...
The tl;dr: how would I go about fixing a bad block on 1 disk in a RAID1 array?
But please read this whole thing for what I've tried already and possible errors in my methods. I've tried to be as detailed as possible, and I'm really hoping for some feedback
This is my situation: I have two 2TB disks (same model) set up in a RAID1 array managed by
mdadm
. About 6 months ago I noticed the first bad block when SMART reported it. Today I noticed more, and am now trying to fix it.
This HOWTO page seems to be the one article everyone links to to fix bad blocks that SMART is reporting. It's a great page, full of info, however it is fairly outdated and doesn't address my particular setup. Here is how my config is different:
* Instead of one disk, I'm using two disks in a RAID1 array. One disk is reporting errors while the other is fine. The HOWTO is written with only one disk in mind, which bring up various questions such as 'do I use this command on the disk device or the RAID device'?
* I'm using GPT, which fdisk does not support. I've been using gdisk instead, and I'm hoping that it is giving me the same info that I need
So, lets get down to it. This is what I have done, however it doesn't seem to be working. Please feel free to double check my calculations and method for errors. The disk reporting errors is /dev/sda:
# smartctl -l selftest /dev/sda
smartctl 5.42 2011-10-20 r3458 [x86_64-linux-3.4.4-2-ARCH] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed: read failure 90% 12169 3212761936
With this, we gather that the error resides on LBA 3212761936. Following the HOWTO, I use gdisk to find the start sector to be used later in determining the block number (as I cannot use fdisk since it does not support GPT):
# gdisk -l /dev/sda
GPT fdisk (gdisk) version 0.8.5
Partition table scan:
MBR: protective
BSD: not present
APM: not present
GPT: present
Found valid GPT with protective MBR; using GPT.
Disk /dev/sda: 3907029168 sectors, 1.8 TiB
Logical sector size: 512 bytes
Disk identifier (GUID): CFB87C67-1993-4517-8301-76E16BBEA901
Partition table holds up to 128 entries
First usable sector is 34, last usable sector is 3907029134
Partitions will be aligned on 2048-sector boundaries
Total free space is 2014 sectors (1007.0 KiB)
Number Start (sector) End (sector) Size Code Name
1 2048 3907029134 1.8 TiB FD00 Linux RAID
Using tunefs
I find the blocksize to be 4096
. Using this info and the calculuation from the HOWTO, I conclude that the block in question is ((3212761936 - 2048) * 512) / 4096 = 401594986
.
The HOWTO then directs me to debugfs
to see if the block is in use (I use the RAID device as it needs an EXT filesystem, this was one of the commands that confused me as I did not, at first, know if I should use /dev/sda or /dev/md0):
# debugfs
debugfs 1.42.4 (12-June-2012)
debugfs: open /dev/md0
debugfs: testb 401594986
Block 401594986 not in use
So block 401594986 is empty space, I should be able to write over it without problems. Before writing to it, though, I try to make sure that it, indeed, cannot be read:
# dd if=/dev/sda1 of=/dev/null bs=4096 count=1 seek=401594986
1+0 records in
1+0 records out
4096 bytes (4.1 kB) copied, 0.000198887 s, 20.6 MB/s
If the block could not be read, I wouldn't expect this to work. However, it does. I repeat using /dev/sda
, /dev/sda1
, /dev/sdb
, /dev/sdb1
, /dev/md0
, and +-5 to the block number to search around the bad block. It all works. I shrug my shoulders and go ahead and commit the write and sync (I use /dev/md0 because I figured modifying one disk and not the other might cause issues, this way both disks overwrite the bad block):
# dd if=/dev/zero of=/dev/md0 bs=4096 count=1 seek=401594986
1+0 records in
1+0 records out
4096 bytes (4.1 kB) copied, 0.000142366 s, 28.8 MB/s
# sync
I would expect that writing to the bad block would have the disks reassign the block to a good one, however running another SMART test shows differently:
# 1 Short offline Completed: read failure 90% 12170 3212761936
Back to square 1. So basically, how would I fix a bad block on 1 disk in a RAID1 array? I'm sure I've not done something correctly...
Thanks for your time and patience.
----------
EDIT 1:
-------
I've tried to run an long SMART test, with the same LBA returning as bad (the only difference is it reports 30% remaining rather than 90%):
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed: read failure 30% 12180 3212761936
# 2 Short offline Completed: read failure 90% 12170 3212761936
I've also used badblocks with the following output. The output is strange and seems to be miss-formatted, but I tried to test the numbers outputed as blocks but debugfs gives an error
# badblocks -sv /dev/sda
Checking blocks 0 to 1953514583
Checking for bad blocks (read-only test): 1606380968ne, 3:57:08 elapsed. (0/0/0 errors)
1606380969ne, 3:57:39 elapsed. (1/0/0 errors)
1606380970ne, 3:58:11 elapsed. (2/0/0 errors)
1606380971ne, 3:58:43 elapsed. (3/0/0 errors)
done
Pass completed, 4 bad blocks found. (4/0/0 errors)
# debugfs
debugfs 1.42.4 (12-June-2012)
debugfs: open /dev/md0
debugfs: testb 1606380968
Illegal block number passed to ext2fs_test_block_bitmap #1606380968 for block bitmap for /dev/md0
Block 1606380968 not in use
Not sure where to go from here. badblocks
definitely found something, but I'm not sure what to do with the information presented...
----------
EDIT 2
------
More commands and info.
I feel like an idiot forgetting to include this originally. This is SMART values for /dev/sda
. I have 1 Current_Pending_Sector, and 0 Offline_Uncorrectable.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 100 100 051 Pre-fail Always - 166
2 Throughput_Performance 0x0026 055 055 000 Old_age Always - 18345
3 Spin_Up_Time 0x0023 084 068 025 Pre-fail Always - 5078
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 75
5 Reallocated_Sector_Ct 0x0033 252 252 010 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 252 252 051 Old_age Always - 0
8 Seek_Time_Performance 0x0024 252 252 015 Old_age Offline - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 12224
10 Spin_Retry_Count 0x0032 252 252 051 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 252 252 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 75
181 Program_Fail_Cnt_Total 0x0022 100 100 000 Old_age Always - 1646911
191 G-Sense_Error_Rate 0x0022 100 100 000 Old_age Always - 12
192 Power-Off_Retract_Count 0x0022 252 252 000 Old_age Always - 0
194 Temperature_Celsius 0x0002 064 059 000 Old_age Always - 36 (Min/Max 22/41)
195 Hardware_ECC_Recovered 0x003a 100 100 000 Old_age Always - 0
196 Reallocated_Event_Count 0x0032 252 252 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 100 100 000 Old_age Always - 1
198 Offline_Uncorrectable 0x0030 252 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0036 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x002a 100 100 000 Old_age Always - 30
223 Load_Retry_Count 0x0032 252 252 000 Old_age Always - 0
225 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 77
# mdadm -D /dev/md0
/dev/md0:
Version : 1.2
Creation Time : Thu May 5 06:30:21 2011
Raid Level : raid1
Array Size : 1953512383 (1863.01 GiB 2000.40 GB)
Used Dev Size : 1953512383 (1863.01 GiB 2000.40 GB)
Raid Devices : 2
Total Devices : 2
Persistence : Superblock is persistent
Update Time : Tue Jul 3 22:15:51 2012
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Name : server:0 (local to host server)
UUID : e7ebaefd:e05c9d6e:3b558391:9b131afb
Events : 67889
Number Major Minor RaidDevice State
2 8 1 0 active sync /dev/sda1
1 8 17 1 active sync /dev/sdb1
As per one of the answers: it would seem I did switch seek
and skip
for dd
. I was using seek as that's what is used with the HOWTO. Using this command causes dd
to hang:
# dd if=/dev/sda1 of=/dev/null bs=4096 count=1 skip=401594986
Using blocks around that one (..84, ..85, ..87, ..88) seems to work just fine, and using /dev/sdb1 with block 401594986
reads just fine as well (as expected as that disk passed SMART testing). Now, the question that I have is: When writing over this area to reassign the blocks, do I use /dev/sda1
or /dev/md0
? I don't want to cause any issues with the RAID array by writing directly to one disk and not having the other disk update.
EDIT 3
------
Writing to the block directly produced filesystem errors. I've chosen an answer that solved the problem quickly:
# 1 Short offline Completed without error 00% 14211 -
# 2 Extended offline Completed: read failure 30% 12244 3212761936
Thanks to everyone who helped. =)
blitzmann
(405 rep)
Jul 3, 2012, 10:24 PM
• Last activity: Feb 8, 2024, 02:00 PM
2
votes
0
answers
490
views
Are SMART offline data collection and offline attributes obsolete?
**TLDR;** I tried to understand the difference between SMART `Offline` and `Always` attributes, and thus to understand what SMART offline data collection is, and if I should enable it on my HDDs. `smartmontools`' [official wiki][1] states that : > Note that the SMART automatic offline test command i...
**TLDR;**
I tried to understand the difference between SMART
Offline
and Always
attributes, and thus to understand what SMART offline data collection is, and if I should enable it on my HDDs.
smartmontools
' official wiki states that :
> Note that the SMART automatic offline test command is listed as Obsolete in every version of the ATA and ATA/ATAPI Specifications. (...) However it is implemented and used by many vendors.
After some extensive reading on the web, and also some tests, the conclusion I reached is:
- Nowadays SMART offline data collection is obsolete
- All data is updated in real time (e.g. Offline
and Always
attributes behave the same way)
- There is no need to enable "Auto Offline Data Collection" (# smartctl --offlineauto=on /dev/sda
), nor to ever launch it manually (# smartctl -t offline /dev/sda
).
- As for the reason all this offline stuff is still in smartmontools
, it's probably to keep it compatible with some very old HDDs that indeed implemented real offline attributes.
Am I right ? Or do I miss something ?
----------
**MORE DETAILS**
I did some tests on a HDD, which has 3 offline attributes (and has Auto Offline Data Collection
disabled):
# smartctl -a /dev/sda
(...)
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
(...)
240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 235 (114 97 0)
241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 13381561756
242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 20472945077
(...)
I then wrote some data on that drive, and noted that all 3 attributes were updated in real time. Thus, they are in fact online (or Always
) attributes, and not Offline
ones.
I did the same test on a few other HDDs, the behavior was identical.
ChennyStar
(1969 rep)
Nov 6, 2023, 04:55 PM
• Last activity: Nov 6, 2023, 05:01 PM
2
votes
1
answers
3395
views
How can I view the smart logs for an NVMe disk in Linux when smartclt is showing there are errors?
My daily driver (Debian Bookworm RC3 + KDE Plasma) is configured to send me emails containing error notifications. Today, I received the following email: ``` This message was generated by the smartd daemon running on: host name: desk DNS domain: local.lan The following warning/error was logged by th...
My daily driver (Debian Bookworm RC3 + KDE Plasma) is configured to send me emails containing error notifications.
Today, I received the following email:
This message was generated by the smartd daemon running on:
host name: desk
DNS domain: local.lan
The following warning/error was logged by the smartd daemon:
Device: /dev/nvme0, number of Error Log entries increased from 1754 to 1758
Device info:
KBG30ZMV256G TOSHIBA, S/N:X8OPD1PGP12P, FW:ADHA0101
For details see host's SYSLOG.
You can also use the smartctl utility for further investigation.
The original message about this issue was sent at Wed May 17 16:09:04 2023 EDT
Another message will be sent in 24 hours if the problem persists.
This is what sudo journalctl -t smart
shows:
May 20 15:19:47 desk smartd: smartd 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-9-amd64] (local build)
May 20 15:19:47 desk smartd: Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org
May 20 15:19:47 desk smartd: Opened configuration file /etc/smartd.conf
May 20 15:19:47 desk smartd: Drive: DEVICESCAN, implied '-a' Directive on line 21 of file /etc/smartd.conf
May 20 15:19:47 desk smartd: Configuration file /etc/smartd.conf was parsed, found DEVICESCAN, scanning devices
May 20 15:19:47 desk smartd: Device: /dev/sda, type changed from 'scsi' to 'sat'
May 20 15:19:47 desk smartd: Device: /dev/sda [SAT], opened
May 20 15:19:47 desk smartd: Device: /dev/sda [SAT], CT4000MX500SSD1, S/N:2304E6A3D318, WWN:5-00a075-1e6a3d318, FW:M3CR045, 4.00 TB
May 20 15:19:47 desk smartd: Device: /dev/sda [SAT], not found in smartd database 7.3/5319.
May 20 15:19:47 desk smartd: Device: /dev/sda [SAT], is SMART capable. Adding to "monitor" list.
May 20 15:19:47 desk smartd: Device: /dev/sda [SAT], state read from /var/lib/smartmontools/smartd.CT4000MX500SSD1-2304E6A3D318.ata.state
May 20 15:19:47 desk smartd: Device: /dev/nvme0, opened
May 20 15:19:47 desk smartd: Device: /dev/nvme0, KBG30ZMV256G TOSHIBA, S/N:X8OPD1PGP12P, FW:ADHA0101
May 20 15:19:47 desk smartd: Device: /dev/nvme0, is SMART capable. Adding to "monitor" list.
May 20 15:19:47 desk smartd: Device: /dev/nvme0, state read from /var/lib/smartmontools/smartd.KBG30ZMV256G_TOSHIBA-X8OPD1PGP12P.nvme.state
May 20 15:19:47 desk smartd: Monitoring 1 ATA/SATA, 0 SCSI/SAS and 1 NVMe devices
May 20 15:19:48 desk smartd: Device: /dev/nvme0, number of Error Log entries increased from 1754 to 1758
May 20 15:19:48 desk smartd: Sending warning via /usr/share/smartmontools/smartd-runner to root ...
May 20 15:19:48 desk smartd: Warning via /usr/share/smartmontools/smartd-runner to root: successful
May 20 15:19:48 desk smartd: Device: /dev/sda [SAT], state written to /var/lib/smartmontools/smartd.CT4000MX500SSD1-2304E6A3D318.ata.state
May 20 15:19:48 desk smartd: Device: /dev/nvme0, state written to /var/lib/smartmontools/smartd.KBG30ZMV256G_TOSHIBA-X8OPD1PGP12P.nvme.state
May 20 15:49:48 desk smartd: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 73 to 74
May 20 22:49:48 desk smartd: Device: /dev/nvme0, number of Error Log entries increased from 1758 to 1760
When I run sudo smartctl -i -a /dev/nvme0
, it shows me the error count, but I can't figure out how to see the log message associated to the increase count:
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-9-amd64] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Number: KBG30ZMV256G TOSHIBA
Serial Number: X8OPD1PGP12P
Firmware Version: ADHA0101
PCI Vendor/Subsystem ID: 0x1179
IEEE OUI Identifier: 0x00080d
Controller ID: 0
NVMe Version: 1.2.1
Number of Namespaces: 1
Namespace 1 Size/Capacity: 256,060,514,304 [256 GB]
Namespace 1 Formatted LBA Size: 512
Namespace 1 IEEE EUI-64: 00080d 04004ad9aa
Local Time is: Sat May 20 23:09:32 2023 EDT
Firmware Updates (0x12): 1 Slot, no Reset required
Optional Admin Commands (0x0017): Security Format Frmw_DL Self_Test
Optional NVM Commands (0x0017): Comp Wr_Unc DS_Mngmt Sav/Sel_Feat
Log Page Attributes (0x02): Cmd_Eff_Lg
Maximum Data Transfer Size: 512 Pages
Warning Comp. Temp. Threshold: 82 Celsius
Critical Comp. Temp. Threshold: 85 Celsius
Supported Power States
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
0 + 3.30W - - 0 0 0 0 0 0
1 + 2.70W - - 1 1 1 1 0 0
2 + 2.30W - - 2 2 2 2 0 0
3 - 0.0500W - - 4 4 4 4 8000 32000
4 - 0.0050W - - 4 4 4 4 8000 40000
Supported LBA Sizes (NSID 0x1)
Id Fmt Data Metadt Rel_Perf
0 - 4096 0 0
1 + 512 0 3
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SMART/Health Information (NVMe Log 0x02)
Critical Warning: 0x00
Temperature: 32 Celsius
Available Spare: 100%
Available Spare Threshold: 10%
Percentage Used: 30%
Data Units Read: 23,188,612 [11.8 TB]
Data Units Written: 39,727,036 [20.3 TB]
Host Read Commands: 222,771,983
Host Write Commands: 498,052,687
Controller Busy Time: 7,440
Power Cycles: 291
Power On Hours: 20,378
Unsafe Shutdowns: 615
Media and Data Integrity Errors: 0
Error Information Log Entries: 1,760
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0
Temperature Sensor 1: 32 Celsius
Error Information (NVMe Log 0x01, 16 of 64 entries)
Num ErrCount SQId CmdId Status PELoc LBA NSID VS
0 1760 0 0x501a 0xc005 0x028 - 1 -
1 1759 0 0xb012 0xc005 0x028 - 1 -
2 1758 0 0x5010 0xc005 0x028 - 0 -
How can I figure out what the errors are?
IMTheNachoMan
(433 rep)
May 21, 2023, 03:12 AM
• Last activity: Sep 28, 2023, 08:38 AM
1
votes
1
answers
232
views
SMART error (CurrentPendingSector) and (OfflineUncorrectableSector)
I have been receiving the following error messages every day for several months now, and I do not know how to stop receiving these messages. `CurrentPendingSector` ``` This message was generated by the smartd daemon running on: host name: myhost DNS domain: [Empty] The following warning/error was lo...
I have been receiving the following error messages every day for several months now, and I do not know how to stop receiving these messages.
CurrentPendingSector
This message was generated by the smartd daemon running on:
host name: myhost
DNS domain: [Empty]
The following warning/error was logged by the smartd daemon:
Device: /dev/sda [SAT], 6 Currently unreadable (pending) sectors
Device info:
KingFast, S/N:03112222C0002, FW:U0803A0, 256 GB
For details see host's SYSLOG.
You can also use the smartctl utility for further investigation.
The original message about this issue was sent at Fri Feb 3 19:41:29 2023 PST
Another message will be sent in 24 hours if the problem persists.
OfflineUncorrectableSector
This message was generated by the smartd daemon running on:
host name: myhost
DNS domain: [Empty]
The following warning/error was logged by the smartd daemon:
Device: /dev/sda [SAT], 3 Offline uncorrectable sectors
Device info:
KingFast, S/N:03112222C0002, FW:U0803A0, 256 GB
For details see host's SYSLOG.
You can also use the smartctl utility for further investigation.
The original message about this issue was sent at Fri Feb 3 19:41:29 2023 PST
Another message will be sent in 24 hours if the problem persists.
smartctl -a /dev/sda
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.19.0-46-generic] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Device Model: KingFast
Serial Number: 03112222C0002
Firmware Version: U0803A0
User Capacity: 256,060,514,304 bytes [256 GB]
Sector Size: 512 bytes logical/physical
Rotation Rate: Solid State Device
Form Factor: 2.5 inches
TRIM Command: Available
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: ACS-2 T13/2015-D revision 3
SATA Version is: SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Sat Jul 8 15:44:59 2023 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x02) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 120) seconds.
Offline data collection
capabilities: (0x11) SMART execute Offline immediate.
No Auto Offline data collection support.
Suspend Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
No Selective Self-test supported.
SMART capabilities: (0x0002) Does not save SMART data before
entering power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 10) minutes.
SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x0032 100 100 050 Old_age Always - 0
5 Reallocated_Sector_Ct 0x0032 100 100 050 Old_age Always - 6
9 Power_On_Hours 0x0032 100 100 050 Old_age Always - 3335
12 Power_Cycle_Count 0x0032 100 100 050 Old_age Always - 440
160 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 3
161 Unknown_Attribute 0x0033 100 100 050 Pre-fail Always - 86
163 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 26
164 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 79004
165 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 481
166 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 6
167 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 114
168 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 5050
169 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 98
175 Program_Fail_Count_Chip 0x0032 100 100 050 Old_age Always - 0
176 Erase_Fail_Count_Chip 0x0032 100 100 050 Old_age Always - 0
177 Wear_Leveling_Count 0x0032 100 100 050 Old_age Always - 0
178 Used_Rsvd_Blk_Cnt_Chip 0x0032 100 100 050 Old_age Always - 6
181 Program_Fail_Cnt_Total 0x0032 100 100 050 Old_age Always - 0
182 Erase_Fail_Count_Total 0x0032 100 100 050 Old_age Always - 0
192 Power-Off_Retract_Count 0x0032 100 100 050 Old_age Always - 88
194 Temperature_Celsius 0x0022 100 100 050 Old_age Always - 35
195 Hardware_ECC_Recovered 0x0032 100 100 050 Old_age Always - 0
196 Reallocated_Event_Count 0x0032 100 100 050 Old_age Always - 3
197 Current_Pending_Sector 0x0032 100 100 050 Old_age Always - 6
198 Offline_Uncorrectable 0x0032 100 100 050 Old_age Always - 3
199 UDMA_CRC_Error_Count 0x0032 100 100 050 Old_age Always - 0
232 Available_Reservd_Space 0x0032 100 100 050 Old_age Always - 86
241 Total_LBAs_Written 0x0030 100 100 050 Old_age Offline - 168900
242 Total_LBAs_Read 0x0030 100 100 050 Old_age Offline - 815543
245 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 191939
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 3329 -
# 2 Short offline Completed without error 00% 3325 -
# 3 Short offline Completed without error 00% 3321 -
# 4 Short offline Completed without error 00% 3313 -
# 5 Short offline Completed without error 00% 3309 -
# 6 Short offline Completed without error 00% 3306 -
# 7 Extended offline Completed without error 00% 3250 -
# 8 Extended offline Completed without error 00% 3232 -
# 9 Extended offline Completed without error 00% 3229 -
#10 Extended offline Completed without error 00% 976 -
#11 Extended offline Completed without error 00% 968 -
Selective Self-tests/Logging not supported
I have tried to ignore the 197
and 198
errors in /etc/smartd.conf
with
/dev/sda -d removable -n standby -H -l error -l selftest -f -t -I 197 -I 198 -s (S/../.././(01|09|17)|L/../../3/11) -m root -M exec /usr/share/smartmontools/smartd-runner
to no avail.
I also do not see any LBA_of_first_error
in the self-test section.
To me, it appears that `SMART overall-health self-assessment test result: PASSED
` is healthy and the self-tests return no errors. My current understanding is that the disk appears to be healthy but is still sending these messages erroneously.
Is there something that I'm missing?
The /dev/sda
drive is a KingFast 256 GB SSD, and I'm not sure if this would be relevant as I could not find anything online for this particular drive or manufacturer.
How would I be able to stop receiving these messages but still have SMART monitoring for other genuine issues on the drive, and how would I fix the issue if this error message really does indicate some problem with the drive?
Thanks!
Edit:
After running smartctl -t long /dev/sda
, I have
smartctl -a /dev/sda
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.19.0-46-generic] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Device Model: KingFast
Serial Number: 03112222C0002
Firmware Version: U0803A0
User Capacity: 256,060,514,304 bytes [256 GB]
Sector Size: 512 bytes logical/physical
Rotation Rate: Solid State Device
Form Factor: 2.5 inches
TRIM Command: Available
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: ACS-2 T13/2015-D revision 3
SATA Version is: SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Sun Jul 9 10:05:33 2023 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x03) Offline data collection activity
is in progress.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 241) Self-test routine in progress...
10% of test remaining.
Total time to complete Offline
data collection: ( 600) seconds.
Offline data collection
capabilities: (0x11) SMART execute Offline immediate.
No Auto Offline data collection support.
Suspend Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
No Selective Self-test supported.
SMART capabilities: (0x0002) Does not save SMART data before
entering power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 10) minutes.
SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x0032 100 100 050 Old_age Always - 0
5 Reallocated_Sector_Ct 0x0032 100 100 050 Old_age Always - 6
9 Power_On_Hours 0x0032 100 100 050 Old_age Always - 3341
12 Power_Cycle_Count 0x0032 100 100 050 Old_age Always - 441
160 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 3
161 Unknown_Attribute 0x0033 100 100 050 Pre-fail Always - 86
163 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 26
164 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 79553
165 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 482
166 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 6
167 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 115
168 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 5050
169 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 98
175 Program_Fail_Count_Chip 0x0032 100 100 050 Old_age Always - 0
176 Erase_Fail_Count_Chip 0x0032 100 100 050 Old_age Always - 0
177 Wear_Leveling_Count 0x0032 100 100 050 Old_age Always - 0
178 Used_Rsvd_Blk_Cnt_Chip 0x0032 100 100 050 Old_age Always - 6
181 Program_Fail_Cnt_Total 0x0032 100 100 050 Old_age Always - 0
182 Erase_Fail_Count_Total 0x0032 100 100 050 Old_age Always - 0
192 Power-Off_Retract_Count 0x0032 100 100 050 Old_age Always - 88
194 Temperature_Celsius 0x0022 100 100 050 Old_age Always - 46
195 Hardware_ECC_Recovered 0x0032 100 100 050 Old_age Always - 0
196 Reallocated_Event_Count 0x0032 100 100 050 Old_age Always - 3
197 Current_Pending_Sector 0x0032 100 100 050 Old_age Always - 6
198 Offline_Uncorrectable 0x0032 100 100 050 Old_age Always - 3
199 UDMA_CRC_Error_Count 0x0032 100 100 050 Old_age Always - 0
232 Available_Reservd_Space 0x0032 100 100 050 Old_age Always - 86
241 Total_LBAs_Written 0x0030 100 100 050 Old_age Offline - 170468
242 Total_LBAs_Read 0x0030 100 100 050 Old_age Offline - 815560
245 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 193199
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 3337 -
# 2 Short offline Completed without error 00% 3329 -
# 3 Short offline Completed without error 00% 3325 -
# 4 Short offline Completed without error 00% 3321 -
# 5 Short offline Completed without error 00% 3313 -
# 6 Short offline Completed without error 00% 3309 -
# 7 Short offline Completed without error 00% 3306 -
# 8 Extended offline Completed without error 00% 3250 -
# 9 Extended offline Completed without error 00% 3232 -
#10 Extended offline Completed without error 00% 3229 -
#11 Extended offline Completed without error 00% 976 -
#12 Extended offline Completed without error 00% 968 -
Selective Self-tests/Logging not supported
The #12 Extended offline test Completed without error
, so I'm not really sure what I'm supposed to do from here.
**Edit #2:**
I have also run the following which I believe indicate that there are no errors with the drive:
badblocks -sv /dev/sda
Checking blocks 0 to 250059095
Checking for bad blocks (read-only test): done
Pass completed, 0 bad blocks found. (0/0/0 errors)
dd if=/dev/sda of=/dev/null bs=64K conv=noerror
3907173+1 records in
3907173+1 records out
256060514304 bytes (256 GB, 238 GiB) copied, 485.648 s, 527 MB/s
jameszp
(93 rep)
Jul 9, 2023, 03:04 AM
• Last activity: Jul 18, 2023, 06:01 PM
7
votes
5
answers
8769
views
S.M.A.R.T shows high Load_Cycle_Count | Why and how to prevent the number from increaseing?
i just realized that **some of my HDD's have a huge Load_Cycle_Count** when reading out their S.M.A.R.T data. Some are short before failing and i am asking myself why this is the case and if there is anything i can do to prevent them from dying. alex@ga-P55A-UD5:~$ sudo smartctl -a /dev/sdb smartctl...
i just realized that **some of my HDD's have a huge Load_Cycle_Count** when reading out their S.M.A.R.T data. Some are short before failing and i am asking myself why this is the case and if there is anything i can do to prevent them from dying.
alex@ga-P55A-UD5:~$ sudo smartctl -a /dev/sdb smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.4.0-142-generic] (local build) Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Western Digital Caviar Green (AF) Device Model: WDC WD10EARS-00Y5B1 [...] Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 4 Start_Stop_Count 0x0032 090 090 000 Old_age Always - 10281 9 Power_On_Hours 0x0032 062 062 000 Old_age Always - 28456 193 Load_Cycle_Count 0x0032 001 001 000 Old_age Always - 611460
alex@ga-P55A-UD5:~$ sudo smartctl -a /dev/sdc smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.4.0-142-generic] (local build) Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Western Digital Caviar Green Device Model: WDC WD6400AADS-00M2B0 [...] Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 4 Start_Stop_Count 0x0032 093 093 000 Old_age Always - 7615 9 Power_On_Hours 0x0032 057 057 000 Old_age Always - 31743 193 Load_Cycle_Count 0x0032 053 053 000 Old_age Always - 442121
alex@silent-ssd:~$ sudo smartctl -a /dev/sdd smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.4.0-142-generic] (local build) Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Western Digital Green Device Model: WDC WD20EARX-00PASB0 [...] Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 098 098 000 Old_age Always - 2477 9 Power_On_Hours 0x0032 085 085 000 Old_age Always - 11176 193 Load_Cycle_Count 0x0032 181 181 000 Old_age Always - 59646
AlexOnLinux
(725 rep)
Mar 4, 2019, 10:42 AM
• Last activity: Jun 19, 2023, 11:27 AM
1
votes
1
answers
188
views
Does my disk support SMART?
I'm confused about this smartctl output. It says SMART status is not supported, but then it says it PASSED. ``` # sudo smartctl -H -d megaraid,24 /dev/sdb smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.59.1.el7.x86_64] (local build) Copyright (C) 2002-18, Bruce Allen, Christian Franke, www....
I'm confused about this smartctl output. It says SMART status is not supported, but then it says it PASSED.
# sudo smartctl -H -d megaraid,24 /dev/sdb
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.59.1.el7.x86_64] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF READ SMART DATA SECTION ===
SMART Status not supported: ATA return descriptor not supported by controller firmware
SMART overall-health self-assessment test result: PASSED
Warning: This result is based on an Attribute check.
# echo $?
4
According to the man page, status code 4 means prefail Attribute is less than the danger threshold.
EXIT STATUS
...
...
Bit 4: We found prefail Attributes <= threshold.
So I'm confused, is SMART data available on this disk or not?
Timothy Pulliam
(3953 rep)
May 9, 2023, 04:36 PM
• Last activity: May 9, 2023, 10:05 PM
0
votes
0
answers
269
views
Which hard drive failed SMART ? Synology NAS
I had 2 x 4TB Red Pro drives in a Sinology NAS. One of them was reported as failing (couple of years ago) by Synology. I believe it was a SMART failure. So I pulled out both drives bought a new one and was able to copy all data from the drive that was failing (not failed yet). Both 4 TB drives seem...
I had 2 x 4TB Red Pro drives in a Sinology NAS. One of them was reported as failing (couple of years ago) by Synology. I believe it was a SMART failure. So I pulled out both drives bought a new one and was able to copy all data from the drive that was failing (not failed yet). Both 4 TB drives seem to be working fine when mounted on a linux machine. I can copy data to and from the one reported as "failing" by Synology. The thing is I'm not a 100% sure which drive was reported as "failing" as it was couple of years ago.
Is there a command/test I can run to figure out which drive is failing so I exclude it (or discard it) from my primary NAS ?
I tried smartctl -a /dev/ for my drives and the self assessment result says "Passed" and I see no errors there.
Update: After running the test recommended by @meuh - "smartctl -t short " I get the following error:
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed: read failure 90% 21687 8359136
Curious101
(311 rep)
Apr 16, 2023, 08:31 PM
• Last activity: Apr 19, 2023, 04:13 AM
0
votes
2
answers
797
views
SSD's SMART Status not supported behind DELL PERC H730 Mini controller
I would like to output the temperature for each of my drives (NVME, SATA, SAS) in my Dell R630, but it couldn't show my SATA **Samsung SSD 870 EVO 250GB** (`/dev/sdc`)'s temperature, which is behind DELL PERC H730 Mini controller: `hddtemp` command shows: /dev/sda: SAMSUNG AREA7680S5xnNTRI: 37°...
I would like to output the temperature for each of my drives (NVME, SATA, SAS) in my Dell R630, but it couldn't show my SATA **Samsung SSD 870 EVO 250GB** (
/dev/sdc
)'s temperature, which is behind DELL PERC H730 Mini controller:
hddtemp
command shows:
/dev/sda: SAMSUNG AREA7680S5xnNTRI: 37°C
/dev/sdb: SAMSUNG AREA7680S5xnNTRI: 36°C
/dev/sdc: DELL PERC H730 Mini: S.M.A.R.T. not available
When I tried to use smartctl
, it shows:
Smartctl open device: /dev/sdc failed: DELL or MegaRaid controller, please try adding '-d megaraid,N'
I then use smartctl -a -d megaraid,0 /dev/sdc
It does shows my device name correctly:
=== START OF INFORMATION SECTION ===
Device Model: Samsung SSD 870 EVO 250GB
and
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
but SMART status shows:
=== START OF READ SMART DATA SECTION ===
SMART Status not supported: ATA return descriptor not supported by controller firmware
May I know how to find out the temperature of the SSD behind DELL PERC H730 Mini controller?
JCTL
(3 rep)
Mar 30, 2023, 07:26 AM
• Last activity: Mar 30, 2023, 12:28 PM
1
votes
2
answers
3301
views
Cannot get smartctl working
On my debian wheezy server I use a **software raid 1** with two harddisks `dev/sda3` and `dev/sdb3` connected into `dev/md2`: mdadm --detail /dev/md2 Number Major Minor RaidDevice State 0 8 3 0 active sync /dev/sda3 1 8 19 1 active sync /dev/sdb3 The raid seems to be fine, but on one of the disks SM...
On my debian wheezy server I use a **software raid 1** with two harddisks
dev/sda3
and dev/sdb3
connected into dev/md2
:
mdadm --detail /dev/md2
Number Major Minor RaidDevice State
0 8 3 0 active sync /dev/sda3
1 8 19 1 active sync /dev/sdb3
The raid seems to be fine, but on one of the disks SMART is not running:
smartctl --all /dev/sda
sais:
SMART support is: Available - device has SMART capability.
SMART support is: Disabled
While /dev/sdb
gives a lot of SMART information.
I tried to start it with
smartctl -s on /dev/sda -T verypermissive not working
But it doesn't start:
Error SMART Enable failed: scsi error aborted command
Smartctl: SMART Enable Failed.
How can I get it running? Or does it mean the disk has a problem?
rubo77
(30435 rep)
Feb 8, 2015, 10:59 PM
• Last activity: Nov 1, 2022, 02:41 PM
2
votes
4
answers
866
views
smartmontools: Should I replace my SSHD?
Today, when I was watching a video in Firefox, suddenly the following window pupped up: [![enter image description here][1]][1] [1]: https://i.sstatic.net/X1UA6.jpg Or the Output from GSmartContol: smartctl 7.1 2019-12-30 r5022 [x86_64-linux-4.19.0-22-amd64] (local build) Copyright (C) 2002-19, Bruc...
Today, when I was watching a video in Firefox, suddenly the following window pupped up:
Or the Output from GSmartContol:
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-4.19.0-22-amd64] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Seagate Laptop SSHD
Device Model: ST500LM000-1EJ162-SSHD
Serial Number: W3715AR9
LU WWN Device Id: 5 000c50 06e236b9f
Firmware Version: HPD3
User Capacity: 500,107,862,016 bytes [500 GB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5400 rpm
Form Factor: 2.5 inches
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2, ACS-3 T13/2161-D revision 3b
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Sun Oct 23 14:41:09 2022 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is: Unavailable
APM level is: 254 (maximum performance)
Rd look-ahead is: Enabled
Write cache is: Enabled
DSN feature is: Unavailable
ATA Security is: Disabled, frozen [SEC2]
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 634) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 99) minutes.
SCT capabilities: (0x1081) SCT Status supported.
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE
1 Raw_Read_Error_Rate POSR-K 118 099 006 - 195697992
3 Spin_Up_Time PO---K 099 099 000 - 0
4 Start_Stop_Count -O--CK 093 093 020 - 7676
5 Reallocated_Sector_Ct PO--CK 100 100 036 - 0
7 Seek_Error_Rate POSR-K 082 060 030 - 4473742513
9 Power_On_Hours -O--CK 087 087 000 - 11853
10 Spin_Retry_Count PO--CK 100 100 097 - 0
12 Power_Cycle_Count -O--CK 093 093 020 - 7668
180 Unknown_HDD_Attribute -O-R-K 100 100 000 - 64025461
183 Runtime_Bad_Block -O--CK 100 100 000 - 0
184 End-to-End_Error PO--CK 100 100 097 - 0
187 Reported_Uncorrect -O--CK 100 100 000 - 0
188 Command_Timeout -O--CK 100 099 000 - 2
189 High_Fly_Writes -O-RCK 063 063 000 - 37
190 Airflow_Temperature_Cel -O---K 069 055 045 - 31 (Min/Max 28/32)
191 G-Sense_Error_Rate -O--CK 100 100 000 - 0
192 Power-Off_Retract_Count -O--CK 100 100 000 - 228
193 Load_Cycle_Count -O--CK 097 097 000 - 7777
194 Temperature_Celsius -O---K 031 045 000 - 31 (0 14 0 0 0)
196 Reallocated_Event_Count -O--CK 100 100 000 - 0
197 Current_Pending_Sector -O--CK 100 100 000 - 16
198 Offline_Uncorrectable ----CK 100 100 000 - 16
199 UDMA_CRC_Error_Count -O--CK 200 200 000 - 0
254 Free_Fall_Sensor -O--CK 100 100 000 - 0
||||||_ K auto-keep
|||||__ C event count
||||___ R error rate
|||____ S speed/performance
||_____ O updated online
|______ P prefailure warning
General Purpose Log Directory Version 1
SMART Log Directory Version 1 [multi-sector log support]
Address Access R/W Size Description
0x00 GPL,SL R/O 1 Log Directory
0x01 SL R/O 1 Summary SMART error log
0x02 SL R/O 5 Comprehensive SMART error log
0x03 GPL R/O 5 Ext. Comprehensive SMART error log
0x06 SL R/O 1 SMART self-test log
0x07 GPL R/O 1 Extended self-test log
0x09 SL R/W 1 Selective self-test log
0x10 GPL R/O 1 NCQ Command Error log
0x11 GPL R/O 1 SATA Phy Event Counters log
0x21 GPL R/O 1 Write stream error log
0x22 GPL R/O 1 Read stream error log
0x24 GPL R/O 1223 Current Device Internal Status Data log
0x25 GPL R/O 1223 Saved Device Internal Status Data log
0x30 GPL,SL R/O 9 IDENTIFY DEVICE data log
0x80-0x9f GPL,SL R/W 16 Host vendor specific log
0xa1 GPL,SL VS 20 Device vendor specific log
0xa2 GPL VS 3900 Device vendor specific log
0xa8 GPL,SL VS 129 Device vendor specific log
0xa9 GPL,SL VS 1 Device vendor specific log
0xab GPL VS 1 Device vendor specific log
0xae GPL VS 1 Device vendor specific log
0xb0 GPL VS 4580 Device vendor specific log
0xb6 GPL VS 1918 Device vendor specific log
0xbe-0xbf GPL VS 65535 Device vendor specific log
0xc1 GPL,SL VS 10 Device vendor specific log
0xc2 GPL,SL VS 50 Device vendor specific log
0xc4 GPL,SL VS 5 Device vendor specific log
0xe0 GPL,SL R/W 1 SCT Command/Status
0xe1 GPL,SL R/W 1 SCT Data Transfer
SMART Extended Comprehensive Error Log Version: 1 (5 sectors)
Device Error Count: 1
CR = Command Register
FEATR = Features Register
COUNT = Count (was: Sector Count) Register
LBA_48 = Upper bytes of LBA High/Mid/Low Registers ] ATA-8
LH = LBA High (was: Cylinder High) Register ] LBA
LM = LBA Mid (was: Cylinder Low) Register ] Register
LL = LBA Low (was: Sector Number) Register ]
DV = Device (was: Device/Head) Register
DC = Device Control Register
ER = Error register
ST = Status register
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 1 occurred at disk power-on lifetime: 8134 hours (338 days + 22 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
40 -- 51 00 00 00 00 00 a0 3a 40 00 00 Error: UNC at LBA = 0x00a03a40 = 10500672
Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- --------------- --------------------
25 00 00 00 2a 00 00 00 a0 3a 40 e0 00 01:31:49.827 READ DMA EXT
25 00 00 00 35 00 00 00 a0 42 0b e0 00 01:31:49.348 READ DMA EXT
25 00 00 00 0b 00 00 00 a0 42 00 e0 00 01:31:49.345 READ DMA EXT
25 00 00 00 15 00 00 03 93 ac 6b e0 00 01:31:49.342 READ DMA EXT
25 00 00 00 2b 00 00 03 93 ac 40 e0 00 01:31:49.339 READ DMA EXT
SMART Extended Self-test Log Version: 1 (1 sectors)
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 11852 -
# 2 Short offline Completed without error 00% 11847 -
# 3 Short offline Completed without error 00% 11844 -
# 4 Short offline Completed without error 00% 11835 -
# 5 Short offline Completed without error 00% 11830 -
# 6 Short offline Completed without error 00% 11823 -
# 7 Short offline Completed without error 00% 11818 -
# 8 Short offline Completed without error 00% 11814 -
# 9 Short offline Completed without error 00% 11806 -
#10 Short offline Completed without error 00% 11801 -
#11 Short offline Completed without error 00% 11792 -
#12 Short offline Completed without error 00% 11790 -
#13 Short offline Completed without error 00% 11780 -
#14 Short offline Completed without error 00% 11772 -
#15 Short offline Completed without error 00% 11765 -
#16 Short offline Completed without error 00% 11756 -
#17 Short offline Completed without error 00% 11751 -
#18 Short offline Completed without error 00% 11747 -
#19 Short offline Completed without error 00% 11740 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
SCT Status Version: 3
SCT Version (vendor specific): 522 (0x020a)
Device State: Active (0)
Current Temperature: 31 Celsius
Power Cycle Min/Max Temperature: 25/32 Celsius
Lifetime Min/Max Temperature: 16/44 Celsius
Under/Over Temperature Limit Count: 0/2
SCT Data Table command not supported
SCT Error Recovery Control command not supported
Device Statistics (GP/SMART Log 0x04) not supported
SATA Phy Event Counters (GP Log 0x11)
ID Size Value Description
0x000a 2 3 Device-to-host register FISes sent due to a COMRESET
0x0001 2 0 Command failed due to ICRC error
0x0003 2 0 R_ERR response for device-to-host data FIS
0x0004 2 0 R_ERR response for host-to-device data FIS
0x0006 2 0 R_ERR response for device-to-host non-data FIS
0x0007 2 0 R_ERR response for host-to-device non-data FIS
Also today, when I was booting Linux it was not booting. So I have restarted the boot and it worked without problem. This was before this error popped up. No idea if this boot issue has something to do with the smartmontools error. The booting issue was before I had this error warning.
**Confusing:**
In the reoprt there is a line "Error 1 occurred at disk power-on lifetime: 8134 hours (338 days + 22 hours)".
But there is no date. My expectation was, that there would be a date at which this error occured, so that I can show what todays date is and can definitely assign the error to the date of today.
As I did not found a date in the whole output of the txt file, I was looking for the actual lifetime of my sshd, because it was said, that the error occurred at 8134h. So my expectation was, that I can somewhere find the amount of hours my sshd has run until the current time. But I also did not found this.
Which host's syslog is meant?
Maybe this one:
/var/log/syslog ?
If yes: Here it is:
https://workupload.com/file/NVD2gpdrvHp
But my main question is: Is there a high risk, that my sshd soon will die?
It is said, that the hard disk health status has changed. But where can I now find the current health status?
Thank you.

Wogehu
(123 rep)
Oct 23, 2022, 01:51 PM
• Last activity: Oct 25, 2022, 07:09 AM
2
votes
2
answers
999
views
smartd output to screen, not email
I'm trying to get smartd working; it is determined that messages will be sent via email via 'mail' which I've never used. But I recall that a few years ago I had smartd sending it's warnings directly to the screen via a popup text box. I'm trying to figure out how to do that again. The info for the...
I'm trying to get smartd working; it is determined that messages will be sent via email via 'mail' which I've never used. But I recall that a few years ago I had smartd sending it's warnings directly to the screen via a popup text box. I'm trying to figure out how to do that again. The info for the 'screen' command baffles me. tmux likewise. Or I suppose it could be a notification. When I have a few weeks to study it, I'll get 'mail' working but for now I'd prefer a popup message anyway.
==================================================
In 'smartd.conf':
DEVICESCAN -a -m -M exec notify -M test
... Ok, added full path, much better:
DEVICESCAN -a -m -M exec /bin/notify -M test
... 'notify' runs fine from CLI, is an executable script:
/bin/notify-send "$(systemctl status smartd)"
... but although:
systemctl restart smartd; systemctl status smartd
... reports no errors, I get no 'test' result.
BTW, so far no results at all using those variables you mentioned.
...
$ smartd ... shows two notifications, one for each of my two disks! So why does 'systemctl restart smartd' show nothing?
Ray Andrews
(2615 rep)
Oct 17, 2022, 01:56 PM
• Last activity: Oct 17, 2022, 10:08 PM
1
votes
2
answers
1479
views
Why smartd need database?
Do smartd need the database? or smartctl needs the database? I saw smart tool github keep updating database: https://github.com/smartmontools/smartmontools/labels/drivedb In my understanding, smartd will scan all disks then why does it need a database? what's the function/purpose to use a database i...
Do smartd need the database?
or
smartctl needs the database?
I saw smart tool github keep updating database:
https://github.com/smartmontools/smartmontools/labels/drivedb
In my understanding, smartd will scan all disks then why does it need a database? what's the function/purpose to use a database in smartd/smartctl?
Mark K
(955 rep)
Aug 12, 2022, 03:32 AM
• Last activity: Aug 12, 2022, 11:52 AM
0
votes
1
answers
99
views
Prometheus DiskTooManyReallocatedSectors
I have Prometheus Alert Manager running on several linux machines. *(https://prometheus.io/docs/alerting/latest/alertmanager/)* One of them is reporting *2 reallocated sectors*. I got the setup-alert from here: *https://awesome-prometheus-alerts.grep.to/rules.html* **1. What is my course of action?*...
I have Prometheus Alert Manager running on several linux machines. *(https://prometheus.io/docs/alerting/latest/alertmanager/)*
One of them is reporting *2 reallocated sectors*. I got the setup-alert from here:
*https://awesome-prometheus-alerts.grep.to/rules.html*
**1. What is my course of action?** Replace with an SDD?
**2. What is the priority ...weeks, months?**

DavidDunham
(117 rep)
Jul 2, 2022, 03:08 AM
• Last activity: Jul 3, 2022, 07:51 AM
1
votes
3
answers
998
views
Deciphering smartctl results
I'm trying to use an internal 2.5" HDD to serve as my external storage media and for backups. This HDD had been previously on a Windows machine for a long time before some of my files became corrupted, and so I simply changed it for an SSD. CrystalDiskInfo reports the drive's health as 'good', howev...
I'm trying to use an internal 2.5" HDD to serve as my external storage media and for backups. This HDD had been previously on a Windows machine for a long time before some of my files became corrupted, and so I simply changed it for an SSD. CrystalDiskInfo reports the drive's health as 'good', however, HDDScan shows a warning for "UltraDMA CRC Errors".
This is the results of running
smartctl -a
on it:
smartctl 7.3 2022-02-28 r5338 [x86_64-w64-mingw32-w10-21H2] (sf-7.3-1)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Seagate Mobile HDD
Device Model: ST1000LM035-1RK172
Firmware Version: SDM1
User Capacity: 1,000,204,886,016 bytes [1.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5400 rpm
Form Factor: 2.5 inches
Device is: In smartctl database 7.3/5319
ATA Version is: ACS-3 T13/2161-D revision 3b
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 1.5 Gb/s)
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 0) seconds.
Offline data collection
capabilities: (0x71) SMART execute Offline immediate.
No Auto Offline data collection support.
Suspend Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 169) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x3035) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 072 051 006 Pre-fail Always - 15037148
3 Spin_Up_Time 0x0003 099 099 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 065 065 020 Old_age Always - 36007
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 080 060 045 Pre-fail Always - 22014115380
9 Power_On_Hours 0x0032 100 086 000 Old_age Always - 55 (236 20 0)
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 092 020 Old_age Always - 56
184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 001 001 000 Old_age Always - 3990
188 Command_Timeout 0x0032 100 081 000 Old_age Always - 30067195949
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 068 049 040 Old_age Always - 32 (Min/Max 29/32)
191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 197
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 69
193 Load_Cycle_Count 0x0032 064 064 000 Old_age Always - 73637
194 Temperature_Celsius 0x0022 032 051 000 Old_age Always - 32 (0 18 0 0 0)
197 Current_Pending_Sector 0x0012 100 051 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 051 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 3
240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 12591 (154 139 0)
241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 98336960706
242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 114926523300
254 Free_Fall_Sensor 0x0032 100 100 000 Old_age Always - 0
SMART Error Log Version: 1
ATA Error Count: 3475 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 3475 occurred at disk power-on lifetime: 29 hours (1 days + 5 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 88 7d f4 09 Error: UNC at LBA = 0x09f47d88 = 167017864
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 00 08 88 7d f4 49 00 00:38:49.208 READ FPDMA QUEUED
60 00 08 80 7d f4 49 00 00:38:49.195 READ FPDMA QUEUED
60 00 08 b0 a8 21 49 00 00:38:49.182 READ FPDMA QUEUED
60 00 08 a8 a8 21 49 00 00:38:49.181 READ FPDMA QUEUED
60 00 08 a0 a8 21 49 00 00:38:49.181 READ FPDMA QUEUED
Error 3474 occurred at disk power-on lifetime: 29 hours (1 days + 5 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 80 7d f4 09 Error: UNC at LBA = 0x09f47d80 = 167017856
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 00 08 80 7d f4 49 00 00:38:46.657 READ FPDMA QUEUED
60 00 08 78 7d f4 49 00 00:38:46.625 READ FPDMA QUEUED
ef 10 03 00 00 00 a0 00 00:38:46.615 SET FEATURES [Enable SATA feature]
ef 10 02 00 00 00 a0 00 00:38:46.605 SET FEATURES [Enable SATA feature]
27 00 00 00 00 00 e0 00 00:38:46.578 READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
Error 3473 occurred at disk power-on lifetime: 29 hours (1 days + 5 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 78 7d f4 09 Error: UNC at LBA = 0x09f47d78 = 167017848
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 00 08 78 7d f4 49 00 00:38:44.127 READ FPDMA QUEUED
60 00 08 70 7d f4 49 00 00:38:44.095 READ FPDMA QUEUED
ef 10 03 00 00 00 a0 00 00:38:44.085 SET FEATURES [Enable SATA feature]
ef 10 02 00 00 00 a0 00 00:38:44.076 SET FEATURES [Enable SATA feature]
27 00 00 00 00 00 e0 00 00:38:44.049 READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
Error 3472 occurred at disk power-on lifetime: 29 hours (1 days + 5 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 70 7d f4 09 Error: UNC at LBA = 0x09f47d70 = 167017840
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 00 08 70 7d f4 49 00 00:38:41.598 READ FPDMA QUEUED
60 00 08 68 7d f4 49 00 00:38:41.566 READ FPDMA QUEUED
ef 10 03 00 00 00 a0 00 00:38:41.556 SET FEATURES [Enable SATA feature]
ef 10 02 00 00 00 a0 00 00:38:41.547 SET FEATURES [Enable SATA feature]
27 00 00 00 00 00 e0 00 00:38:41.520 READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
Error 3471 occurred at disk power-on lifetime: 29 hours (1 days + 5 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 68 7d f4 09 Error: UNC at LBA = 0x09f47d68 = 167017832
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 00 08 68 7d f4 49 00 00:38:39.013 READ FPDMA QUEUED
60 00 08 60 7d f4 49 00 00:38:38.987 READ FPDMA QUEUED
ef 10 03 00 00 00 a0 00 00:38:38.977 SET FEATURES [Enable SATA feature]
ef 10 02 00 00 00 a0 00 00:38:38.968 SET FEATURES [Enable SATA feature]
27 00 00 00 00 00 e0 00 00:38:38.941 READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed: read failure 90% 22 282912
# 2 Extended offline Completed: read failure 90% 20 282912
# 3 Extended offline Completed: read failure 90% 18 282912
# 4 Conveyance offline Completed: read failure 90% 18 282912
# 5 Short offline Completed: read failure 80% 18 282912
# 6 Extended offline Completed: read failure 90% 17 282912
# 7 Extended offline Completed: read failure 90% 2 64400520
# 8 Extended offline Completed: read failure 90% 2 64400520
# 9 Extended offline Completed: read failure 90% 2 64400520
#10 Extended offline Completed without error 00% 9245 -
#11 Short offline Completed without error 00% 4741 -
#12 Short offline Completed without error 00% 4667 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
Which is bit confusing to me. The drive is attached to my computer via USB (adaptor) which has no brand name on it, but smartctl
reveals it to be "USB JMicron". Is this "USB JMicron" adaptor causing the "UltraDMA CRC Errors" or other problems?
As I'm not sure what had caused the then data corruption, I am wondering if this drive is actually safe and reliable based on the info from smartctl
. Any help with deciphering the diagnostic info would be much appreciated.
Sepp A
(73 rep)
Jul 1, 2022, 04:24 PM
• Last activity: Jul 1, 2022, 06:36 PM
4
votes
1
answers
3336
views
Linux on Marvell 88SE9230. How to get stats?
I use Marvell 88SE9230 controller on my home Linux server. HP does have utility to setup raid and get some stats. But I'm wondering how to get any status from a Linux system. Quick googling shows only Linux drivers for accessing array itself on previous versions of kernel, but I want to know SMART s...
I use Marvell 88SE9230 controller on my home Linux server. HP does have utility to setup raid and get some stats. But I'm wondering how to get any status from a Linux system. Quick googling shows only Linux drivers for accessing array itself on previous versions of kernel, but I want to know SMART status of drives.
Smartctl doesn't work:
root@iris:~# smartctl -a -d marvell -T verypermissive /dev/sda
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.0-96-generic] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org
Read Device Identity failed: Unknown error
=== START OF INFORMATION SECTION ===
Device Model: [No Information Found]
Serial Number: [No Information Found]
Firmware Version: [No Information Found]
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: [No Information Found]
Local Time is: Thu Jan 27 19:11:54 2022 MSK
SMART support is: Ambiguous - ATA IDENTIFY DEVICE words 82-83 don't show if SMART supported.
SMART support is: Ambiguous - ATA IDENTIFY DEVICE words 85-87 don't show if SMART is enabled.
Checking to be sure by trying SMART RETURN STATUS command.
SMART support is: Unknown - Try option -s with argument 'on' to enable it.
Read SMART Data failed: Success
=== START OF READ SMART DATA SECTION ===
SMART Status command failed: Success
SMART overall-health self-assessment test result: UNKNOWN!
SMART Status, Attributes and Thresholds cannot be read.
Read SMART Error Log failed: Success
Read SMART Self-test Log failed: Success
Selective Self-tests/Logging not supported
How can I get at least some stats from controller?
Hills of Eternity
(88 rep)
Jan 27, 2022, 04:14 PM
• Last activity: Apr 3, 2022, 05:47 PM
0
votes
1
answers
2520
views
How to disable smartd
So I have a `Mac` which I installed `smartontools`, to see my `smart data`. And I thought that `smartd` would be helpful in doing short tests on my `Mac SSD` But I found out via `Google` that `smartd` only runs tests during `03:00am` and no way my Mac will be powered on at that time. I understand th...
So I have a
Mac
which I installed smartontools
, to see my smart data
.
And I thought that smartd
would be helpful in doing short tests on my Mac SSD
But I found out via Google
that smartd
only runs tests during 03:00am
and no way my Mac will be powered on at that time.
I understand that smartd
is for servers
which run 24/7
so there is no use of smartd
.
So I would like to disable it and write my own simple bash script
which runs short tests on my Mac SSD
.
So is there any way I can disable smartd or remove it without affecting smartctl?
Jddhhdhdi283838
(1 rep)
Dec 25, 2017, 12:56 PM
• Last activity: Feb 13, 2022, 09:02 PM
3
votes
0
answers
1190
views
Linux-image generates samsung nvme errors
After upgrade from kernel **5.10** to **5.14** and now to **5.15.3**. There are errors increased in smart after every boot in Samsung 970 Evo nvme disk (and most likely others Samsung's nvmes) like this one: `Error Information Log Entries: 41` <- increased after every boot `nvme error-log /dev/nvme0...
After upgrade from kernel **5.10** to **5.14** and now to **5.15.3**. There are errors increased in smart after every boot in Samsung 970 Evo nvme disk (and most likely others Samsung's nvmes) like this one:
Error Information Log Entries: 41
<- increased after every boot
nvme error-log /dev/nvme0
:
status_field : 0x4004(INVALID_FIELD: A reserved coded value or an unsupported value in a defined field)
Does someone know if this is a bug ? or maybe kernel try to talk to
nvme disk in non supported (by a disk) way ? or maybe there is a need
for some kind of exclude samsung disks from this kind of communication ?
Is there any kernel boot parameter solving this behavior ?
It is worth to mention this issue is not present in 5.10 kernel.
---
There is a bug report filled 27 Sep 2021: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=995161 but seems like nothing has changed since then.
EdiD
(342 rep)
Nov 26, 2021, 11:42 AM
• Last activity: Dec 1, 2021, 12:30 PM
Showing page 1 of 20 total questions