Unix & Linux Stack Exchange
Q&A for users of Linux, FreeBSD and other Unix-like operating systems
Latest Questions
6
votes
1
answers
10569
views
How to check/fix nvme health?
I'm running debian stable with a 2 x nvme Raid 1. Here is the hardware/hoster it's running on https://www.hetzner.com/dedicated-rootserver/ex62-nvme?country=us Almost every second day mdadm monitoring reports a fail event and leaves the array degraded. It only disables 1 partition as you can see her...
I'm running debian stable with a 2 x nvme Raid 1.
Here is the hardware/hoster it's running on
https://www.hetzner.com/dedicated-rootserver/ex62-nvme?country=us
Almost every second day mdadm monitoring reports a fail event and leaves the array degraded.
It only disables 1 partition as you can see here:
This is an automatically generated mail message from mdadm
running on xxx
A Fail event had been detected on md device /dev/md/2.
It could be related to component device /dev/nvme1n1p3.
Faithfully yours, etc.
P.S. The /proc/mdstat file currently contains the following:
Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10]
md2 : active raid1 nvme1n1p3(F) nvme0n1p3
465895744 blocks super 1.2 [2/1] [U_]
bitmap: 4/4 pages [16KB], 65536KB chunk
md0 : active (auto-read-only) raid1 nvme1n1p1 nvme0n1p1
33521664 blocks super 1.2 [2/2] [UU]
md1 : active raid1 nvme0n1p2 nvme1n1p2
523712 blocks super 1.2 [2/2] [UU]
unused devices:
This happens on both disks. One time it's nvme0n1p3 and next time it's nvme1n1p3.
I then just re-add the failed partition with
mdadm --re-add /dev/md2 /dev/nvme0n1p3
or
mdadm --re-add /dev/md2 /dev/nvme1n1p3
and after the resync it works for a day or two.
In dmesg I found this:
[94879.144892] nvme nvme1: I/O 311 QID 1 timeout, reset controller
[94879.252851] nvme nvme1: completing aborted command with status: 0007
[94879.252970] blk_update_request: I/O error, dev nvme1n1, sector 452352001
[94879.253091] nvme nvme1: completing aborted command with status: fffffffc
[94879.253223] blk_update_request: I/O error, dev nvme1n1, sector 68159504
[94879.253418] md: super_written gets error=-5
I tried to check the health of the devices with these commands, but they don't give me stats like "Reallocated_Sector_Ct" or "Reported_Uncorrect".
smartctl -x /dev/nvme1
smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.9.0-8-amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Number: KXG50ZNV512G TOSHIBA
Serial Number: 28SS10F6TYST
Firmware Version: AAGA4102
PCI Vendor/Subsystem ID: 0x1179
IEEE OUI Identifier: 0x00080d
Total NVM Capacity: 512,110,190,592 [512 GB]
Unallocated NVM Capacity: 0
Controller ID: 0
Number of Namespaces: 1
Namespace 1 Size/Capacity: 512,110,190,592 [512 GB]
Namespace 1 Formatted LBA Size: 512
Local Time is: Mon May 13 10:34:11 2019 CEST
Firmware Updates (0x14): 2 Slots, no Reset required
Optional Admin Commands (0x0017): Security Format Frmw_DL *Other*
Optional NVM Commands (0x005f): Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat *Other*
Maximum Data Transfer Size: 512 Pages
Warning Comp. Temp. Threshold: 78 Celsius
Critical Comp. Temp. Threshold: 82 Celsius
Namespace 1 Features (0x02): NA_Fields
Supported Power States
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
0 + 6.00W - - 0 0 0 0 0 0
1 + 2.40W - - 1 1 1 1 0 0
2 + 1.90W - - 2 2 2 2 0 0
3 - 0.0500W - - 3 3 3 3 1500 1500
4 - 0.0050W - - 4 4 4 4 6000 14000
5 - 0.0030W - - 5 5 5 5 50000 80000
Supported LBA Sizes (NSID 0x1)
Id Fmt Data Metadt Rel_Perf
0 + 512 0 2
1 - 4096 0 1
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SMART/Health Information (NVMe Log 0x02, NSID 0xffffffff)
Critical Warning: 0x00
Temperature: 47 Celsius
Available Spare: 100%
Available Spare Threshold: 10%
Percentage Used: 57%
Data Units Read: 31,858,921 [16.3 TB]
Data Units Written: 293,589,002 [150 TB]
Host Read Commands: 4,130,502,428
Host Write Commands: 889,121,505
Controller Busy Time: 13,552
Power Cycles: 7
Power On Hours: 6,720
Unsafe Shutdowns: 0
Media and Data Integrity Errors: 0
Error Information Log Entries: 0
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0
Temperature Sensor 1: 47 Celsius
Error Information (NVMe Log 0x01, max 128 entries)
No Errors Logged
nvme smart-log /dev/nvme1
Smart Log for NVME device:nvme1 namespace-id:ffffffff
critical_warning : 0
temperature : 47 C
available_spare : 100%
available_spare_threshold : 10%
percentage_used : 57%
data_units_read : 31,858,921
data_units_written : 293,589,023
host_read_commands : 4,130,502,429
host_write_commands : 889,122,059
controller_busy_time : 13,552
power_cycles : 7
power_on_hours : 6,720
unsafe_shutdowns : 0
media_errors : 0
num_err_log_entries : 0
Warning Temperature Time : 0
Critical Composite Temperature Time : 0
Temperature Sensor 1 : 47 C
Temperature Sensor 2 : 0 C
Temperature Sensor 3 : 0 C
Temperature Sensor 4 : 0 C
Temperature Sensor 5 : 0 C
Temperature Sensor 6 : 0 C
Temperature Sensor 7 : 0 C
Temperature Sensor 8 : 0 C
nvme smart-log-add /dev/nvme1
NVMe Status:INVALID_LOG_PAGE(4109)
smartctl -A /dev/nvme1
smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.9.0-8-amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF SMART DATA SECTION ===
SMART/Health Information (NVMe Log 0x02, NSID 0xffffffff)
Critical Warning: 0x00
Temperature: 46 Celsius
Available Spare: 100%
Available Spare Threshold: 10%
Percentage Used: 57%
Data Units Read: 31,858,924 [16.3 TB]
Data Units Written: 293,591,327 [150 TB]
Host Read Commands: 4,130,502,490
Host Write Commands: 889,172,096
Controller Busy Time: 13,552
Power Cycles: 7
Power On Hours: 6,721
Unsafe Shutdowns: 0
Media and Data Integrity Errors: 0
Error Information Log Entries: 0
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0
Temperature Sensor 1: 46 Celsius
I only noticed the issue after apache failed to start and I repaired the filesystem with fsck.ext4 -f. Before I didn't have setup root mail correctly.
So looks to me like a hardware error and I should get rid of both nvmes.
Is there anything I can try to fix these issues and save the nvmes? Or at least to get all the smart values like "Reported_Uncorrect" or "Offline_Uncorrectable".
treffner
(61 rep)
May 13, 2019, 08:55 AM
• Last activity: May 17, 2025, 06:00 AM
2
votes
0
answers
270
views
How to configure smartd, s-nail and selinux to get sending mails to work?
I am trying to configure smartd to send mails via s-nail on Fedora 41. I created a .mailrc file (in which I have set the mta variable to directly send via smtps, there is no sendmail installed) in roots home directory and can successfully send mails via: echo "Test" | mail -s Test I also managed to...
I am trying to configure smartd to send mails via s-nail on Fedora 41.
I created a .mailrc file (in which I have set the mta variable to directly send via smtps, there is no sendmail installed) in roots home directory and can successfully send mails via:
echo "Test" | mail -s Test
I also managed to send mails in a bash script started by a custom systemd service.
But smartd isn't able to send mails. The following error is shown in the log:
Executing test of /usr/libexec/smartmontools/smartdnotify to ...
Test of /usr/libexec/smartmontools/smartdnotify to produced unexpected output (163 bytes) to STDOUT/STDERR:
s-nail: Cannot start /usr/sbin/sendmail: executable not found (adjust *mta* variable)
s-nail: Cannot save to $DEAD: Permission denied
s-nail: ... message not sent
Selinux is blocking the access to the .mailrc file (therefore s-nail is trying /usr/sbin/sendmail as a default fallback):
type=AVC msg=audit(1744370186.375:606): avc: denied { read } for pid=42644 comm="mail" name=".mailrc" dev="nvme0n1p3" ino=140324 scontext=system_u:system_r:smartdwarn_t:s0 tcontext=unconfined_u:object_r:mail_home_t:s0 tclass=file permissive=0
I tried the suggested
ausearch -c 'mail' --raw | audit2allow -M my-mail
semodule -X 300 -i my-mail.pp
systemctl restart smartd.service
a couple of times until no new selinux errors appeared. Now I get the following error:
Test of /usr/libexec/smartmontools/smartdnotify to produced unexpected output (130 bytes) to STDOUT/STDERR:
s-nail: could not initiate TLS connection: error:00000000:lib(0)::reason(0)
/root/dead.letter 23/578
s-nail: ... message not sent
s-nail now can access the .mailrc file and can connect to the server. But no successfull communication with the server (Error 0 ?). The content of the mail is written to the dead.letter file instead.
What could be the reason for this? Is it an improper selinux config?
Am I missing an selinux option? Do I have to switch mta client?
AckderIII
(21 rep)
Apr 10, 2025, 09:17 PM
• Last activity: Apr 11, 2025, 11:55 AM
1
votes
3
answers
1472
views
can we automatically wait the required time for smartmontools/smartctl?
Can we do something like this in a script (preferably zsh): smartctl -t long /dev/sda smartctl -t long /dev/sdb smartctl -t long /dev/sdc [Wait however long smartctl needs] smartctl -H /dev/sda smartctl -H /dev/sdb smartctl -H /dev/sdc As is obvious I'm just trying to automate this.
Can we do something like this in a script (preferably zsh):
smartctl -t long /dev/sda
smartctl -t long /dev/sdb
smartctl -t long /dev/sdc
[Wait however long smartctl needs]
smartctl -H /dev/sda
smartctl -H /dev/sdb
smartctl -H /dev/sdc
As is obvious I'm just trying to automate this.
Ray Andrews
(2615 rep)
Aug 8, 2017, 12:37 AM
• Last activity: Nov 9, 2024, 11:05 AM
4
votes
2
answers
1270
views
smartctl lies that NVME has lifespan of ~2800TBW? What is the real lifespan of my NVME?
`smartctl -x` on my Samsung SSD 860 EVO M.2 2TB shows: ``` Device Statistics (GP Log 0x04) Page Offset Size Value Flags Description 0x01 ===== = = === == General Statistics (rev 1) == 0x01 0x008 4 1132 --- Lifetime Power-On Resets 0x01 0x010 4 6584 --- Power-on Hours 0x01 0x018 6 59675855461 --- Log...
smartctl -x
on my Samsung SSD 860 EVO M.2 2TB shows:
Device Statistics (GP Log 0x04)
Page Offset Size Value Flags Description
0x01 ===== = = === == General Statistics (rev 1) ==
0x01 0x008 4 1132 --- Lifetime Power-On Resets
0x01 0x010 4 6584 --- Power-on Hours
0x01 0x018 6 59675855461 --- Logical Sectors Written
0x01 0x020 6 1711777462 --- Number of Write Commands
0x01 0x028 6 51882440157 --- Logical Sectors Read
0x01 0x030 6 1869976194 --- Number of Read Commands
0x01 0x038 6 293000 --- Date and Time TimeStamp
0x04 ===== = = === == General Errors Statistics (rev 1) ==
0x04 0x008 4 0 --- Number of Reported Uncorrectable Errors
0x04 0x010 4 97 --- Resets Between Cmd Acceptance and Completion
0x05 ===== = = === == Temperature Statistics (rev 1) ==
0x05 0x008 1 40 --- Current Temperature
0x05 0x020 1 64 --- Highest Temperature
0x05 0x028 1 18 --- Lowest Temperature
0x05 0x058 1 70 --- Specified Maximum Operating Temperature
0x06 ===== = = === == Transport Statistics (rev 1) ==
0x06 0x008 4 20530 --- Number of Hardware Resets
0x06 0x010 4 0 --- Number of ASR Events
0x06 0x018 4 0 --- Number of Interface CRC Errors
0x07 ===== = = === == Solid State Device Statistics (rev 1) ==
0x07 0x008 1 1 N-- Percentage Used Endurance Indicator
|||_ C monitored condition met
||__ D supports DSN
|___ N normalized value
A paltry 28TB written sounds a little low for the past year I've had this NVME but it's believable. However, the Percentaged Used Endurance Indicator is only at 1%. That would suggest there's still around 100x that or 2800TBW left in this device, which is more than twice the rated 1200TBW so it can't be a rounding error.
Is smartctl lying? (Not that it would lie; I mean, is my NVME lying to smartctl, is smartctl misinterpreting my NVME, etc and etc?) How do I find out the real TBW life remaining in my NVME for sure?
Jack G
(269 rep)
Jul 18, 2024, 01:33 AM
• Last activity: Jul 18, 2024, 01:42 PM
1
votes
2
answers
752
views
Disable automatic S.M.A.R.T. tests
I have a small (single-board) Zimaboard server that is running Debian 12 (bookworm) 24/7 in my bedroom. This server has a single HDD hooked-up, which remains unmounted and in sleep mode except for a daily back-up cycle (after which it is unmounted and goes back to sleep). Unfortunately, every Monday...
I have a small (single-board) Zimaboard server that is running Debian 12 (bookworm) 24/7 in my bedroom. This server has a single HDD hooked-up, which remains unmounted and in sleep mode except for a daily back-up cycle (after which it is unmounted and goes back to sleep).
Unfortunately, every Monday at 00:45 AM the server decides to wake up the HDD (and myself...) to execute what sounds like a short S.M.A.R.T test. I then have to grab my phone and issue a sleep command to stop the HDD from making its typical HDD noises afterwards (humming, occasional clicks, ...). As you can imagine, this is incredibly annoying, so I want to fix it.
I first looked for
crontab
schedules (executed as user or root), but I didn't see anything relevant. journalctl --since ... --until ...
didn't report anything useful either. The only uncommented line in /etc/smartd.conf
says: DEVICESCAN -d removable -n standby -m root -M exec /usr/share/smartmontools/smartd-runner
. I don't see anything there that could point to a weekly maintenance schedule.
Is there any way to find out what triggered the S.M.A.R.T. test (or similar)? Where do I look for log entries? And how do I prevent it from executing in the middle of the night?
MPA
(113 rep)
May 4, 2024, 12:40 PM
• Last activity: May 6, 2024, 07:51 PM
2
votes
0
answers
490
views
Are SMART offline data collection and offline attributes obsolete?
**TLDR;** I tried to understand the difference between SMART `Offline` and `Always` attributes, and thus to understand what SMART offline data collection is, and if I should enable it on my HDDs. `smartmontools`' [official wiki][1] states that : > Note that the SMART automatic offline test command i...
**TLDR;**
I tried to understand the difference between SMART
Offline
and Always
attributes, and thus to understand what SMART offline data collection is, and if I should enable it on my HDDs.
smartmontools
' official wiki states that :
> Note that the SMART automatic offline test command is listed as Obsolete in every version of the ATA and ATA/ATAPI Specifications. (...) However it is implemented and used by many vendors.
After some extensive reading on the web, and also some tests, the conclusion I reached is:
- Nowadays SMART offline data collection is obsolete
- All data is updated in real time (e.g. Offline
and Always
attributes behave the same way)
- There is no need to enable "Auto Offline Data Collection" (# smartctl --offlineauto=on /dev/sda
), nor to ever launch it manually (# smartctl -t offline /dev/sda
).
- As for the reason all this offline stuff is still in smartmontools
, it's probably to keep it compatible with some very old HDDs that indeed implemented real offline attributes.
Am I right ? Or do I miss something ?
----------
**MORE DETAILS**
I did some tests on a HDD, which has 3 offline attributes (and has Auto Offline Data Collection
disabled):
# smartctl -a /dev/sda
(...)
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
(...)
240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 235 (114 97 0)
241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 13381561756
242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 20472945077
(...)
I then wrote some data on that drive, and noted that all 3 attributes were updated in real time. Thus, they are in fact online (or Always
) attributes, and not Offline
ones.
I did the same test on a few other HDDs, the behavior was identical.
ChennyStar
(1969 rep)
Nov 6, 2023, 04:55 PM
• Last activity: Nov 6, 2023, 05:01 PM
0
votes
0
answers
871
views
`smartctl` and `smartd` commands not working
I have been receiving hard disk warnings from the smartd daemon for a while now (every 24 hours), saying that my error logs have increased. I have been trying to examine this by checking my log files, but I haven't found anything. The error logs are for my hard disk `/dev/nvme0`. I've been trying to...
I have been receiving hard disk warnings from the smartd daemon for a while now (every 24 hours), saying that my error logs have increased. I have been trying to examine this by checking my log files, but I haven't found anything. The error logs are for my hard disk
/dev/nvme0
. I've been trying to find out what this problem is and have been trying to scan my hard disk using smartctl
or smartd
command, but I keep getting back command not found
. I have the packages installed and I even can open the manual pages; man smartctl
and man smartd
. How can I resolve this issue?
The warning I receive is the following:

tobibox
(1 rep)
Oct 23, 2023, 09:30 AM
1
votes
1
answers
232
views
SMART error (CurrentPendingSector) and (OfflineUncorrectableSector)
I have been receiving the following error messages every day for several months now, and I do not know how to stop receiving these messages. `CurrentPendingSector` ``` This message was generated by the smartd daemon running on: host name: myhost DNS domain: [Empty] The following warning/error was lo...
I have been receiving the following error messages every day for several months now, and I do not know how to stop receiving these messages.
CurrentPendingSector
This message was generated by the smartd daemon running on:
host name: myhost
DNS domain: [Empty]
The following warning/error was logged by the smartd daemon:
Device: /dev/sda [SAT], 6 Currently unreadable (pending) sectors
Device info:
KingFast, S/N:03112222C0002, FW:U0803A0, 256 GB
For details see host's SYSLOG.
You can also use the smartctl utility for further investigation.
The original message about this issue was sent at Fri Feb 3 19:41:29 2023 PST
Another message will be sent in 24 hours if the problem persists.
OfflineUncorrectableSector
This message was generated by the smartd daemon running on:
host name: myhost
DNS domain: [Empty]
The following warning/error was logged by the smartd daemon:
Device: /dev/sda [SAT], 3 Offline uncorrectable sectors
Device info:
KingFast, S/N:03112222C0002, FW:U0803A0, 256 GB
For details see host's SYSLOG.
You can also use the smartctl utility for further investigation.
The original message about this issue was sent at Fri Feb 3 19:41:29 2023 PST
Another message will be sent in 24 hours if the problem persists.
smartctl -a /dev/sda
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.19.0-46-generic] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Device Model: KingFast
Serial Number: 03112222C0002
Firmware Version: U0803A0
User Capacity: 256,060,514,304 bytes [256 GB]
Sector Size: 512 bytes logical/physical
Rotation Rate: Solid State Device
Form Factor: 2.5 inches
TRIM Command: Available
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: ACS-2 T13/2015-D revision 3
SATA Version is: SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Sat Jul 8 15:44:59 2023 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x02) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 120) seconds.
Offline data collection
capabilities: (0x11) SMART execute Offline immediate.
No Auto Offline data collection support.
Suspend Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
No Selective Self-test supported.
SMART capabilities: (0x0002) Does not save SMART data before
entering power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 10) minutes.
SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x0032 100 100 050 Old_age Always - 0
5 Reallocated_Sector_Ct 0x0032 100 100 050 Old_age Always - 6
9 Power_On_Hours 0x0032 100 100 050 Old_age Always - 3335
12 Power_Cycle_Count 0x0032 100 100 050 Old_age Always - 440
160 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 3
161 Unknown_Attribute 0x0033 100 100 050 Pre-fail Always - 86
163 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 26
164 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 79004
165 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 481
166 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 6
167 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 114
168 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 5050
169 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 98
175 Program_Fail_Count_Chip 0x0032 100 100 050 Old_age Always - 0
176 Erase_Fail_Count_Chip 0x0032 100 100 050 Old_age Always - 0
177 Wear_Leveling_Count 0x0032 100 100 050 Old_age Always - 0
178 Used_Rsvd_Blk_Cnt_Chip 0x0032 100 100 050 Old_age Always - 6
181 Program_Fail_Cnt_Total 0x0032 100 100 050 Old_age Always - 0
182 Erase_Fail_Count_Total 0x0032 100 100 050 Old_age Always - 0
192 Power-Off_Retract_Count 0x0032 100 100 050 Old_age Always - 88
194 Temperature_Celsius 0x0022 100 100 050 Old_age Always - 35
195 Hardware_ECC_Recovered 0x0032 100 100 050 Old_age Always - 0
196 Reallocated_Event_Count 0x0032 100 100 050 Old_age Always - 3
197 Current_Pending_Sector 0x0032 100 100 050 Old_age Always - 6
198 Offline_Uncorrectable 0x0032 100 100 050 Old_age Always - 3
199 UDMA_CRC_Error_Count 0x0032 100 100 050 Old_age Always - 0
232 Available_Reservd_Space 0x0032 100 100 050 Old_age Always - 86
241 Total_LBAs_Written 0x0030 100 100 050 Old_age Offline - 168900
242 Total_LBAs_Read 0x0030 100 100 050 Old_age Offline - 815543
245 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 191939
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 3329 -
# 2 Short offline Completed without error 00% 3325 -
# 3 Short offline Completed without error 00% 3321 -
# 4 Short offline Completed without error 00% 3313 -
# 5 Short offline Completed without error 00% 3309 -
# 6 Short offline Completed without error 00% 3306 -
# 7 Extended offline Completed without error 00% 3250 -
# 8 Extended offline Completed without error 00% 3232 -
# 9 Extended offline Completed without error 00% 3229 -
#10 Extended offline Completed without error 00% 976 -
#11 Extended offline Completed without error 00% 968 -
Selective Self-tests/Logging not supported
I have tried to ignore the 197
and 198
errors in /etc/smartd.conf
with
/dev/sda -d removable -n standby -H -l error -l selftest -f -t -I 197 -I 198 -s (S/../.././(01|09|17)|L/../../3/11) -m root -M exec /usr/share/smartmontools/smartd-runner
to no avail.
I also do not see any LBA_of_first_error
in the self-test section.
To me, it appears that `SMART overall-health self-assessment test result: PASSED
` is healthy and the self-tests return no errors. My current understanding is that the disk appears to be healthy but is still sending these messages erroneously.
Is there something that I'm missing?
The /dev/sda
drive is a KingFast 256 GB SSD, and I'm not sure if this would be relevant as I could not find anything online for this particular drive or manufacturer.
How would I be able to stop receiving these messages but still have SMART monitoring for other genuine issues on the drive, and how would I fix the issue if this error message really does indicate some problem with the drive?
Thanks!
Edit:
After running smartctl -t long /dev/sda
, I have
smartctl -a /dev/sda
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.19.0-46-generic] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Device Model: KingFast
Serial Number: 03112222C0002
Firmware Version: U0803A0
User Capacity: 256,060,514,304 bytes [256 GB]
Sector Size: 512 bytes logical/physical
Rotation Rate: Solid State Device
Form Factor: 2.5 inches
TRIM Command: Available
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: ACS-2 T13/2015-D revision 3
SATA Version is: SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Sun Jul 9 10:05:33 2023 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x03) Offline data collection activity
is in progress.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 241) Self-test routine in progress...
10% of test remaining.
Total time to complete Offline
data collection: ( 600) seconds.
Offline data collection
capabilities: (0x11) SMART execute Offline immediate.
No Auto Offline data collection support.
Suspend Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
No Selective Self-test supported.
SMART capabilities: (0x0002) Does not save SMART data before
entering power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 10) minutes.
SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x0032 100 100 050 Old_age Always - 0
5 Reallocated_Sector_Ct 0x0032 100 100 050 Old_age Always - 6
9 Power_On_Hours 0x0032 100 100 050 Old_age Always - 3341
12 Power_Cycle_Count 0x0032 100 100 050 Old_age Always - 441
160 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 3
161 Unknown_Attribute 0x0033 100 100 050 Pre-fail Always - 86
163 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 26
164 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 79553
165 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 482
166 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 6
167 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 115
168 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 5050
169 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 98
175 Program_Fail_Count_Chip 0x0032 100 100 050 Old_age Always - 0
176 Erase_Fail_Count_Chip 0x0032 100 100 050 Old_age Always - 0
177 Wear_Leveling_Count 0x0032 100 100 050 Old_age Always - 0
178 Used_Rsvd_Blk_Cnt_Chip 0x0032 100 100 050 Old_age Always - 6
181 Program_Fail_Cnt_Total 0x0032 100 100 050 Old_age Always - 0
182 Erase_Fail_Count_Total 0x0032 100 100 050 Old_age Always - 0
192 Power-Off_Retract_Count 0x0032 100 100 050 Old_age Always - 88
194 Temperature_Celsius 0x0022 100 100 050 Old_age Always - 46
195 Hardware_ECC_Recovered 0x0032 100 100 050 Old_age Always - 0
196 Reallocated_Event_Count 0x0032 100 100 050 Old_age Always - 3
197 Current_Pending_Sector 0x0032 100 100 050 Old_age Always - 6
198 Offline_Uncorrectable 0x0032 100 100 050 Old_age Always - 3
199 UDMA_CRC_Error_Count 0x0032 100 100 050 Old_age Always - 0
232 Available_Reservd_Space 0x0032 100 100 050 Old_age Always - 86
241 Total_LBAs_Written 0x0030 100 100 050 Old_age Offline - 170468
242 Total_LBAs_Read 0x0030 100 100 050 Old_age Offline - 815560
245 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 193199
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 3337 -
# 2 Short offline Completed without error 00% 3329 -
# 3 Short offline Completed without error 00% 3325 -
# 4 Short offline Completed without error 00% 3321 -
# 5 Short offline Completed without error 00% 3313 -
# 6 Short offline Completed without error 00% 3309 -
# 7 Short offline Completed without error 00% 3306 -
# 8 Extended offline Completed without error 00% 3250 -
# 9 Extended offline Completed without error 00% 3232 -
#10 Extended offline Completed without error 00% 3229 -
#11 Extended offline Completed without error 00% 976 -
#12 Extended offline Completed without error 00% 968 -
Selective Self-tests/Logging not supported
The #12 Extended offline test Completed without error
, so I'm not really sure what I'm supposed to do from here.
**Edit #2:**
I have also run the following which I believe indicate that there are no errors with the drive:
badblocks -sv /dev/sda
Checking blocks 0 to 250059095
Checking for bad blocks (read-only test): done
Pass completed, 0 bad blocks found. (0/0/0 errors)
dd if=/dev/sda of=/dev/null bs=64K conv=noerror
3907173+1 records in
3907173+1 records out
256060514304 bytes (256 GB, 238 GiB) copied, 485.648 s, 527 MB/s
jameszp
(93 rep)
Jul 9, 2023, 03:04 AM
• Last activity: Jul 18, 2023, 06:01 PM
2
votes
2
answers
999
views
smartd output to screen, not email
I'm trying to get smartd working; it is determined that messages will be sent via email via 'mail' which I've never used. But I recall that a few years ago I had smartd sending it's warnings directly to the screen via a popup text box. I'm trying to figure out how to do that again. The info for the...
I'm trying to get smartd working; it is determined that messages will be sent via email via 'mail' which I've never used. But I recall that a few years ago I had smartd sending it's warnings directly to the screen via a popup text box. I'm trying to figure out how to do that again. The info for the 'screen' command baffles me. tmux likewise. Or I suppose it could be a notification. When I have a few weeks to study it, I'll get 'mail' working but for now I'd prefer a popup message anyway.
==================================================
In 'smartd.conf':
DEVICESCAN -a -m -M exec notify -M test
... Ok, added full path, much better:
DEVICESCAN -a -m -M exec /bin/notify -M test
... 'notify' runs fine from CLI, is an executable script:
/bin/notify-send "$(systemctl status smartd)"
... but although:
systemctl restart smartd; systemctl status smartd
... reports no errors, I get no 'test' result.
BTW, so far no results at all using those variables you mentioned.
...
$ smartd ... shows two notifications, one for each of my two disks! So why does 'systemctl restart smartd' show nothing?
Ray Andrews
(2615 rep)
Oct 17, 2022, 01:56 PM
• Last activity: Oct 17, 2022, 10:08 PM
1
votes
2
answers
1479
views
Why smartd need database?
Do smartd need the database? or smartctl needs the database? I saw smart tool github keep updating database: https://github.com/smartmontools/smartmontools/labels/drivedb In my understanding, smartd will scan all disks then why does it need a database? what's the function/purpose to use a database i...
Do smartd need the database?
or
smartctl needs the database?
I saw smart tool github keep updating database:
https://github.com/smartmontools/smartmontools/labels/drivedb
In my understanding, smartd will scan all disks then why does it need a database? what's the function/purpose to use a database in smartd/smartctl?
Mark K
(955 rep)
Aug 12, 2022, 03:32 AM
• Last activity: Aug 12, 2022, 11:52 AM
0
votes
0
answers
581
views
Why smartd can't find my nvme device
``` greentea smartd[1147]: Configuration file /etc/smartd.conf was parsed, found DEVICESCAN, scanning devices greentea smartd[1147]: Problem creating device name scan list greentea smartd[1147]: In the system's table of devices NO devices found to scan greentea smartd[1147]: Unable to monitor any SM...
greentea smartd: Configuration file /etc/smartd.conf was parsed, found DEVICESCAN, scanning devices
greentea smartd: Problem creating device name scan list
greentea smartd: In the system's table of devices NO devices found to scan
greentea smartd: Unable to monitor any SMART enabled devices. Try debug (-d) option. Exiting...
My smartd.conf
has no change and has this line:
DEVICESCAN
I had changed the DEVICESCAN
to /dev/nvme0n1
still got the same error.
I tested on Debian 9.
Mark K
(955 rep)
Jul 13, 2022, 07:37 AM
• Last activity: Jul 13, 2022, 09:53 AM
1
votes
1
answers
1322
views
Smartd how to wake disks only for scheduled scans
I am using smartd to monitor my disks in Ubuntu and have it configured to run a short scan daily at 2am and a long test 3am every Saturday morning: `/dev/sda -a -n standby -o on -S on -s (S/../.././02|L/../../6/03) -m name@mail.com` I understand that smartd periodically polls the disks (every 30mins...
I am using smartd to monitor my disks in Ubuntu and have it configured to run a short scan daily at 2am and a long test 3am every Saturday morning:
/dev/sda -a -n standby -o on -S on -s (S/../.././02|L/../../6/03) -m name@mail.com
I understand that smartd periodically polls the disks (every 30mins?) causing them to wakeup if in stand by, hence I have added the -n standby
flag in the above config. However, this also stops the scheduled scans from running if the disk is in standby.
Is there a way to force the scheduled scans to start at the given times and wake the disks if needed, but stop the periodic polling form waking the disks ?
Dibly
(11 rep)
Jun 8, 2022, 09:24 AM
• Last activity: Jun 8, 2022, 09:41 AM
1
votes
0
answers
287
views
SmartMonTools: tests get cancelled without any traces
I am testing hard disks with [SmartMonTools][1] under Ubuntu 20.04. Tests for some hard disks are not working - they disappear without leaving any warnings or errors. Hard drive status before the test (note the time **Sun Mar 13 16:25:12 2022 UTC**): smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.0...
I am testing hard disks with SmartMonTools under Ubuntu 20.04.
Tests for some hard disks are not working - they disappear without leaving any warnings or errors.
Hard drive status before the test (note the time **Sun Mar 13 16:25:12 2022 UTC**):
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.0-104-generic] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Device Model: MB4000GDUPB
Serial Number: 26F5K1J3F17A
LU WWN Device Id: 5 000039 6db900727
Firmware Version: HPG3
User Capacity: 4,000,787,030,016 bytes [4.00 TB]
Sector Size: 512 bytes logical/physical
Rotation Rate: 7200 rpm
Form Factor: 3.5 inches
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: ACS-2, ATA8-ACS T13/1699-D revision 6
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is: Sun Mar 13 16:25:12 2022 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 120) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 532) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x0025) SCT Status supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 100 100 050 Pre-fail Always - 0
2 Throughput_Performance 0x0007 100 100 050 Pre-fail Always - 0
3 Spin_Up_Time 0x0003 100 100 002 Pre-fail Always - 11957
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 100 100 050 Pre-fail Always - 0
8 Seek_Time_Performance 0x0005 100 100 050 Pre-fail Offline - 0
9 Power_On_Hours 0x0032 001 001 000 Old_age Always - 41134
10 Spin_Retry_Count 0x0013 105 100 030 Pre-fail Always - 0
180 Unknown_HDD_Attribute 0x003b 100 100 001 Pre-fail Always - 0
194 Temperature_Celsius 0x0022 100 100 000 Old_age Always - 34 (Min/Max 8/58)
196 Reallocated_Event_Count 0x0033 100 100 010 Pre-fail Always - 0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 41109 -
# 2 Short offline Completed without error 00% 41038 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
Begin the **long** test:
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.0-104-generic] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Extended self-test routine immediately in off-line mode".
Drive command "Execute SMART Extended self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 532 minutes for test to complete.
Test will complete after Mon Mar 14 01:17:34 2022 UTC
Use smartctl -X to abort test.
Check the test status - test is in progress (time **Sun Mar 13 16:26:05 2022 UTC**):
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.0-104-generic] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Device Model: MB4000GDUPB
Serial Number: 26F5K1J3F17A
LU WWN Device Id: 5 000039 6db900727
Firmware Version: HPG3
User Capacity: 4,000,787,030,016 bytes [4.00 TB]
Sector Size: 512 bytes logical/physical
Rotation Rate: 7200 rpm
Form Factor: 3.5 inches
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: ACS-2, ATA8-ACS T13/1699-D revision 6
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is: Sun Mar 13 16:26:05 2022 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 249) Self-test routine in progress...
90% of test remaining.
Total time to complete Offline
data collection: ( 120) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 532) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x0025) SCT Status supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 100 100 050 Pre-fail Always - 0
2 Throughput_Performance 0x0007 100 100 050 Pre-fail Always - 0
3 Spin_Up_Time 0x0003 100 100 002 Pre-fail Always - 11957
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 100 100 050 Pre-fail Always - 0
8 Seek_Time_Performance 0x0005 100 100 050 Pre-fail Offline - 0
9 Power_On_Hours 0x0032 001 001 000 Old_age Always - 41134
10 Spin_Retry_Count 0x0013 105 100 030 Pre-fail Always - 0
180 Unknown_HDD_Attribute 0x003b 100 100 001 Pre-fail Always - 0
194 Temperature_Celsius 0x0022 100 100 000 Old_age Always - 36 (Min/Max 8/58)
196 Reallocated_Event_Count 0x0033 100 100 010 Pre-fail Always - 0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Self-test routine in progress 90% 41134 -
# 2 Extended offline Completed without error 00% 41109 -
# 3 Short offline Completed without error 00% 41038 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
Check the test progress again - **no tests running** (time **Sun Mar 13 16:26:46 2022 UTC**):
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.0-104-generic] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Device Model: MB4000GDUPB
Serial Number: 26F5K1J3F17A
LU WWN Device Id: 5 000039 6db900727
Firmware Version: HPG3
User Capacity: 4,000,787,030,016 bytes [4.00 TB]
Sector Size: 512 bytes logical/physical
Rotation Rate: 7200 rpm
Form Factor: 3.5 inches
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: ACS-2, ATA8-ACS T13/1699-D revision 6
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is: Sun Mar 13 16:26:46 2022 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 120) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 532) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x0025) SCT Status supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 100 100 050 Pre-fail Always - 0
2 Throughput_Performance 0x0007 100 100 050 Pre-fail Always - 0
3 Spin_Up_Time 0x0003 100 100 002 Pre-fail Always - 858
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 100 100 050 Pre-fail Always - 0
8 Seek_Time_Performance 0x0005 100 100 050 Pre-fail Offline - 0
9 Power_On_Hours 0x0032 001 001 000 Old_age Always - 41134
10 Spin_Retry_Count 0x0013 105 100 030 Pre-fail Always - 0
180 Unknown_HDD_Attribute 0x003b 100 100 001 Pre-fail Always - 0
194 Temperature_Celsius 0x0022 100 100 000 Old_age Always - 32 (Min/Max 8/58)
196 Reallocated_Event_Count 0x0033 100 100 010 Pre-fail Always - 0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 41109 -
# 2 Short offline Completed without error 00% 41038 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
In short:
- Sun Mar 13 16:25:12 2022 UTC: no tests running.
- Sun Mar 13 16:26:05 2022 UTC: long test running.
- Sun Mar 13 16:26:46 2022 UTC: no tests running, no test results recorded.
How can I find out **why those tests get cancelled - any logs available**???
Andriy
(111 rep)
Mar 13, 2022, 05:02 PM
0
votes
1
answers
3897
views
Is my hard drive failing? / Need help with smartctl -a output
I have an old Seagate 4TB internal drive from a crapped out pc that I was planning to repurpose as a spare drive for gaming. Figured I'd run some smartctl scans on it first just to be safe so I did `smartctl -t short /dev/sdb` and got back results. They looked ok to me bc I didn't see anything liste...
I have an old Seagate 4TB internal drive from a crapped out pc that I was planning to repurpose as a spare drive for gaming.
Figured I'd run some smartctl scans on it first just to be safe so I did
smartctl -t short /dev/sdb
and got back results. They looked ok to me bc I didn't see anything listed in the 'WHEN_FAILED' column (and originally I had been mostly concerned with the temperature related errors). But then I saw [an article from 2018](https://harddrivegeek.com/current-pending-sector-count/) mentioning that 'Current_Pending_Sector' is pretty serious... And mine is not zero... And I did have some errors besides... Since I can't really make sense of whether or not to be concerned about them, I figured I'd try SE.
My best guess so far is that I shouldn't put anything critical on it but that it might be fine to use for games if I symlink the save folders so they exist somewhere else (on a drive with better smart results) and don't mind re-downloading the installed game in the event of drive failure. Also not sure if the 'READ DMA EXT' errors are indications of imminent failure or if that could be a cable or other one-time event (I can only see errors 35-39 and they all occurred at "16936 hours"... not sure if there's a way to see all of the errors or if literally only the last 5 are stored like it says). OTOH, I didn't have any issues mounting it or copying data off it (it was a relative's and they didn't want it anymore; just some pics/videos off it).
If there's at least decent odds that the drive might have some life left, I don't mind chancing it for less important stuff. But it is highly likely to fail in the near future, I'd prefer not to waste any time with it for anything but acquiring a new magnet :-) Any advice / recommendations?
Anyway, I reran with smartctl -t long /dev/sdb
waited till the next day and ran smartctl -a /dev/sdb
. Here are the results for that:
I_AM_ROOT@fedora35:~
# smartctl -a /dev/sdb
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.7-200.fc35.x86_64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Seagate Desktop HDD.15
Device Model: ST4000DM000-1F2168
Serial Number:
LU WWN Device Id:
Firmware Version: CC54
User Capacity: 4,000,787,030,016 bytes [4.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5900 rpm
Form Factor: 3.5 inches
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2, ACS-3 T13/2161-D revision 3b
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Fri Dec 17 11:44:49 2021 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 118) The previous self-test completed having
the read element of the test failed.
Total time to complete Offline
data collection: ( 168) seconds.
Offline data collection
capabilities: (0x73) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 528) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x1085) SCT Status supported.
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 119 099 006 Pre-fail Always - 233492808
3 Spin_Up_Time 0x0003 092 091 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 099 099 020 Old_age Always - 1890
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 044 039 030 Pre-fail Always - 678608011490
9 Power_On_Hours 0x0032 065 065 000 Old_age Always - 30836
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 099 099 020 Old_age Always - 1206
183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0
184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 061 061 000 Old_age Always - 39
188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0 0 0
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 071 058 045 Old_age Always - 29 (Min/Max 27/32)
191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 304
193 Load_Cycle_Count 0x0032 084 084 000 Old_age Always - 32204
194 Temperature_Celsius 0x0022 029 042 000 Old_age Always - 29 (0 12 0 0 0)
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 16
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 16
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 23293h+16m+41.533s
241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 19236444339
242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 27220280383
SMART Error Log Version: 1
ATA Error Count: 39 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 39 occurred at disk power-on lifetime: 16936 hours (705 days + 16 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 08 ff ff ff ef 00 04:59:22.764 READ DMA EXT
25 00 40 ff ff ff ef 00 04:59:22.762 READ DMA EXT
25 00 00 ff ff ff ef 00 04:59:22.736 READ DMA EXT
25 00 08 ff ff ff ef 00 04:59:22.735 READ DMA EXT
ef 10 02 00 00 00 a0 00 04:59:22.735 SET FEATURES [Enable SATA feature]
Error 38 occurred at disk power-on lifetime: 16936 hours (705 days + 16 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 08 ff ff ff ef 00 04:59:18.709 READ DMA EXT
25 00 00 ff ff ff ef 00 04:59:18.696 READ DMA EXT
25 00 00 ff ff ff ef 00 04:59:18.693 READ DMA EXT
25 00 00 ff ff ff ef 00 04:59:18.631 READ DMA EXT
25 00 08 ff ff ff ef 00 04:59:18.631 READ DMA EXT
Error 37 occurred at disk power-on lifetime: 16936 hours (705 days + 16 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 08 ff ff ff ef 00 04:57:53.914 READ DMA EXT
25 00 08 ff ff ff ef 00 04:57:53.914 READ DMA EXT
25 00 00 ff ff ff ef 00 04:57:53.882 READ DMA EXT
ef 10 02 00 00 00 a0 00 04:57:53.881 SET FEATURES [Enable SATA feature]
27 00 00 00 00 00 e0 00 04:57:53.881 READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
Error 36 occurred at disk power-on lifetime: 16936 hours (705 days + 16 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 08 ff ff ff ef 00 04:57:49.903 READ DMA EXT
25 00 08 ff ff ff ef 00 04:57:49.903 READ DMA EXT
25 00 08 ff ff ff ef 00 04:57:49.903 READ DMA EXT
25 00 08 ff ff ff ef 00 04:57:49.903 READ DMA EXT
25 00 08 ff ff ff ef 00 04:57:49.903 READ DMA EXT
Error 35 occurred at disk power-on lifetime: 16936 hours (705 days + 16 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 00 ff ff ff ef 00 04:57:45.210 READ DMA EXT
25 00 00 ff ff ff ef 00 04:57:45.181 READ DMA EXT
25 00 00 ff ff ff ef 00 04:57:45.179 READ DMA EXT
25 00 00 ff ff ff ef 00 04:57:45.178 READ DMA EXT
25 00 58 ff ff ff ef 00 04:57:45.149 READ DMA EXT
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed: read failure 60% 30817 3723785408
# 2 Short offline Completed without error 00% 30812 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
zpangwin
(1061 rep)
Dec 17, 2021, 05:15 PM
• Last activity: Dec 17, 2021, 07:10 PM
2
votes
1
answers
5551
views
Best practices to enable SMART disk notifications on a Linux workstation?
I enabled SMART notifications on my laptop running Debian. Basically I just want to get a notification pop up when something goes wrong on a disk. I don't want to get an email, I think a notification is better indicated on the workstation where I spend my days (while emails are off course better for...
I enabled SMART notifications on my laptop running Debian. Basically I just want to get a notification pop up when something goes wrong on a disk. I don't want to get an email, I think a notification is better indicated on the workstation where I spend my days (while emails are off course better for servers).
It works, I even tested it (but what exactly did I test ?), but I still have doubts if I did it the right way, and if what I did is really useful.
Basically, what I did :
1. I installed
smartmontools
and smart-notifier
# apt-get install smartmontools smart-notifier
2. I then configured the smartd
daemon to monitor /dev/sda
and send its messages to the notifier. This is done in /etc/smartd.conf
, in which I have only 1 line :
/dev/sda -a -m myUsername -M exec /usr/share/smartmontools/smartd-runner -M test
3. The -M test
option in the previous command displays a test notification popup as soon as I restart the smartd
daemon (you have to log out and log back in in order for it to work). And it works, restarting the smartd
daemon displays the test notification popup.
4. And finally I removed the -M test
option and restarted smartd
again.
So, can I be at ease now ? Will this setup send me a popup as soon as something goes wrong with /dev/sda
? I have a lot of unanswered questions :
1. With the -M test
option, the test notification popup is only displayed when I restart smartd
. Nothing is displayed when I restart my laptop and log in (probably because smartd
is already running at that point). Can I be confident that a notification will pop up if smartd
detects something wrong on my disks ?
2. What event exactly will trigger that pop up ? In other words, what is "something wrong" ? $ man smartd
states that :
> smartd will attempt to enable SMART monitoring on ATA devices (equivalent to smartctl -s on) and polls these and SCSI devices every 30 minutes (configurable), logging SMART errors and changes of SMART Attributes via the SYSLOG interface.
And indeed, checking /var/log/syslog
I can see a log entry after 30 minutes (last line) :
Jul 30 13:17:06 precision7520 smartd: smartd 6.6 2016-05-31 r4324 [x86_64-linux-4.19.0-0.bpo.5-amd64] (local build)
Jul 30 13:17:06 precision7520 smartd: Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
Jul 30 13:17:06 precision7520 smartd: Opened configuration file /etc/smartd.conf
Jul 30 13:17:06 precision7520 smartd: Configuration file /etc/smartd.conf parsed.
Jul 30 13:17:06 precision7520 smartd: Device: /dev/sda, type changed from 'scsi' to 'sat'
Jul 30 13:17:06 precision7520 smartd: Device: /dev/sda [SAT], opened
Jul 30 13:17:06 precision7520 smartd: Device: /dev/sda [SAT], Samsung SSD 850 EVO 2TB, S/N:S2RMNB0J801642K, WWN:5-002538-c407b1fd2, FW:EMT02B6Q, 2.00 TB
Jul 30 13:17:06 precision7520 smartd: Device: /dev/sda [SAT], not found in smartd database.
Jul 30 13:17:06 precision7520 smartd: Device: /dev/sda [SAT], can't monitor Current_Pending_Sector count - no Attribute 197
Jul 30 13:17:06 precision7520 smartd: Device: /dev/sda [SAT], can't monitor Offline_Uncorrectable count - no Attribute 198
Jul 30 13:17:06 precision7520 smartd: Device: /dev/sda [SAT], is SMART capable. Adding to "monitor" list.
Jul 30 13:17:06 precision7520 smartd: Device: /dev/sda [SAT], state read from /var/lib/smartmontools/smartd.Samsung_SSD_850_EVO_2TB-S2RMNB0J801642K.ata.state
Jul 30 13:17:06 precision7520 smartd: Monitoring 1 ATA/SATA, 0 SCSI/SAS and 0 NVMe devices
Jul 30 13:17:06 precision7520 smartd: Device: /dev/sda [SAT], state written to /var/lib/smartmontools/smartd.Samsung_SSD_850_EVO_2TB-S2RMNB0J801642K.ata.state
Jul 30 13:47:06 precision7520 smartd: Device: /dev/sda [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 67 to 68
But no pop up. Maybe because the log entry was just a minor information (a 1 degree temperature increase) ? But then, what kind of event exactly is supposed to trigger the notification ?
3. Finally, there are a lot of examples in /etc/smartd.conf
, with even more in $ man smartd.conf
, some performing (-s
) short (-s S
) or extended (-s L
) self tests at given intervals. Are those self tests necessary ? Isn't SMART supposed to integrate its own self test procedures (the SM of SMART stands for Self-Monitoring) ? How useful are results without performing self tests ?
For information, my # smartctl /dev/sda
results :
$ sudo smartctl -a /dev/sda
smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.19.0-0.bpo.5-amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Device Model: Samsung SSD 850 EVO 2TB
Serial Number: S2RMNB0J801642K
LU WWN Device Id: 5 002538 c407b1fd2
Firmware Version: EMT02B6Q
User Capacity: 2 000 398 934 016 bytes [2,00 TB]
Sector Size: 512 bytes logical/physical
Rotation Rate: Solid State Device
Form Factor: 2.5 inches
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: ACS-2, ATA8-ACS T13/1699-D revision 4c
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Fri Jul 30 14:15:22 2021 WAT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
(...)
No self test seems to be ever performed :
(...)
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 0) seconds.
Offline data collection
capabilities: (0x53) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 265) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
(...)
Are these data of any use, even without self-tests ?
(...)
SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0
9 Power_On_Hours 0x0032 094 094 000 Old_age Always - 27805
12 Power_Cycle_Count 0x0032 098 098 000 Old_age Always - 1055
177 Wear_Leveling_Count 0x0013 099 099 000 Pre-fail Always - 21
179 Used_Rsvd_Blk_Cnt_Tot 0x0013 100 100 010 Pre-fail Always - 0
181 Program_Fail_Cnt_Total 0x0032 100 100 010 Old_age Always - 0
182 Erase_Fail_Count_Total 0x0032 100 100 010 Old_age Always - 0
183 Runtime_Bad_Block 0x0013 100 099 010 Pre-fail Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0032 067 043 000 Old_age Always - 33
195 Hardware_ECC_Recovered 0x001a 200 200 000 Old_age Always - 0
199 UDMA_CRC_Error_Count 0x003e 100 100 000 Old_age Always - 0
235 Unknown_Attribute 0x0012 099 099 000 Old_age Always - 71
241 Total_LBAs_Written 0x0032 099 099 000 Old_age Always - 26330052507
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 14903 -
# 2 Short offline Completed without error 00% 14709 -
# 3 Short offline Aborted by host 70% 2733 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
255 0 65535 Read_scanning was never started
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
A lot of question, but basically they all boil down to one : what are the best practices to enable SMART disk notifications on a Linux workstation ? I was kind of surprised that googling this question didn't provide any useful informations
ChennyStar
(1969 rep)
Jul 30, 2021, 01:29 PM
• Last activity: Jul 31, 2021, 02:39 AM
0
votes
1
answers
386
views
NAS server smartd.conf assistance
I want to perform the following using smartd: Run short smartctl test once a week. Run long smartctl test once a month. Get the results for each run on mail. I tried to read the 'man' page for both smartd and smartd.conf (https://linux.die.net/man/5/smartd.conf), but I just can't seem to understand...
I want to perform the following using smartd:
Run short smartctl test once a week.
Run long smartctl test once a month.
Get the results for each run on mail.
I tried to read the 'man' page for both smartd and smartd.conf (https://linux.die.net/man/5/smartd.conf) , but I just can't seem to understand it. Maybe I'm just dumb, but I can't understand anything from their examples...
E.g:
#
# Three disks connected to a MegaRAID controller
# Start short self-tests daily between 1-2, 2-3, and
# 3-4 am.
/dev/sda -d megaraid,0 -a -s S/../.././01
/dev/sda -d megaraid,1 -a -s S/../.././02
/dev/sda -d megaraid,2 -a -s S/../.././03
That doesn't make any sense to me, and I can't understand how to apply that to my use case.
Help would be appreciated. Thanks.
displainame
(3 rep)
Mar 11, 2021, 11:33 PM
• Last activity: Mar 12, 2021, 12:05 AM
1
votes
1
answers
743
views
Can I determine the real lifetime with “Error occurred at power lifetime: 19132h” and "Power_On_Hours 0h" in smartctl?
I just bought a "new" and very cheap hdd online. I used some kinda usb3.0 hdd box to connect to my PC. By running smartctl, I can see the following outputs ``` 9 Power_On_Hours 0x0012 100 100 000 Old_age Always - 0 Error 4 occurred at disk power-on lifetime: 19132 hours SMART Self-test log structure...
I just bought a "new" and very cheap hdd online.
I used some kinda usb3.0 hdd box to connect to my PC.
By running smartctl, I can see the following outputs
9 Power_On_Hours 0x0012 100 100 000 Old_age Always - 0
Error 4 occurred at disk power-on lifetime: 19132 hours
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Vendor (0xb0) Completed without error 00% 47354 -
# 2 Vendor (0x71) Completed without error 00% 47354 -
The complete output: https://hastebin.com/zafejecopu.yaml
What do those errors mean? Can I determine the real lifetime value from the smartctl output?
Thanks a lot.
sgon00
(457 rep)
Nov 16, 2020, 07:05 AM
• Last activity: Feb 24, 2021, 10:59 AM
1
votes
1
answers
642
views
Can I recover a 500GB Seagate Momentus with bad sectors?
I've received a Seagate 2.5" 5400rpm 500gb HDD that was throwing up a Boot configuration error post some windows updates. I've tried the following on it and nothing seems to work: **First step**: I tried Windows repair to re-install the bootloader but the installer wouldn't interact with this partic...
I've received a Seagate 2.5" 5400rpm 500gb HDD that was throwing up a Boot configuration error post some windows updates. I've tried the following on it and nothing seems to work:
**First step**: I tried Windows repair to re-install the bootloader but the installer wouldn't interact with this particular Disk. (disk was inserted in it's original machine)
**Second Step**: I used
to recover data from it, which I did successfully.(disk connected to my laptop through a Sata to USB adapter)
**Third Step**: I tried formatting the HDD with
but it threw a read error when trying to create partition table and exited. I was able to delete de old NTFS/FAT32 partitions but not able to create new ones.
**Fourth Step**: Started Windows installer and tried formatting from the installer but it threw an error saying it cannot format the disk. (again with the HDD in it's original machine)
After this things get weird. Some times my laptop would recognize the HDD other times not.
**Fifth Step**: I checked the disk with
and it did show some read errors in some sectors. I tried to write zeroes to those sectors, which seemed to work but not sure it did. I tried partitioning the disk but now
would throw : cannot open /dev/sdb: Input/output error
.
**Sixth Step**: I tried partitioning with
which would open up the disk and it did threw some errors to which I said ignore. After about 8 ignores for the partition table and some more for the actual partition(which I set to take up the entire disk space, from 1 to 500G) the following happen:
now sometimes shows disk
with partition
sometimes it only shows the disk, sometimes not at all.
now sometimes shows data/executes test, but more often it throws Device Identity failed: scsi error unsupported scsi opcode
.
-I /dev/sdb
throws /dev/sdb: HDIO_DRIVE_CMD(identify) failed: Invalid argument
/
throw Bad magic number in super-block while trying to open /dev/sdb1
If I disconnect the disk and reconnect it I have to recreate a partition table and re-partition in with
.
log when connecting the drive:
[ 4708.480583] sd 3:0:0:0: [sdb] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=0s
[ 4708.480592] sd 3:0:0:0: [sdb] tag#0 Sense Key : Medium Error [current]
[ 4708.480598] sd 3:0:0:0: [sdb] tag#0 Add. Sense: Unrecovered read error
[ 4708.480603] sd 3:0:0:0: [sdb] tag#0 CDB: Read(10) 28 00 00 00 00 00 00 00 01 00
[ 4708.480610] blk_update_request: critical medium error, dev sdb, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[ 4708.480617] buffer_io_error: 6 callbacks suppressed
[ 4708.480620] Buffer I/O error on dev sdb, logical block 0, async page read
[ 4708.843190] sd 3:0:0:0: [sdb] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=0s
[ 4708.843199] sd 3:0:0:0: [sdb] tag#0 Sense Key : Medium Error [current]
[ 4708.843204] sd 3:0:0:0: [sdb] tag#0 Add. Sense: Unrecovered read error
[ 4708.843210] sd 3:0:0:0: [sdb] tag#0 CDB: Read(10) 28 00 00 00 00 01 00 00 07 00
[ 4708.843216] blk_update_request: critical medium error, dev sdb, sector 1 op 0x0:(READ) flags 0x0 phys_seg 7 prio class 0
[ 4708.843223] Buffer I/O error on dev sdb, logical block 1, async page read
[ 4708.843229] Buffer I/O error on dev sdb, logical block 2, async page read
[ 4708.843232] Buffer I/O error on dev sdb, logical block 3, async page read
[ 4708.843235] Buffer I/O error on dev sdb, logical block 4, async page read
[ 4708.843238] Buffer I/O error on dev sdb, logical block 5, async page read
[ 4708.843240] Buffer I/O error on dev sdb, logical block 6, async page read
[ 4708.843244] Buffer I/O error on dev sdb, logical block 7, async page read
[ 4708.976204] sd 3:0:0:0: [sdb] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=0s
[ 4708.976212] sd 3:0:0:0: [sdb] tag#0 Sense Key : Medium Error [current]
[ 4708.976217] sd 3:0:0:0: [sdb] tag#0 Add. Sense: Unrecovered read error
[ 4708.976223] sd 3:0:0:0: [sdb] tag#0 CDB: Read(10) 28 00 00 00 00 00 00 00 01 00
[ 4708.976228] blk_update_request: critical medium error, dev sdb, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[ 4708.976235] Buffer I/O error on dev sdb, logical block 0, async page read
[ 4709.153850] sd 3:0:0:0: [sdb] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=0s
[ 4709.153860] sd 3:0:0:0: [sdb] tag#0 Sense Key : Medium Error [current]
[ 4709.153865] sd 3:0:0:0: [sdb] tag#0 Add. Sense: Unrecovered read error
[ 4709.153871] sd 3:0:0:0: [sdb] tag#0 CDB: Read(10) 28 00 00 00 00 01 00 00 07 00
[ 4709.153877] blk_update_request: critical medium error, dev sdb, sector 1 op 0x0:(READ) flags 0x0 phys_seg 7 prio class 0
[ 4709.153885] Buffer I/O error on dev sdb, logical block 1, async page read
[ 4709.320307] sd 3:0:0:0: [sdb] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=0s
[ 4709.320316] sd 3:0:0:0: [sdb] tag#0 Sense Key : Medium Error [current]
[ 4709.320321] sd 3:0:0:0: [sdb] tag#0 Add. Sense: Unrecovered read error
[ 4709.320327] sd 3:0:0:0: [sdb] tag#0 CDB: Read(10) 28 00 00 00 00 00 00 00 01 00
[ 4709.320333] blk_update_request: critical medium error, dev sdb, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[ 4709.486795] sd 3:0:0:0: [sdb] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=0s
[ 4709.486803] sd 3:0:0:0: [sdb] tag#0 Sense Key : Medium Error [current]
[ 4709.486809] sd 3:0:0:0: [sdb] tag#0 Add. Sense: Unrecovered read error
[ 4709.486814] sd 3:0:0:0: [sdb] tag#0 CDB: Read(10) 28 00 00 00 00 01 00 00 07 00
[ 4709.486820] blk_update_request: critical medium error, dev sdb, sector 1 op 0x0:(READ) flags 0x0 phys_seg 7 prio class 0
[ 4709.488688] audit: type=1106 audit(1606925626.528:133): pid=2818 uid=0 auid=1000 ses=1 msg='op=PAM:session_close grantors=pam_limits,pam_unix,pam_permit acct="root" exe="/usr/bin/sudo" hostname=? addr=? terminal=/dev/pts/2 res=success'
[ 4709.489637] audit: type=1104 audit(1606925626.528:134): pid=2818 uid=0 auid=1000 ses=1 msg='op=PAM:setcred grantors=pam_faillock,pam_permit,pam_faillock acct="root" exe="/usr/bin/sudo" hostname=? addr=? terminal=/dev/pts/2 res=success'
[ 4709.653391] sd 3:0:0:0: [sdb] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=0s
[ 4709.653395] sd 3:0:0:0: [sdb] tag#0 Sense Key : Medium Error [current]
[ 4709.653398] sd 3:0:0:0: [sdb] tag#0 Add. Sense: Unrecovered read error
[ 4709.653400] sd 3:0:0:0: [sdb] tag#0 CDB: Read(10) 28 00 00 00 00 00 00 00 01 00
[ 4709.653403] blk_update_request: critical medium error, dev sdb, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[ 4709.831007] sd 3:0:0:0: [sdb] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=0s
[ 4709.831011] sd 3:0:0:0: [sdb] tag#0 Sense Key : Medium Error [current]
[ 4709.831013] sd 3:0:0:0: [sdb] tag#0 Add. Sense: Unrecovered read error
[ 4709.831016] sd 3:0:0:0: [sdb] tag#0 CDB: Read(10) 28 00 00 00 00 01 00 00 07 00
[ 4709.831018] blk_update_request: critical medium error, dev sdb, sector 1 op 0x0:(READ) flags 0x0 phys_seg 7 prio class 0
[ 4709.997153] sd 3:0:0:0: [sdb] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=0s
[ 4709.997162] sd 3:0:0:0: [sdb] tag#0 Sense Key : Medium Error [current]
[ 4709.997167] sd 3:0:0:0: [sdb] tag#0 Add. Sense: Unrecovered read error
[ 4709.997173] sd 3:0:0:0: [sdb] tag#0 CDB: Read(10) 28 00 00 00 00 00 00 00 01 00
[ 4709.997179] blk_update_request: critical medium error, dev sdb, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[ 4710.174596] sd 3:0:0:0: [sdb] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=0s
[ 4710.174599] sd 3:0:0:0: [sdb] tag#0 Sense Key : Medium Error [current]
[ 4710.174602] sd 3:0:0:0: [sdb] tag#0 Add. Sense: Unrecovered read error
[ 4710.174604] sd 3:0:0:0: [sdb] tag#0 CDB: Read(10) 28 00 00 00 00 01 00 00 07 00
[ 4710.174607] blk_update_request: critical medium error, dev sdb, sector 1 op 0x0:(READ) flags 0x0 phys_seg 7 prio class 0
[ 4710.174653] ldm_validate_partition_table(): Disk read failed.
[ 4711.206717] sdb: unable to read partition table
I'm thinking I could write zeroes on the entire HDD but not sure if it would help. Is there any way to recover this HDD?
fluffehStack
(125 rep)
Dec 2, 2020, 04:35 PM
• Last activity: Dec 2, 2020, 06:07 PM
0
votes
1
answers
201
views
Bind smartd to user session
I want to see smartd notifications in DE (Gnome3). So I've configured smartd to execute custom script that uses notify-send to notify all logged users: **smartd.conf**: ```conf /dev/sda -m root -M test -M exec /etc/smartmontools/smartd_warning.d/notify -a -n standby,10,q ``` **smartd_warning.d/notif...
I want to see smartd notifications in DE (Gnome3). So I've configured smartd to execute custom script that uses notify-send to notify all logged users:
**smartd.conf**:
/dev/sda -m root -M test -M exec /etc/smartmontools/smartd_warning.d/notify -a -n standby,10,q
**smartd_warning.d/notify**:
#!/usr/bin/env sh
IFS=$'\n'
for LINE in w -hs
do
USER=echo $LINE | awk '{print $1}'
USER_ID=id -u $USER
DISP_ID=echo $LINE | awk '{print $8}'
sudo -u $USER DISPLAY=$DISP_ID DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/$USER_ID/bus notify-send "S.M.A.R.T Error ($SMARTD_FAILTYPE)" "$SMARTD_MESSAGE" --icon=dialog-warning
done
it works correctly only if I restart smartd
when I logged into system. Obviously it can't work on boot, because smartd
starts before any user logged into system.
[Unit]
Description=Self Monitoring and Reporting Technology (SMART) Daemon
Documentation=man:smartd(8) man:smartd.conf(5)
[Service]
Type=notify
EnvironmentFile=-/etc/sysconfig/smartmontools
ExecStart=/usr/sbin/smartd -n $smartd_opts
ExecReload=/bin/kill -HUP $MAINPID
StandardOutput=syslog
[Install]
WantedBy=multi-user.target
How can bind smartd service to user session to see those notifications?
Evan
(101 rep)
Jun 13, 2020, 05:51 PM
• Last activity: Jun 13, 2020, 10:48 PM
1
votes
0
answers
992
views
How to check for file system consistency after power outage
What can I do to check if a file system (data in files and the hardware) is intact or corrupt after a computer is shutdown abruptly due to power outage? My home desktop computer was shut down by sudden power outage. The computer automatically rebooted itself after the power is back, and I then shut...
What can I do to check if a file system (data in files and the hardware) is intact or corrupt after a computer is shutdown abruptly due to power outage?
My home desktop computer was shut down by sudden power outage. The computer automatically rebooted itself after the power is back, and I then shut it down manually in the regular way. The computer runs Ubuntu 18.10, Linux 4.18.0. It has a SSD and a HDD, where the SSD holds the boot, root, and all the essential partitions, and the HDD holds one partition for data files. I think all the file systems are
ext4
. I want to determine if there was a corruption in any file, and if the SSD or HDD had a physical damage.
I think I can use smartmontools
to check the physical damage.
I am lacking a clue about how to check the data integrity. It looks that fsck
can do some checks on the file system, but it looks I need to unmount the partition to inspect. How can I run fsck
to inspect the SSD? Can I use a USB boot stick which has Ubuntu on it?
I would appreciate any pointers.
----
**Edit**
The 'duplicate question' link nominally answered my question, and I am closing this question.
Among the several methods suggested in the link and the comments to this question, what I actually did is the following.
I booted the computer, and ran
$ sudo tune2fs -c 1 /dev/sda4
where /dev/sda4
is the SSD. Then, I rebooted. The system started up, showed something about the disk check for a few seconds, and presented the normal log-in screen.
I also did
$ sudo tune2fs -c 1 /dev/sdb1
for the HDD. Upon reboot, the start up screen showed a progress for about a minute, and then the normal log-in screen came up.
I'm not really sure if there was no error, if a problem was fixed silently, or if there was an error, but I assume the lack of explicit warning indicates that there was no error.
Thank you for the comments and the link.
norio
(225 rep)
Sep 28, 2019, 02:53 AM
• Last activity: Sep 29, 2019, 01:03 AM
Showing page 1 of 20 total questions