Sample Header Ad - 728x90

ATA errors in dmesg

1 vote
0 answers
211 views
Since a couple weeks, I sporadically find in dmesg a bunch of errors similar to the following one:
[174745.892138] ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[174745.892149] ata7.00: failed command: FLUSH CACHE EXT
[174745.892152] ata7.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 21
                         res 40/00:01:06:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[174745.892163] ata7.00: status: { DRDY }
[174745.892169] ata7: hard resetting link
[174746.201093] ata7: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[174746.218236] ata7.00: configured for UDMA/133
[174746.218245] ata7.00: retrying FLUSH 0xea Emask 0x4
[174746.218356] ata7: EH complete
It is always ata7 and it is always this same exception Emask. I already looked at the SMART data of all my disks, but they seem to be healthy, I cannot see any weird numbers under reallocated sectors count or something like this. I tried this answer here https://superuser.com/a/617193 to find out the actuyl sdX disk name, I get the following:
# grep '[0-9]' /sys/class/scsi_host/host{0..9}/unique_id
/sys/class/scsi_host/host0/unique_id:1
/sys/class/scsi_host/host1/unique_id:2
/sys/class/scsi_host/host2/unique_id:3
/sys/class/scsi_host/host3/unique_id:4
/sys/class/scsi_host/host4/unique_id:5
/sys/class/scsi_host/host5/unique_id:6
/sys/class/scsi_host/host6/unique_id:7
/sys/class/scsi_host/host7/unique_id:8
/sys/class/scsi_host/host8/unique_id:9
/sys/class/scsi_host/host9/unique_id:10
if I interpret this correctly, the disk ata7 belongs to SCSI Host 6? I then tried to find which disk is at host 6:
# ls -l /sys/block/sd*
lrwxrwxrwx 1 root root 0 Nov 24 19:28 /sys/block/sda -> ../devices/pci0000:00/0000:00:11.4/ata1/host0/target0:0:0/0:0:0:0/block/sda
lrwxrwxrwx 1 root root 0 Nov 24 19:28 /sys/block/sdb -> ../devices/pci0000:00/0000:00:11.4/ata2/host1/target1:0:0/1:0:0:0/block/sdb
lrwxrwxrwx 1 root root 0 Nov 24 19:28 /sys/block/sdc -> ../devices/pci0000:00/0000:00:11.4/ata3/host2/target2:0:0/2:0:0:0/block/sdc
lrwxrwxrwx 1 root root 0 Nov 24 19:28 /sys/block/sdd -> ../devices/pci0000:00/0000:00:11.4/ata4/host3/target3:0:0/3:0:0:0/block/sdd
lrwxrwxrwx 1 root root 0 Nov 24 19:28 /sys/block/sde -> ../devices/pci0000:00/0000:00:1f.2/ata5/host4/target4:0:0/4:0:0:0/block/sde
lrwxrwxrwx 1 root root 0 Nov 24 19:28 /sys/block/sdf -> ../devices/pci0000:00/0000:00:1f.2/ata6/host5/target5:0:0/5:0:0:0/block/sdf
lrwxrwxrwx 1 root root 0 Nov 24 19:28 /sys/block/sdg -> ../devices/pci0000:00/0000:00:1f.2/ata7/host6/target6:0:0/6:0:0:0/block/sdg
to me, it looks like the host 6 has disk sdg attached. However, when I check the SMART log of disk sdg, I cannot see anything that is dramatically different to all other disks, so I am wondering, are my above assumptions correct at all? and how can I find out what causes this strange errors in my dmesg? they seem to appear randomly, sometimes there are a few per hour, sometimes there is none for days. Is my disk dying and I just need to replace it, or is my hardware the problem? how can I isolate this? This is on a production system, so unfortunately it will not be so simple to randomly pull drives, change hardware or power off the system.
Asked by T. Pluess (626 rep)
Nov 27, 2024, 08:37 AM