Sample Header Ad - 728x90

Mapping around ecc errors in Linux does not seem to work?

3 votes
0 answers
302 views
I get the following ecc error on a Linux box several times a day -
May 24 18:21:04 staton-nas kernel: mce: [Hardware Error]: Machine check events logged
May 24 18:21:04 staton-nas kernel: EDAC sbridge MC0: HANDLING MCE MEMORY ERROR
May 24 18:21:04 staton-nas kernel: EDAC sbridge MC0: CPU 0: Machine Check Event: 0 Bank 11: 8c000040000800c2
May 24 18:21:04 staton-nas kernel: EDAC sbridge MC0: TSC 1c35588953416 
May 24 18:21:04 staton-nas kernel: EDAC sbridge MC0: ADDR 117d228000 
May 24 18:21:04 staton-nas kernel: EDAC sbridge MC0: MISC 122100200020008c 
May 24 18:21:04 staton-nas kernel: EDAC sbridge MC0: PROCESSOR 0:306e4 TIME 1590358864 SOCKET 0 APIC 0
May 24 18:21:04 staton-nas kernel: EDAC MC0: 1 CE memory scrubbing error on CPU_SrcID#0_Ha#0_Chan#0_DIMM#1 (channel:0 slot:1 page:0x117d228 offset:0x0 grain:32 syndrome:0x0 - area:DRAM err_code:0008:00c2 socket:0 ha:0 channel_mask:1 rank:4)
The addr is always the same, so I’m trying to map around it with a ‘memmap=5M$0x117CFA8001’ kernel argument. The argument seems to be applying because I see the following in syslog -
May 24 16:03:09 staton-nas kernel: user: [mem 0x00000000ff000000-0x00000000ffffffff] reserved
May 24 16:03:09 staton-nas kernel: user: [mem 0x0000000100000000-0x000000117cfa8000] usable
May 24 16:03:09 staton-nas kernel: user: [mem 0x000000117cfa8001-0x000000117d4a8000] reserved
May 24 16:03:09 staton-nas kernel: user: [mem 0x000000117d4a8001-0x000000407fffffff] usable
but I still get the ecc errors. Am I missing something? Is the “ADDR 117d228000” in the edac syslog errors not the actual address I need to map around? Do I need to covert that to a physical address somehow? I’m too cheap to replace a whole dimm for a single bad bit. The more research I do, the more convinced I become that the “memory scrubbing error“ message indicates the error is coming from memory scrubbing that the hardware is doing. And I can safely ignore it now that I have mapped around it. The OS will never actually use this memory area because I reserved it. Can anyone confirm that?
Asked by statop (31 rep)
May 25, 2020, 05:07 PM