Unix & Linux Stack Exchange

Q&A for users of Linux, FreeBSD and other Unix-like operating systems

Latest Questions

0 votes

0 answers

81 views

Properly setting BAD RAM in GRUB

lately I've started to experience instability with my system. At first, I suspected high temps, bad drivers, and so on... In the end I've decided to try Memtest86+, here is the result after 1 hour, with the BAD-RAM format selected. First of all I want to specify that I can't change the RAM, it's sol...

                                  lately I've started to experience instability with my system. At first, I suspected high temps, bad drivers, and so on... In the end I've decided to try Memtest86+, here is the result after 1 hour, with the BAD-RAM format selected.

First of all I want to specify that I can't change the RAM, it's soldered on the MOBO, I just want to have a normal experience with my device until I'm able to get a new one.

    badram= 0x0000000232848000,0xfffffffffffff040,
            0x0000000232849040,0xfffffffffffff040,
            0x000000023284a000,0xfffffffffffff040,
            0x000000023284b060,0xfffffffffffff060,
            0x000000023284c040,0xffffffffffffd440,
            0x000000023284c420,0xffffffffffffe520,
            0x000000023284d110,0xfffffffffffff550,
            0x000000023284e020,0xffffffffffffe528,
            0x000000023284e020,0xffffffffffffe020,
            0x000000023284f000,0xfffffffffffff040

My problem:

I want to set somehow the **GRUB_BADRAM** variable in the etc/default/grub file.

What I've tried: 

 1. GRUB_BADRAM="0x0000000232848000,0xfffffffffffff040,0x0000000232849040,0xfffffffffffff040,0x000000023284a000,0xfffffffffffff040,0x000000023284b060,0xfffffffffffff060,0x000000023284c040,0xffffffffffffd440,0x000000023284c420,0xffffffffffffe520,0x000000023284d110,0xfffffffffffff550,0x000000023284e020,0xffffffffffffe528,0x000000023284e020,0xffffffffffffe020,0x000000023284f000,0xfffffffffffff040"
for short, the result of memtest, just in a one-liner. Result? Not being able to boot at all, needed to use a live USB to comment out the line and do a update-grub command.

 2. Being that the detected bad RAM ranges seem rather small I wanted to cover the whole range with just one address-mask pair: 0x0000000232848000 and the mask ... ? I don't really understand how it works TBH. 

Thanks!

qUneT (1 rep)

Nov 22, 2024, 10:44 PM • Last activity: Nov 22, 2024, 10:47 PM

3 votes

2 answers

1064 views

GRUB hangs itself with 64bit Memtest86+ BadRAM pattern?

linux-kernel grub2 uefi kernel-parameters badram

When I add the "**badram**" pattern that **64bit Memtest86+** v6.10/v6.20 gave me, **GRUB 2** hangs completely on boot. Q: * Why is the badram pattern address different from the "Error Address" displayed (`0x0ac...` vs `0x62c...`)? What is the reason for this apparent offset? * Why does GRUB hang on...

When I add the "**badram**" pattern that **64bit Memtest86+** v6.10/v6.20 gave me, **GRUB 2** hangs completely on boot.

Q: * Why is the badram pattern address different from the "Error Address" displayed (0x0ac... vs 0x62c...)? What is the reason for this apparent offset? * Why does GRUB hang on passing a 64bit badram pattern? ---- This is my GRUB...

# grub-mkimage --version
grub-mkimage (GRUB) 2.06-3~deb11u5

sleeping on the job...

Beyond the "**Welcome to GRUB!**" message, nothing. No reboot, no reaction to key inputs, no rescue shell. System completely "*bricked*" - I had to build a rescue USB UEFI boot stick to to recover from this. (Btw, no secure-boot hardware, no signed grub install, so no excuse.) Anyway. I don't have too much knowledge about system memory and really can't tell much from Memtests' hex numbers. But I don't think I can just trim the leading zeros and pass these like 32-bit numbers to GRUB, or can I? ...Some person on reddit seems to have done just that , but, like me, couldn't afterwards verify if those numbers actually worked as expected, and masked out the correct memory regions. Why might GRUB crap itself on this? Is this a bad mem region to mask? Is the region too small, should it be a certain size (like 4K page size) or alignment? Is GRUB badram just broken, perhaps? Or is the hardware? (I don't think so, but you never know with these ACPI tables, right?) In any case, I dug up quite a few instances of other people reporting the same problem with GRUB + 64bit addresses (clearly my GRUB is not the only lazy worker out there): > Upon issuing this command (either via grub.cfg or interactively on the command line) my system hangs and becomes unresponsive. > badram 0x000000008c4e0800,0xffffffffffffcfe0 (They got no response from GRUB devs ) > GRUB_BADRAM="0x00000000b3a9feec,0xfffffffffffffffc" > > And after that change, I don't even get to Grub boot screen. When it's supposed to show up, computer just hangs and shows the black screen. (They didn't manage to fix it, either ) > I did all that, but the Computer that is perfectly fine and has no errors refused to boot after that GRUB_BADRAM= line addition. it never boots and gives no menu at al. (The GRUB badram argument failed on two different computers for them ...) ... I can't tell if there might be any relation between these patterns that make them bad, or if GRUB badram just plain doesn't work with 64bit addresses, since I couldn't find any *positive* "works for me" reports. (Those all boiled down to people using Linux memmap= format or Linux memtest= kernel parameters, instead.) Finally, I found one more person who seems to have had success with badram... using 32bit address notations (on a 64bit machine) ? So I'm going to try that next.

nyov (215 rep)

May 17, 2023, 01:18 PM • Last activity: Jul 29, 2024, 07:32 AM

2 votes

0 answers

193 views

Grub BADRAM doesn't blacklist special memory range

debian grub memory badram

I have a laptop with a damaged integrated RAM, so I ran memtest86+ to get the affected sectors and marked them as unusable. The issue came when modifying the grub configuration. I set the `GRUB_BADRAM` option as GRUB_BADRAM="0x086580000,0xffff80000,0x0bb1c0000,0xffff00000,0x0bfa40000,0xffffc0000" Ch...

                                  I have a laptop with a damaged integrated RAM, so I ran memtest86+ to get the affected sectors and marked them as unusable. The issue came when modifying the grub configuration. I set the GRUB_BADRAM option as

    GRUB_BADRAM="0x086580000,0xffff80000,0x0bb1c0000,0xffff00000,0x0bfa40000,0xffffc0000"
Checking /proc/iomem I could see that no blacklist was being applied as it outputted the line

    5f000000-dfffffff : Reserved
I decided to see the difference between the iomem file with no blacklist and the one with the BAD_RAM option and the only difference was the address where the kernel code was stored, which, as I am aware, is decided at random on every boot, so there was no effective difference at all. Also, I was still experiencing the same memory related troubles as before, where some programs would crash due to a SIGSEGV, even my Desktop Environment or GCC.

Thinking I did something wrong, I chose to change the memory addresses to 4 GB later (starting at address 0x186580000 and so on)
 and to my surprise it did block them, as I could see

    186580000-1865fffff : Unusable memory
    186600000-1bb0fffff : System RAM
    1bb100000-1bb1fffff : Unusable memory
    1bb200000-1bfa3ffff : System RAM
    1bfa40000-1bfa7ffff : Unusable memory
    1bfa80000-41f33ffff : System RAM

So this time the change was being applied. 

All of this makes me question, was the first blacklist happening at all? Was iomem just not showing the blacklist range due to some hierarchy? Or even, is there some range that cannot be affected by GRUB_BADRAM?
                                

Gerard Jensen Olmos (21 rep)

Jun 8, 2024, 07:00 PM

0 votes

1 answers

1227 views

What does the "segfault at X" kernel log message mean if X is very large?

kernel memory segmentation-fault badram

I've got a device with bad RAM. Running memtest overnight shows all faulting addresses to be in the `0x7d0000000 - 0x7f0000000` range. I plan to replace the RAM, but until then, I've disabled a 2GB chunk around it with `memmap=`: ``` # cat /proc/cmdline BOOT_IMAGE=/boot/vmlinuz-6.5.0-25-generic root...

I've got a device with bad RAM. Running memtest overnight shows all faulting addresses to be in the 0x7d0000000 - 0x7f0000000 range. I plan to replace the RAM, but until then, I've disabled a 2GB chunk around it with memmap=:

# cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-6.5.0-25-generic root=UUID=5277c53f-b2cd-4301-8fdf-0b2119430870 ro memmap=2G$0x0000000790000000 quiet splash vt.handoff=7

Those cmdline options do seem to be acknowledged by the kernel:

[    0.000000] user-defined physical RAM map:
[    0.000000] user: [mem 0x0000000000000000-0x000000000009efff] usable
[    0.000000] user: [mem 0x000000000009f000-0x00000000000fffff] reserved
[    0.000000] user: [mem 0x0000000000100000-0x0000000019e6a017] usable
[    0.000000] user: [mem 0x0000000019e6a018-0x0000000019e7ae57] usable
[    0.000000] user: [mem 0x0000000019e7ae58-0x000000002cb82fff] usable
[    0.000000] user: [mem 0x000000002cb83000-0x000000002ed2ffff] reserved
[    0.000000] user: [mem 0x000000002ed30000-0x000000002edacfff] ACPI data
[    0.000000] user: [mem 0x000000002edad000-0x000000002f29bfff] ACPI NVS
[    0.000000] user: [mem 0x000000002f29c000-0x000000002fd0efff] reserved
[    0.000000] user: [mem 0x000000002fd0f000-0x000000002fd0ffff] usable
[    0.000000] user: [mem 0x000000002fd10000-0x000000003cffffff] reserved
[    0.000000] user: [mem 0x00000000e0000000-0x00000000efffffff] reserved
[    0.000000] user: [mem 0x00000000fe000000-0x00000000fe010fff] reserved
[    0.000000] user: [mem 0x00000000fec00000-0x00000000fec00fff] reserved
[    0.000000] user: [mem 0x00000000fed00000-0x00000000fed03fff] reserved
[    0.000000] user: [mem 0x00000000fee00000-0x00000000fee00fff] reserved
[    0.000000] user: [mem 0x00000000ff000000-0x00000000ffffffff] reserved
[    0.000000] user: [mem 0x0000000100000000-0x000000078fffffff] usable
[    0.000000] user: [mem 0x0000000790000000-0x000000080fffffff] reserved
[    0.000000] user: [mem 0x0000000810000000-0x00000008beffffff] usable

However, I still get segfaults, ostensibly in the reserved address range:

Mar 09 20:47:40 srv0 kernel: udisksd: segfault at 7fe974786218 ip 00007fe974786218 sp 00007ffcd10d1848 error 7 in libbd_swap.so.3.0.0[7fe974785000+2000] likely on CPU 7 (core 3, socket 0)

According to this page , I should interpret that as udiskd trying to write to the reserved address 0x7fe974786218 (error 7). At first glance, the 0x7f address seems to match up with what memtest found to be bad RAM, but is off by orders of magnitude, since it points to a value of 140 TB. My machine has 32 GB. What, if not a memory address, does the segfault at X value represent?

thariqfahry (113 rep)

Mar 26, 2024, 09:53 PM • Last activity: Mar 27, 2024, 10:52 AM

1 votes

1 answers

184 views

BadRAM Range: cannot set up the right range

grub2 ram memtest badram

I think my MacBook with soldered RAM has a RAM issue.  With memtest86+, I figured out which BadRAM pattern I have, but I cannot interpret the result correctly.  How should I read the range to set up the right exclusion in GRUB? Here are my memtest results: ``` BadRAM Pa...

I think my MacBook with soldered RAM has a RAM issue. With memtest86+, I figured out which BadRAM pattern I have, but I cannot interpret the result correctly. How should I read the range to set up the right exclusion in GRUB? Here are my memtest results:

BadRAM Patterns
---------------
badram=0x0000000058cb4000,0xfffffffffffffc00,
       0x0000000058cb4400,0xfffffffffffffc00,
       0x0000000058cb4800,0xfffffffffffffc00,
       0x0000000058cb4c00,0xfffffffffffffc00,
       0x0000000058cb5000,0xfffffffffffff800,
       0x0000000058cb5800,0xfffffffffffff800,
       0x0000000058cb6000,0xfffffffffffff800,
       0x0000000058cb6800,0xfffffffffffff800,
       0x0000000058cb7000,0xfffffffffffff800,
       0x0000000058cb7800,0xfffffffffffff800

.] Would memmap=64K$0x58cb0000 be correct?

devreklim (13 rep)

Mar 18, 2024, 07:43 AM • Last activity: Mar 18, 2024, 05:00 PM

3 votes

1 answers

1936 views

User friendly way to apply BadRAM patterns

linux grub memory memtest badram

My Linux machine is having issues with faulty RAM. I ran `PCMemTest-64`, and I determined the following patterns: [![enter image description here][1]][1] [1]: https://i.sstatic.net/94GSG.jpg Now, I have stock Ubuntu which doesn't seem to have the BadRAM patch, and I'm a bit nervous about compiling L...

                                  My Linux machine is having issues with faulty RAM. I ran PCMemTest-64, and I determined the following patterns:

Now, I have stock Ubuntu which doesn't seem to have the BadRAM patch, and I'm a bit nervous about compiling Linux from scratch. So I'm wondering if there is an easy way to disable these fault RAM addresses using existing tools in GRUB and Linux, for example using the memmap kernel parameter. I'm happy to lose a bit of RAM other than the faulty addresses (in the order of kilobytes and not gigabytes ideally) to take this shortcut.

Some versions:
* Linux 5.19.0-32
* Grub 2.06
* Ubuntu 22.04

What should I do?

Migwell (477 rep)

Mar 24, 2023, 10:38 AM • Last activity: Mar 24, 2023, 11:36 AM

Showing page 1 of 6 total questions