Bug 212917
Summary: | Unable to handle kernel NULL pointer dereference at virtual address | ||
---|---|---|---|
Product: | Memory Management | Reporter: | rudi |
Component: | Other | Assignee: | Andrew Morton (akpm) |
Status: | NEW --- | ||
Severity: | normal | CC: | geraldogabriel, rudi, stefan.bruens |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 5.12 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: | Simple patch to downclock affected Rock Pi N10 units |
Description
rudi
2021-05-01 11:12:36 UTC
The OS itself is a patched Ubuntu 20.04.2 I see the same, but also all other kinds of memory errors, like userspace segfaults, "stack smashing" faults, etc. When I run with kernel command line parameter maxcpus=4, i.e. disabling the two A72 cores, the systems runs fine even under stress. Offlining the two A72 cores via sysfs has the same effect, enabling the cores again makes the system unreliable. This is completely repeatable. I believe that this has been addressed in 5.13, I read the patch somewhere About adjusting 1 to 2 or 2 to 1… I just can’t find the reference at the moment. Closing the bug. I believe this has contributed to the fix - https://lore.kernel.org/lkml/20210511211335.2935163-1-pgwipeout@gmail.com/ Along with the DTS - https://lore.kernel.org/lkml/20210527105943.GA441@7698f5da3a10/ - https://lore.kernel.org/lkml/20210527122911.GA1640@7698f5da3a10/ I also own a Rock Pi N10 and like Stefan Brüns I have been disabling the A72 cores since forever. Lately however I still have been having crashes under load and I must say this is much in test but I downclocked the SBC through a rk3399pro-rock-pi-n10.dts change. -#include "rk3399-opp.dtsi" +#include "rk3399-t-opp.dtsi" This enforces a 1512 MHz clock on the big cores and a 1008 MHz clock on the little cores. With this DTB patch I can run all six cores apparently smoothly, with a performance penalty but it seems it won't crash randomly anymore. Still in test, happy to continue reporting. Created attachment 306469 [details]
Simple patch to downclock affected Rock Pi N10 units
(In reply to rudi from comment #4) > I believe this has contributed to the fix > - https://lore.kernel.org/lkml/20210511211335.2935163-1-pgwipeout@gmail.com/ > Along with the DTS > - https://lore.kernel.org/lkml/20210527105943.GA441@7698f5da3a10/ > - https://lore.kernel.org/lkml/20210527122911.GA1640@7698f5da3a10/ Rudi, by the way, I'm running Rock Pi N10 with an active cooling fan solution, I was aware that it would crash once it hit that 80ºC cpufreq threshold, and I run therefore with "performance" cpu mode and active cooling. And I still see the crashes on Linux 6.10-rc4. The only thing that helped was downclocking the CPU to RK3399-T frequencies, no more crashes so far. By the way I'm on gentoo on this box, so it's easy to crash it with "emerge -av --emptytree @world" which will trigger a re-compilation of all packages. Even with distcc helping it used to crash. That was in the context of turning off the two BIG A72 cores or it would happen exactly as Stefan Brüns described, stack smashing detected, segfaults etc. Still it would crash with only the little cores and under high load, sometimes in a few hours, sometimes it took more than a day. Now by downclocking the CPU the SBC seems to be running fine under high load with all six cores, BIG and little, online. |