Bug 212917 - Unable to handle kernel NULL pointer dereference at virtual address
Summary: Unable to handle kernel NULL pointer dereference at virtual address
Status: NEW
Alias: None
Product: Memory Management
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Andrew Morton
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-05-01 11:12 UTC by rudi
Modified: 2021-06-27 23:58 UTC (History)
2 users (show)

See Also:
Kernel Version: 5.12
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description rudi 2021-05-01 11:12:36 UTC
On Radxa RockPi N10 using current u-boot 2021.04 and Kernel 5.12 (using mainline config, and dtbs) 

Recompiling the kernel with “make -j6 Image dtbs” failed with this kernel error.

[ 7422.215547] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
[ 7422.216335] Mem abort info:
[ 7422.216583]   ESR = 0x86000006
[ 7422.216855]   EC = 0x21: IABT (current EL), IL = 32 bits
[ 7422.217322]   SET = 0, FnV = 0
[ 7422.217593]   EA = 0, S1PTW = 0
[ 7422.217870] user pgtable: 4k pages, 39-bit VAs, pgdp=000000000f3f3000
[ 7422.218438] [0000000000000000] pgd=0000000006bac003, p4d=0000000006bac003, pud=0000000006bac003, pmd=0000000000000000
[ 7422.219375] Internal error: Oops: 86000006 [#1] SMP
[ 7422.219808] Modules linked in: autofs4
[ 7422.220146] CPU: 4 PID: 9677 Comm: cc1 Not tainted 5.12.0 #1
[ 7422.220645] Hardware name: Radxa ROCK Pi N10 (DT)
[ 7422.221061] pstate: 20000085 (nzCv daIf -PAN -UAO -TCO BTYPE=--)
[ 7422.221590] pc : 0x0
[ 7422.221789] lr : __mod_lruvec_state+0x30/0x5c
[ 7422.222182] sp : ffffffc0165ab970
[ 7422.222475] x29: ffffffc0165ab970 x28: 0000000000001000 
[ 7422.222946] x27: ffffff800e561200 x26: ffffff80139638c0 
[ 7422.223416] x25: fffffffe00e20380 x24: 0000000000045a8e 
[ 7422.223886] x23: 0000000000000008 x22: 00000000ffffffff 
[ 7422.224358] x21: 0000000000000014 x20: ffffffffffffffff 
[ 7422.224828] x19: ffffff800c865c00 x18: 0000000000000000 
[ 7422.225298] x17: 0000000000000000 x16: 0000000000000000 
[ 7422.225768] x15: 0000000000000000 x14: 0000000000001000 
[ 7422.226238] x13: 0000000000000040 x12: 0000000000000000 
[ 7422.226708] x11: 000000003880d000 x10: ffffff8013ac6000 
[ 7422.227178] x9 : ffffffc0101a3678 x8 : 0000000000000300 
[ 7422.227648] x7 : 0000000000004000 x6 : 00000000000000d3 
[ 7422.228118] x5 : 000000003880c000 x4 : 0000000000000000 
[ 7422.228587] x3 : 0000000000002015 x2 : ffffffc0113543f0 
[ 7422.229057] x1 : 0000000000000024 x0 : ffffffc0e6463000 
[ 7422.229528] Call trace:
[ 7422.229748]  0x0
[ 7422.229915]  __mod_lruvec_page_state+0x5c/0x60
[ 7422.230312]  mod_lruvec_page_state+0x34/0x4c
[ 7422.230694]  clear_page_dirty_for_io+0x80/0xc4
[ 7422.231089]  mpage_submit_page+0x38/0x90
[ 7422.231438]  mpage_map_and_submit_buffers+0x21c/0x268
[ 7422.231884]  ext4_writepages+0x65c/0x908
[ 7422.232234]  do_writepages+0x40/0x80
[ 7422.232551]  __filemap_fdatawrite_range+0x64/0x94
[ 7422.232969]  filemap_flush+0x20/0x28
[ 7422.233287]  ext4_alloc_da_blocks+0x68/0x7c
[ 7422.233658]  ext4_release_file+0x30/0xd0
[ 7422.234006]  __fput+0xe8/0x1f8
[ 7422.234281]  ____fput+0x14/0x1c
[ 7422.234563]  task_work_run+0x88/0xac
[ 7422.234883]  do_notify_resume+0x214/0x2c4
[ 7422.235240]  work_pending+0xc/0x1c0
[ 7422.235557] Code: bad PC value
[ 7422.235831] ---[ end trace a189587e728b7e44 ]---
Comment 1 rudi 2021-05-01 11:14:06 UTC
The OS itself is a patched Ubuntu 20.04.2
Comment 2 Stefan Brüns 2021-05-26 16:14:08 UTC
I see the same, but also all other kinds of memory errors, like userspace segfaults, "stack smashing" faults, etc.

When I run with kernel command line parameter maxcpus=4, i.e. disabling the two A72 cores, the systems runs fine even under stress.

Offlining the two A72 cores via sysfs has the same effect, enabling the cores again makes the system unreliable. This is completely repeatable.
Comment 3 rudi 2021-06-27 15:47:08 UTC
I believe that this has been addressed in 5.13, I read the patch somewhere About adjusting 1 to 2 or 2 to 1… I just can’t find the reference at the moment. Closing the bug.

Note You need to log in before you can comment on or make changes to this bug.