Bug 215803

Summary: ppc64le(P9): BUG: Kernel NULL pointer dereference on read at 0x00000060 NIP: do_remove_conflicting_framebuffers+0x184/0x1d0
Product: Platform Specific/Hardware Reporter: Zorro Lang (zlang)
Component: PPC-64Assignee: platform_ppc-64
Status: CLOSED CODE_FIX    
Severity: normal CC: linux, michael
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 5.18-rc1 Subsystem:
Regression: No Bisected commit-id:
Attachments: kernel .config file

Description Zorro Lang 2022-04-05 04:37:02 UTC
When I test latest linux kernel, it panic[1] directly when I  just tried to boot it on ppc64le. I hit it several times on different ppc64le machines, same call trace. Due to I only hit this panic on ppc64le, so I report this bug to ppc64 to get more review.

The linux kernel HEAD is (nearly 5.18-rc1):

commit be2d3ecedd9911fbfd7e55cc9ceac5f8b79ae4cf
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Sat Apr 2 12:57:17 2022 -0700

    Merge tag 'perf-tools-for-v5.18-2022-04-02' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux


[1]
[   18.785170] RPC: Registered named UNIX socket transport module. 
[   18.785214] RPC: Registered udp transport module. 
[   18.785235] RPC: Registered tcp transport module. 
[   18.785256] RPC: Registered tcp NFSv4.1 backchannel transport module. 
[      
  OK     
] Mounted         
RPC Pipe File System   
. 
[      
  OK     
] Reached target         
rpc_pipefs.target   
. 
[   18.830598] fb0: switching to ast from OFfb vga 
[   18.830646] BUG: Kernel NULL pointer dereference on read at 0x00000060 
[   18.830669] Faulting instruction address: 0xc0000000009fd974 
[   18.830692] Oops: Kernel access of bad area, sig: 7 [#1] 
[   18.830712] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA PowerNV 
[   18.830734] Modules linked in: ast(+) i2c_algo_bit sunrpc drm_vram_helper drm_ttm_helper ttm drm_kms_helper fb_sys_fops syscopyarea sysfillrect ofpart sysimgblt ses enclosure powernv_flash ipmi_powernv at24 ipmi_devintf ext4 opal_prd mtd scsi_transport_sas ibmpowernv regmap_i2c ipmi_msghandler mbcache jbd2 drm fuse drm_panel_orientation_quirks xfs libcrc32c sd_mod t10_pi crc64_rocksoft_generic crc64_rocksoft crc64 sg i40e vmx_crypto aacraid 
[   18.830875] CPU: 0 PID: 963 Comm: kworker/0:2 Not tainted 5.17.0+ #1 
[   18.830906] Workqueue: events work_for_cpu_fn 
[   18.830930] NIP:  c0000000009fd974 LR: c0000000009fd96c CTR: 0000000000000000 
[   18.830961] REGS: c0000001156db740 TRAP: 0300   Not tainted  (5.17.0+) 
[   18.830981] MSR:  9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 48028222  XER: 00000000 
[   18.831022] CFAR: c00000000022a9ec DAR: 0000000000000060 DSISR: 00080000 IRQMASK: 0  
[   18.831022] GPR00: c0000000009fd96c c0000001156db9e0 c000000002d06200 0000000000000023  
[   18.831022] GPR04: 0000000000000000 c0000001156db730 c0000001156db728 0000000000000000  
[   18.831022] GPR08: 0000000000000027 c000000002be6200 c000000115751000 0000000000000001  
[   18.831022] GPR12: 0000001ff1900000 c000000005120000 c000000000194608 c00020001119b000  
[   18.831022] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000  
[   18.831022] GPR20: 0000000000000000 0000000000000000 c00000000144c560 c00000000144c588  
[   18.831022] GPR24: 0000000000000001 00000000000a0000 c0080000166c1200 c00000010f0dbf60  
[   18.831022] GPR28: c000000002d65038 c00020001ecc0380 0000000000000000 c000000002d64f40  
[   18.831234] NIP [c0000000009fd974] do_remove_conflicting_framebuffers+0x184/0x1d0 
[   18.831267] LR [c0000000009fd96c] do_remove_conflicting_framebuffers+0x17c/0x1d0 
[   18.831299] Call Trace: 
[   18.831314] [c0000001156db9e0] [c0000000009fd96c] do_remove_conflicting_framebuffers+0x17c/0x1d0 (unreliable) 
[   18.831351] [c0000001156dbab0] [c0000000009fdf34] remove_conflicting_framebuffers+0x64/0x160 
[   18.831385] [c0000001156dbb00] [c008000014ed05a8] drm_aperture_remove_conflicting_framebuffers+0x80/0xf0 [drm] 
[   18.831439] [c0000001156dbb50] [c0080000166b0238] ast_pci_probe+0x60/0x130 [ast] 
[   18.831474] [c0000001156dbb90] [c0000000009b39c8] local_pci_probe+0x68/0x110 
[   18.831508] [c0000001156dbc10] [c00000000017f038] work_for_cpu_fn+0x38/0x60 
[   18.831540] [c0000001156dbc40] [c000000000185608] process_one_work+0x348/0x850 
[   18.831574] [c0000001156dbd30] [c000000000185d70] worker_thread+0x260/0x500 
[   18.831605] [c0000001156dbdc0] [c000000000194748] kthread+0x148/0x150 
[   18.831627] [c0000001156dbe10] [c00000000000cbf4] ret_from_kernel_thread+0x5c/0x64 
[   18.831661] Instruction dump: 
[   18.831679] 7d710120 7d708120 4e800020 e8df0000 7fc407b4 7f45d378 7ec3b378 f8810068  
[   18.831716] 38c601f0 4b82d03d 60000000 3d22ffee <e9550060> 3929ee90 e8810068 7c2a4800  
[   18.831755] ---[ end trace 0000000000000000 ]--- 
[   18.958634]  
[   18.958701] kworker/0:2 (963) used greatest stack depth: 7056 bytes left 
[      
  OK     
] Started         
Security Auditing Service   
. 
         Starting         
Record System Boot/Shutdown in UTMP
Comment 1 Zorro Lang 2022-04-05 04:37:40 UTC
Created attachment 300697 [details]
kernel .config file
Comment 2 Daniel Kolesa 2022-04-10 22:45:37 UTC
This now hits 5.15.33. I noticed this when virtio-gpu failed to come up.

Commit: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/drivers/video/fbdev/core?h=linux-5.15.y&id=c894ac44786cfed383a6c6b20c1bfb12eb96018a

More detailed backtrace: https://gist.github.com/q66/6ffc1bd18cf241e6ad894dc4409a2f72

This is also on a ppc64le system. However, I think this bug may not be ppc64 specific...
Comment 3 Daniel Kolesa 2022-04-10 22:46:48 UTC
It does not panic in my case though; I merely get stuck with the offb framebuffer console instead of it switching modes to the right thing
Comment 4 Daniel Kolesa 2022-04-10 22:47:38 UTC
Also, just to be clear, reverting the commit I linked above does fix the problem for me. Here is a patch you can quickly test: https://gist.github.com/q66/da01b4baecfdc24cd8fa3253d4e7f05a
Comment 5 Michael Ellerman 2022-04-12 07:07:45 UTC
This was reported to the patch author here:

  https://lore.kernel.org/all/YkHXO6LGHAN0p1pq@debian/

And there is a fix here:

  https://patchwork.freedesktop.org/patch/480648/
Comment 6 Michael Ellerman 2022-05-23 23:53:04 UTC
The fix was merged into v5.18-rc2 as:

https://git.kernel.org/torvalds/c/0f525289ff0ddeb380813bd81e0f9bdaaa1c9078