Bug 215443

Summary: [radeon] BUG: Unable to handle kernel data access on read at 0xc007ffffffff9130, Oops: Kernel access of bad area, sig: 11 [#1]
Product: Platform Specific/Hardware Reporter: Erhard F. (erhard_f)
Component: PPC-64Assignee: platform_ppc-64
Status: RESOLVED OBSOLETE    
Severity: normal CC: alexdeucher, dri-devel
Priority: P1    
Hardware: PPC-64   
OS: Linux   
Kernel Version: 5.16-rc7 Subsystem:
Regression: No Bisected commit-id:
Attachments: kernel dmesg (kernel 5.16-rc7, Talos II)
kernel .config (kernel 5.16-rc7, Talos II)

Description Erhard F. 2022-01-01 14:35:55 UTC
Created attachment 300196 [details]
kernel dmesg (kernel 5.16-rc7, Talos II)

[...]
BUG: Unable to handle kernel data access on read at 0xc007ffffffff9130
Faulting instruction address: 0xc0080000076a1bb4
Oops: Kernel access of bad area, sig: 11 [#1]
BE PAGE_SIZE=4K MMU=Radix SMP NR_CPUS=192 DEBUG_PAGEALLOC NUMA PowerNV
Modules linked in: rfkill evdev ecb radeon(+) snd_hda_codec_hdmi xts drm_ttm_helper ttm snd_hda_intel snd_intel_dspcfg ctr i2c_algo_bit snd_hda_codec snd_hwdep xhci_pci cbc ofpart snd_hda_core powernv_flash aes_generic xhci_hcd libaes mtd ibmpowernv snd_pcm vmx_crypto gf128mul opal_prd hwmon snd_timer drm_kms_helper usbcore at24 regmap_i2c sysimgblt usb_common syscopyarea snd sysfillrect fb_sys_fops soundcore lz4 lz4_compress lz4_decompress zram zsmalloc powernv_cpufreq drm fuse drm_panel_orientation_quirks backlight configfs
CPU: 0 PID: 281 Comm: kworker/0:3 Not tainted 5.16.0-rc7-TalosII+ #1
Workqueue: events .work_for_cpu_fn
NIP:  c0080000076a1bb4 LR: c008000008994dd8 CTR: c0080000076a1b80
REGS: c000000011ede950 TRAP: 0300   Not tainted  (5.16.0-rc7-TalosII+)
MSR:  9000000000009032 <SF,HV,EE,ME,IR,DR,RI>  CR: 84048242  XER: 00000006
CFAR: c0080000089bffe8 DAR: c007ffffffff9130 DSISR: 40000000 IRQMASK: 0 
GPR00: c008000008994dd8 c000000011edebf0 c008000000000000 0000000000000001 
GPR04: c0080000076dfe28 0000000000000038 0000000000000002 0000000000024414 
GPR08: 0000000000000001 c008000000000000 c00000002f3e9878 c0080000089bffd0 
GPR12: c0080000076a1b80 c0000000033e1000 c000000000132940 0000000000000000 
GPR16: 0000000000000000 c00000002f406918 c00800001d9968e0 0000000000000000 
GPR20: 0000000000000043 0000000000000000 0000000000000001 c00000002f406800 
GPR24: 0000000000000001 c00000000efac000 c000000021a6f2c0 c00000000efac848 
GPR28: c000000021a6f980 0000000000000000 0000000000000001 c0000000167a9050 
NIP [c0080000076a1bb4] .drm_mode_object_get+0x34/0xc0 [drm]
LR [c008000008994dd8] .drm_crtc_helper_set_config+0x378/0xc00 [drm_kms_helper]
Call Trace:
[c000000011edebf0] [0000000000000898] 0x898 (unreliable)
[c000000011edec70] [c008000008994dd8] .drm_crtc_helper_set_config+0x378/0xc00 [drm_kms_helper]
[c000000011ededb0] [c00800001d7d29c8] .radeon_crtc_set_config+0x68/0x220 [radeon]
[c000000011edee60] [c008000007680ee0] .__drm_mode_set_config_internal+0xc0/0x190 [drm]
[c000000011edef00] [c0080000076ba1f8] .drm_client_modeset_commit_locked+0x178/0x270 [drm]
[c000000011edefa0] [c0080000076ba328] .drm_client_modeset_commit+0x38/0x80 [drm]
[c000000011edf020] [c0080000089bb344] .__drm_fb_helper_restore_fbdev_mode_unlocked+0x114/0x1c0 [drm_kms_helper]
[c000000011edf0c0] [c0080000089bb484] .drm_fb_helper_set_par+0x44/0x90 [drm_kms_helper]
[c000000011edf140] [c0000000009ac4a4] .fbcon_init+0x594/0x800
[c000000011edf230] [c0000000009eecb8] .visual_init+0x108/0x1c0
[c000000011edf2d0] [c0000000009f25f4] .do_bind_con_driver.isra.0+0x2c4/0x550
[c000000011edf3a0] [c0000000009f2a50] .do_take_over_console+0x1d0/0x300
[c000000011edf480] [c0000000009a8ac4] .do_fbcon_takeover+0xb4/0x1b0
[c000000011edf530] [c00000000099fd9c] .register_framebuffer+0x2ac/0x480
[c000000011edf630] [c0080000089babd4] .__drm_fb_helper_initial_config_and_unlock+0x444/0x830 [drm_kms_helper]
[c000000011edf740] [c00800001d7e0278] .radeon_fbdev_init+0xf8/0x180 [radeon]
[c000000011edf7d0] [c00800001d7d6560] .radeon_modeset_init+0x8b0/0xe20 [radeon]
[c000000011edf8b0] [c00800001d79cce4] .radeon_driver_load_kms+0xc4/0x230 [radeon]
[c000000011edf950] [c00800000767c308] .drm_dev_register+0x128/0x2d0 [drm]
[c000000011edf9f0] [c00800001d798644] .radeon_pci_probe+0x124/0x1a0 [radeon]
[c000000011edfa80] [c0000000009732e0] .local_pci_probe+0x60/0x100
[c000000011edfb10] [c00000000011e260] .work_for_cpu_fn+0x30/0x50
[c000000011edfb90] [c0000000001240b0] .process_one_work+0x2f0/0x830
[c000000011edfc70] [c000000000124840] .worker_thread+0x250/0x4f0
[c000000011edfd50] [c000000000132b00] .kthread+0x1c0/0x1d0
[c000000011edfe10] [c00000000000cdf0] .ret_from_kernel_thread+0x58/0x60
Instruction dump:
2c290000 4d820020 7c0802a6 fbe1fff8 3be30010 f8010010 f821ff81 80c30010 
3d220000 80a30000 78c60020 38600001 <e8899130> 48008429 60000000 39200001 
---[ end trace 1bbd44c839d96aca ]---

BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 39s!
Showing busy workqueues and worker pools:
workqueue events: flags=0x0
  pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=7/256 refcnt=9
    in-flight: 1802201963:.work_for_cpu_fn BAR(409)
    pending: .dbs_work_handler, .vmstat_shepherd, .once_deferred, .once_deferred, .once_deferred, .kfree_rcu_monitor
workqueue events_power_efficient: flags=0x80
  pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
    pending: .fb_flashcursor
workqueue mm_percpu_wq: flags=0x8
  pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
    pending: .vmstat_update
workqueue pm: flags=0x4
  pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
    pending: .pm_runtime_work
[...]
Comment 1 Erhard F. 2022-01-01 14:37:04 UTC
Created attachment 300197 [details]
kernel .config (kernel 5.16-rc7, Talos II)
Comment 2 Alex Deucher 2022-01-03 15:10:37 UTC
does appending amdgpu.runpm=0 on the kernel command line help?
Comment 3 Erhard F. 2022-01-03 15:26:56 UTC
(In reply to Alex Deucher from comment #2)
> does appending amdgpu.runpm=0 on the kernel command line help?
I doubt it as amdgpu is not even built (# CONFIG_DRM_AMDGPU is not set, see attached .config).

The card in question is a Radeon HD 6670 using the radeon drm module. Sorry I forgot to mention that!
Comment 4 Erhard F. 2022-08-26 14:36:28 UTC
Have not seen this since quite a few stable kernel releases on the same hardware. I think it's save to close here.