Origin report: https://gitlab.freedesktop.org/drm/amd/-/issues/2615 Compile with CONFIG_DRM_RADEON=m on 32bit, kernel panics on radeon.ko load. On slow machine, there is 70 second window between login: prompt on raw VGA tty1 and radeon.ko KMS console load. When logging in on tty1 during this window, kernel panics. (Fast machine loads KMS console before login: prompt, so the problem is masked) [ 466.298267] [drm] radeon kernel modesetting enabled. [ 466.371716] radeon 0000:01:00.0: vgaarb: deactivate vga console [ 466.476920] Console: switching to colour dummy device 80x25 [ 466.661163] [drm] initializing kernel modesetting (RV280 0x1002:0x5960 0x148C:0x2094 0x01). [ 466.672175] [drm] Forcing AGP to PCI mode [ 466.848372] [drm] Generation 2 PCI interface, using max accessible memory [ 466.859389] radeon 0000:01:00.0: VRAM: 128M 0x00000000C0000000 - 0x00000000C7FFFFFF (128M used) [ 466.869191] radeon 0000:01:00.0: GTT: 512M 0x00000000A0000000 - 0x00000000BFFFFFFF [ 467.022337] [drm] Detected VRAM RAM=128M, BAR=128M [ 467.041498] [drm] RAM width 64bits DDR [ 467.153587] workqueue: Failed to create a rescuer kthread for wq "ttm": -EINTR [ 467.154126] [drm:radeon_ttm_init [radeon]] *ERROR* failed initializing buffer object driver(-12). [ 467.176714] radeon 0000:01:00.0: Fatal error during GPU init [ 467.361714] [drm] radeon: finishing device. [ 467.431560] [drm] radeon: cp finalized [ 467.495402] radeon: probe of 0000:01:00.0 failed with error -12 In this instance, tty1 console freezes.
While bisecting, I got a clearer panic: [ 469.825305] BUG: kernel NULL pointer dereference, address: 00000004 [ 469.830502] #PF: supervisor read access in kernel mode [ 469.830502] #PF: error_code(0x0000) - not-present page [ 469.830502] *pde = 00000000 [ 469.830502] Oops: 0000 [#1] PREEMPT SMP [ 469.830502] CPU: 0 PID: 365 Comm: systemd-udevd Not tainted 5.14.0-221.el9.v1.i586 #1 [ 469.830502] Hardware name: System Manufacturer System Name/ALADDIN5, BIOS 0626 07/15/95 [ 469.830502] EIP: _raw_spin_lock_irqsave+0x1f/0x40 [ 469.830502] Code: cc cc cc cc cc cc cc 3e cc cc cc 3e 55 89 c1 89 55 89 c1 89 5b fa 64 ff 5b fa 64 ff c2 31 d2 be c2 31 d2 be 89 d0 3e 0f 89 d0 <3e> 0f 89 d8 5b 5e 89 d8 5b 5e 26 00 90 89 26 00 90 89 b7 15 75 ff [ 469.830502] EAX: 00000000 EBX: 00000246 ECX: 00000004 EDX: 00000000 [ 469.830502] ESI: 00000001 EDI: c3e71c40 EBP: c3e71c34 ESP: c3e71c2c [ 469.830502] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00010046 [ 469.830502] CR0: 80050033 CR2: 00000004 CR3: 057fa000 CR4: 00000010 [ 469.830502] Call Trace: [ 469.830502] kthread_flush_worker+0x52/0xa0 [ 469.830502] ? kthread_should_park+0x40/0x40 [ 469.830502] drm_crtc_vblank_off+0x1d7/0x230 [drm] [ 469.830502] radeon_crtc_dpms+0x197/0x1b0 [radeon] [ 469.830502] radeon_crtc_disable+0x16/0xa0 [radeon] [ 469.830502] __drm_helper_disable_unused_functions+0x74/0xc0 [drm_kms_helper] [ 469.830502] drm_helper_disable_unused_functions+0x3c/0x50 [drm_kms_helper] [ 469.830502] radeon_fbdev_init+0xb0/0x130 [radeon] [ 469.830502] radeon_modeset_init+0x25d/0x320 [radeon] [ 469.830502] radeon_driver_load_kms+0xc4/0x240 [radeon] [ 469.830502] drm_dev_register+0xb4/0x1a0 [drm] [ 469.830502] radeon_pci_probe+0xc0/0x100 [radeon] [ 469.830502] pci_device_probe+0xbc/0x150 [ 469.830502] really_probe+0xb7/0x350 [ 469.830502] __driver_probe_device+0x109/0x1e0 [ 469.830502] driver_probe_device+0x1f/0x90 [ 469.830502] __driver_attach+0x8a/0x1b0 [ 469.830502] ? __device_attach_driver+0x100/0x100 [ 469.830502] bus_for_each_dev+0x58/0x90 [ 469.830502] driver_attach+0x19/0x20 [ 469.830502] ? __device_attach_driver+0x100/0x100 [ 469.830502] bus_add_driver+0x12f/0x1d0 [ 469.830502] driver_register+0x79/0xd0 [ 469.830502] ? 0xf7dde000 [ 469.830502] __pci_register_driver+0x52/0x60 [ 469.830502] radeon_module_init+0x5c/0x1000 [radeon] [ 469.830502] do_one_initcall+0x3e/0x1c0 [ 469.830502] ? __vunmap+0x20b/0x2a0 [ 469.830502] ? __vunmap+0x20b/0x2a0 [ 469.830502] ? kmem_cache_alloc_trace+0x38/0x440 [ 469.830502] do_init_module+0x52/0x260 [ 469.830502] load_module+0x930/0x9b0 [ 469.830502] __ia32_sys_init_module+0x15d/0x180 [ 469.830502] do_int80_syscall_32+0x2e/0x80 [ 469.830502] entry_INT80_32+0xf0/0xf0 [ 469.830502] EIP: 0xb79e7e4e [ 469.830502] Code: 0f 83 d6 06 00 00 c3 66 90 66 90 90 57 56 53 8b 7c 24 20 8b 74 24 1c 8b 54 24 18 8b 4c 24 14 8b 5c 24 10 b8 80 00 00 00 cd 80 <5b> 5e 5f 3d 01 f0 ff ff 0f 83 a4 06 00 00 c3 66 90 90 53 8b 54 24 [ 469.830502] EAX: ffffffda EBX: b5526010 ECX: 0020d79c EDX: b7c26274 [ 469.830502] ESI: b7c20295 EDI: b7c2ddd8 EBP: 018af7c0 ESP: bfd2f810 [ 469.830502] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b EFLAGS: 00000292 [ 469.830502] Modules linked in: nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib radeon(+) nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 gpu_sched drm_buddy i2c_algo_bit drm_display_helper cec drm_ttm_helper ppdev ttm rfkill ip_set nf_tables libcrc32c nfnetlink drm_kms_helper pcspkr syscopyarea e100 sysfillrect parport_pc sysimgblt mii fb_sys_fops parport qrtr drm fuse ext4 mbcache jbd2 sd_mod t10_pi sr_mod crc64_rocksoft_generic cdrom crc64_rocksoft crc64 sg ata_generic pata_ali libata serio_raw [ 469.830502] CR2: 0000000000000004 [ 469.830502] ---[ end trace 30555bd5ee4bee23 ]--- [ 469.830502] EIP: _raw_spin_lock_irqsave+0x1f/0x40 [ 469.830502] Code: cc cc cc cc cc cc cc 3e cc cc cc 3e 55 89 c1 89 55 89 c1 89 5b fa 64 ff 5b fa 64 ff c2 31 d2 be c2 31 d2 be 89 d0 3e 0f 89 d0 <3e> 0f 89 d8 5b 5e 89 d8 5b 5e 26 00 90 89 26 00 90 89 b7 15 75 ff [ 469.830502] EAX: 00000000 EBX: 00000246 ECX: 00000004 EDX: 00000000 [ 469.830502] ESI: 00000001 EDI: c3e71c40 EBP: c3e71c34 ESP: c3e71c2c [ 469.830502] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00010046 [ 469.830502] CR0: 80050033 CR2: 00000004 CR3: 057fa000 CR4: 00000010 [ 469.830502] Kernel panic - not syncing: Fatal exception [ 469.830502] Kernel Offset: disabled [ 469.830502] ---[ end Kernel panic - not syncing: Fatal exception ]---
analysis(short): vblank->worker was NULL in drivers/gpu/drm/drm_vblank.c:drm_crtc_vblank_off(), but no one afterwards checked it for NULL. kthread_queue_work() do dereferecne "worker" as raw_spin_lock_irqsave(&worker->lock, flags); which results as dereferencing &NULL->lock == 0x00000004. I'm not sure whether vblank->worker == NULL is a correct behavior or not.
analysis(long): drivers/gpu/drm/drm_vblank.c void drm_crtc_vblank_off(struct drm_crtc *crtc) { struct drm_device *dev = crtc->dev; unsigned int pipe = drm_crtc_index(crtc); struct drm_vblank_crtc *vblank = &dev->vblank[pipe]; struct drm_pending_vblank_event *e, *t; ktime_t now; u64 seq; ... /* Wait for any vblank work that's still executing to finish */ drm_vblank_flush_worker(vblank); <<< } drivers/gpu/drm/drm_internal.h static inline void drm_vblank_flush_worker(struct drm_vblank_crtc *vblank) { kthread_flush_worker(vblank->worker); <<< NULL for unknown reason } kernel/kthread.c void kthread_flush_worker(struct kthread_worker *worker) { struct kthread_flush_work fwork = { KTHREAD_WORK_INIT(fwork.work, kthread_flush_work_fn), COMPLETION_INITIALIZER_ONSTACK(fwork.done), }; kthread_queue_work(worker, &fwork.work); <<<< passes NULL wait_for_completion(&fwork.done); } kernel/kthread.c bool kthread_queue_work(struct kthread_worker *worker, struct kthread_work *work) { bool ret = false; unsigned long flags; raw_spin_lock_irqsave(&worker->lock, flags); <<<< dereference NULL->lock==0x00000004 if (!queuing_blocked(worker, work)) { kthread_insert_work(worker, work, &worker->work_list); ret = true; } raw_spin_unlock_irqrestore(&worker->lock, flags); return ret; } include/linux/kthread.h struct kthread_worker { unsigned int flags; raw_spinlock_t lock; <<< offset 0x00000004 struct list_head work_list; struct list_head delayed_work_list; struct task_struct *task; struct kthread_work *current_work; };
Created attachment 304625 [details] Activate #ifdef FTRACE_MCOUNT_MAX_OFFSET only if defined as nonzero
(attach patch-FTRACE_MCOUNT_MAX_OFFSET-defined.patch) I succeeded to bisect down the regressing commit found in kernel-5.18.0-rc2: b39181f7c690 (refs/bisect/bad) ftrace: Add FTRACE_MCOUNT_MAX_OFFSET to avoid adding weak function This at a glance does not relate to drm/kms code. The attached patch effectively reverts the commit for 32bit. This fixed the problem on kernel-5.18.0, but not enough for kernel-6.4.3 .
Created attachment 304626 [details] check and return if vblank->worker was null in drm_vblank_flush_worker Attached patch sort of fixes the problem; it does not panic and KMS console works, but printk is triggered 4 times on radeon.ko load and when VGA connector is plugged in. I am sort of at loss now; I need advice from people which knows better.
Can you also file report on freedesktop gitlab [1]? [1]: https://gitlab.freedesktop.org/drm/amd/-/issues
(In reply to Bagas Sanjaya from comment #7) > Can you also file report on freedesktop gitlab [1]? > > [1]: https://gitlab.freedesktop.org/drm/amd/-/issues Oops, I don't see your gitlab issue link on the TS. Sorry for inconvenience.
regressions@leemhuis.info sed in <55a3bbb1-5b3c-f454-b529-8ee9944cc67c@leemhuis. info> >> That being said: That commit is not in 5.18, as Steve noticed: >> >> >> #regzbot introduced: b39181f7c6907d >> https://bugzilla.kernel.org/show_bug.c gi?id=217669 >> >> #regzbot title: FTRACE_MCOUNT_MAX_OFFSET causes kernel NULL pointer >> derefe rence and virtual console (tty1) freeze >> > That commit was added in 5.19. >> > >> > So I'm confused about why it's mentioned. Was it backported? >> >> Taketo Kabe, could you please help to clean this confusion up? Did you >> mean 5.19 in https://bugzilla.kernel.org/show_bug.cgi?id=217669#c5 ? And >> BTW: did you really use a vanilla kernel for your bisection? Reporter Me: I bisected using freedesktop.org kernel tree, which git commit ID is in sync with kernel.org but version number in ./Makefile could be slighty behind. Patch in https://bugzilla.kernel.org/show_bug.cgi?id=217669#c4 fixed the problem in freedesktop.org kernel 5.18.0-rc2 . This may explain that in kernel.org tree, the said commit is in kernel-5.19.