A regression in kernels 2.6.37 up to 2.6.38.6 hits a BUG() in the radeon driver with the following hardware (as reported by lspci -kv) 02:00.0 VGA compatible controller: ATI Technologies Inc RV515 [Radeon X1300] (prog-if 00 [VGA controller]) Subsystem: ATI Technologies Inc All-in-Wonder 2006 PCI-E Edition Flags: bus master, fast devsel, latency 0, IRQ 40 Memory at e0000000 (64-bit, prefetchable) [size=256M] Memory at fd9f0000 (64-bit, non-prefetchable) [size=64K] I/O ports at ce00 [size=256] Expansion ROM at fd9c0000 [disabled] [size=128K] Capabilities: [50] Power Management version 2 Capabilities: [58] Express Endpoint, MSI 00 Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit+ Kernel driver in use: radeon The BUG() is kernel tried to execute NX-protected page - exploit attempt? (uid: 0) BUG: unable to handle kernel paging request at ffffffff81c0e711 IP: [<ffffffff81c0e711>] platform_device_register_resndata+0x0/0x8c PGD 1ae5067 PUD 1ae9063 PMD 3c473063 PTE 8000000001c0e163 Oops: 0011 [#1] SMP last sysfs file: /sys/devices/system/cpu/cpu1/cache/index2/shared_cpu_map CPU 0 Pid: 1627, comm: X Tainted: G W 2.6.38.6 #1 HP Pavilion 061 ER900AA-ABA a1430n/NAGAMI RIP: 0010:[<ffffffff81c0e711>] [<ffffffff81c0e711>] platform_device_register_resndata+0x0/0x8c RSP: 0018:ffff88003a2b3d30 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88003cb81000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffffffff81a41cbb RDI: 0000000000000000 RBP: ffff88003a2b3d88 R08: 0000000000000000 R09: 0000000000000000 R10: ffff88003a340000 R11: 00000000effff000 R12: 0000000000000000 R13: ffff88003cb80800 R14: ffff88003abeaa80 R15: 0000000000000001 FS: 00007f40b4c72880(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffff81c0e711 CR3: 000000003a2a8000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process X (pid: 1627, threadinfo ffff88003a2b2000, task ffff88003d3ef1a0) Stack: ffffffff8133aa29 0000000000000000 0000000000000001 ffff88003a2b3de8 ffffffff811681fc ffff88003a2b3df8 ffff88003abeaa80 ffff88003cb80800 0000000000000040 0000000040786440 0000000000000078 ffff88003a2b3ea8 Call Trace: [<ffffffff8133aa29>] ? radeon_cp_init+0xd79/0x1180 [<ffffffff811681fc>] ? ext4_file_write+0x6c/0x2a0 [<ffffffff81317439>] drm_ioctl+0x389/0x4d0 [<ffffffff81339cb0>] ? radeon_cp_init+0x0/0x1180 [<ffffffff8110ddfb>] do_vfs_ioctl+0x9b/0x510 [<ffffffff8110e2bf>] sys_ioctl+0x4f/0x80 [<ffffffff81002e2b>] system_call_fastpath+0x16/0x1b Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 RIP [<ffffffff81c0e711>] platform_device_register_resndata+0x0/0x8c RSP <ffff88003a2b3d30> CR2: ffffffff81c0e711 ---[ end trace f569524e57c84b79 ]--- More detail is avaiable in the following gentoo bug: http://bugs.gentoo.org/show_bug.cgi?id=359281 We're looking into doing the git-bisect now.
Bisecting shows that reverting 737a3bb9416ce2a7c7a4170852473a4fcc9c67e8 fixed the problem. The patch is in comment 28 of the Gentoo bug report.
OK, this is a bit strange. I can see that Greg's 737a3bb9416ce2a7c7a4170852473a4fcc9 would cause radeon_cp_init() to crash when it is called via ioctl. It calls platform_device_register_resndata(), but platform_device_register_resndata() got unloaded from kernel memory. But that will only happen if CONFIG_MODULES=n, and in the config attached to the gentto report, CONFIG_MODULES=y. So platform_device_register_resndata() won't have been unloaded! Still, we should revert 737a3bb9416ce2a7c7a4170852473a4fcc9, as its assumptions are wrong in this case.
Actually, radeon_cp_init() crashed only with later kernel configurations which had CONFIG_MODULES=n. And that's because of the commit 737a3b. Previous configuration had CONFIG_MODULES=y, but with 2.6.36 kernels, and the bug/dmesg output was different I think. Probably it was a different, 2.6.36-related bug, which was fixed in later kernel releases.
OK, thanks. Mystery solved.
So is the bug still revealent ? If so what is the bt now ?
The only problem now seems to be with commit 737a3b, which should be reverted, as it breaks radeon with a non-modular kernel (because of __init_or_module).
A patch referencing this bug report has been merged in Linux v3.0-rc6: commit bb2b43fefab723f4a0760146e7bed59d41a50e53 Author: Andrew Morton <akpm@linux-foundation.org> Date: Mon May 23 14:44:19 2011 -0700 drivers/base/platform.c: don't mark platform_device_register_resndata() as __init_or_module