Bug 35192

Summary: BUG() in radeon driver with ATI Technologies Inc RV515 [Radeon X1300]
Product: Drivers Reporter: Anthony Basile (blueness)
Component: Video(DRI - non Intel)Assignee: drivers_video-dri
Status: CLOSED CODE_FIX    
Severity: normal CC: akpm, alan, florian, glisse, greg, kernel, psomas
Priority: P1    
Hardware: All   
OS: Linux   
URL: http://bugs.gentoo.org/show_bug.cgi?id=359281
Kernel Version: 2.6.37 and 2.6.38 Subsystem:
Regression: Yes Bisected commit-id:

Description Anthony Basile 2011-05-16 00:02:45 UTC
A regression in kernels 2.6.37 up to 2.6.38.6 hits a BUG() in the radeon driver with the following hardware (as reported by lspci -kv)

02:00.0 VGA compatible controller: ATI Technologies Inc RV515 [Radeon X1300] (prog-if 00 [VGA controller])
	Subsystem: ATI Technologies Inc All-in-Wonder 2006 PCI-E Edition
	Flags: bus master, fast devsel, latency 0, IRQ 40
	Memory at e0000000 (64-bit, prefetchable) [size=256M]
	Memory at fd9f0000 (64-bit, non-prefetchable) [size=64K]
	I/O ports at ce00 [size=256]
	Expansion ROM at fd9c0000 [disabled] [size=128K]
	Capabilities: [50] Power Management version 2
	Capabilities: [58] Express Endpoint, MSI 00
	Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit+
	Kernel driver in use: radeon


The BUG() is

kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
BUG: unable to handle kernel paging request at ffffffff81c0e711
IP: [<ffffffff81c0e711>] platform_device_register_resndata+0x0/0x8c
PGD 1ae5067 PUD 1ae9063 PMD 3c473063 PTE 8000000001c0e163
Oops: 0011 [#1] SMP 
last sysfs file: /sys/devices/system/cpu/cpu1/cache/index2/shared_cpu_map
CPU 0 
Pid: 1627, comm: X Tainted: G        W   2.6.38.6 #1 HP Pavilion 061 ER900AA-ABA a1430n/NAGAMI
RIP: 0010:[<ffffffff81c0e711>]  [<ffffffff81c0e711>] platform_device_register_resndata+0x0/0x8c
RSP: 0018:ffff88003a2b3d30  EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff88003cb81000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffffff81a41cbb RDI: 0000000000000000
RBP: ffff88003a2b3d88 R08: 0000000000000000 R09: 0000000000000000
R10: ffff88003a340000 R11: 00000000effff000 R12: 0000000000000000
R13: ffff88003cb80800 R14: ffff88003abeaa80 R15: 0000000000000001
FS:  00007f40b4c72880(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffff81c0e711 CR3: 000000003a2a8000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process X (pid: 1627, threadinfo ffff88003a2b2000, task ffff88003d3ef1a0)
Stack:
 ffffffff8133aa29 0000000000000000 0000000000000001 ffff88003a2b3de8
 ffffffff811681fc ffff88003a2b3df8 ffff88003abeaa80 ffff88003cb80800
 0000000000000040 0000000040786440 0000000000000078 ffff88003a2b3ea8
Call Trace:
 [<ffffffff8133aa29>] ? radeon_cp_init+0xd79/0x1180
 [<ffffffff811681fc>] ? ext4_file_write+0x6c/0x2a0
 [<ffffffff81317439>] drm_ioctl+0x389/0x4d0
 [<ffffffff81339cb0>] ? radeon_cp_init+0x0/0x1180
 [<ffffffff8110ddfb>] do_vfs_ioctl+0x9b/0x510
 [<ffffffff8110e2bf>] sys_ioctl+0x4f/0x80
 [<ffffffff81002e2b>] system_call_fastpath+0x16/0x1b
Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
RIP  [<ffffffff81c0e711>] platform_device_register_resndata+0x0/0x8c
 RSP <ffff88003a2b3d30>
CR2: ffffffff81c0e711
---[ end trace f569524e57c84b79 ]---


More detail is avaiable in the following gentoo bug:

   http://bugs.gentoo.org/show_bug.cgi?id=359281


We're looking into doing the git-bisect now.
Comment 1 Anthony Basile 2011-05-16 23:38:38 UTC
Bisecting shows that reverting 737a3bb9416ce2a7c7a4170852473a4fcc9c67e8 fixed the problem.  The patch is in comment 28 of the Gentoo bug report.
Comment 2 Andrew Morton 2011-05-18 21:52:58 UTC
OK, this is a bit strange.

I can see that Greg's 737a3bb9416ce2a7c7a4170852473a4fcc9 would cause radeon_cp_init() to crash when it is called via ioctl.  It calls platform_device_register_resndata(), but platform_device_register_resndata() got unloaded from kernel memory.

But that will only happen if CONFIG_MODULES=n, and in the config attached to the gentto report, CONFIG_MODULES=y.  So platform_device_register_resndata() won't have been unloaded!

Still, we should revert 737a3bb9416ce2a7c7a4170852473a4fcc9, as its assumptions are wrong in this case.
Comment 3 Stratos Psomadakis 2011-05-18 22:55:59 UTC
Actually, radeon_cp_init() crashed only with later kernel configurations which had CONFIG_MODULES=n. And that's because of the commit 737a3b.

Previous configuration had CONFIG_MODULES=y, but with 2.6.36 kernels, and the bug/dmesg output was different I think. Probably it was a different, 2.6.36-related bug, which was fixed in later kernel releases.
Comment 4 Andrew Morton 2011-05-18 22:58:47 UTC
OK, thanks.  Mystery solved.
Comment 5 Jérôme Glisse 2011-05-19 14:05:02 UTC
So is the bug still revealent ? If so what is the bt now ?
Comment 6 Stratos Psomadakis 2011-05-19 18:15:57 UTC
The only problem now seems to be with commit 737a3b, which should be reverted, as it breaks radeon with a non-modular kernel (because of __init_or_module).
Comment 7 Florian Mickler 2011-07-05 10:31:32 UTC
A patch referencing this bug report has been merged in Linux v3.0-rc6:

commit bb2b43fefab723f4a0760146e7bed59d41a50e53
Author: Andrew Morton <akpm@linux-foundation.org>
Date:   Mon May 23 14:44:19 2011 -0700

    drivers/base/platform.c: don't mark platform_device_register_resndata() as __init_or_module