Bug 2913

Summary: kernel oops on modules unload
Product: Drivers Reporter: Martin Mokrejs (mmokrejs)
Component: Video(DRI - non Intel)Assignee: drivers_video-dri
Status: CLOSED PATCH_ALREADY_AVAILABLE    
Severity: normal CC: bunk, devzero, protasnb
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.7 Subsystem:
Regression: --- Bisected commit-id:
Attachments: dmesg, "ps -ef" and lsof output.

Description Martin Mokrejs 2004-06-18 12:57:42 UTC
Distribution: Gentoo
Hardware Environment: ASUS P4C800E-Deluxe, P4 box, HT enabled
Software Environment: 
Problem Description:

I have unloaded radeon module and then agpgart. This is what I found later in
dmesg output:

Linux agpgart interface v0.100 (c) Dave Jones
[drm] Initialized radeon 1.11.0 20020828 on minor 0: ATI Technologies Inc RV280
[Radeon 9200]
[drm:radeon_cp_init] *ERROR* radeon_cp_init called without lock held
[drm:radeon_unlock] *ERROR* Process 21119 using kernel context 0
atkbd.c: Spurious ACK on isa0060/serio0. Some program, like XFree86, might be
trying access hardware directly.
atkbd.c: Spurious ACK on isa0060/serio0. Some program, like XFree86, might be
trying access hardware directly.
[drm] Module unloaded
Unable to handle kernel paging request at virtual address 00100100
 printing eip:
c0137241
*pde = 00000000
Oops: 0000 [#1]
PREEMPT SMP DEBUG_PAGEALLOC
Modules linked in: agpgart uhci_hcd ohci_hcd ehci_hcd e1000
CPU:    1
EIP:    0060:[<c0137241>]    Not tainted
EFLAGS: 00010087   (2.6.7) 
EIP is at module_text_address+0x4c/0x64
eax: 001000fc   ebx: dd98c000   ecx: 001000fc   edx: 00100100
esi: dd98c000   edi: dd98dc0c   ebp: dd98db40   esp: dd98db3c
ds: 007b   es: 007b   ss: 0068
Process hotplug (pid: 21458, threadinfo=dd98c000 task=cc882a70)
Stack: d146df1c dd98db48 c0130bb1 dd98db64 c0142e7d 000000dc c01821f5 80050c00 
       f7fff680 00000000 dd98db9c c0145c51 00000520 40013960 dd98db9c 00000000 
       1146d000 c01821f5 d146d000 c18c2054 00000282 dd98dc9c 400007d0 d146dfb8 
Call Trace:
 [<c0105c9c>] show_stack+0x7a/0x90
 [<c0105e1f>] show_registers+0x152/0x1a1
 [<c0105fb2>] die+0xb5/0x129
 [<c011945a>] do_page_fault+0x1d2/0x4e3
 [<c0105961>] error_code+0x2d/0x38
 [<c0130bb1>] kernel_text_address+0x2e/0x39
 [<c0142e7d>] store_stackinfo+0x6c/0x8c
 [<c0145c51>] kfree+0x1eb/0x39b
 [<c01821f5>] load_elf_interp+0x11f/0x22c
 [<c0182912>] load_elf_binary+0x4e8/0xd2e
 [<c0165ab7>] search_binary_handler+0x194/0x2eb
 [<c0181bfa>] load_script+0x20a/0x240
 [<c0165ab7>] search_binary_handler+0x194/0x2eb
 [<c0165d61>] do_execve+0x153/0x1b9
 [<c0103b4b>] sys_execve+0x32/0x63
 [<c0104ed7>] syscall_call+0x7/0xb

Code: 8b 40 04 0f 18 00 90 81 fa f8 15 48 c0 75 c3 31 c0 5b 5d c3 
 


Steps to reproduce:
Comment 1 Dave Airlie 2004-06-18 20:49:24 UTC
The user isn't loading an agp chipset driver which causes the radeon drm to
fail, we probably should fix the DRM to fail nicer but it has never been that
high on the list as it is a system configuration issue...

Comment 2 Martin Mokrejs 2004-06-21 07:33:06 UTC
Well, yes, I have use agp=try_unsupported on Lilo kernel command line to get my
chipset detected. The Intel 875 chipset isn't that new, but need this options to
get agp detected. Therefore, it is my fault and really configuration issue.
However, please fix this problem. That.
BTW: The tricky thing was to realize that agp_try_unsupported=1 has to be used
in /etc/modules.autoload.d/kernel-2.x (note the different syntax to kernel
commandline). I think it should be explained in the help in kernel sources (see
the help option when "make menuconfig"). :(

Linux agpgart interface v0.99 (c) Jeff Hartmann
agpgart: Maximum main memory to use for agp memory: 941M
agpgart: Trying generic Intel routines for device id: 2578
agpgart: AGP aperture is 128M @ 0xe8000000
[drm] AGP 0.99 Aperture @ 0xe8000000 128MB
[drm] Initialized radeon 1.7.0 20020828 on minor 0
Comment 3 Dave Jones 2007-05-17 15:13:34 UTC
Did radeon ever get fixed up ?
Comment 4 Natalie Protasevich 2007-09-04 00:03:31 UTC
Any updates on this problem?
Thanks.
Comment 5 Martin Mokrejs 2007-09-17 06:20:56 UTC
I cannot get to test it due to bug #9032 at the moment.
Comment 6 Natalie Protasevich 2008-03-03 20:51:34 UTC
Martin,
can you update the status please. The bug mentioned in #5 seems to untested either. Can you try with newest kernel?
Comment 7 Natalie Protasevich 2008-03-06 00:16:23 UTC
Created attachment 15160 [details]
dmesg, "ps -ef" and lsof output.

Attaching on Martin's behalf
Comment 8 Natalie Protasevich 2008-03-06 00:19:11 UTC
Update from Martin:
Hi,
 I am sorry for the delay. I tested with 2.6.24.2 and cannot reproduce the crash
but still, I cannot easily remove the module. Please see attached dmesg,
"ps -ef" and lsof output.

 Bugzilla currently gives me access denied, seems like some misconfiguration.
I will try to attach the file later after it works again.
Martin

I had to force removal of intel_agp and then I have succeeded in removing agpgart without
forcing. And, no kernel crash.
Comment 9 Martin Mokrejs 2008-04-28 05:07:49 UTC
Does not happen to me now, maybe because I use different part of the driver on the same hardware with recent 2.6 kernels? I use now 2.6.24.2, I do not force any extra kernel commandline flags and no extra module loading options. In syslog I see:

Linux agpgart interface v0.102
agpgart: Detected an Intel 845G Chipset.
agpgart: AGP aperture is 256M @ 0xe0000000
[drm] Setting GART location based on new memory map
[drm] writeback test succeeded in 2 usecs
Comment 10 Roland Kletzing 2008-04-30 11:51:14 UTC
seems dmesg output is missing. 

what`s the problem you currently have?
you cannot remove agpart module?

can you describe, what you tried to do and what fails ?

mind the difference between "rmmod" and "modprobe -r" !

maybe it`s worth closing this ticket and creating a new one, since this is a different issue now and so there is a lot information inside, which isn`t relevant anymore.
Comment 11 Natalie Protasevich 2008-05-02 16:44:50 UTC
It seems like the driver finally worked for Martin.
Closing the bug.