Most recent kernel where this bug did not occur: don`t know Distribution: OpenSuSE 10.3 + vanilla kernel 2.6.24rc6 Hardware Environment: fsc lifebook e-series (happens both on real hw and inside vmware) Software Environment: ??? Problem Description: modprobe capidrv;rmmod capidrv rmmod never returns [ 7358.883208] capidrv: Rev 1.1.2.2 : unloaded [ 7358.895698] BUG: unable to handle kernel NULL pointer dereference at virtual address 00000009 [ 7358.896132] printing eip: c020ceea *pde = 00000000 [ 7358.896512] Oops: 0000 [#1] SMP [ 7358.896844] Modules linked in: capidrv kernelcapi isdn slhc edd iptable_filter ip_tables ip6table_filter ip6_tables x_tables ipv6 af_packet microcode firmware_class fuse loop dm_mod ide_cd cdrom pata_acpi 8250_pnp ata_piix ahci ata_generic libata parport_pc parport floppy 8250 rtc_cmos serial_core rtc_core rtc_lib pcnet32 mii pcspkr hci_usb piix bluetooth generic i2c_piix4 ide_core i2c_core container shpchp pci_hotplug ac thermal power_supply button processor intel_agp agpgart sg mousedev evdev ext3 jbd mbcache sd_mod mptspi mptscsih mptbase ehci_hcd uhci_hcd scsi_transport_spi scsi_mod usbcore [ 7358.903430] [ 7358.903462] Pid: 4466, comm: kstopmachine Not tainted (2.6.24-rc6 #1) [ 7358.905436] EIP: 0060:[<c020ceea>] EFLAGS: 00010092 CPU: 0 [ 7358.906712] EIP is at list_del+0xa/0x61 [ 7358.906769] EAX: e0b39704 EBX: 00000009 ECX: 00000000 EDX: de4e4e10 [ 7358.906819] ESI: df6b8ef0 EDI: 00000000 EBP: de4a2fb4 ESP: de4a2fa4 [ 7358.906863] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 [ 7358.907431] Process kstopmachine (pid: 4466, ti=de4a2000 task=de4e4e10 task.ti=de4a2000) [ 7358.907463] Stack: c03f80ac 00000046 00000000 00000008 de4a2fbc c0153506 de4a2fd0 c015fb58 [ 7358.907492] df6b8ef0 c015fa9c 00000000 de4a2fe0 c013fe7e c013fe43 00000000 00000000 [ 7358.909431] c0108d77 df6b8e6c 00000000 00000000 00000000 00000000 00000000 [ 7358.909649] Call Trace: [ 7358.911432] [<c01091cd>] show_trace_log_lvl+0x1a/0x2f [ 7358.911492] [<c010927d>] show_stack_log_lvl+0x9b/0xa3 [ 7358.911539] [<c010932c>] show_registers+0xa7/0x179 [ 7358.911575] [<c0109538>] die+0x13a/0x225 [ 7358.911602] [<c02ed731>] do_page_fault+0x554/0x632 [ 7358.915172] [<c02ebd72>] error_code+0x72/0x78 [ 7358.915172] [<c0153506>] __unlink_module+0xb/0xf [ 7358.915172] [<c015fb58>] do_stop+0xbc/0x110 [ 7358.915172] [<c013fe7e>] kthread+0x3b/0x61 [ 7358.915172] [<c0108d77>] kernel_thread_helper+0x7/0x10 [ 7358.915172] ======================= [ 7358.915432] Code: 00 00 8b 53 10 8d 4b 0c 8d 46 0c e8 72 00 00 00 89 f8 e8 87 fe ff ff 83 c4 10 5b 5e 5f 5d c3 90 90 55 89 e5 53 83 ec 0c 8b 58 04 <8b> 0b 39 c1 74 18 89 4c 24 08 89 44 24 04 c7 04 24 a7 43 39 c0 [ 7358.917433] EIP: [<c020ceea>] list_del+0xa/0x61 SS:ESP 0068:de4a2fa4 [ 7358.919433] ---[ end trace a35f9be43025b578 ]--- Steps to reproduce: power up vanilla kernel built with >allmodconfig< or use suse distro kernel from http://ftp.suse.com/pub/projects/kernel/kotd/HEAD/i386/ then do modprobe capidrv;rmmod capidrv (don`t need any isdn hardware for that)
Same problem with my Fedora 8, i386, vanilla kernel v2.6.24-rc3 , but I even cannot save the oops message because Linux hangs immediately after the Oops. Besides, I cannot reproduce this oops within VMware.
Hi Roland, I run a git bisect (hmm, it takes a whole day) between a bad kernel 2.6.24-rc1, and a good kernel 2.6.23. It seems the commit `b1b2e7cf4a9742f61d76fcb419b1fd13159876a5' is lucky. Here is the patch to revert it, please apply and check if it is OK. thanks!
Created attachment 14346 [details] revert b1b2e7cf4a9742f61d76fcb419b1fd13159876a5
No, this cannot be the reason for the Oops and it is completely unrelated to module unload.
(In reply to comment #4) > No, this cannot be the reason for the Oops and it is completely unrelated to > module unload. > Yep, checking of return value of alloc_skb() should never introduce new BUGs.
I can reproduce the same (or similar) problem on 2.6.23.14. modprobe capidrv;rmmod capidrv produces: capidrv: Rev 1.1.2.2 : unloaded BUG: unable to handle kernel NULL pointer dereference at virtual address 00000007 printing eip: c012f3e1 *pde = 2d0b3067 *pte = 00000000 Oops: 0002 [#1] Modules linked in: capidrv kernelcapi isdn deflate zlib_deflate twofish twofish_common camellia serpent blowfish des cbc ecb blkcipher aes xcbc sha256 crypto_null xfrm_user xfrm4_tunnel tunnel4 ipcomp esp4 ah4 af_key forcedeth e1000 iptable_mangle ipt_iprange xt_CONNMARK ipt_REDIRECT ipt_MASQUERADE iptable_nat xt_conntrack xt_connmark ipt_LOG xt_limit xt_TCPMSS ipt_REJECT ipt_recent nf_conntrack_ipv4 xt_state ipt_ACCOUNT xt_tcpudp xt_condition xt_policy xt_multiport iptable_filter ip_tables x_tables nf_nat_tftp nf_nat_irc nf_nat_pptp nf_nat_proto_gre nf_nat_ftp nf_nat nf_conntrack_tftp nf_conntrack_socks nf_conntrack_irc nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_ftp nf_conntrack nfnetlink ext3 jbd mbcache reiserfs sg ahci libata amd74xx generic ehci_hcd ohci_hcd CPU: 0 EIP: 0060:[__unlink_module+0xa/0x21] Not tainted VLI EFLAGS: 00010046 (2.6.23-1.i2n #1) EIP is at __unlink_module+0xa/0x21 eax: f8b80b00 ebx: f8b80b04 ecx: 00000003 edx: 00000012 esi: 00000000 edi: bfc90e90 ebp: ed332000 esp: ed333f50 ds: 007b es: 007b fs: 0000 gs: 0000 ss: 0068 Process rmmod (pid: 2896, ti=ed332000 task=efac5550 task.ti=ed332000) Stack: f8b80b00 c0130279 f8b80b00 c0131bc6 69706163 00767264 00000000 ed07a320 c0147e46 ee61c580 ed07a320 c014876e ffffffff b7f25000 b7f24000 b7f25000 ed07a5cc ed07a5c0 0061c580 f8b80b80 00000880 ed333fa8 00000000 bfc90e90 Call Trace: [free_module+0x9/0x9e] free_module+0x9/0x9e [sys_delete_module+0x165/0x17b] sys_delete_module+0x165/0x17b [remove_vma+0x31/0x36] remove_vma+0x31/0x36 [do_munmap+0x193/0x1ac] do_munmap+0x193/0x1ac [syscall_call+0x7/0x0b] syscall_call+0x7/0xb ======================= Code: 94 00 00 00 00 0f 95 c0 0f b6 c0 c3 83 b8 98 00 00 00 00 0f 95 c0 0f b6 c0 c3 8b 80 80 01 00 00 c3 53 8b 48 04 8d 58 04 8b 53 04 <89> 51 04 89 0a c7 43 04 00 02 20 00 c7 40 04 00 01 10 00 31 c0 EIP: [__unlink_module+0xa/0x21] __unlink_module+0xa/0x21 SS:ESP 0068:ed333f50 I tried to nail it down but it's not easy: as soon as I add some printks to capidrv.c the bug does not cause immediate trouble anymore (but I think it's not gone). I think this is why removing the alloc_skb check helped Jike. On my system the local variable "mod" of sys_delete_module is modified during the call to mod->exit() and thus causes problems later on. But I don't think it is targeting mod but it's just some random stack or memory corruption. In my case the problem disappears if I remove the remove_proc_entry() call from capidrv's proc_exit. But I'm not sure if it has something to do with this either.
cannot reproduce this with http://ftp.suse.com/pub/projects/kernel/kotd/HEAD/i386/kernel-default-2.6.24_rc8-20080117182846.i586.rpm anymore
Created attachment 14564 [details] proposed fix for capidrv I found the bug, it is a long standing bug overwriting the stack. If the memory is aligned in some special way it crashes the kernel, this is why you can see it with some kernel versions and some not.
Yes this makes lot of sense, it was fixed in the init part but overseen that the exit code has a similar issue. I acked the patch. Thanks. I will close it when the code is in the tree.
i found the fix went upstream (http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=eb36f4fc019835cecf0788907f6cab774508087b ), so i`m closing this one , ok ?
Gerd's fix is not included in 2.6.24.x and therefore the kernel still oopses. I'm wondering what happend as it was already part of 2.6.23.15 in February? If you are looking for the patch in 2.6.23.15, it's filed under Karsten Keil as author...
2.6.24 is tagged 24.1.2008 the patch went in just one day later - see http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=eb36f4fc019835cecf0788907f6cab774508087b
>I'm wondering what happend as it was already part of 2.6.23.15 in February? 2.6.23.x is stable series, so is 2.6.24.x - both are different branches and maintained for a longer time - what`s being merged to development kernel after release of those, doesn`t get automatically into that, afaik.
Yes the stable branch for 2.6.24 was not open at this time, so it got lost, now Greg has it in his queue for the next 2.6.24 stable patch, thanks for spotting this.