The Distribution is Gentoo, using a vanilla kernel 2.6.30-rc6. While booting and executing the ip6tables rule loading script an Oops occurs in about half of the cases. The module-init-tools version is 3.5. The problem did never occur in Kernel 2.6.29. dmesg extract: [ 374.119788] BUG: unable to handle kernel paging request at 000cba0b [ 374.119970] IP: [<c015981c>] m_show+0x9c/0x1a0 [ 374.120004] *pde = 00000000 [ 374.120004] Oops: 0000 [#1] PREEMPT [ 374.120004] last sysfs file: /sys/block/dm-0/size [ 374.120004] Modules linked in: snd_opl3_synth snd_seq_midi_emul snd_seq_midi snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss snd_mixer_oss fuse tun asb100 hwmon_vid iptable_mangle nf_conntrack_netbios_ns nf_conntrack_irc nf_conntrack_ftp usbhid cryptoloop 8139cp tuner_simple tuner_types tuner arc4 cx8800 cx88xx rt61pci rt2x00pci rt2x00lib ir_common led_class i2c_algo_bit snd_cmipci input_polldev usblp tveeprom snd_pcm v4l2_common mac80211 videodev snd_page_alloc v4l1_compat videobuf_dma_sg snd_opl3_lib videobuf_core btcx_risc eeprom_93cx6 skge snd_timer snd_hwdep snd_mpu401 forcedeth snd_mpu401_uart i2c_nforce2 snd_rawmidi snd_seq_device nvidia_agp agpgart snd soundcore floppy [ 374.120004] [ 374.120004] Pid: 5953, comm: modprobe Not tainted (2.6.30-rc6 #3) A7N8X-E [ 374.120004] EIP: 0060:[<c015981c>] EFLAGS: 00010296 CPU: 0 [ 374.120004] EIP is at m_show+0x9c/0x1a0 [ 374.120004] EAX: 00000000 EBX: 000cba0b ECX: 00000000 EDX: 00000000 [ 374.120004] ESI: f81b0c48 EDI: c7f88ec0 EBP: f81b0b40 ESP: cac88ecc [ 374.120004] DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068 [ 374.120004] Process modprobe (pid: 5953, ti=cac88000 task=c7f967c0 task.ti=cac88000) [ 374.120004] Stack: [ 374.120004] c7f88ec0 c0640c5a f8344e4c 00009100 f81b0b44 00000060 00010246 35a83966 [ 374.120004] 00000000 c7f88ec0 f81b0b44 000002af c01bf296 cac88f90 000003ff b8016001 [ 374.120004] cac5c280 c7f88ee8 00000001 00000027 00000000 00000026 00000000 35a83966 [ 374.120004] Call Trace: [ 374.120004] [<c01bf296>] ? seq_read+0x236/0x3a0 [ 374.120004] [<c01bf060>] ? seq_read+0x0/0x3a0 [ 374.120004] [<c01e5384>] ? proc_reg_read+0x84/0xf0 [ 374.120004] [<c01a0c07>] ? vfs_read+0xc7/0x170 [ 374.120004] [<c01e5300>] ? proc_reg_read+0x0/0xf0 [ 374.120004] [<c01a0de0>] ? sys_read+0x60/0xc0 [ 374.120004] [<c010353b>] ? sysenter_do_call+0x12/0x26 [ 374.120004] Code: 31 c0 81 c6 04 01 00 00 39 f3 74 2e 8d 76 00 8b 43 08 89 3c 24 c7 44 24 04 5a 0c 64 c0 83 c0 0c 89 44 24 08 e8 66 56 06 00 8b 1b <8b> 03 0f 18 00 90 39 de 75 da b8 01 00 00 00 8b b5 bc 00 00 00 [ 374.120004] EIP: [<c015981c>] m_show+0x9c/0x1a0 SS:ESP 0068:cac88ecc [ 374.120004] CR2: 00000000000cba0b [ 374.130919] ---[ end trace 560ced3d4b382f16 ]--- $ cat /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 6 model : 10 model name : AMD Athlon(tm) XP 2500+ stepping : 0 cpu MHz : 1837.534 cache size : 512 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse syscall mmxext 3dnowext 3dnow bogomips : 3675.06 clflush size : 32 power management: ts
That decodes to 8: 39 f3 cmp %esi,%ebx a: 74 2e je 0x3a c: 8d 76 00 lea 0x0(%esi),%esi f: 8b 43 08 mov 0x8(%ebx),%eax 12: 89 3c 24 mov %edi,(%esp) 15: c7 44 24 04 5a 0c 64 movl $0xc0640c5a,0x4(%esp) 1c: c0 1d: 83 c0 0c add $0xc,%eax 20: 89 44 24 08 mov %eax,0x8(%esp) 24: e8 66 56 06 00 call 0x6568f 29: 8b 1b mov (%ebx),%ebx 2b:* 8b 03 mov (%ebx),%eax <-- trapping instruction 2d: 0f 18 00 prefetchnta (%eax) 30: 90 nop 31: 39 de cmp %ebx,%esi 33: 75 da jne 0xf 35: b8 01 00 00 00 mov $0x1,%eax 3a: 8b b5 bc 00 00 00 mov 0xbc(%ebp),%esi Which would _seem_ to be part of this loop: /* Always include a trailing , so userspace can differentiate between this and the old multi-field proc format. */ list_for_each_entry(use, &mod->modules_which_use_me, list) { printed_something = 1; seq_printf(m, "%s,", use->module_which_uses->name); } in print_unload_info(). Does the problem go away if you disable CONFIG_MODULE_UNLOAD? Also, is the oops always in the same spot? This looks like something has seriously corrupted some module list information, which in turn sounds like a memory corruption issue (and then it's obviously very interesting if it's always just about ip6tables!)
> Does the problem go away if you disable CONFIG_MODULE_UNLOAD? Yes, it seems to go away. I found no problems in 20 reboots with CONFIG_MODULE_UNLOAD unset, while with CONFIG_MODULE_UNLOAD=y the problem occured again at the fifth boot. Maybe 50% is a bit exaggerated. > Also, is the oops always in the same spot? Yes, as far as I observed it. The modules, that are normally loaded but missing above are: af_packet, bridge, drm, ip6table_filter, ip6_tables, iptable_filter, iptable_nat, ip_tables, ipt_LOG, ipt_REJECT, llc, nf_conntrack, nf_conntrack_ipv4, nf_defrag_ipv4, nf_nat, radeon, stp, ts_bm, x_tables, xt_mac, xt_multiport, xt_state, xt_string, xt_tcpudp I attached another oops trace (I don't know, if it helps). If I can provide any other useful information, let me know. Thank you! [ 43.368473] BUG: unable to handle kernel paging request at 000cba0b [ 43.368654] IP: [<c0152d0c>] m_show+0x9c/0x1a0 [ 43.368787] *pde = 00000000 [ 43.368911] Oops: 0000 [#1] [ 43.369036] last sysfs file: /sys/block/dm-0/size [ 43.369120] Modules linked in: snd_opl3_synth snd_seq_midi_emul snd_seq_midi snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss snd_mixer_oss fuse tun asb100 hwmon_vid iptable_mangle ip_tables x_tables nf_conntrack_netbios_ns nf_conntrack_irc nf_conntrack_ftp nf_conntrack cryptoloop 8139cp tuner_simple tuner_types tuner arc4 cx8800 rt61pci rt2x00pci cx88xx rt2x00lib ir_common led_class input_polldev i2c_algo_bit snd_cmipci tveeprom mac80211 v4l2_common snd_pcm usbhid videodev v4l1_compat snd_page_alloc videobuf_dma_sg snd_opl3_lib videobuf_core eeprom_93cx6 skge snd_timer btcx_risc snd_hwdep forcedeth snd_mpu401 snd_mpu401_uart snd_rawmidi snd_seq_device i2c_nforce2 snd soundcore floppy nvidia_agp agpgart usblp [ 43.372320] [ 43.372320] Pid: 5993, comm: modprobe Not tainted (2.6.30-rc7 #1) A7N8X-E [ 43.372320] EIP: 0060:[<c0152d0c>] EFLAGS: 00010282 CPU: 0 [ 43.372320] EIP is at m_show+0x9c/0x1a0 [ 43.372320] EAX: 00000000 EBX: 000cba0b ECX: 00000000 EDX: 00000000 [ 43.372320] ESI: f819b988 EDI: f55517c0 EBP: f819b880 ESP: f5621ed4 [ 43.372320] DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068 [ 43.372320] Process modprobe (pid: 5993, ti=f5621000 task=f5630350 task.ti=f5621000) [ 43.372320] Stack: [ 43.372320] f55517c0 c05eccc5 f83a1b4c 000049a0 f819b884 6b2d2eea c055d966 6b2d2eea [ 43.372320] 00000000 f55517c0 f819b884 00000040 c01af4d6 f5621f90 000003dc b7f52024 [ 43.372320] f5628880 f55517e8 00000024 0000002e 00000000 0000002d 00000000 6b2d2eea [ 43.372320] Call Trace: [ 43.372320] [<c055d966>] ? mutex_lock+0x16/0x40 [ 43.372320] [<c01af4d6>] ? seq_read+0x236/0x3a0 [ 43.372320] [<c01af2a0>] ? seq_read+0x0/0x3a0 [ 43.372320] [<c01d2cac>] ? proc_reg_read+0x5c/0x90 [ 43.372320] [<c0194527>] ? vfs_read+0xc7/0x170 [ 43.372320] [<c01d2c50>] ? proc_reg_read+0x0/0x90 [ 43.372320] [<c0194700>] ? sys_read+0x60/0xc0 [ 43.372320] [<c010340f>] ? sysenter_do_call+0x12/0x26 [ 43.372320] Code: 31 c0 81 c6 04 01 00 00 39 f3 74 2e 8d 76 00 8b 43 08 89 3c 24 c7 44 24 04 c5 cc 5e c0 83 c0 0c 89 44 24 08 e8 b6 c3 05 00 8b 1b <8b> 03 0f 18 00 90 39 de 75 da b8 01 00 00 00 8b b5 bc 00 00 00 [ 43.372320] EIP: [<c0152d0c>] m_show+0x9c/0x1a0 SS:ESP 0068:f5621ed4 [ 43.372320] CR2: 00000000000cba0b [ 43.378955] ---[ end trace 6d4e1d01893fe12a ]---
On Tue, 26 May 2009, bugzilla-daemon@bugzilla.kernel.org wrote: > > Yes, it seems to go away. I found no problems in 20 reboots with > CONFIG_MODULE_UNLOAD unset, while with CONFIG_MODULE_UNLOAD=y the problem > occured again at the fifth boot. Maybe 50% is a bit exaggerated. Ok. The more I look, the less likely this looks like anything but a module loading bug. The fact that it is so consistent for you means that it's almost certainly not some random memory corruption by somebody else that corrupts the module lists. And if it isn't somebody else, then it's the module loading (or unloading) itself that corrupts them. And it's entirely possible that it's not unloading per se that matters: CONFIG_MODULE_UNLOAD=y doesn't just mean that we allow unloading, it also ends up causing a lot of other code to be accessed, particularly in the path you oops in. 'm_show()' is an almost trivial function if unloading isn't supported, while with CONFIG_MODULE_UNLOAD=y it does a lot more following of module pointers (which is why I asked you to test it). My personal gut feel (which I admittedly have absolutely nothing to back up with), is that it's either: - commit e91defa26c527ceeaff6266c55cdc7e17c9081a2: "module: don't use stop_machine on module load", which changes loading a lot. - fbf59bc9d74d1fb30b8e0630743aff2806eafcea: "percpu: implement new dynamic percpu allocator", since it could easily be the percpu allocations too (and in particular the unloading), and that may well explain why it happens with certain modules in particular. - or possibly it's just a timing difference (perhaps even brought on by totally unrelated changes, or the async stuff), and it goes back to earlier than 2.6.29, and was just exposed recently. You probably have something touching /proc/modules at just the same time something else happens, and it's quite possible that it's very timing-dependent and that would explain why you see it but others don't. Anyway, it really looks like a bug in handling "modules_which_use_me", and none of that code itself has really changed since 2.6.29 afaik. Which is why my primary suspect would be that locking change/synchronization thing (if this was some unrelated timing difference that wasn't about locking, I'd have expected it to have happened before - at least for somebody else). Hmm. Rusty - what protects that 'modules_which_use_me' list? It looks like it's always protected module_mutex, but we have some things like 'use_module()' that are exported - why? If somebody were to use that without holding the lock, they'd be buggy. Patrick, do you have any external modules built? I'm not seeing any in your list, but I'm double-checking.. Linus
> Patrick, do you have any external modules built? I had the Virtualbox modules "vboxdrv" and "vboxnetflt" built for 2.6.30-rc6, but none for 2.6.30-rc7.
The module_mutex is also exported: it's for the out-of-tree ksplice guys to play with module usage. That's not the cuplrit here. But I don't expect modules to be unloaded during boot. Let's make sure that is what is happening. Could you apply this by hand or see attachment, then report what's in the logs before the oops? Thanks! Rusty. diff --git a/kernel/module.c b/kernel/module.c --- a/kernel/module.c +++ b/kernel/module.c @@ -784,6 +784,8 @@ SYSCALL_DEFINE2(delete_module, const cha return -EFAULT; name[MODULE_NAME_LEN-1] = '\0'; + printk("delete_module: %s\n", name); + /* Create stop_machine threads since free_module relies on * a non-failing stop_machine call. */ ret = stop_machine_create(); @@ -2295,6 +2297,7 @@ static noinline struct module *load_modu return mod; unlink: + printk("%s: unlink\n", mod->name); /* Unlink carefully: kallsyms could be walking list. */ list_del_rcu(&mod->list); synchronize_sched(); @@ -2368,6 +2371,7 @@ SYSCALL_DEFINE3(init_module, void __user blocking_notifier_call_chain(&module_notify_list, MODULE_STATE_GOING, mod); mutex_lock(&module_mutex); + printk("Freeing failed init module %s\n", mod->name); free_module(mod); mutex_unlock(&module_mutex); wake_up(&module_wq);
Created attachment 21576 [details] Diagnostic printing patch.
On Wed, 27 May 2009, bugzilla-daemon@bugzilla.kernel.org wrote: > > But I don't expect modules to be unloaded during boot. Let's make sure that > is > what is happening. Could you apply this by hand or see attachment, then > report > what's in the logs before the oops? Rusty, look more closely. The reason CONFIG_MODULE_UNLOAD is relevant is not because modules necessarily get unloaded, it's because it enables code that traverses the "modules_which_use_me" list in "print_unload_info()". If MODULE_UNLOAD is not enabled, that code doesn't exist at all, and it seems to be that list that is not protected. It could easily be "not protected" over a concurrent module _load_, rather than an unload. So it's not really all that interesting whether an unload event happens or not. That's not necessarily (or even likely) what triggers the problem. Linus
> (In reply to comment #7) > Rusty, look more closely. > > The reason CONFIG_MODULE_UNLOAD is relevant is not because modules > necessarily get unloaded, it's because it enables code that traverses the > "modules_which_use_me" list in "print_unload_info()". > > If MODULE_UNLOAD is not enabled, that code doesn't exist at all, and it > seems to be that list that is not protected. > > It could easily be "not protected" over a concurrent module _load_, rather > than an unload. Except it is protected! Even if it weren't, I can't see a racy add likely to cause this kind of oops. Since I have no idea, I want to look harder at the corner cases; where we remove from the list because of module unload or load failure (the latter might explain why Patrick sees this and noone else does). Cheers, Rusty.
Note that in both cases we have > [ 374.120004] last sysfs file: /sys/block/dm-0/size so there may be a problem in the sysfs read-side handler for that sysfs file. (This is why I added that "last sysfs file" thing originally - the backtrace doesn't tell us which file is being read from)
> Could you apply this by hand or see attachment, then report what's in the logs before the oops? Done, but none of the three messages appears, not at a normal boot and neither, when the oops occurs.
Created attachment 21593 [details] complete dmesg output
OK, my guess is that someone is doing use-after-free or overrunning the end of their alloc and hitting our little "use" alloc. One more patch should eliminate module.c as the culprit. Hopefully won't perturb things enough to remove the bug. (BTW, this will deliberately crash if a module load fails or is removed, but your boot should be fine). Thanks, Rusty.
Created attachment 21598 [details] YA diagnostic patch; if it triggers, implies not module.c error.
Handled-By : Rusty Russell <rusty@rustcorp.com.au>
After up to 50 retries I was not able to reproduce the oops with the patch applied.
On Mon, 1 Jun 2009, bugzilla-daemon@bugzilla.kernel.org wrote: > > After up to 50 retries I was not able to reproduce the oops with the patch > applied. Grr. Either it's a subtle timing bug, and the patch just ends up hiding it, or it's a compiler bug, and the patch ends up hiding it due to different code generation. Bt your dmesg says "gcc version 4.3.2 (Gentoo 4.3.2-r3 p1.6, pie-10.1.5)", which doesn't sound bad. Neither bleeding-edge nor known-bad compiler, afaik. I have no ideas. Maybe you can delete parts of Rusty's patch until you can make it trigger again, and then we can see exactly _which_ change makes the oops no longer happen. Linus
Two additional ideas: 1) Add a BUG() at the bottom of dump_use_list(); if only the magic is corrupted it won't crash with that patch. 2) Remove the wmb() in use_module(). Does it trigger now? But basically, I'm as confused as Linus :( Rusty.
Created attachment 21793 [details] Actual used modification to module.c, see below
Hello again, after some days with very few time I experimented again a bit with 2.6.30-rc7 and modifications of module.c. In difference to the "YA diagnostic patch" there is - a BUG() at the bottom of dump_use_list() - no modification of already_uses() - a pushed up test for a bad magic in print_unload_info() (see Comment #18 above) The BUG() call in dump_use_list() triggered and dmesg contained the following. The bad value in "magic" is the same as it was each time in the "next" pointer, whose place it took (?). In normal operation there are no more than the mentioned three modules using "videobuf_core". [ 128.050410] print_unload_info: videobuf_core: bad magic in use list f8161988: next=f6a532e4 prev=f6976234 [ 128.050416] f6a532e4: magic cba0b next = f6976fe4 prev = f8161988 user=f834cb40 (cx8800) [ 128.050420] f6976fe4: magic badc0ff next = f6976234 prev = f6a532e4 user=f830ce40 (cx88xx) [ 128.050424] f6976234: magic badc0ff next = f8161988 prev = f6976fe4 user=f818a260 (videobuf_dma_sg) [ 128.050440] ------------[ cut here ]------------ [ 128.050531] kernel BUG at kernel/module.c:638! [ 128.050616] invalid opcode: 0000 [#1] [ 128.050742] last sysfs file: /sys/block/sr1/capability [ 128.050827] Modules linked in: snd_opl3_synth snd_seq_midi_emul snd_seq_midi snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq sn d_pcm_oss snd_mixer_oss fuse tun asb100 hwmon_vid iptable_mangle ip_tables x_tables nf_conntrack_netbios_ns nf_conntrack_irc nf_conntra ck_ftp nf_conntrack cryptoloop 8139cp tuner_simple tuner_types tuner arc4 cx8800 rt61pci rt2x00pci rt2x00lib cx88xx led_class ir_common input_polldev mac80211 i2c_algo_bit tveeprom snd_cmipci v4l2_common videodev usbhid snd_pcm v4l1_compat videobuf_dma_sg snd_page_alloc snd_opl3_lib videobuf_core eeprom_93cx6 btcx_risc skge snd_timer snd_hwdep snd_mpu401 forcedeth snd_mpu401_uart i2c_nforce2 snd_rawmid i snd_seq_device nvidia_agp agpgart snd soundcore floppy usblp [ 128.054065] [ 128.054147] Pid: 5964, comm: modprobe Not tainted (2.6.30-rc7bugtest #24) A7N8X-E [ 128.054266] EIP: 0060:[<c0151d5b>] EFLAGS: 00010246 CPU: 0 [ 128.054266] EIP is at m_show+0x1fb/0x250 [ 128.054266] EAX: f8161988 EBX: f8161984 ECX: f6a532e4 EDX: f8161988 [ 128.054266] ESI: f54b43c0 EDI: f8161988 EBP: f8161880 ESP: f5696ec8 [ 128.054266] DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068 [ 128.054266] Process modprobe (pid: 5964, ti=f5696000 task=f55ab1b0 task.ti=f5696000) [ 128.054266] Stack: [ 128.054266] c060a6f0 f818a26c 0badc0ff f8161988 f6976fe4 f818a260 f8161884 f816188c [ 128.054266] 36933e7f c0557006 36933e7f 00000000 f54b43c0 f8161884 00000040 c01adf56 [ 128.054266] f5696f90 000003dc b7fee024 f7353480 f54b43e8 00000024 0000002e 00000000 [ 128.054266] Call Trace: [ 128.054266] [<c0557006>] ? mutex_lock+0x16/0x40 [ 128.054266] [<c01adf56>] ? seq_read+0x236/0x3a0 [ 128.054266] [<c01add20>] ? seq_read+0x0/0x3a0 [ 128.054266] [<c01d17bc>] ? proc_reg_read+0x5c/0x90 [ 128.054266] [<c0193707>] ? vfs_read+0xc7/0x170 [ 128.054266] [<c01d1760>] ? proc_reg_read+0x0/0x90 [ 128.054266] [<c01938e0>] ? sys_read+0x60/0xc0 [ 128.054266] [<c010310f>] ? sysenter_do_call+0x12/0x26 [ 128.054266] Code: c0 89 44 24 14 89 54 24 08 c7 04 24 2c 61 5e c0 e8 db 3f 40 00 8b 85 08 01 00 00 8d 58 fc 8b 4b 04 0f 18 01 90 39 c7 89 c2 75 04 <0f> 0b eb fe 8b 43 0c 89 44 24 14 8b 43 08 89 4c 24 0c 89 44 24 [ 128.054266] EIP: [<c0151d5b>] m_show+0x1fb/0x250 SS:ESP 0068:f5696ec8 [ 128.060624] ---[ end trace fce10aafebf4c8a9 ]---
On Sun, 7 Jun 2009, bugzilla-daemon@bugzilla.kernel.org wrote: > > The BUG() call in dump_use_list() triggered and dmesg contained the > following. > The bad value in "magic" is the same as it was each time in the "next" > pointer, > whose place it took (?). Ok, that's definitely a smoking gun. Do you have CONFIG_DEBUG_PAGEALLOC and CONFIG_SLUB_DEBUG enabled? Oh, and please either do CONFIG_SLUB_DEBUG_ON=y, or boot with the "slubdebug" kernel command line (you need to have CONFIG_SLUB_DEBUG enabled for this to do anything). It does sound like something else is corrupting that first word. But that cba0b value doesn't trigger any responses in my brain. Linus
Hello, after many tests with rc7 and the released 2.6.30, the error did not occur with these settings. I'll keep trying and report any new finding... Patrick
Hello, the Oops occured at least with a 2.6.30.4. It is very strange, as the occurence varied extremely in frequency. However, I got a new computer. It is not likely, that I will be able to reproduce it again in the next time. I am sorry. Maybe the bug should be closed? Patrick
> BUG: unable to handle kernel paging request at 000??a0b > EIP is at m_show() this appears to be pretty common signature according to kerneloops.org
> [last unloaded: scsi_wait_scan]
There are at least two corruption modes: 1. BUG: unable to handle kernel paging request at 000XYa0b 2. BUG: unable to handle kernel paging request at 000XYa1f where X is c or d.
List of modules which were loaded from all traces: snd snd_page_alloc snd_pcm snd_timer soundcore
Modules which were loaded with 000??a0b _and_ 000??a1f corruptions: arc4 crc_itu_t ecb input_polldev rt2x00lib snd snd_page_alloc snd_pcm snd_timer soundcore
*** Bug 13975 has been marked as a duplicate of this bug. ***