Most recent kernel where this bug did not occur: Distribution: suse 10.1 Hardware Environment: abit KN8 sli Software Environment: Problem Description: system crashes Steps to reproduce: use the system kernel BUG at include/linux/mm.h:296! invalid opcode: 0000 [#1] Modules linked in: nls_iso8859_1 nls_cp437 vfat fat ipt_LOG xt_limit xt_pkttype af_packet edd snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device ip6t_REJECT xt_tcpudp ipt_REJECT xt_state iptable_mangle iptable_nat ip_nat iptable_filter ip6table_mangle ip_conntrack nfnetlink ip_tables ip6table_filter ip6_tables x_tables ipv6 loop dm_mod snd_intel8x0 ehci_hcd snd_ac97_codec snd_ac97_bus i2c_nforce2 snd_pcm snd_timer snd ohci_hcd soundcore usbcore i2c_core snd_page_alloc ide_cd cdrom ohci1394 ieee1394 forcedeth ext3 jbd processor sg sata_nv libata amd74xx sd_mod scsi_mod ide_disk ide_core CPU: 0 EIP: 0060:[<c013c778>] Not tainted VLI EFLAGS: 00013246 (2.6.17.11-default #1) EIP is at __free_pages+0xc/0x34 eax: 00000000 ebx: 00000000 ecx: c16af540 edx: 00000000 esi: f6d27bdc edi: f57aa000 ebp: f6d27b58 esp: f6d27b58 ds: 007b es: 007b ss: 0068 Process X (pid: 3552, threadinfo=f6d26000 task=dfafea70) Stack: f6d27b60 c013c7cd f6d27b78 c01631e3 00000014 00000001 f65d3734 00000023 f6d27e2c c0163a9f f6d27e4c dfbc226c 00000001 f6d27fa0 f6d27f4c 00000000 00000023 f6d27eb4 f6d27ed4 f6d27ef4 f6d27e54 f6d27e74 f6d27e94 00000007 Call Trace: <c0103ba3> show_stack_log_lvl+0x85/0x8f <c0103d04> show_registers+0x11f/0x18b <c0103ec1> die+0x151/0x261 <c0287c52> do_trap+0x7c/0x96 <c01045a9> do_invalid_op+0x89/0x93 <c0103693> error_code+0x4f/0x54 <c013c7cd> free_pages+0x2d/0x2f <c01631e3> poll_freewait+0x50/0x5c <c0163a9f> do_select+0x4a3/0x4c6 <c0163d8a> core_sys_select+0x2c8/0x2ed <c01642fd> sys_select+0x93/0x166 <c0102afb> sysenter_past_esp+0x54/0x75 Code: 18 89 4a 04 ff 75 e8 89 55 ec ba 01 00 00 00 e8 60 fb ff ff 53 9d 5e 8d 65 f4 5b 5e 5f 5d c3 55 89 c1 8b 40 04 89 e5 85 c0 75 08 <0f> 0b 28 01 41 6c 2a c0 ff 49 04 0f 94 c0 84 c0 74 14 85 d2 75 EIP: [<c013c778>] __free_pages+0xc/0x34 SS:ESP 0068:f6d27b58 <1>BUG: unable to handle kernel paging request at virtual address 0020020c printing eip: c014f450 *pde = 76564067 Oops: 0000 [#2] Modules linked in: nls_iso8859_1 nls_cp437 vfat fat ipt_LOG xt_limit xt_pkttype af_packet edd snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device ip6t_REJECT xt_tcpudp ipt_REJECT xt_state iptable_mangle iptable_nat ip_nat iptable_filter ip6table_mangle ip_conntrack nfnetlink ip_tables ip6table_filter ip6_tables x_tables ipv6 loop dm_mod snd_intel8x0 ehci_hcd snd_ac97_codec snd_ac97_bus i2c_nforce2 snd_pcm snd_timer snd ohci_hcd soundcore usbcore i2c_core snd_page_alloc ide_cd cdrom ohci1394 ieee1394 forcedeth ext3 jbd processor sg sata_nv libata amd74xx sd_mod scsi_mod ide_disk ide_core CPU: 0 EIP: 0060:[<c014f450>] Not tainted VLI EFLAGS: 00013246 (2.6.17.11-default #1) EIP is at cache_alloc_debugcheck_after+0xe6/0x127 eax: 80000000 ebx: f78e0d34 ecx: 00200200 edx: 00000000 esi: dffff980 edi: f78e0d34 ebp: f6d278fc esp: f6d278e8 ds: 007b es: 007b ss: 0068 Process X (pid: 3552, threadinfo=f6d26000 task=dfafea70) Stack: c0114957 00000020 dffff980 00000020 00003246 f6d27918 c015024e f88dbbe5 f88dbbe5 f7a4fd88 00000680 f7a4fd88 f6d2793c c0235195 00000000 00000020 00000724 dfff6500 c2194000 dfa262c0 0000003d f6d27958 f88dbbe5 dfa26000 Call Trace: <c0103ba3> show_stack_log_lvl+0x85/0x8f <c0103d04> show_registers+0x11f/0x18b <c0103ec1> die+0x151/0x261 <c0288a5d> do_page_fault+0x41b/0x57e <c0103693> error_code+0x4f/0x54 <c015024e> __kmalloc_track_caller+0xaa/0xb7 <c0235195> __alloc_skb+0x52/0xfa <f88dbbe5> nv_alloc_rx+0x48/0x147 [forcedeth] <f88dc134> nv_nic_irq+0x84/0x191 [forcedeth] <c013770b> handle_IRQ_event+0x27/0x52 <c01377b0> __do_IRQ+0x7a/0xce <c0104b09> do_IRQ+0x49/0x5c <c01035a6> common_interrupt+0x1a/0x20 <c0103f1a> die+0x1aa/0x261 <c0287c52> do_trap+0x7c/0x96 <c01045a9> do_invalid_op+0x89/0x93 <c0103693> error_code+0x4f/0x54 <c013c7cd> free_pages+0x2d/0x2f <c01631e3> poll_freewait+0x50/0x5c <c0163a9f> do_select+0x4a3/0x4c6 <c0163d8a> core_sys_select+0x2c8/0x2ed <c01642fd> sys_select+0x93/0x166 <c0102afb> sysenter_past_esp+0x54/0x75 Code: ff ff c7 00 a5 c2 0f 17 8d 97 00 00 00 40 c1 ea 0c c1 e2 05 03 15 50 65 3e c0 8b 02 f6 c4 40 74 03 8b 52 0c 8b 4a 1c 89 fb 31 d2 <2b> 59 0c 89 d8 f7 76 10 c7 44 81 1c fd ff ff ff 8b 5e 3c 03 be EIP: [<c014f450>] cache_alloc_debugcheck_after+0xe6/0x127 SS:ESP 0068:f6d278e8 <0>Kernel panic - not syncing: Fatal exception in interrupt
root:~>/usr/src/linux-2.6.17.11/scripts/ver_linux If some fields are empty or look unusual you may have an old version. Compare to the current minimal requirements in Documentation/Changes. Linux c-69-137-115-121 2.6.17.11-default #1 Mon Aug 28 01:21:52 CDT 2006 i686 athlon i386 GNU/Linux Gnu C 4.1.0 Gnu make 3.80 binutils 2.16.91.0.5 util-linux 2.12r mount 2.12r module-init-tools 3.2.2 e2fsprogs 1.38 jfsutils 1.1.10 reiserfsprogs 3.6.19 xfsprogs 2.7.11 quota-tools 3.13. PPP 2.4.3 isdn4k-utils 3.9 nfs-utils 1.0.7 Linux C Library > libc.2.4 Dynamic linker (ldd) 2.4 Linux C++ Library 6.0.8 Procps 3.2.6 Net-tools 1.60 Kbd 1.12 Sh-utils 5.93 udev 085 Modules Loaded nls_iso8859_1 nls_cp437 vfat fat ipt_LOG xt_limit xt_pkttype af_packet edd snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device ip6t_REJECT xt_tcpudp ipt_REJECT xt_state iptable_mangle iptable_nat ip_nat iptable_filter ip6table_mangle ip_conntrack nfnetlink ip_tables ip6table_filter ip6_tables x_tables ipv6 loop dm_mod snd_intel8x0 snd_ac97_codec snd_ac97_bus snd_pcm snd_timer snd ehci_hcd soundcore snd_page_alloc ohci_hcd usbcore i2c_nforce2 i2c_core ide_cd cdrom ohci1394 ieee1394 forcedeth ext3 jbd processor sg sata_nv libata amd74xx sd_mod scsi_mod ide_disk ide_core root:~>
root:~>cat /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 15 model : 43 model name : AMD Athlon(tm) 64 X2 Dual Core Processor 3800+ stepping : 1 cpu MHz : 2009.490 cache size : 512 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 3dnow up pni lahf_lm cmp_legacy ts fid vid ttp bogomips : 4022.72
root:~>cat /proc/ioports 0000-001f : dma1 0020-0021 : pic1 0040-0043 : timer0 0050-0053 : timer1 0060-006f : keyboard 0070-0077 : rtc 0080-008f : dma page reg 00a0-00a1 : pic2 00c0-00df : dma2 00f0-00ff : fpu 0170-0177 : ide1 01f0-01f7 : ide0 0376-0376 : ide1 03c0-03df : vga+ 03f6-03f6 : ide0 0970-0977 : 0000:00:07.0 0970-0977 : sata_nv 09f0-09f7 : 0000:00:07.0 09f0-09f7 : sata_nv 0b70-0b73 : 0000:00:07.0 0b70-0b73 : sata_nv 0bf0-0bf3 : 0000:00:07.0 0bf0-0bf3 : sata_nv 1c00-1c3f : 0000:00:01.1 1c00-1c07 : nForce2_smbus 1c40-1c7f : 0000:00:01.1 1c40-1c47 : nForce2_smbus 7000-7fff : PCI Bus #05 8000-8fff : PCI Bus #04 9000-9fff : PCI Bus #03 a000-afff : PCI Bus #02 b000-bfff : PCI Bus #01 b400-b407 : 0000:01:0a.0 b800-b807 : 0000:01:0a.0 bc00-bc1f : 0000:01:0a.0 bc00-bc07 : serial bc08-bc0f : serial c800-c807 : 0000:00:0a.0 c800-c807 : forcedeth cc00-cc0f : 0000:00:07.0 cc00-cc0f : sata_nv e000-e00f : 0000:00:06.0 e000-e007 : ide0 e008-e00f : ide1 ec00-ecff : 0000:00:04.0 ec00-ecff : NVidia CK804 f000-f0ff : 0000:00:04.0 f000-f0ff : NVidia CK804 fc00-fc1f : 0000:00:01.1 root:~>cat /proc/iomem 00000000-0009efff : System RAM 00000000-00000000 : Crash kernel 0009f000-0009ffff : reserved 000a0000-000bffff : Video RAM area 000c0000-000cedff : Video ROM 000f0000-000fffff : System ROM 00100000-7fedffff : System RAM 00100000-00289fbf : Kernel code 00289fc0-0037cd6f : Kernel data 7fee0000-7fee2fff : ACPI Non-volatile Storage 7fee3000-7feeffff : ACPI Tables 7fef0000-7fefffff : reserved d0000000-dfffffff : PCI Bus #05 d0000000-dfffffff : 0000:05:00.0 e0000000-efffffff : reserved fa000000-fcffffff : PCI Bus #05 fa000000-faffffff : 0000:05:00.0 fb000000-fbffffff : 0000:05:00.0 fc000000-fc01ffff : 0000:05:00.0 fd800000-fd8fffff : PCI Bus #04 fd900000-fd9fffff : PCI Bus #04 fda00000-fdafffff : PCI Bus #03 fdb00000-fdbfffff : PCI Bus #03 fdc00000-fdcfffff : PCI Bus #02 fdd00000-fddfffff : PCI Bus #02 fde00000-fdefffff : PCI Bus #01 fdef8000-fdefbfff : 0000:01:06.0 fdeff000-fdeff7ff : 0000:01:06.0 fdeff000-fdeff7ff : ohci1394 fdf00000-fdffffff : PCI Bus #01 fe02a000-fe02afff : 0000:00:0a.0 fe02a000-fe02afff : forcedeth fe02b000-fe02bfff : 0000:00:07.0 fe02b000-fe02bfff : sata_nv fe02d000-fe02dfff : 0000:00:04.0 fe02d000-fe02dfff : NVidia CK804 fe02f000-fe02ffff : 0000:00:02.0 fe02f000-fe02ffff : ohci_hcd feb00000-feb000ff : 0000:00:02.1 feb00000-feb000ff : ehci_hcd fec00000-ffffffff : reserved root:~>
Created attachment 8887 [details] System map
Created attachment 8888 [details] lspci -vvv
root:~>cat /proc/scsi/scsi Attached devices: Host: scsi0 Channel: 00 Id: 00 Lun: 00 Vendor: ATA Model: ST3160812AS Rev: 3.AA Type: Direct-Access ANSI SCSI revision: 05 Host: scsi1 Channel: 00 Id: 00 Lun: 00 Vendor: ATA Model: ST3160812AS Rev: 3.AA Type: Direct-Access ANSI SCSI revision: 05 root:~>
Begin forwarded message: Date: Mon, 28 Aug 2006 12:59:12 -0700 (PDT) From: Linus Torvalds <torvalds@osdl.org> To: Andrew Morton <akpm@osdl.org> Subject: Re: Fw: [Bug 7066] New: __free_pages+0xc/0x34 On Mon, 28 Aug 2006, Andrew Morton wrote: > > This is another in the ongoing once-a-month dribble of weird crashes coming > out of the poll() code. This time it looks like a free of a freed page. > > There's enough noise coming out of that area to make me think we have some > subtle bug in there. Or maybe it's just that random driver > lifetime-management bugs will sometimes manifest in a way which makes it > look like the poll() code is to blame, dunno. I really don't think the poll/select code is to blame - all the memory management there is really really simple, and it's all totally thread- private, so there aren't even any race conditions or locking issues. The "memory management" is a poll_wqueues entry allocated on the thread kernel stack, and a simple linked list of pages. I suspect things show up in the poll() code not because the poll code itself is buggy, but because when you run X, poll() is one of the most common system calls, and of the common system calls it's the only one that does any real amount of page allocation (ie the other ones are gettimeofday() and things like sendmsg()/recvmsg(), and the former one obviously doesn't do any allocations at all, and the latter ones hide all their allocations through the slab layer). In other words, if something corrupts the free-page list, poll() just ends up being the most likely thing to ever notice. The part that may give more of a clue is the second oops, which gets a page fault on this part: #ifdef CONFIG_DEBUG_SLAB_LEAK { struct slab *slabp; unsigned objnr; slabp = page_get_slab(virt_to_page(objp)); objnr = (unsigned)(objp - slabp->s_mem) / cachep->buffer_size; slab_bufctl(slabp)[objnr] = BUFCTL_ACTIVE; } #endif where if I read the oops right, it's the "objp - slabp->s_mem" that crashes because "slabp" is 0x00200200, ie the list poisoning thing. But slabp was gotten through "page_get_slab()", which in turn is page->lru.prev. And notice how we did NOT trigger the BUG_ON(!PageSlab(page)); in page_get_slab(). That looks strange and interesting. Somebody seems to have done a list_del(&page->lru.prev) on a page that is still marked as being a slab page. And yeah, I bet it's related to the first oops somehow, but the thing is, that code-sequence should have _nothing_ to do with slab pages. And besides, if that sequence does a list_del(), I'd have expected it to do so _after_ it does the put_page_testzero() anyway (but I didn't go check). Linus
Don, can you confirm whether you can reproduce this bug at will or not? Has it happened only once, or can you basically depend on it happening under certain loads? Linus
I had been having problems with earlier kernel versions and which were tainted. I decided to try at that time the latest kernel version linux-2.6.17.1 which was tainted. I got more crashes on tainted linux-2.6.17.1 than I did on tainted linux-2.6.16.21-0.13. I have not been using untainted linux-2.6.17.11 very long. I just compiled it this morning. It seems to be more stable than tainted linux-2.6.17.1 so far. I had problems with clock not keeping the correct time in smp tainted versions of suse linux-2.6.16.21-0.13. So I switched to an default non smp config. With this untainted kernel that I have submitted my bug report on, my screensaver does not change. Today I did a bios upgrade for my motherboard and that might fix some problems. I am jsut waiting for the next crash on this kernel. I have a serial cable to another linux box and it will hopefully capture it. If it is like tainted linux-2.6.17.1, I shouldn't have to wait very long.
Created attachment 8891 [details] my .config file
It has been over 12 hours and this hasn't crashed. It seems to be more stable. I suspect that those smp changes made in earlier versions may have caused that. I am expecting lightning storms and need to shut down my computer. I will see if I can crash this system using my kmid suse bug like the other kernels. That may be an audio driver issue though. If I compile a smp version and it crashes, would that apply to this bug report or should I submit a separate one?
Yes I can crash it with kmid. I am not sure if it is related to this bug though. Here is what I captured: BUG: spinlock recursion on CPU#0, kmid/10249 lock: c0342198, .magic: dead4ead, .owner: kmid/10249, .owner_cpu: 0 BUG: spinlock lockup on CPU#0, kmid/10249, c0342198 †‚??!•?
I am back from vacation. I have been running on this kernel for over 3 days now and it hasn't crashed. I will try to load it more or try a smp version on this dual core processor.
Created attachment 8979 [details] 2 new crashes in a log I was just reading an email in thunderbird when it crashed. I was interacting with the mail window.
On Sat, 9 Sep 2006, bugme-daemon@bugzilla.kernel.org wrote: > > I was just reading an email in thunderbird when it crashed. I was interacting > with the mail window. Looks like the "sock->ops->sendmsg()" function pointer was corrupted (with a value of something like 0xa951937b, which doesn't look like any obvious string or other poisoning value, so there's no hint there). The "sock->ops" values should normally be pointers to a compile-time static structure, so I suspect the "sock" structure itself got corrupted somehow. Which points to more slab corruption. Which makes this very hard to debug, since the corruption could have happened at any point. Don, do you have slab debugging turned on in your tree? If you don't, turning it on might help us. It's "CONFIG_DEBUG_SLAB". Linus
I have built an smp version. I chose these options among others: CONFIG_FUTEX=y CONFIG_EPOLL=y CONFIG_SHMEM=y CONFIG_SLAB=y # CONFIG_TINY_SHMEM is not set CONFIG_BASE_SMALL=0 # CONFIG_SLOB is not set CONFIG_OBSOLETE_INTERMODULE=m If you would rather me use a non-smp version let me know. I have had more crashes though with smp kernels. Others with my same motherboard have said that they do not have the problems that I have had. Could the problem be my add in cards or the drivers which others may not have?
I have built and am using a smp version. It hasn't crashed yet. However my clock on the kde panel is running fast.
I had attached my kernel config file and I see that for the crash reports the slab debug option was set.
Created attachment 9028 [details] strace of mount command Not sure if this is related, but I cannot mount my other file system as a ext3 type. When I issue the command it either locks up the terminal and does not respond or segmentation faults. I have attached an strace of where it just locks up. If I issue the same command as "ext2", it works.
This crashed using smp kernel. I was running the following command "cp -R --preserve -f -x -t /mnt /" I had a same size partition mounted on /mnt and was backing up my current partition. I will attach the boot up log and the information below. I will then attach my system.map for the snmp kernel. CPU: 1 EIP: 0060:[<c02974ba>] Not tainted VLI EFLAGS: 00010086 (2.6.17.11-smp #1) EIP is at schedule+0x88a/0xa9e eax: c241ce00 ebx: 00000001 ecx: 39393909 edx: c241ce00 esi: f68de9f0 edi: 00000000 ebp: d9777da0 esp: d9777d38 ds: 007b es: 007b ss: 0068 Process pdflush (pid: 3966, threadinfo=d9776000 task=dfe13560) Stack: e5592810 dfb16420 9775360d 00000190 d9777d5c 0000000a dfe13668 dfe13560 df41da90 c241d760 97942cf2 00000190 001ef6e5 00000001 00000000 00000002 f68c0bc4 0000000c 00000001 00000000 dfab0d74 00000000 d9777d98 c241d760 Call Trace: <c0104d2a> show_stack_log_lvl+0x85/0x8f <c0104ea7> show_registers+0x13b/0x1af <c0105090> die+0x175/0x285 <c029a248> do_page_fault+0x428/0x58e <c010485b> error_code+0x4f/0x54 <c0297884> io_schedule+0x26/0x30 <c0161d97> sync_buffer+0x33/0x37 <c0297fa0> __wait_on_bit+0x36/0x5d <c0298030> out_of_line_wait_on_bit+0x69/0x71 <c0161cef> __wait_on_buffer+0x1f/0x25 <c0162df5> __bread+0x92/0xa8 <c019b931> ext2_get_inode+0x94/0xef <c019b9c5> ext2_update_inode+0x39/0x2b5 <c019bc49> ext2_write_inode+0x8/0xa <c017df89> __writeback_single_inode+0x1ba/0x310 <c017e367> sync_sb_inodes+0x1a2/0x268 <c017e8a6> writeback_inodes+0x8e/0xe0 <c0149111> background_writeout+0x5e/0x87 <c0149763> pdflush+0xf2/0x189 <c0130520> kthread+0xa3/0xd0 <c0102005> kernel_thread_helper+0x5/0xb Code: 8b 58 10 b8 80 28 3d c0 74 7f f0 0f b3 9e 78 01 00 00 89 c2 03 14 9d 80 ef 39 c0 c7 42 04 01 00 00 00 03 04 9d 80 ef 39 c0 89 08 <f0> 0f ab 99 78 01 00 00 8b 41 24 05 00 00 00 40 0f 22 d8 8b 81 EIP: [<c02974ba>] schedule+0x88a/0xa9e SS:ESP 0068:d9777d38 <0>BUG: spinlock recursion on CPU#1, pdflush/3966 BUG: spinlock lockup on CPU#0, swapper/0, c241d760 BUG: spinlock lockup on CPU#1, pdflush/3966, c0346ec0 BUG: spinlock lockup on CPU#0, swapper/0, c0346ec0 BUG: spinlock lockup on CPU#1, pdflush/3966, c0346ec0 BUG: spinlock lockup on CPU#0, swapper/0, c0346ec0 BUG: spinlock lockup on CPU#1, pdflush/3966, c0346ec0 BUG: spinlock lockup on CPU#0, swapper/0, c0346ec0 BUG: spinlock lockup on CPU#1, pdflush/3966, c0346ec0 BUG: spinlock lockup on CPU#0, swapper/0, c0346ec0 BUG: spinlock lockup on CPU#1, pdflush/3966, c0346ec0 BUG: spinlock lockup on CPU#0, swapper/0, c0346ec0 BUG: spinlock lockup on CPU#1, pdflush/3966, c0346ec0 BUG: spinlock lockup on CPU#0, swapper/0, c0346ec0 BUG: spinlock lockup on CPU#1, pdflush/3966, c0346ec0 BUG: spinlock lockup on CPU#0, swapper/0, c0346ec0 BUG: spinlock lockup on CPU#1, pdflush/3966, c0346ec0 BUG: spinlock lockup on CPU#0, swapper/0, c0346ec0 BUG: spinlock lockup on CPU#1, pdflush/3966, c0346ec0 BUG: spinlock lockup on CPU#0, swapper/0, c0346ec0 BUG: spinlock lockup on CPU#1, pdflush/3966, c0346ec0
Created attachment 9033 [details] bootup and log of the crash
Created attachment 9034 [details] system map for the smp kernel compile
Created attachment 9035 [details] my smp .config file
On Sun, 17 Sep 2006, bugme-daemon@bugzilla.kernel.org wrote: > > EIP: 0060:[<c02974ba>] Not tainted VLI > EIP is at schedule+0x88a/0xa9e > eax: c241ce00 ebx: 00000001 ecx: 39393909 edx: c241ce00 > esi: f68de9f0 edi: 00000000 ebp: d9777da0 esp: d9777d38 > Code: 8b 58 10 b8 80 28 3d c0 74 7f f0 0f b3 9e 78 01 00 00 89 c2 03 14 9d 80 ef > 39 c0 c7 42 04 01 00 00 00 03 04 9d 80 ef 39 c0 89 08 <f0> 0f ab 99 78 01 00 00 > 8b 41 24 05 00 00 00 40 0f 22 d8 8b 81 That would be at the instruction sequence lock bts %ebx,%ds:0x178(%ecx) mov 0x24(%ecx),%eax add $0x40000000,%eax mov %eax,%cr3 which is the TLB invalidate, where "ecx" is 39393909, which looks like the string "\t999". It _should_ be this code (from switch_mm() in asm-i386/mmu_context.h): cpu_set(cpu, next->cpu_vm_mask); /* Re-load page tables */ load_cr3(next->pgd); so it's "next->cpu_vm_mask" that is corrupt. Linus
I crashed this smp kernel again. I just ran the command dd if=/dev/sda2 of=/dev/sdb2. Here is what happened. BUG: spinlock lockup on CPU#0, klogd/2530, c241d760 I was planning on trying the latest kernel but needed to run this command first. My main partition is almost full.
I ran this same dd command with 2.6.18 and it did not crash.
2.6.18 still has a bug with kmid. this may be a different bug and may be kmids fault ======================================================= [ INFO: possible circular locking dependency detected ] ------------------------------------------------------- kmid/4918 is trying to acquire lock: (&timer->lock){++..}, at: [<f89cee75>] snd_timer_interrupt+0x21/0x241 [snd_timer] but task is already holding lock: (rtc_task_lock){+...}, at: [<c02275f6>] rtc_interrupt+0x96/0xe3 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (rtc_task_lock){+...}: [<c0136cde>] lock_acquire+0x60/0x80 [<c02b07cf>] _spin_lock_irqsave+0x22/0x32 [<c0227175>] rtc_control+0x2e/0x66 [<f8b1c087>] rtctimer_start+0x45/0x5a [snd_rtctimer] [<f89ce0e7>] snd_timer_start1+0x67/0x78 [snd_timer] [<f89cec7f>] snd_timer_start+0x54/0x82 [snd_timer] [<f8ad0d2c>] snd_seq_timer_start+0x33/0x4a [snd_seq] [<f8acfcdb>] snd_seq_control_queue+0xb6/0x13e [snd_seq] [<f8ad10d2>] event_input_timer+0xe/0x10 [snd_seq] [<f8acce4e>] snd_seq_deliver_single_event+0xdd/0x1cc [snd_seq] [<f8acd0b3>] snd_seq_deliver_event+0x176/0x184 [snd_seq] [<f8acd4e9>] snd_seq_dispatch_event+0x10f/0x127 [snd_seq] [<f8acf6ef>] snd_seq_check_queue+0x9d/0xd6 [snd_seq] [<f8acf9b0>] snd_seq_enqueue_event+0xce/0xdf [snd_seq] [<f8acd173>] snd_seq_client_enqueue_event+0xb2/0xdc [snd_seq] [<f8ace80e>] snd_seq_write+0x129/0x16c [snd_seq] [<c01694d3>] vfs_write+0xab/0x157 [<c0169b16>] sys_write+0x3b/0x60 [<c0103db9>] sysenter_past_esp+0x56/0x8d -> #0 (&timer->lock){++..}: [<c0136cde>] lock_acquire+0x60/0x80 [<c02b07cf>] _spin_lock_irqsave+0x22/0x32 [<f89cee75>] snd_timer_interrupt+0x21/0x241 [snd_timer] [<f8b1c0f0>] rtctimer_interrupt+0xd/0x11 [snd_rtctimer] [<c0227605>] rtc_interrupt+0xa5/0xe3 [<c014b428>] handle_IRQ_event+0x20/0x4d [<c014b4e9>] __do_IRQ+0x94/0xef [<c010627d>] do_IRQ+0x71/0x84 [<c01048d9>] common_interrupt+0x25/0x2c other info that might help us debug this: 1 lock held by kmid/4918: #0: (rtc_task_lock){+...}, at: [<c02275f6>] rtc_interrupt+0x96/0xe3 stack backtrace: [<c0104eda>] show_trace_log_lvl+0x58/0x16a [<c01054d7>] show_trace+0xd/0x10 [<c01055f6>] dump_stack+0x19/0x1b [<c013604c>] print_circular_bug_tail+0x59/0x64 [<c0136882>] __lock_acquire+0x82b/0x9b6 [<c0136cde>] lock_acquire+0x60/0x80 [<c02b07cf>] _spin_lock_irqsave+0x22/0x32 [<f89cee75>] snd_timer_interrupt+0x21/0x241 [snd_timer] [<f8b1c0f0>] rtctimer_interrupt+0xd/0x11 [snd_rtctimer] [<c0227605>] rtc_interrupt+0xa5/0xe3 [<c014b428>] handle_IRQ_event+0x20/0x4d [<c014b4e9>] __do_IRQ+0x94/0xef [<c010627d>] do_IRQ+0x71/0x84 [<c01048d9>] common_interrupt+0x25/0x2c DWARF2 unwinder stuck at common_interrupt+0x25/0x2c Leftover inexact backtrace: [<c01054d7>] show_trace+0xd/0x10 [<c01055f6>] dump_stack+0x19/0x1b [<c013604c>] print_circular_bug_tail+0x59/0x64 [<c0136882>] __lock_acquire+0x82b/0x9b6 [<c0136cde>] lock_acquire+0x60/0x80 [<c02b07cf>] _spin_lock_irqsave+0x22/0x32 [<f89cee75>] snd_timer_interrupt+0x21/0x241 [snd_timer] [<f8b1c0f0>] rtctimer_interrupt+0xd/0x11 [snd_rtctimer] [<c0227605>] rtc_interrupt+0xa5/0xe3 [<c014b428>] handle_IRQ_event+0x20/0x4d [<c014b4e9>] __do_IRQ+0x94/0xef [<c010627d>] do_IRQ+0x71/0x84 [<c01048d9>] common_interrupt+0x25/0x2c BUG: spinlock lockup on CPU#0, kmid/4947, c0321338 [<c0104eda>] show_trace_log_lvl+0x58/0x16a [<c01054d7>] show_trace+0xd/0x10 [<c01055f6>] dump_stack+0x19/0x1b [<c01d82d4>] _raw_spin_lock+0xc2/0xe8 [<c02b07d6>] _spin_lock_irqsave+0x29/0x32 [<c0227175>] rtc_control+0x2e/0x66 [<f8b1c03e>] rtctimer_stop+0x3e/0x42 [snd_rtctimer] [<f89ceb71>] _snd_timer_stop+0xd0/0x137 [snd_timer] [<f89cebe7>] snd_timer_pause+0xf/0x11 [snd_timer] [<f8ad0a19>] snd_seq_timer_stop+0x20/0x26 [snd_seq] [<f8ad0d0f>] snd_seq_timer_start+0x16/0x4a [snd_seq] [<f8acfcdb>] snd_seq_control_queue+0xb6/0x13e [snd_seq] [<f8ad10d2>] event_input_timer+0xe/0x10 [snd_seq] [<f8acce4e>] snd_seq_deliver_single_event+0xdd/0x1cc [snd_seq] [<f8acd0b3>] snd_seq_deliver_event+0x176/0x184 [snd_seq] [<f8acd119>] snd_seq_client_enqueue_event+0x58/0xdc [snd_seq] [<f8ace80e>] snd_seq_write+0x129/0x16c [snd_seq] [<c01694d3>] vfs_write+0xab/0x157 [<c0169b16>] sys_write+0x3b/0x60 [<c0103db9>] sysenter_past_esp+0x56/0x8d DWARF2 unwinder stuck at sysenter_past_esp+0x56/0x8d Leftover inexact backtrace: [<c01054d7>] show_trace+0xd/0x10 [<c01055f6>] dump_stack+0x19/0x1b [<c01d82d4>] _raw_spin_lock+0xc2/0xe8 [<c02b07d6>] _spin_lock_irqsave+0x29/0x32 [<c0227175>] rtc_control+0x2e/0x66 [<f8b1c03e>] rtctimer_stop+0x3e/0x42 [snd_rtctimer] [<f89ceb71>] _snd_timer_stop+0xd0/0x137 [snd_timer] [<f89cebe7>] snd_timer_pause+0xf/0x11 [snd_timer] [<f8ad0a19>] snd_seq_timer_stop+0x20/0x26 [snd_seq] [<f8ad0d0f>] snd_seq_timer_start+0x16/0x4a [snd_seq] [<f8acfcdb>] snd_seq_control_queue+0xb6/0x13e [snd_seq] [<f8ad10d2>] event_input_timer+0xe/0x10 [snd_seq] [<f8acce4e>] snd_seq_deliver_single_event+0xdd/0x1cc [snd_seq] [<f8acd0b3>] snd_seq_deliver_event+0x176/0x184 [snd_seq] [<f8acd119>] snd_seq_client_enqueue_event+0x58/0xdc [snd_seq] [<f8ace80e>] snd_seq_write+0x129/0x16c [snd_seq] [<c01694d3>] vfs_write+0xab/0x157 [<c0169b16>] sys_write+0x3b/0x60 [<c0103db9>] sysenter_past_esp+0x56/0x8d BUG: spinlock lockup on CPU#1, kmid/4918, dfbb263c [<c0104eda>] show_trace_log_lvl+0x58/0x16a [<c01054d7>] show_trace+0xd/0x10 [<c01055f6>] dump_stack+0x19/0x1b [<c01d82d4>] _raw_spin_lock+0xc2/0xe8 [<c02b07d6>] _spin_lock_irqsave+0x29/0x32 [<f89cee75>] snd_timer_interrupt+0x21/0x241 [snd_timer] [<f8b1c0f0>] rtctimer_interrupt+0xd/0x11 [snd_rtctimer] [<c0227605>] rtc_interrupt+0xa5/0xe3 [<c014b428>] handle_IRQ_event+0x20/0x4d [<c014b4e9>] __do_IRQ+0x94/0xef [<c010627d>] do_IRQ+0x71/0x84 [<c01048d9>] common_interrupt+0x25/0x2c DWARF2 unwinder stuck at common_interrupt+0x25/0x2c Leftover inexact backtrace: [<c01054d7>] show_trace+0xd/0x10 [<c01055f6>] dump_stack+0x19/0x1b [<c01d82d4>] _raw_spin_lock+0xc2/0xe8 [<c02b07d6>] _spin_lock_irqsave+0x29/0x32 [<f89cee75>] snd_timer_interrupt+0x21/0x241 [snd_timer] [<f8b1c0f0>] rtctimer_interrupt+0xd/0x11 [snd_rtctimer] [<c0227605>] rtc_interrupt+0xa5/0xe3 [<c014b428>] handle_IRQ_event+0x20/0x4d [<c014b4e9>] __do_IRQ+0x94/0xef [<c010627d>] do_IRQ+0x71/0x84 [<c01048d9>] common_interrupt+0x25/0x2c
A simple workaround for lockdep of rtctimer is to disable CONFIG_SND_RTCTIMER, or pass seq_default_timer_device=0 to snd-seq module. This will use a system timer instead of RTC. But, of course, this AB<->BA lock must be fixed. I'll check it now...
A simple solution is to remove the superfluous spinlock in rtc.c. Since snd-rtctimer is the only user of this rtc in-kernel-control stuff and it has own lock from the caller side (this triggered lockdep BUG), we can simply remove rtc_task_lock in rtc.c. I'll attach the patch. Don, could you check whether kmid works with it?
Created attachment 9126 [details] Remove superfluous lock from rtc.c
[ Peter Zijlistra added to Cc just because he's touched that area too lately. Peter, you probably don't care, but it's a independent problem to the one that has been getting tracked throught http://bugzilla.kernel.org/show_bug.cgi?id=7066 and I thought I'd at least mention it to you ] On Fri, 29 Sep 2006, Takashi Iwai wrote: > > A simple solution is to remove the superfluous spinlock in rtc.c. > Since snd-rtctimer is the only user of this rtc in-kernel-control stuff and it > has own lock from the caller side (this triggered lockdep BUG), we can simply > remove rtc_task_lock in rtc.c. > > I'll attach the patch. Don, could you check whether kmid works with it? I worry about this a bit - it sounds to me that the lock _should_ be in the rtc layer, and maybe it's the other lock that should be removed? But you're the one who added that RTC lock in the first place, so ... (Well, it's attributed to Jaroslav, but in the log-message he says it came from you - this was before sign-offs, and before git). Who else really has worked on this thing? It seems a bit broken that there's this interface in the RTC code that is apparently only used by the sound subsystem, and seems to be badly designed anyway (and where you can apparently even disable the use of it). This all makes me go "Hmm, that can't be very good.." Linus
We can unlock temporarily in timer->lock in sound/core/rtctimer.c, too. But the double-lock of rtc_task_lock might still happen due to a code path like rtc_interrupt -> sound -> rtc_control(). Your concern is right. There is one little non-safe case by the complete removal of rtc_task_lock. We have no protection for rtc_callback instance being used in rtc_interrupt(). So, a safer option is to remove the lock in rtc_control(). This is just for sanity-check and plays no real role. This should suffice for avoiding the AB/BA lock. The new patch is attached again. And, yes, the interface design is my bad, indeed. It was written to be generic, but so far the sound is the only user. (RTC was the only usable major fine-timer-source on PC at that time :) It can get rid when we use more finer timer-subsystem...
Created attachment 9127 [details] Fix ABBA deadlock in rtc and sound timer
Created attachment 9129 [details] crash dump This generates debug information with the patch but does not lock up the computer. I tried and could not lock it up by moving the left volume bar up and down. I don't hear any sound even after playing with a mixer and an alsa mixer. I can play mp3s with mpg321.
Created attachment 9130 [details] system map for the patched kernel
Since I have tried his patch my KDE clock is keeping the correct time. Before it would not keep the correct time even with ntp. I can also set my keyboard to repeat keys now. Before I had to turn that off.
Created attachment 9147 [details] another crash- syslog Here is data from another crash. I found out my console log on the other computer was not turned on so this is from syslog on the crashed machine.
Oct 9 19:19:56 c-69-137-114-21 kernel: ============================================= Oct 9 19:19:56 c-69-137-114-21 kernel: [ INFO: possible recursive locking detected ] Oct 9 19:19:56 c-69-137-114-21 kernel: --------------------------------------------- Oct 9 19:19:56 c-69-137-114-21 kernel: java_vm/11309 is trying to acquire lock: Oct 9 19:19:56 c-69-137-114-21 kernel: (slock-AF_INET6){-+..}, at: [<c0257c3f>] sk_clone+0xb6/0x27e Oct 9 19:19:56 c-69-137-114-21 kernel: Oct 9 19:19:56 c-69-137-114-21 kernel: but task is already holding lock: Oct 9 19:19:56 c-69-137-114-21 kernel: (slock-AF_INET6){-+..}, at: [<f8f00d7d>] tcp_v6_rcv+0x318/0x717 [ipv6] Oct 9 19:19:56 c-69-137-114-21 kernel: Oct 9 19:19:56 c-69-137-114-21 kernel: other info that might help us debug this: Oct 9 19:19:56 c-69-137-114-21 kernel: 1 lock held by java_vm/11309: Oct 9 19:19:56 c-69-137-114-21 kernel: #0: (slock-AF_INET6){-+..}, at: [<f8f00d7d>] tcp_v6_rcv+0x318/0x717 [ipv6] Oct 9 19:19:56 c-69-137-114-21 kernel: ct 9 19:19:56 c-69-137-114-21 kernel: Oct 9 19:19:56 c-69-137-114-21 kernel: stack backtrace: Oct 9 19:19:56 c-69-137-114-21 kernel: [<c0104eda>] show_trace_log_lvl+0x58/0x16a Oct 9 19:19:56 c-69-137-114-21 kernel: [<c01054d7>] show_trace+0xd/0x10 Oct 9 19:19:56 c-69-137-114-21 kernel: [<c01055f6>] dump_stack+0x19/0x1b Oct 9 19:19:56 c-69-137-114-21 kernel: [<c01367ea>] __lock_acquire+0x793/0x9b6 Oct 9 19:19:56 c-69-137-114-21 kernel: [<c0136cde>] lock_acquire+0x60/0x80 Oct 9 19:19:56 c-69-137-114-21 kernel: [<c02b0473>] _spin_lock+0x19/0x28 Oct 9 19:19:56 c-69-137-114-21 kernel: [<c0257c3f>] sk_clone+0xb6/0x27e Oct 9 19:19:56 c-69-137-114-21 kernel: [<c027ef35>] inet_csk_clone+0xd/0x5e Oct 9 19:19:56 c-69-137-114-21 kernel: [<c028f7fd>] tcp_create_openreq_child+0x1b/0x382 Oct 9 19:19:56 c-69-137-114-21 kernel: [<f8eff9fa>] tcp_v6_syn_recv_sock+0x248/0x575 [ipv6] Oct 9 19:19:56 c-69-137-114-21 kernel: [<c028fd34>] tcp_check_req+0x1d0/0x2e4 Oct 9 19:19:56 c-69-137-114-21 kernel: [<f8efed23>] tcp_v6_do_rcv+0x142/0x356 [ipv6] Oct 9 19:19:56 c-69-137-114-21 kernel: [<f8f0112a>] tcp_v6_rcv+0x6c5/0x717 [ipv6] Oct 9 19:19:56 c-69-137-114-21 kernel: [<f8ee7a03>] ip6_input+0x1c3/0x296 [ipv6] Oct 9 19:19:56 c-69-137-114-21 kernel: [<f8ee7f73>] ipv6_rcv+0x1d2/0x21f [ipv6] Oct 9 19:19:56 c-69-137-114-21 kernel: [<c025dbd8>] netif_receive_skb+0x2c6/0x34a Oct 9 19:19:56 c-69-137-114-21 kernel: [<c025f572>] process_backlog+0x99/0x114 Oct 9 19:19:56 c-69-137-114-21 kernel: [<c025f76f>] net_rx_action+0x9d/0x162 Oct 9 19:19:56 c-69-137-114-21 kernel: [<c01245e0>] __do_softirq+0x78/0xf2 Oct 9 19:19:56 c-69-137-114-21 kernel: [<c0124698>] do_softirq+0x3e/0x56 Oct 9 19:19:56 c-69-137-114-21 kernel: [<c012494a>] local_bh_enable_ip+0xa3/0xc9 Oct 9 19:19:56 c-69-137-114-21 kernel: [<c02b0411>] _spin_unlock_bh+0x25/0x28 Oct 9 19:19:56 c-69-137-114-21 kernel: [<c02569fb>] release_sock+0xb0/0xb8 Oct 9 19:19:56 c-69-137-114-21 kernel: [<c0297dd8>] inet_stream_connect+0x129/0x21c Oct 9 19:19:56 c-69-137-114-21 kernel: [<c025593a>] sys_connect+0x67/0x84 Oct 9 19:19:56 c-69-137-114-21 kernel: [<c0255fac>] sys_socketcall+0x8c/0x186 Oct 9 19:19:56 c-69-137-114-21 kernel: [<c0103db9>] sysenter_past_esp+0x56/0x8d Oct 9 19:19:56 c-69-137-114-21 kernel: DWARF2 unwinder stuck at sysenter_past_esp+0x56/0x8d Oct 9 19:19:56 c-69-137-114-21 kernel: Leftover inexact backtrace: Oct 9 19:19:56 c-69-137-114-21 kernel: [<c01054d7>] show_trace+0xd/0x10 Oct 9 19:19:56 c-69-137-114-21 kernel: [<c01055f6>] dump_stack+0x19/0x1b Oct 9 19:19:56 c-69-137-114-21 kernel: [<c01367ea>] __lock_acquire+0x793/0x9b6 Oct 9 19:19:56 c-69-137-114-21 kernel: [<c0136cde>] lock_acquire+0x60/0x80 Oct 9 19:19:56 c-69-137-114-21 kernel: [<c02b0473>] _spin_lock+0x19/0x28 Oct 9 19:19:56 c-69-137-114-21 kernel: [<c0257c3f>] sk_clone+0xb6/0x27e Oct 9 19:19:56 c-69-137-114-21 kernel: [<c027ef35>] inet_csk_clone+0xd/0x5e Oct 9 19:19:56 c-69-137-114-21 kernel: [<c028f7fd>] tcp_create_openreq_child+0x1b/0x382 Oct 9 19:19:56 c-69-137-114-21 kernel: [<f8eff9fa>] tcp_v6_syn_recv_sock+0x248/0x575 [ipv6] Oct 9 19:19:56 c-69-137-114-21 kernel: [<c028fd34>] tcp_check_req+0x1d0/0x2e4 Oct 9 19:19:56 c-69-137-114-21 kernel: [<f8efed23>] tcp_v6_do_rcv+0x142/0x356 [ipv6] Oct 9 19:19:56 c-69-137-114-21 kernel: [<f8f0112a>] tcp_v6_rcv+0x6c5/0x717 [ipv6] Oct 9 19:19:56 c-69-137-114-21 kernel: [<f8ee7a03>] ip6_input+0x1c3/0x296 [ipv6] Oct 9 19:19:56 c-69-137-114-21 kernel: [<f8ee7f73>] ipv6_rcv+0x1d2/0x21f [ipv6] Oct 9 19:19:56 c-69-137-114-21 kernel: [<c025dbd8>] netif_receive_skb+0x2c6/0x34a Oct 9 19:19:56 c-69-137-114-21 kernel: [<c025f572>] process_backlog+0x99/0x114 Oct 9 19:19:56 c-69-137-114-21 kernel: [<c025f76f>] net_rx_action+0x9d/0x162 Oct 9 19:19:56 c-69-137-114-21 kernel: [<c01245e0>] __do_softirq+0x78/0xf2 Oct 9 19:19:56 c-69-137-114-21 kernel: [<c0124698>] do_softirq+0x3e/0x56 Oct 9 19:19:56 c-69-137-114-21 kernel: [<c012494a>] local_bh_enable_ip+0xa3/0xc9 Oct 9 19:19:56 c-69-137-114-21 kernel: [<c02b0411>] _spin_unlock_bh+0x25/0x28 Oct 9 19:19:56 c-69-137-114-21 kernel: [<c02569fb>] release_sock+0xb0/0xb8 Oct 9 19:19:56 c-69-137-114-21 kernel: [<c0297dd8>] inet_stream_connect+0x129/0x21c Oct 9 19:19:56 c-69-137-114-21 kernel: [<c025593a>] sys_connect+0x67/0x84 Oct 9 19:19:56 c-69-137-114-21 kernel: [<c0255fac>] sys_socketcall+0x8c/0x186 Oct 9 19:19:56 c-69-137-114-21 kernel: [<c0103db9>] sysenter_past_esp+0x56/0x8d Oct 9 19:19:56 c-69-137-114-21 kernel: kobject vcs10: registering. parent: vc, set: class_obj Oct 9 19:19:56 c-69-137-114-21 kernel: kobject_uevent Oct 9 19:19:56 c-69-137-114-21 kernel: fill_kobj_path: path = '/class/vc/vcs10' Oct 9 19:19:56 c-69-137-114-21 kernel: kobject vcsa10: registering. parent: vc, set: class_obj Oct 9 19:19:56 c-69-137-114-21 kernel: kobject_uevent Oct 9 19:19:56 c-69-137-114-21 kernel: fill_kobj_path: path = '/class/vc/vcsa10' Oct 9 19:31:00 c-69-137-114-21 syslog-ng[2601]: STATS: dropped 0 Oct 9 20:31:00 c-69-137-114-21 syslog-ng[2601]: STATS: dropped 0
I am not sure when this error was made. Perhaps in an older kernel or maybe not. fsck 1.38 (30-Jun-2005) [/bin/fsck.ext3 (1) -- /] fsck.ext3 -a /dev/sda1 /dev/sda1 has gone 60 days without being checked, check forced. /dev/sda1: Duplicate or bad block in use! /dev/sda1: Multiply-claimed block(s) in inode 9601216: 16777225 /dev/sda1: (There are 1 inodes containing multiply-claimed blocks.)
I have been having problems compiling a src rpm. My system has been up for several days. It fails do to a segmentation fault. When I change to that directory and rerun the command manualy, it compiles OK without errors. I just did a memtest overnight and it passed with 0 errors. Then I noticed these messages in the log. ============================================= [ INFO: possible recursive locking detected ] --------------------------------------------- java_vm/6897 is trying to acquire lock: (slock-AF_INET6){-+..}, at: [<c0257c3f>] sk_clone+0xb6/0x27e but task is already holding lock: (slock-AF_INET6){-+..}, at: [<f8f1ad7d>] tcp_v6_rcv+0x318/0x717 [ipv6] other info that might help us debug this: 1 lock held by java_vm/6897: #0: (slock-AF_INET6){-+..}, at: [<f8f1ad7d>] tcp_v6_rcv+0x318/0x717 [ipv6] stack backtrace: [<c0104eda>] show_trace_log_lvl+0x58/0x16a [<c01054d7>] show_trace+0xd/0x10 [<c01055f6>] dump_stack+0x19/0x1b [<c01367ea>] __lock_acquire+0x793/0x9b6 [<c0136cde>] lock_acquire+0x60/0x80 [<c02b0473>] _spin_lock+0x19/0x28 [<c0257c3f>] sk_clone+0xb6/0x27e [<c027ef35>] inet_csk_clone+0xd/0x5e [<c028f7fd>] tcp_create_openreq_child+0x1b/0x382 [<f8f199fa>] tcp_v6_syn_recv_sock+0x248/0x575 [ipv6] [<c028fd34>] tcp_check_req+0x1d0/0x2e4 [<f8f18d23>] tcp_v6_do_rcv+0x142/0x356 [ipv6] [<f8f1b12a>] tcp_v6_rcv+0x6c5/0x717 [ipv6] [<f8f01a03>] ip6_input+0x1c3/0x296 [ipv6] [<f8f01f73>] ipv6_rcv+0x1d2/0x21f [ipv6] [<c025dbd8>] netif_receive_skb+0x2c6/0x34a [<c025f572>] process_backlog+0x99/0x114 [<c025f76f>] net_rx_action+0x9d/0x162 [<c01245e0>] __do_softirq+0x78/0xf2 [<c0124698>] do_softirq+0x3e/0x56 [<c012494a>] local_bh_enable_ip+0xa3/0xc9 [<c02b0411>] _spin_unlock_bh+0x25/0x28 [<c02569fb>] release_sock+0xb0/0xb8 [<c0297dd8>] inet_stream_connect+0x129/0x21c [<c025593a>] sys_connect+0x67/0x84 [<c0255fac>] sys_socketcall+0x8c/0x186 [<c0103db9>] sysenter_past_esp+0x56/0x8d DWARF2 unwinder stuck at sysenter_past_esp+0x56/0x8d Leftover inexact backtrace: [<c01054d7>] show_trace+0xd/0x10 [<c01055f6>] dump_stack+0x19/0x1b [<c01367ea>] __lock_acquire+0x793/0x9b6 [<c0136cde>] lock_acquire+0x60/0x80 [<c02b0473>] _spin_lock+0x19/0x28 [<c0257c3f>] sk_clone+0xb6/0x27e [<c027ef35>] inet_csk_clone+0xd/0x5e [<c028f7fd>] tcp_create_openreq_child+0x1b/0x382 [<f8f199fa>] tcp_v6_syn_recv_sock+0x248/0x575 [ipv6] [<c028fd34>] tcp_check_req+0x1d0/0x2e4 [<f8f18d23>] tcp_v6_do_rcv+0x142/0x356 [ipv6] [<f8f1b12a>] tcp_v6_rcv+0x6c5/0x717 [ipv6] [<f8f01a03>] ip6_input+0x1c3/0x296 [ipv6] [<f8f01f73>] ipv6_rcv+0x1d2/0x21f [ipv6] [<c025dbd8>] netif_receive_skb+0x2c6/0x34a [<c025f572>] process_backlog+0x99/0x114 [<c025f76f>] net_rx_action+0x9d/0x162 [<c01245e0>] __do_softirq+0x78/0xf2 [<c0124698>] do_softirq+0x3e/0x56 [<c012494a>] local_bh_enable_ip+0xa3/0xc9 [<c02b0411>] _spin_unlock_bh+0x25/0x28 [<c02569fb>] release_sock+0xb0/0xb8 [<c0297dd8>] inet_stream_connect+0x129/0x21c [<c025593a>] sys_connect+0x67/0x84 [<c0255fac>] sys_socketcall+0x8c/0x186 [<c0103db9>] sysenter_past_esp+0x56/0x8d SFW2-INext-DROP-DEFLT IN=eth0 OUT= MAC=00:50:8d:db:39:3d:00:01:5c:22:76:02:08:00 SRC=60.11.125.41 DST=69.137.114.21 LEN=928 TOS=0x00 PREC=0x20 TTL=43 ID=0 DF PROTO=UDP SPT=57695 DPT=1026 LEN=908 SFW2-INext-DROP-DEFLT IN=eth0 OUT= MAC=00:50:8d:db:39:3d:00:01:5c:22:76:02:08:00 SRC=60.11.125.41 DST=69.137.114.21 LEN=928 TOS=0x00 PREC=0x20 TTL=43 ID=0 DF PROTO=UDP SPT=57695 DPT=1027 LEN=908 kobject_uevent fill_kobj_path: path = '/class/vc/vcs7' kobject vcs7: cleaning up kobject_uevent fill_kobj_path: path = '/class/vc/vcsa7' kobject vcsa7: cleaning up INIT: Switching to runINIT: kobject_uevent Sending processefill_kobj_path: path = '/class/vc/vcs3' s the TERM signal kobject vcs3: cleaning up kobject_uevent fill_kobj_path: path = '/class/vc/vcsa3' kobject vcsa3: cleaning up kobject_uevent fill_kobj_path: path = '/class/vc/vcs2' kobject vcs2: cleaning up kobject_uevent fill_kobj_path: path = '/class/vc/vcsa2' kobject vcsa2: cleaning up kobject_uevent fill_kobj_path: path = '/class/vc/vcs5' kobject vcs5: cleaning up kobject_uevent fill_kobj_path: path = '/class/vc/vcsa5' kobject vcsa5: cleaning up kobject_uevent fill_kobj_path: path = '/class/vc/vcs6' kobject vcs6: cleaning up kobject_uevent fill_kobj_path: path = '/class/vc/vcsa6' kobject vcsa6: cleaning up kobject_uevent fill_kobj_path: path = '/class/vc/vcs4' kobject vcs4: cleaning up kobject_uevent fill_kobj_path: path = '/class/vc/vcsa4' kobject vcsa4: cleaning up
This latest crash was caused by just untaring a large source file: Oops: 0002 [#1] SMP Modules linked in: ipt_LOG xt_limit xt_pkttype af_packet cpufreq_ondemand cpufreq_userspace cpufreq_powersave powernow_k8 freq_table edd snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device asus_acpi button battery ac ip6t_REJECT xt_tcpudp ipt_REJECT xt_state iptable_mangle iptable_nat ip_nat iptable_filter ip6table_mangle ip_conntrack nfnetlink ip_tables ip6table_filter ip6_tables x_tables ipv6 loop dm_mod snd_intel8x0 snd_ac97_codec snd_ac97_bus snd_pcm snd_timer snd soundcore snd_page_alloc i2c_nforce2 ehci_hcd ohci_hcd i2c_core usbcore ohci1394 ieee1394 ide_cd cdrom forcedeth ext3 jbd fan thermal processor sg sata_nv libata amd74xx sd_mod scsi_mod ide_disk ide_core CPU: 1 EIP: 0060:[<c017e0e0>] Not tainted VLI EFLAGS: 00010206 (2.6.18-smp #2) EIP is at prune_one_dentry+0x23/0x79 eax: 00207400 ebx: d7f4f07c ecx: d7f4f0a4 edx: c2aa2fb4 esi: d7f4f07c edi: d7f4f084 ebp: dfad7ee8 esp: dfad7ee0 ds: 007b es: 007b ss: 0068 Process kswapd0 (pid: 213, ti=dfad6000 task=dff0cae0 task.ti=dfad6000) Stack: dfa769d0 d7f4f07c dfad7f04 c017e2ff 00000000 00000053 00027678 dfff2404 000000a8 dfad7f0c c017e34f dfad7f3c c0154b5a 01d8da00 00000000 01d8da00 0006d934 000000d0 00000180 00000000 00000060 c030ea80 c0312200 dfad7fc8 Call Trace: [<c017e2ff>] prune_dcache+0xfc/0x133 [<c017e34f>] shrink_dcache_memory+0x19/0x31 [<c0154b5a>] shrink_slab+0xd0/0x137 [<c0154f33>] kswapd+0x2d2/0x3a8 [<c01310d7>] kthread+0xc3/0xf0 [<c0102005>] kernel_thread_helper+0x5/0xb DWARF2 unwinder stuck at kernel_thread_helper+0x5/0xb Leftover inexact backtrace: [<c0105076>] show_stack_log_lvl+0x8a/0x95 [<c01051ad>] show_registers+0x12c/0x199 [<c010539b>] die+0x181/0x284 [<c02b190b>] do_page_fault+0x3e5/0x4ad [<c0104a71>] error_code+0x39/0x40 [<c017e2ff>] prune_dcache+0xfc/0x133 [<c017e34f>] shrink_dcache_memory+0x19/0x31 [<c0154b5a>] shrink_slab+0xd0/0x137 [<c0154f33>] kswapd+0x2d2/0x3a8 [<c01310d7>] kthread+0xc3/0xf0 [<c0102005>] kernel_thread_helper+0x5/0xb Code: fe ff ff 89 d8 5b 5d c3 55 89 e5 56 53 89 c3 8b 40 04 a8 10 75 1f 83 c8 10 89 43 04 8d 4b 28 8b 43 28 8b 51 04 85 c0 89 02 74 03 <89> 50 04 c7 41 04 00 02 20 00 8d 4b 48 8b 53 48 8b 41 04 89 42 EIP: [<c017e0e0>] prune_one_dentry+0x23/0x79 SS:ESP 0068:dfad7ee0 <4>SFW2-INext-DROP-DEFLT IN=eth0 OUT= MAC=00:50:8d:db:39:3d:00:01:5c:22:76:02:08:00 SRC=221.208.208.94 DST=69.137.114.21 LEN=485 TOS=0x00 PREC=0x20 TTL=43 ID=0 DF PROTO=UDP SPT=53530 DPT=1027 LEN=465 BUG: soft lockup detected on CPU#0! [<c0104eda>] show_trace_log_lvl+0x58/0x16a [<c01054d7>] show_trace+0xd/0x10 [<c01055f6>] dump_stack+0x19/0x1b [<c014b1b7>] softlockup_tick+0xa5/0xb9 [<c0128467>] run_local_timers+0x12/0x14 [<c01287e6>] update_process_times+0x3c/0x61 [<c0113a78>] smp_apic_timer_interrupt+0x6d/0x75 [<c010499a>] apic_timer_interrupt+0x2a/0x30 DWARF2 unwinder stuck at apic_timer_interrupt+0x2a/0x30 Leftover inexact backtrace: [<c01054d7>] show_trace+0xd/0x10 [<c01055f6>] dump_stack+0x19/0x1b [<c014b1b7>] softlockup_tick+0xa5/0xb9 [<c0128467>] run_local_timers+0x12/0x14 [<c01287e6>] update_process_times+0x3c/0x61 [<c0113a78>] smp_apic_timer_interrupt+0x6d/0x75 [<c010499a>] apic_timer_interrupt+0x2a/0x30 [<c02b047a>] _spin_lock+0x20/0x28 [<c017dcd3>] d_instantiate+0x24/0x8a [<f88bb7fc>] ext3_add_nondir+0x2e/0x42 [ext3] [<f88bbd2d>] ext3_create+0xa8/0xdc [ext3] [<c0175603>] vfs_create+0xce/0x13e [<c01780cc>] open_namei+0x16b/0x630 [<c0167b0c>] do_filp_open+0x1f/0x35 [<c0167b62>] do_sys_open+0x40/0xb5 [<c0167c03>] sys_open+0x16/0x18 [<c0103db9>] sysenter_past_esp+0x56/0x8d BUG: soft lockup detected on CPU#1! [<c0104eda>] show_trace_log_lvl+0x58/0x16a [<c01054d7>] show_trace+0xd/0x10 [<c01055f6>] dump_stack+0x19/0x1b [<c014b1b7>] softlockup_tick+0xa5/0xb9 [<c0128467>] run_local_timers+0x12/0x14 [<c01287e6>] update_process_times+0x3c/0x61 [<c0113a78>] smp_apic_timer_interrupt+0x6d/0x75 [<c010499a>] apic_timer_interrupt+0x2a/0x30 DWARF2 unwinder stuck at apic_timer_interrupt+0x2a/0x30 Leftover inexact backtrace: [<c01054d7>] show_trace+0xd/0x10 [<c01055f6>] dump_stack+0x19/0x1b [<c014b1b7>] softlockup_tick+0xa5/0xb9 [<c0128467>] run_local_timers+0x12/0x14 [<c01287e6>] update_process_times+0x3c/0x61 [<c0113a78>] smp_apic_timer_interrupt+0x6d/0x75 [<c010499a>] apic_timer_interrupt+0x2a/0x30 [<c02b047a>] _spin_lock+0x20/0x28 [<c01c9409>] _atomic_dec_and_lock+0x2d/0x4c [<c017df95>] dput+0x34/0x139 [<c0174957>] path_release+0xd/0x23 [<c0171765>] vfs_stat_fd+0x36/0x40 [<c01717f1>] vfs_stat+0x11/0x13 [<c0171807>] sys_stat64+0x14/0x28 [<c0103db9>] sysenter_past_esp+0x56/0x8d BUG: soft lockup detected on CPU#0! [<c0104eda>] show_trace_log_lvl+0x58/0x16a [<c01054d7>] show_trace+0xd/0x10 [<c01055f6>] dump_stack+0x19/0x1b [<c014b1b7>] softlockup_tick+0xa5/0xb9 [<c0128467>] run_local_timers+0x12/0x14 [<c01287e6>] update_process_times+0x3c/0x61 [<c0113a78>] smp_apic_timer_interrupt+0x6d/0x75 [<c010499a>] apic_timer_interrupt+0x2a/0x30 DWARF2 unwinder stuck at apic_timer_interrupt+0x2a/0x30 Leftover inexact backtrace: [<c01054d7>] show_trace+0xd/0x10 [<c01055f6>] dump_stack+0x19/0x1b [<c014b1b7>] softlockup_tick+0xa5/0xb9 [<c0128467>] run_local_timers+0x12/0x14 [<c01287e6>] update_process_times+0x3c/0x61 [<c0113a78>] smp_apic_timer_interrupt+0x6d/0x75 [<c010499a>] apic_timer_interrupt+0x2a/0x30 [<c01ccee0>] __delay+0x9/0xb [<c01d828c>] _raw_spin_lock+0x7a/0xe8 [<c02b047a>] _spin_lock+0x20/0x28 [<c017dcd3>] d_instantiate+0x24/0x8a [<f88bb7fc>] ext3_add_nondir+0x2e/0x42 [ext3] [<f88bbd2d>] ext3_create+0xa8/0xdc [ext3] [<c0175603>] vfs_create+0xce/0x13e [<c01780cc>] open_namei+0x16b/0x630 [<c0167b0c>] do_filp_open+0x1f/0x35 [<c0167b62>] do_sys_open+0x40/0xb5 [<c0167c03>] sys_open+0x16/0x18 [<c0103db9>] sysenter_past_esp+0x56/0x8d BUG: soft lockup detected on CPU#1! [<c0104eda>] show_trace_log_lvl+0x58/0x16a [<c01054d7>] show_trace+0xd/0x10 [<c01055f6>] dump_stack+0x19/0x1b [<c014b1b7>] softlockup_tick+0xa5/0xb9 [<c0128467>] run_local_timers+0x12/0x14 [<c01287e6>] update_process_times+0x3c/0x61 [<c0113a78>] smp_apic_timer_interrupt+0x6d/0x75 [<c010499a>] apic_timer_interrupt+0x2a/0x30 DWARF2 unwinder stuck at apic_timer_interrupt+0x2a/0x30 Leftover inexact backtrace: [<c01054d7>] show_trace+0xd/0x10 [<c01055f6>] dump_stack+0x19/0x1b [<c014b1b7>] softlockup_tick+0xa5/0xb9 [<c0128467>] run_local_timers+0x12/0x14 [<c01287e6>] update_process_times+0x3c/0x61 [<c0113a78>] smp_apic_timer_interrupt+0x6d/0x75 [<c010499a>] apic_timer_interrupt+0x2a/0x30 [<c02b047a>] _spin_lock+0x20/0x28 [<c01c9409>] _atomic_dec_and_lock+0x2d/0x4c [<c017df95>] dput+0x34/0x139 [<c0174957>] path_release+0xd/0x23 [<c0171765>] vfs_stat_fd+0x36/0x40 [<c01717f1>] vfs_stat+0x11/0x13 [<c0171807>] sys_stat64+0x14/0x28 [<c0103db9>] sysenter_past_esp+0x56/0x8d BUG: soft lockup detected on CPU#0! [<c0104eda>] show_trace_log_lvl+0x58/0x16a [<c01054d7>] show_trace+0xd/0x10 [<c01055f6>] dump_stack+0x19/0x1b [<c014b1b7>] softlockup_tick+0xa5/0xb9 [<c0128467>] run_local_timers+0x12/0x14 [<c01287e6>] update_process_times+0x3c/0x61 [<c0113a78>] smp_apic_timer_interrupt+0x6d/0x75 [<c010499a>] apic_timer_interrupt+0x2a/0x30 DWARF2 unwinder stuck at apic_timer_interrupt+0x2a/0x30 Leftover inexact backtrace: [<c01054d7>] show_trace+0xd/0x10 [<c01055f6>] dump_stack+0x19/0x1b [<c014b1b7>] softlockup_tick+0xa5/0xb9 [<c0128467>] run_local_timers+0x12/0x14 [<c01287e6>] update_process_times+0x3c/0x61 [<c0113a78>] smp_apic_timer_interrupt+0x6d/0x75 [<c010499a>] apic_timer_interrupt+0x2a/0x30 [<c02b047a>] _spin_lock+0x20/0x28 [<c017dcd3>] d_instantiate+0x24/0x8a [<f88bb7fc>] ext3_add_nondir+0x2e/0x42 [ext3] [<f88bbd2d>] ext3_create+0xa8/0xdc [ext3] [<c0175603>] vfs_create+0xce/0x13e [<c01780cc>] open_namei+0x16b/0x630 [<c0167b0c>] do_filp_open+0x1f/0x35 [<c0167b62>] do_sys_open+0x40/0xb5 [<c0167c03>] sys_open+0x16/0x18 [<c0103db9>] sysenter_past_esp+0x56/0x8d BUG: soft lockup detected on CPU#1! [<c0104eda>] show_trace_log_lvl+0x58/0x16a [<c01054d7>] show_trace+0xd/0x10 [<c01055f6>] dump_stack+0x19/0x1b [<c014b1b7>] softlockup_tick+0xa5/0xb9 [<c0128467>] run_local_timers+0x12/0x14 [<c01287e6>] update_process_times+0x3c/0x61 [<c0113a78>] smp_apic_timer_interrupt+0x6d/0x75 [<c010499a>] apic_timer_interrupt+0x2a/0x30 DWARF2 unwinder stuck at apic_timer_interrupt+0x2a/0x30 Leftover inexact backtrace: [<c01054d7>] show_trace+0xd/0x10 [<c01055f6>] dump_stack+0x19/0x1b [<c014b1b7>] softlockup_tick+0xa5/0xb9 [<c0128467>] run_local_timers+0x12/0x14 [<c01287e6>] update_process_times+0x3c/0x61 [<c0113a78>] smp_apic_timer_interrupt+0x6d/0x75 [<c010499a>] apic_timer_interrupt+0x2a/0x30 [<c02b047a>] _spin_lock+0x20/0x28 [<c01c9409>] _atomic_dec_and_lock+0x2d/0x4c [<c017df95>] dput+0x34/0x139 [<c0174957>] path_release+0xd/0x23 [<c0171765>] vfs_stat_fd+0x36/0x40 [<c01717f1>] vfs_stat+0x11/0x13 [<c0171807>] sys_stat64+0x14/0x28 [<c0103db9>] sysenter_past_esp+0x56/0x8d BUG: soft lockup detected on CPU#0! [<c0104eda>] show_trace_log_lvl+0x58/0x16a [<c01054d7>] show_trace+0xd/0x10 [<c01055f6>] dump_stack+0x19/0x1b [<c014b1b7>] softlockup_tick+0xa5/0xb9 [<c0128467>] run_local_timers+0x12/0x14 [<c01287e6>] update_process_times+0x3c/0x61 [<c0113a78>] smp_apic_timer_interrupt+0x6d/0x75 [<c010499a>] apic_timer_interrupt+0x2a/0x30 DWARF2 unwinder stuck at apic_timer_interrupt+0x2a/0x30 Leftover inexact backtrace: [<c01054d7>] show_trace+0xd/0x10 [<c01055f6>] dump_stack+0x19/0x1b [<c014b1b7>] softlockup_tick+0xa5/0xb9 [<c0128467>] run_local_timers+0x12/0x14 [<c01287e6>] update_process_times+0x3c/0x61 [<c0113a78>] smp_apic_timer_interrupt+0x6d/0x75 [<c010499a>] apic_timer_interrupt+0x2a/0x30 [<c02b047a>] _spin_lock+0x20/0x28 [<c017dcd3>] d_instantiate+0x24/0x8a [<f88bb7fc>] ext3_add_nondir+0x2e/0x42 [ext3] [<f88bbd2d>] ext3_create+0xa8/0xdc [ext3] [<c0175603>] vfs_create+0xce/0x13e [<c01780cc>] open_namei+0x16b/0x630 [<c0167b0c>] do_filp_open+0x1f/0x35 [<c0167b62>] do_sys_open+0x40/0xb5 [<c0167c03>] sys_open+0x16/0x18 [<c0103db9>] sysenter_past_esp+0x56/0x8d BUG: soft lockup detected on CPU#1! [<c0104eda>] show_trace_log_lvl+0x58/0x16a [<c01054d7>] show_trace+0xd/0x10 [<c01055f6>] dump_stack+0x19/0x1b [<c014b1b7>] softlockup_tick+0xa5/0xb9 [<c0128467>] run_local_timers+0x12/0x14 [<c01287e6>] update_process_times+0x3c/0x61 [<c0113a78>] smp_apic_timer_interrupt+0x6d/0x75 [<c010499a>] apic_timer_interrupt+0x2a/0x30 DWARF2 unwinder stuck at apic_timer_interrupt+0x2a/0x30 Leftover inexact backtrace: [<c01054d7>] show_trace+0xd/0x10 [<c01055f6>] dump_stack+0x19/0x1b [<c014b1b7>] softlockup_tick+0xa5/0xb9 [<c0128467>] run_local_timers+0x12/0x14 [<c01287e6>] update_process_times+0x3c/0x61 [<c0113a78>] smp_apic_timer_interrupt+0x6d/0x75 [<c010499a>] apic_timer_interrupt+0x2a/0x30 [<c02b047a>] _spin_lock+0x20/0x28 [<c01c9409>] _atomic_dec_and_lock+0x2d/0x4c [<c017df95>] dput+0x34/0x139 [<c0174957>] path_release+0xd/0x23 [<c0171765>] vfs_stat_fd+0x36/0x40 [<c01717f1>] vfs_stat+0x11/0x13 [<c0171807>] sys_stat64+0x14/0x28 [<c0103db9>] sysenter_past_esp+0x56/0x8d BUG: soft lockup detected on CPU#0! [<c0104eda>] show_trace_log_lvl+0x58/0x16a [<c01054d7>] show_trace+0xd/0x10 [<c01055f6>] dump_stack+0x19/0x1b [<c014b1b7>] softlockup_tick+0xa5/0xb9 [<c0128467>] run_local_timers+0x12/0x14 [<c01287e6>] update_process_times+0x3c/0x61 [<c0113a78>] smp_apic_timer_interrupt+0x6d/0x75 [<c010499a>] apic_timer_interrupt+0x2a/0x30 DWARF2 unwinder stuck at apic_timer_interrupt+0x2a/0x30 Leftover inexact backtrace: [<c01054d7>] show_trace+0xd/0x10 [<c01055f6>] dump_stack+0x19/0x1b [<c014b1b7>] softlockup_tick+0xa5/0xb9 [<c0128467>] run_local_timers+0x12/0x14 [<c01287e6>] update_process_times+0x3c/0x61 [<c0113a78>] smp_apic_timer_interrupt+0x6d/0x75 [<c010499a>] apic_timer_interrupt+0x2a/0x30 [<c01ccee0>] __delay+0x9/0xb [<c01d828c>] _raw_spin_lock+0x7a/0xe8 [<c02b047a>] _spin_lock+0x20/0x28 [<c017dcd3>] d_instantiate+0x24/0x8a [<f88bb7fc>] ext3_add_nondir+0x2e/0x42 [ext3] [<f88bbd2d>] ext3_create+0xa8/0xdc [ext3] [<c0175603>] vfs_create+0xce/0x13e [<c01780cc>] open_namei+0x16b/0x630 [<c0167b0c>] do_filp_open+0x1f/0x35 [<c0167b62>] do_sys_open+0x40/0xb5 [<c0167c03>] sys_open+0x16/0x18 [<c0103db9>] sysenter_past_esp+0x56/0x8d BUG: soft lockup detected on CPU#1! [<c0104eda>] show_trace_log_lvl+0x58/0x16a [<c01054d7>] show_trace+0xd/0x10 [<c01055f6>] dump_stack+0x19/0x1b [<c014b1b7>] softlockup_tick+0xa5/0xb9 [<c0128467>] run_local_timers+0x12/0x14 [<c01287e6>] update_process_times+0x3c/0x61 [<c0113a78>] smp_apic_timer_interrupt+0x6d/0x75 [<c010499a>] apic_timer_interrupt+0x2a/0x30 DWARF2 unwinder stuck at apic_timer_interrupt+0x2a/0x30 Leftover inexact backtrace: [<c01054d7>] show_trace+0xd/0x10 [<c01055f6>] dump_stack+0x19/0x1b [<c014b1b7>] softlockup_tick+0xa5/0xb9 [<c0128467>] run_local_timers+0x12/0x14 [<c01287e6>] update_process_times+0x3c/0x61 [<c0113a78>] smp_apic_timer_interrupt+0x6d/0x75 [<c010499a>] apic_timer_interrupt+0x2a/0x30 [<c02b047a>] _spin_lock+0x20/0x28 [<c01c9409>] _atomic_dec_and_lock+0x2d/0x4c [<c017df95>] dput+0x34/0x139 [<c0174957>] path_release+0xd/0x23 [<c0171765>] vfs_stat_fd+0x36/0x40 [<c01717f1>] vfs_stat+0x11/0x13 [<c0171807>] sys_stat64+0x14/0x28 [<c0103db9>] sysenter_past_esp+0x56/0x8d BUG: soft lockup detected on CPU#0! [<c0104eda>] show_trace_log_lvl+0x58/0x16a [<c01054d7>] show_trace+0xd/0x10 [<c01055f6>] dump_stack+0x19/0x1b [<c014b1b7>] softlockup_tick+0xa5/0xb9 [<c0128467>] run_local_timers+0x12/0x14 [<c01287e6>] update_process_times+0x3c/0x61 [<c0113a78>] smp_apic_timer_interrupt+0x6d/0x75 [<c010499a>] apic_timer_interrupt+0x2a/0x30 DWARF2 unwinder stuck at apic_timer_interrupt+0x2a/0x30 Leftover inexact backtrace: [<c01054d7>] show_trace+0xd/0x10 [<c01055f6>] dump_stack+0x19/0x1b [<c014b1b7>] softlockup_tick+0xa5/0xb9 [<c0128467>] run_local_timers+0x12/0x14 [<c01287e6>] update_process_times+0x3c/0x61 [<c0113a78>] smp_apic_timer_interrupt+0x6d/0x75 [<c010499a>] apic_timer_interrupt+0x2a/0x30 [<c02b047a>] _spin_lock+0x20/0x28 [<c017dcd3>] d_instantiate+0x24/0x8a [<f88bb7fc>] ext3_add_nondir+0x2e/0x42 [ext3] [<f88bbd2d>] ext3_create+0xa8/0xdc [ext3] [<c0175603>] vfs_create+0xce/0x13e [<c01780cc>] open_namei+0x16b/0x630 [<c0167b0c>] do_filp_open+0x1f/0x35 [<c0167b62>] do_sys_open+0x40/0xb5 [<c0167c03>] sys_open+0x16/0x18 [<c0103db9>] sysenter_past_esp+0x56/0x8d BUG: soft lockup detected on CPU#1! [<c0104eda>] show_trace_log_lvl+0x58/0x16a [<c01054d7>] show_trace+0xd/0x10 [<c01055f6>] dump_stack+0x19/0x1b [<c014b1b7>] softlockup_tick+0xa5/0xb9 [<c0128467>] run_local_timers+0x12/0x14 [<c01287e6>] update_process_times+0x3c/0x61 [<c0113a78>] smp_apic_timer_interrupt+0x6d/0x75 [<c010499a>] apic_timer_interrupt+0x2a/0x30 DWARF2 unwinder stuck at apic_timer_interrupt+0x2a/0x30 Leftover inexact backtrace: [<c01054d7>] show_trace+0xd/0x10 [<c01055f6>] dump_stack+0x19/0x1b [<c014b1b7>] softlockup_tick+0xa5/0xb9 [<c0128467>] run_local_timers+0x12/0x14 [<c01287e6>] update_process_times+0x3c/0x61 [<c0113a78>] smp_apic_timer_interrupt+0x6d/0x75 [<c010499a>] apic_timer_interrupt+0x2a/0x30 [<c02b047a>] _spin_lock+0x20/0x28 [<c01c9409>] _atomic_dec_and_lock+0x2d/0x4c [<c017df95>] dput+0x34/0x139 [<c0174957>] path_release+0xd/0x23 [<c0171765>] vfs_stat_fd+0x36/0x40 [<c01717f1>] vfs_stat+0x11/0x13 [<c0171807>] sys_stat64+0x14/0x28 [<c0103db9>] sysenter_past_esp+0x56/0x8d BUG: soft lockup detected on CPU#0! [<c0104eda>] show_trace_log_lvl+0x58/0x16a [<c01054d7>] show_trace+0xd/0x10 [<c01055f6>] dump_stack+0x19/0x1b [<c014b1b7>] softlockup_tick+0xa5/0xb9 [<c0128467>] run_local_timers+0x12/0x14 [<c01287e6>] update_process_times+0x3c/0x61 [<c0113a78>] smp_apic_timer_interrupt+0x6d/0x75 [<c010499a>] apic_timer_interrupt+0x2a/0x30 DWARF2 unwinder stuck at apic_timer_interrupt+0x2a/0x30 Leftover inexact backtrace: [<c01054d7>] show_trace+0xd/0x10 [<c01055f6>] dump_stack+0x19/0x1b [<c014b1b7>] softlockup_tick+0xa5/0xb9 [<c0128467>] run_local_timers+0x12/0x14 [<c01287e6>] update_process_times+0x3c/0x61 [<c0113a78>] smp_apic_timer_interrupt+0x6d/0x75 [<c010499a>] apic_timer_interrupt+0x2a/0x30 [<c01d828c>] _raw_spin_lock+0x7a/0xe8 [<c02b047a>] _spin_lock+0x20/0x28 [<c017dcd3>] d_instantiate+0x24/0x8a [<f88bb7fc>] ext3_add_nondir+0x2e/0x42 [ext3] [<f88bbd2d>] ext3_create+0xa8/0xdc [ext3] [<c0175603>] vfs_create+0xce/0x13e [<c01780cc>] open_namei+0x16b/0x630 [<c0167b0c>] do_filp_open+0x1f/0x35 [<c0167b62>] do_sys_open+0x40/0xb5 [<c0167c03>] sys_open+0x16/0x18 [<c0103db9>] sysenter_past_esp+0x56/0x8d BUG: soft lockup detected on CPU#1! [<c0104eda>] show_trace_log_lvl+0x58/0x16a [<c01054d7>] show_trace+0xd/0x10 [<c01055f6>] dump_stack+0x19/0x1b [<c014b1b7>] softlockup_tick+0xa5/0xb9 [<c0128467>] run_local_timers+0x12/0x14 [<c01287e6>] update_process_times+0x3c/0x61 [<c0113a78>] smp_apic_timer_interrupt+0x6d/0x75 [<c010499a>] apic_timer_interrupt+0x2a/0x30 DWARF2 unwinder stuck at apic_timer_interrupt+0x2a/0x30 Leftover inexact backtrace: [<c01054d7>] show_trace+0xd/0x10 [<c01055f6>] dump_stack+0x19/0x1b [<c014b1b7>] softlockup_tick+0xa5/0xb9 [<c0128467>] run_local_timers+0x12/0x14 [<c01287e6>] update_process_times+0x3c/0x61 [<c0113a78>] smp_apic_timer_interrupt+0x6d/0x75 [<c010499a>] apic_timer_interrupt+0x2a/0x30 [<c02b047a>] _spin_lock+0x20/0x28 [<c01c9409>] _atomic_dec_and_lock+0x2d/0x4c [<c017df95>] dput+0x34/0x139 [<c0174957>] path_release+0xd/0x23 [<c0171765>] vfs_stat_fd+0x36/0x40 [<c01717f1>] vfs_stat+0x11/0x13 [<c0171807>] sys_stat64+0x14/0x28 [<c0103db9>] sysenter_past_esp+0x56/0x8d BUG: soft lockup detected on CPU#0! [<c0104eda>] show_trace_log_lvl+0x58/0x16a [<c01054d7>] show_trace+0xd/0x10 [<c01055f6>] dump_stack+0x19/0x1b [<c014b1b7>] softlockup_tick+0xa5/0xb9 [<c0128467>] run_local_timers+0x12/0x14 [<c01287e6>] update_process_times+0x3c/0x61 [<c0113a78>] smp_apic_timer_interrupt+0x6d/0x75 [<c010499a>] apic_timer_interrupt+0x2a/0x30 DWARF2 unwinder stuck at apic_timer_interrupt+0x2a/0x30 Leftover inexact backtrace: [<c01054d7>] show_trace+0xd/0x10 [<c01055f6>] dump_stack+0x19/0x1b [<c014b1b7>] softlockup_tick+0xa5/0xb9 [<c0128467>] run_local_timers+0x12/0x14 [<c01287e6>] update_process_times+0x3c/0x61 [<c0113a78>] smp_apic_timer_interrupt+0x6d/0x75 [<c010499a>] apic_timer_interrupt+0x2a/0x30 [<c02b047a>] _spin_lock+0x20/0x28 [<c017dcd3>] d_instantiate+0x24/0x8a [<f88bb7fc>] ext3_add_nondir+0x2e/0x42 [ext3] [<f88bbd2d>] ext3_create+0xa8/0xdc [ext3] [<c0175603>] vfs_create+0xce/0x13e [<c01780cc>] open_namei+0x16b/0x630 [<c0167b0c>] do_filp_open+0x1f/0x35 [<c0167b62>] do_sys_open+0x40/0xb5 [<c0167c03>] sys_open+0x16/0x18 [<c0103db9>] sysenter_past_esp+0x56/0x8d
On Sat, 28 Oct 2006, bugme-daemon@bugzilla.kernel.org wrote: > > This latest crash was caused by just untaring a large source file: > > EIP: 0060:[<c017e0e0>] Not tainted VLI > EFLAGS: 00010206 (2.6.18-smp #2) > EIP is at prune_one_dentry+0x23/0x79 > eax: 00207400 ebx: d7f4f07c ecx: d7f4f0a4 edx: c2aa2fb4 > esi: d7f4f07c edi: d7f4f084 ebp: dfad7ee8 esp: dfad7ee0 > ds: 007b es: 007b ss: 0068 > Process kswapd0 (pid: 213, ti=dfad6000 task=dff0cae0 task.ti=dfad6000) > Stack: dfa769d0 d7f4f07c dfad7f04 c017e2ff 00000000 00000053 00027678 dfff2404 > 000000a8 dfad7f0c c017e34f dfad7f3c c0154b5a 01d8da00 00000000 01d8da00 > 0006d934 000000d0 00000180 00000000 00000060 c030ea80 c0312200 dfad7fc8 > Code: ... c0 89 02 74 03 <89> 50 04 c7 41 04 ... That's this sequence: test %eax,%eax mov %eax,(%edx) je over *** mov %edx,0x4(%eax) *** over: movl $0x200200,0x4(%ecx) lea 0x48(%ebx),%ecx .. where the starred instruction oopses because %eax is 00207400. That sequence is "hlist_del_rcu()" from __d_drop() at the very top of the function. The value for %eax is somewhat interesting. It's 0x00200200 + 0x7200, and while I don't see how the 0x7200 got there, it's still close enough to 0x00200200 that it's intriguing. And 0x00200200 is obviously the magic number to poison the list entries. Anyway, the value of %eax at that point comes from "dentry->d_hash.next", so the dentry hash-list has clearly gotten corrupted. It _looks_ like the same dentry has gotten unhashed twice, but we actually have code to protect against that in __d_drop: if (!(dentry->d_flags & DCACHE_UNHASHED)) { dentry->d_flags |= DCACHE_UNHASHED; hlist_del_rcu(&dentry->d_hash); } but it's still intriguing. (This all should run under "dcache_lock" _and_ the dentry->d_lock, so there's no race on the DCACHE_UNHASHED thing either). Very strange. Linus
drh:/boot>md5sum vmlinuz-2.6.18-smp 583308f23f63ecf72dc80d37c825857e vmlinuz-2.6.18-smp drh:/boot>md5sum initrd-2.6.18-smp d1cda47c9e9d23c128c09f0bc48f3f20 initrd-2.6.18-smp drh:/boot/grub>for name in `ls`; do md5sum $name; done f3317bc6b36d7f44282ad8e020b3e2f9 default 089d43a54adbc29b7d2c5308e9cd8439 e2fs_stage1_5 a99945c26d8455ee9787bfc937491699 fat_stage1_5 12f49f1bdedaaa6f24eab8aa06078bbc ffs_stage1_5 453f03aa3c0363e877abadd7f52b4fc3 iso9660_stage1_5 ffed452493ddc3466ab40a3ad7de4f91 jfs_stage1_5 4481893f2403b21d93ae213982d3bddb minix_stage1_5 45c5163d47e8dcdea74ae5a8271d622b reiserfs_stage1_5 addcae72a6ed5918e733d6f5a31862d5 stage1 9f66f479e64fe8eed2528be43295af00 stage2 bbb6e9daaa91fe2f178118454d56fc4c ufs2_stage1_5 20feb267048c8e8fd426f79ccfa42ca5 vstafs_stage1_5 08c1dc8ce7aeb110aff9b85ee27ea2f9 xfs_stage1_5
I am running linux-2.6.18.3 now. Recently I have done some heavy processing and it did not crash. I noticed this message in the log which may be a clue as to what might have/be causing the problem. Nov 28 06:16:40 c-69-137-114-21 kernel: [165129.452000] eth0: too many iterations (6) in nv_nic_irq. I have seen this message in other areas of the log.
This bug does not seem to appear in linux-2.6.18.3. I have not had a crash yet with this kernel version. I turned off my other computer with the serial console log, because of a lack of crashes. I did notice an abnormality today, but it may not be a kernel problem. When I ran grub-install on /dev/hda I got an error message for dev/fd0 even though there were no references to /dev/fd0. /dev/fd0 was in the map though. So if you want to close this you may. If I find additional problems, I will open another bug report. Thanks!