Distribution: Debian Testing Hardware Environment: Dual Pentium2 266, AIC-7880 SCSI, 3Com PCI 3c905C Tornado Software Environment: gcc 3.3.1, SMP kernel, preempt on Problem Description: When doing a ping flood of another machine on my same network (using multiple instances of 'ping -f hostname &'), after about 10 seconds I get this oops, and the machines hangs: Unable to handle kernel paging request at virtual address c4f66068 printing eip: c02b6bd0 *pde = 00014067 *pte = 04f66000 Oops: 0000 [#1] CPU: 0 EIP: 0060:[<c02b6bd0>] Not tainted EFLAGS: 00010246 EIP is at raw_rcv_skb+0x190/0x260 eax: 00000040 ebx: c5321060 ecx: c4dae024 edx: 00000014 esi: c5160000 edi: c5321004 ebp: c4f66004 esp: c5161b80 ds: 007b es: 007b ss: 0068 Process ping (pid: 294, threadinfo=c5160000 task=c52cf000) Stack: c4f66000 c1169890 00001000 c532106c 00000216 00000000 c4f66000 0000005a c4f66004 c5321004 c4dae024 c510e004 c02b6d3d c5321004 c4f66004 c510e038 00000030 00000001 c5321004 c02b681d c5321004 c4f66004 6164050a 1964050a Call Trace: [<c02b6d3d>] raw_rcv+0x9d/0x110 [<c02b681d>] raw_v4_input+0xad/0x160 [<c029206b>] ip_local_deliver+0x9b/0x220 [<c0292586>] ip_rcv+0x396/0x49c [<c0118ee8>] kernel_map_pages+0x28/0x5c [<c0280211>] netif_receive_skb+0x181/0x210 [<c0280329>] process_backlog+0x89/0x120 [<c0280455>] net_rx_action+0x95/0x130 [<c0124595>] do_softirq+0xd5/0xe0 [<c010c235>] do_IRQ+0x185/0x230 [<c010a218>] common_interrupt+0x18/0x20 [<c011007b>] alloc_ldt+0x7b/0x1f0 [<c0143334>] kfree+0x204/0x340 [<c027b253>] kfree_skbmem+0x13/0x30 [<c027b253>] kfree_skbmem+0x13/0x30 [<c027b2db>] __kfree_skb+0x6b/0xf0 [<c02b79a3>] raw_recvmsg+0x113/0x180 [<c02c343a>] inet_recvmsg+0x5a/0x80 [<c027719c>] sock_recvmsg+0x9c/0xc0 [<c0118ee8>] kernel_map_pages+0x28/0x5c [<c013ddc9>] __alloc_pages+0x309/0x370 [<c0276eac>] sockfd_lookup+0x1c/0x80 [<c02787a2>] sys_recvfrom+0xb2/0x120 [<c016cbd4>] poll_freewait+0x44/0x50 [<c016cfa1>] do_select+0x1f1/0x340 [<c02790d6>] sys_socketcall+0x1e6/0x2a0 [<c01098ab>] syscall_call+0x7/0xb Code: 8b 45 64 89 3c 24 89 44 24 04 ff 97 50 01 00 00 eb b2 e8 59 <0>Kernel panic: Fatal exception in interrupt In interrupt handler - not syncing Steps to reproduce: Ping flood machine on local lan with multiple 'ping -f hostname &'. Very reproducable.
Note, if I boot with 'nosmp' bootparm, I can't reproduce this.
Does this happen without preempt?
Recompiled for SMP without preempt, very similar oops: Unable to handle kernel paging request at virtual address c5100068 printing eip: c02ae8b6 *pde = 00015067 *pte = 05100163 Oops: 0000 [#1] CPU: 0 EIP: 0060:[<c02ae8b6>] Not tainted EFLAGS: 00010246 EIP is at raw_rcv_skb+0x156/0x220 eax: 00000040 ebx: c52ed060 ecx: c4f6e024 edx: 00000014 esi: c52ed004 edi: c5100004 ebp: c52ed06c esp: c5661d24 ds: 007b es: 007b ss: 0068 Process ping (pid: 283, threadinfo=c5660000 task=c585f000) Stack: c5100000 c1169890 00001000 00000206 00000000 c1169890 c5100000 0000005a c5100004 c52ed004 c4f6e024 c54a2004 c02aea1d c52ed004 c5100004 c54a2038 00000030 00000001 c52ed004 c02ae569 c52ed004 c5100004 6164050a 1964050a Call Trace: [<c02aea1d>] raw_rcv+0x9d/0x110 [<c02ae569>] raw_v4_input+0xa9/0x130 [<c028ad0b>] ip_local_deliver+0x8b/0x200 [<c028b1fc>] ip_rcv+0x37c/0x47a [<c0118588>] kernel_map_pages+0x28/0x5c [<c027a205>] netif_receive_skb+0x165/0x1f0 [<c027a319>] process_backlog+0x89/0x120 [<c027a441>] net_rx_action+0x91/0x110 [<c0123165>] do_softirq+0xd5/0xe0 [<c010bfd5>] do_IRQ+0x165/0x220 [<c010a06c>] common_interrupt+0x18/0x20 [<c027007b>] pirq_sis_get+0x9b/0xc0 [<c01557d2>] fput+0x2/0x20 [<c0169a05>] poll_freewait+0x35/0x50 [<c0169df1>] do_select+0x201/0x350 [<c0169a20>] __pollwait+0x0/0xd0 [<c016a24b>] sys_select+0x2db/0x4e0 [<c027007b>] pirq_sis_get+0x9b/0xc0 [<c01096ff>] syscall_call+0x7/0xb Code: 8b 47 64 89 34 24 89 44 24 04 ff 96 50 01 00 00 eb b2 0f 0b <0>Kernel panic: Fatal exception in interrupt In interrupt handler - not syncing
Hmm i can't reproduce this locally. Also are you saying that the machine doing the ping -f oopses? How many concurrent ping -f? How fast a processor does the recipient/host have?
The machine doing the pinging (2.6.0-test1) is oopsing; it is SMP with two Pentium2 266mhz processors. Just one instance of ping doesn't seem to do it, I have to run about 5 or so (I do a 'ping -f hostname &' and then hit the up arrow and enter a few times). The maching being pinged is a P4 2.4gz running WinXP, and it doesn't mind the ping flood.
Of interest, if I start my ping flood against localhost, the system doesn't crash. If I do it against any host on the network, it does.
Thanks, with that information i was able to reproduce it. I'll have a look from here
Could this be related at all to the changes that caused http://bugme.osdl.org/show_bug.cgi?id=863? From that bug, it appears that the raw networking stuff was recently changed, and had a few bugs in the new version.
Can you reproduce this with 2.6.0-test2? I'm unable to.
No change here with 2.6.0-test2. Linux version 2.6.0-test2 (root@dual266) (gcc version 3.3.1 20030626 (Debian pre release)) #38 SMP Tue Jul 29 14:36:46 EDT 2003 Unable to handle kernel paging request at virtual address c56a0068 printing eip: c02b6690 *pde = 00016067 *pte = 056a0163 Oops: 0000 [#1] CPU: 0 EIP: 0060:[<c02b6690>] Not tainted EFLAGS: 00010246 EIP is at raw_rcv_skb+0x190/0x260 eax: 00000040 ebx: c55ce060 ecx: c52e4024 edx: 00000014 esi: c78c0000 edi: c55ce004 ebp: c56a0004 esp: c78c1e4c ds: 007b es: 007b ss: 0068 Process sshd (pid: 279, threadinfo=c78c0000 task=c6b43000) Stack: c56a0000 c1169890 00001000 c55ce06c 00000216 00000000 c56a0000 0000005a c56a0004 c55ce004 c52e4024 c5270004 c02b67fd c55ce004 c56a0004 c5270038 00000030 00000001 c55ce004 c02b62dd c55ce004 c56a0004 6164050a 1764050a Call Trace: [<c02b67fd>] raw_rcv+0x9d/0x110 [<c02b62dd>] raw_v4_input+0xad/0x160 [<c0291b2b>] ip_local_deliver+0x9b/0x220 [<c0292046>] ip_rcv+0x396/0x49c [<c0118ee8>] kernel_map_pages+0x28/0x5c [<c027fc81>] netif_receive_skb+0x181/0x210 [<c027fd99>] process_backlog+0x89/0x120 [<c027fec5>] net_rx_action+0x95/0x130 [<c0124355>] do_softirq+0xd5/0xe0 [<c010c235>] do_IRQ+0x185/0x230 [<c010a218>] common_interrupt+0x18/0x20 Code: 8b 45 64 89 3c 24 89 44 24 04 ff 97 50 01 00 00 eb b2 e8 99 <0>Kernel panic: Fatal exception in interrupt In interrupt handler - not syncing
Just incase GCC 3.3.1 was being buggy, I did a 'make mrproper' and recompiled with a known-good GCC, and still get the same oops. Linux version 2.6.0-test2 (root@dual266) (gcc version 2.95.4 20011002 (Debian pr erelease)) #1 SMP Wed Jul 30 10:47:42 EDT 2003
interesting comment i have trouble reproducing this now oddly, but it coincides with my change to using 2.96 for kernel builds locally.
ok my 3way system triggers it much easier with the same kernel.
could you please also post your .config
Config from 2.6.0-test2: bwindle@dual266:/home/kernel/linux$ cat .config | grep -v "is not set" | grep - v "^#" CONFIG_X86=y CONFIG_MMU=y CONFIG_UID16=y CONFIG_GENERIC_ISA_DMA=y CONFIG_EXPERIMENTAL=y CONFIG_SYSVIPC=y CONFIG_SYSCTL=y CONFIG_LOG_BUF_SHIFT=15 CONFIG_KALLSYMS=y CONFIG_FUTEX=y CONFIG_EPOLL=y CONFIG_IOSCHED_AS=y CONFIG_IOSCHED_DEADLINE=y CONFIG_X86_PC=y CONFIG_MPENTIUMII=y CONFIG_X86_CMPXCHG=y CONFIG_X86_XADD=y CONFIG_X86_L1_CACHE_SHIFT=5 CONFIG_RWSEM_XCHGADD_ALGORITHM=y CONFIG_X86_WP_WORKS_OK=y CONFIG_X86_INVLPG=y CONFIG_X86_BSWAP=y CONFIG_X86_POPAD_OK=y CONFIG_X86_GOOD_APIC=y CONFIG_X86_INTEL_USERCOPY=y CONFIG_X86_USE_PPRO_CHECKSUM=y CONFIG_SMP=y CONFIG_NR_CPUS=2 CONFIG_PREEMPT=y CONFIG_X86_LOCAL_APIC=y CONFIG_X86_IO_APIC=y CONFIG_X86_TSC=y CONFIG_X86_MCE=y CONFIG_MICROCODE=y CONFIG_NOHIGHMEM=y CONFIG_HAVE_DEC_LOCK=y CONFIG_PCI=y CONFIG_PCI_GOANY=y CONFIG_PCI_BIOS=y CONFIG_PCI_DIRECT=y CONFIG_PCI_NAMES=y CONFIG_HOTPLUG=y CONFIG_KCORE_ELF=y CONFIG_BINFMT_ELF=y CONFIG_BINFMT_AOUT=y CONFIG_PNP=y CONFIG_PNP_NAMES=y CONFIG_BLK_DEV_FD=y CONFIG_SCSI=y CONFIG_BLK_DEV_SD=y CONFIG_BLK_DEV_SR=y CONFIG_CHR_DEV_SG=y CONFIG_SCSI_MULTI_LUN=y CONFIG_SCSI_REPORT_LUNS=y CONFIG_SCSI_CONSTANTS=y CONFIG_SCSI_AIC7XXX=y CONFIG_AIC7XXX_CMDS_PER_DEVICE=253 CONFIG_AIC7XXX_RESET_DELAY_MS=15000 CONFIG_AIC7XXX_DEBUG_ENABLE=y CONFIG_AIC7XXX_DEBUG_MASK=0 CONFIG_AIC7XXX_REG_PRETTY_PRINT=y CONFIG_NET=y CONFIG_PACKET=y CONFIG_UNIX=y CONFIG_INET=y CONFIG_IP_MULTICAST=y CONFIG_IPV6_SCTP__=y CONFIG_NETDEVICES=y CONFIG_DUMMY=y CONFIG_NET_ETHERNET=y CONFIG_NET_VENDOR_3COM=y CONFIG_VORTEX=y CONFIG_NET_PCI=y CONFIG_INPUT=y CONFIG_INPUT_MOUSEDEV=y CONFIG_INPUT_MOUSEDEV_PSAUX=y CONFIG_INPUT_MOUSEDEV_SCREEN_X=1024 CONFIG_INPUT_MOUSEDEV_SCREEN_Y=768 CONFIG_SOUND_GAMEPORT=y CONFIG_SERIO=y CONFIG_SERIO_I8042=y CONFIG_SERIO_SERPORT=y CONFIG_INPUT_KEYBOARD=y CONFIG_KEYBOARD_ATKBD=y CONFIG_INPUT_MOUSE=y CONFIG_MOUSE_PS2=y CONFIG_MOUSE_SERIAL=y CONFIG_INPUT_MISC=y CONFIG_VT=y CONFIG_VT_CONSOLE=y CONFIG_HW_CONSOLE=y CONFIG_SERIAL_8250=y CONFIG_SERIAL_8250_CONSOLE=y CONFIG_SERIAL_CORE=y CONFIG_SERIAL_CORE_CONSOLE=y CONFIG_UNIX98_PTYS=y CONFIG_UNIX98_PTY_COUNT=256 CONFIG_EXT2_FS=y CONFIG_EXT3_FS=y CONFIG_EXT3_FS_XATTR=y CONFIG_JBD=y CONFIG_FS_MBCACHE=y CONFIG_AUTOFS4_FS=y CONFIG_ISO9660_FS=y CONFIG_PROC_FS=y CONFIG_DEVPTS_FS=y CONFIG_TMPFS=y CONFIG_RAMFS=y CONFIG_NFS_FS=y CONFIG_NFSD=y CONFIG_LOCKD=y CONFIG_EXPORTFS=y CONFIG_SUNRPC=y CONFIG_SMB_FS=y CONFIG_MSDOS_PARTITION=y CONFIG_SMB_NLS=y CONFIG_NLS=y CONFIG_NLS_DEFAULT="iso8859-1" CONFIG_VGA_CONSOLE=y CONFIG_DUMMY_CONSOLE=y CONFIG_DEBUG_KERNEL=y CONFIG_DEBUG_STACKOVERFLOW=y CONFIG_DEBUG_SLAB=y CONFIG_DEBUG_IOVIRT=y CONFIG_MAGIC_SYSRQ=y CONFIG_DEBUG_SPINLOCK=y CONFIG_DEBUG_PAGEALLOC=y CONFIG_DEBUG_SPINLOCK_SLEEP=y CONFIG_X86_EXTRA_IRQS=y CONFIG_X86_FIND_SMP_CONFIG=y CONFIG_X86_MPPARSE=y CONFIG_X86_SMP=y CONFIG_X86_HT=y CONFIG_X86_BIOS_REBOOT=y CONFIG_X86_TRAMPOLINE=y
2.6.0-test2-bk6, slightly different... now I get an oops, and then a BUG Unable to handle kernel paging request at virtual address c5234068 printing eip: c02856ca *pde = 00015067 *pte = 05234163 Oops: 0000 [#1] CPU: 0 EIP: 0060:[<c02856ca>] Not tainted EFLAGS: 00010246 EIP is at raw_rcv_skb+0x1ba/0x20c eax: 00000040 ebx: c7753060 ecx: 00000000 edx: 00000014 esi: c775306c edi: c7753004 ebp: c5234004 esp: c4955c40 ds: 007b es: 007b ss: 0068 Process sshd (pid: 312, threadinfo=c4954000 task=c6a97000) Stack: c5234004 c7753004 00000001 c4e5e004 00000206 00000000 c02857d5 c7753004 c5234004 c7753004 c57c3024 c4e5e004 c02853ac c7753004 c5234004 c4e5e004 c7808004 00000001 c0263ba1 c4e5e004 c57c3024 00000001 c7afb004 c57c3024 Call Trace: [<c02857d5>] raw_rcv+0xb9/0xc4 [<c02853ac>] raw_v4_input+0xb0/0x108 [<c0263ba1>] ip_local_deliver+0xa5/0x1e0 [<c0264071>] ip_rcv+0x395/0x432 [<c013bb72>] kmem_cache_free+0x2aa/0x2b4 [<c02540e0>] netif_receive_skb+0x16c/0x1b8 [<c02541b1>] process_backlog+0x85/0x114 [<c02542c5>] net_rx_action+0x85/0x140 [<c0120cfb>] do_softirq+0x6b/0xd8 [<c010b4e6>] do_IRQ+0x1de/0x1f8 [<c0109ae4>] common_interrupt+0x18/0x20 [<c011007b>] old_mmap+0x6b/0x140 [<c013999d>] poison_obj+0x59/0x6c [<c013b77a>] __kmalloc+0x11e/0x1c4 [<c024f94c>] alloc_skb+0x3c/0xd8 [<c024f94c>] alloc_skb+0x3c/0xd8 [<c026db4a>] tcp_sendmsg+0x2de/0x14b8 [<c029059d>] inet_sendmsg+0x41/0x48 [<c024c39d>] sock_aio_write+0xc5/0xd0 [<c014d53d>] do_sync_write+0x81/0xb0 [<c0118428>] default_wake_function+0x0/0x20 [<c014d61d>] vfs_write+0xb1/0xd0 [<c014d6b9>] sys_write+0x31/0x4c [<c0109177>] syscall_call+0x7/0xb Code: 8b 45 64 50 57 8b 87 50 01 00 00 ff d0 83 c4 08 83 7c 24 14 <0>Kernel panic: Fatal exception in interrupt ------------[ cut here ]------------ kernel BUG at include/asm/spinlock.h:75! invalid operand: 0000 [#2] CPU: 0 EIP: 0060:[<c011d7d9>] Not tainted EFLAGS: 00010002 EIP is at printk+0x1d1/0x250 eax: 00000001 ebx: c03aba2e ecx: c03172e8 edx: c4954000 esi: 00000202 edi: 0000002e ebp: c4955b3c esp: c4955b0c ds: 007b es: 007b ss: 0068 Process sshd (pid: 312, threadinfo=c4954000 task=c6a97000) Stack: c02bae39 00000000 c4955c0c c011cd81 c02c3635 c03ab5c0 c03ab5c0 00000400 c02bae55 c4955b48 00000001 c4954000 c5234004 c010a1e6 c02bae39 05234163 c6a97000 00000234 c0115f3e c02c2a5e c4955c0c 00000000 c7753060 00000000 Call Trace: [<c011cd81>] panic+0x31/0xe4 [<c010a1e6>] die+0xf6/0x138 [<c0115f3e>] do_page_fault+0x2de/0x420 [<c0115c60>] do_page_fault+0x0/0x420 [<c020d2cf>] boomerang_interrupt+0x14b/0x460 [<c010afe1>] handle_IRQ_event+0x31/0x58 [<c010afe1>] handle_IRQ_event+0x31/0x58 [<c010b480>] do_IRQ+0x178/0x1f8 [<c0109be1>] error_code+0x2d/0x38 [<c02856ca>] raw_rcv_skb+0x1ba/0x20c [<c02857d5>] raw_rcv+0xb9/0xc4 [<c02853ac>] raw_v4_input+0xb0/0x108 [<c0263ba1>] ip_local_deliver+0xa5/0x1e0 [<c0264071>] ip_rcv+0x395/0x432 [<c013bb72>] kmem_cache_free+0x2aa/0x2b4 [<c02540e0>] netif_receive_skb+0x16c/0x1b8 [<c02541b1>] process_backlog+0x85/0x114 [<c02542c5>] net_rx_action+0x85/0x140 [<c0120cfb>] do_softirq+0x6b/0xd8 [<c010b4e6>] do_IRQ+0x1de/0x1f8 [<c0109ae4>] common_interrupt+0x18/0x20 [<c011007b>] old_mmap+0x6b/0x140 [<c013999d>] poison_obj+0x59/0x6c [<c013b77a>] __kmalloc+0x11e/0x1c4 [<c024f94c>] alloc_skb+0x3c/0xd8 [<c024f94c>] alloc_skb+0x3c/0xd8 [<c026db4a>] tcp_sendmsg+0x2de/0x14b8 [<c029059d>] inet_sendmsg+0x41/0x48 [<c024c39d>] sock_aio_write+0xc5/0xd0 [<c014d53d>] do_sync_write+0x81/0xb0 [<c0118428>] default_wake_function+0x0/0x20 [<c014d61d>] vfs_write+0xb1/0xd0 [<c014d6b9>] sys_write+0x31/0x4c [<c0109177>] syscall_call+0x7/0xb Code: 0f 0b 4b 00 f8 36 2c c0 c6 05 00 73 31 c0 01 56 9d ff 4a 14 <0>Kernel panic: Fatal exception in interrupt In interrupt handler - not syncing
OK, getting closer. I can *not* get this to happen in 2.5.73, but it does oops in 2.5.75 (2.5.74 wouldn't boot, something about bad gzip magic). I will start trying bk snapshots now.
This problem only happens if CONFIG_DEBUG_PAGEALLOC=y