Bug 937 - Oops in raw_rcv_skb if CONFIG_DEBUG_PAGEALLOC=y
Summary: Oops in raw_rcv_skb if CONFIG_DEBUG_PAGEALLOC=y
Status: CLOSED CODE_FIX
Alias: None
Product: Networking
Classification: Unclassified
Component: IPV4 (show other bugs)
Hardware: i386 Linux
: P2 normal
Assignee: Arnaldo Carvalho de Melo
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2003-07-15 14:42 UTC by Burton Windle
Modified: 2003-12-18 07:06 UTC (History)
1 user (show)

See Also:
Kernel Version: 2.6.0-test1
Subsystem:
Regression: ---
Bisected commit-id:


Attachments

Description Burton Windle 2003-07-15 14:42:00 UTC
Distribution: Debian Testing
Hardware Environment: Dual Pentium2 266, AIC-7880 SCSI, 3Com PCI 3c905C Tornado
Software Environment: gcc 3.3.1, SMP kernel, preempt on
Problem Description:

When doing a ping flood of another machine on my same network (using multiple 
instances of 'ping -f hostname &'), after about 10 seconds I get this oops, and 
the machines hangs:

Unable to handle kernel paging request at virtual address c4f66068
 printing eip:
c02b6bd0
*pde = 00014067
*pte = 04f66000
Oops: 0000 [#1]
CPU:    0
EIP:    0060:[<c02b6bd0>]    Not tainted
EFLAGS: 00010246
EIP is at raw_rcv_skb+0x190/0x260
eax: 00000040   ebx: c5321060   ecx: c4dae024   edx: 00000014
esi: c5160000   edi: c5321004   ebp: c4f66004   esp: c5161b80
ds: 007b   es: 007b   ss: 0068
Process ping (pid: 294, threadinfo=c5160000 task=c52cf000)
Stack: c4f66000 c1169890 00001000 c532106c 00000216 00000000 c4f66000 0000005a
       c4f66004 c5321004 c4dae024 c510e004 c02b6d3d c5321004 c4f66004 c510e038
       00000030 00000001 c5321004 c02b681d c5321004 c4f66004 6164050a 1964050a
Call Trace:
 [<c02b6d3d>] raw_rcv+0x9d/0x110
 [<c02b681d>] raw_v4_input+0xad/0x160
 [<c029206b>] ip_local_deliver+0x9b/0x220
 [<c0292586>] ip_rcv+0x396/0x49c
 [<c0118ee8>] kernel_map_pages+0x28/0x5c
 [<c0280211>] netif_receive_skb+0x181/0x210
 [<c0280329>] process_backlog+0x89/0x120
 [<c0280455>] net_rx_action+0x95/0x130
 [<c0124595>] do_softirq+0xd5/0xe0
 [<c010c235>] do_IRQ+0x185/0x230
 [<c010a218>] common_interrupt+0x18/0x20
 [<c011007b>] alloc_ldt+0x7b/0x1f0
 [<c0143334>] kfree+0x204/0x340
 [<c027b253>] kfree_skbmem+0x13/0x30
 [<c027b253>] kfree_skbmem+0x13/0x30
 [<c027b2db>] __kfree_skb+0x6b/0xf0
 [<c02b79a3>] raw_recvmsg+0x113/0x180
 [<c02c343a>] inet_recvmsg+0x5a/0x80
 [<c027719c>] sock_recvmsg+0x9c/0xc0
 [<c0118ee8>] kernel_map_pages+0x28/0x5c
 [<c013ddc9>] __alloc_pages+0x309/0x370
 [<c0276eac>] sockfd_lookup+0x1c/0x80
 [<c02787a2>] sys_recvfrom+0xb2/0x120
 [<c016cbd4>] poll_freewait+0x44/0x50
 [<c016cfa1>] do_select+0x1f1/0x340
 [<c02790d6>] sys_socketcall+0x1e6/0x2a0
 [<c01098ab>] syscall_call+0x7/0xb
 
Code: 8b 45 64 89 3c 24 89 44 24 04 ff 97 50 01 00 00 eb b2 e8 59
 <0>Kernel panic: Fatal exception in interrupt
In interrupt handler - not syncing

Steps to reproduce:
Ping flood machine on local lan with multiple 'ping -f hostname &'. Very 
reproducable.
Comment 1 Burton Windle 2003-07-15 14:59:52 UTC
Note, if I boot with 'nosmp' bootparm, I can't reproduce this.
Comment 2 Zwane Mwaikambo 2003-07-15 22:27:25 UTC
Does this happen without preempt?
Comment 3 Burton Windle 2003-07-16 06:32:58 UTC
Recompiled for SMP without preempt, very similar oops:

Unable to handle kernel paging request at virtual address c5100068
 printing eip:
c02ae8b6
*pde = 00015067
*pte = 05100163
Oops: 0000 [#1]
CPU:    0
EIP:    0060:[<c02ae8b6>]    Not tainted
EFLAGS: 00010246
EIP is at raw_rcv_skb+0x156/0x220
eax: 00000040   ebx: c52ed060   ecx: c4f6e024   edx: 00000014
esi: c52ed004   edi: c5100004   ebp: c52ed06c   esp: c5661d24
ds: 007b   es: 007b   ss: 0068
Process ping (pid: 283, threadinfo=c5660000 task=c585f000)
Stack: c5100000 c1169890 00001000 00000206 00000000 c1169890 c5100000 0000005a
       c5100004 c52ed004 c4f6e024 c54a2004 c02aea1d c52ed004 c5100004 c54a2038
       00000030 00000001 c52ed004 c02ae569 c52ed004 c5100004 6164050a 1964050a
Call Trace:
 [<c02aea1d>] raw_rcv+0x9d/0x110
 [<c02ae569>] raw_v4_input+0xa9/0x130
 [<c028ad0b>] ip_local_deliver+0x8b/0x200
 [<c028b1fc>] ip_rcv+0x37c/0x47a
 [<c0118588>] kernel_map_pages+0x28/0x5c
 [<c027a205>] netif_receive_skb+0x165/0x1f0
 [<c027a319>] process_backlog+0x89/0x120
 [<c027a441>] net_rx_action+0x91/0x110
 [<c0123165>] do_softirq+0xd5/0xe0
 [<c010bfd5>] do_IRQ+0x165/0x220
 [<c010a06c>] common_interrupt+0x18/0x20
 [<c027007b>] pirq_sis_get+0x9b/0xc0
 [<c01557d2>] fput+0x2/0x20
 [<c0169a05>] poll_freewait+0x35/0x50
 [<c0169df1>] do_select+0x201/0x350
 [<c0169a20>] __pollwait+0x0/0xd0
 [<c016a24b>] sys_select+0x2db/0x4e0
 [<c027007b>] pirq_sis_get+0x9b/0xc0
 [<c01096ff>] syscall_call+0x7/0xb
 
Code: 8b 47 64 89 34 24 89 44 24 04 ff 96 50 01 00 00 eb b2 0f 0b
 <0>Kernel panic: Fatal exception in interrupt
In interrupt handler - not syncing
 
Comment 4 Zwane Mwaikambo 2003-07-16 07:12:59 UTC
Hmm i can't reproduce this locally. Also are you saying that the machine doing
the ping -f oopses? How many concurrent ping -f? How fast a processor does the
recipient/host have?
Comment 5 Burton Windle 2003-07-16 07:26:35 UTC
The machine doing the pinging (2.6.0-test1) is oopsing; it is SMP with two 
Pentium2 266mhz processors. Just one instance of ping doesn't seem to do it, I 
have to run about 5 or so (I do a 'ping -f hostname &' and then hit the up 
arrow and enter a few times). The maching being pinged is a P4 2.4gz running 
WinXP, and it doesn't mind the ping flood.
Comment 6 Burton Windle 2003-07-16 07:30:48 UTC
Of interest, if I start my ping flood against localhost, the system doesn't 
crash.  If I do it against any host on the network, it does.
Comment 7 Zwane Mwaikambo 2003-07-16 07:44:59 UTC
Thanks, with that information i was able to reproduce it. I'll have a look from here
Comment 8 Burton Windle 2003-07-17 09:55:49 UTC
Could this be related at all to the changes that caused 
http://bugme.osdl.org/show_bug.cgi?id=863? From that bug, it appears that the 
raw networking stuff was recently changed, and had a few bugs in the new 
version. 
Comment 9 Zwane Mwaikambo 2003-07-29 19:10:21 UTC
Can you reproduce this with 2.6.0-test2? I'm unable to.
Comment 10 Burton Windle 2003-07-30 07:03:13 UTC
No change here with 2.6.0-test2. 

Linux version 2.6.0-test2 (root@dual266) (gcc version 3.3.1 20030626 (Debian pre
release)) #38 SMP Tue Jul 29 14:36:46 EDT 2003

Unable to handle kernel paging request at virtual address c56a0068
 printing eip:
c02b6690
*pde = 00016067
*pte = 056a0163
Oops: 0000 [#1]
CPU:    0
EIP:    0060:[<c02b6690>]    Not tainted
EFLAGS: 00010246
EIP is at raw_rcv_skb+0x190/0x260
eax: 00000040   ebx: c55ce060   ecx: c52e4024   edx: 00000014
esi: c78c0000   edi: c55ce004   ebp: c56a0004   esp: c78c1e4c
ds: 007b   es: 007b   ss: 0068
Process sshd (pid: 279, threadinfo=c78c0000 task=c6b43000)
Stack: c56a0000 c1169890 00001000 c55ce06c 00000216 00000000 c56a0000 0000005a
       c56a0004 c55ce004 c52e4024 c5270004 c02b67fd c55ce004 c56a0004 c5270038
       00000030 00000001 c55ce004 c02b62dd c55ce004 c56a0004 6164050a 1764050a
Call Trace:
 [<c02b67fd>] raw_rcv+0x9d/0x110
 [<c02b62dd>] raw_v4_input+0xad/0x160
 [<c0291b2b>] ip_local_deliver+0x9b/0x220
 [<c0292046>] ip_rcv+0x396/0x49c
 [<c0118ee8>] kernel_map_pages+0x28/0x5c
 [<c027fc81>] netif_receive_skb+0x181/0x210
 [<c027fd99>] process_backlog+0x89/0x120
 [<c027fec5>] net_rx_action+0x95/0x130
 [<c0124355>] do_softirq+0xd5/0xe0
 [<c010c235>] do_IRQ+0x185/0x230
 [<c010a218>] common_interrupt+0x18/0x20
 
Code: 8b 45 64 89 3c 24 89 44 24 04 ff 97 50 01 00 00 eb b2 e8 99
 <0>Kernel panic: Fatal exception in interrupt
In interrupt handler - not syncing
 
Comment 11 Burton Windle 2003-07-30 07:53:29 UTC
Just incase GCC 3.3.1 was being buggy, I did a 'make mrproper' and recompiled 
with a known-good GCC, and still get the same oops.

Linux version 2.6.0-test2 (root@dual266) (gcc version 2.95.4 20011002 (Debian pr
erelease)) #1 SMP Wed Jul 30 10:47:42 EDT 2003
Comment 12 Zwane Mwaikambo 2003-07-30 08:02:18 UTC
interesting comment i have trouble reproducing this now oddly, but it coincides
with my change to using 2.96 for kernel builds locally.
Comment 13 Zwane Mwaikambo 2003-07-31 22:40:50 UTC
ok my 3way system triggers it much easier with the same kernel.
Comment 14 Zwane Mwaikambo 2003-07-31 22:45:47 UTC
could you please also post your .config
Comment 15 Burton Windle 2003-08-01 06:25:40 UTC
Config from 2.6.0-test2:

bwindle@dual266:/home/kernel/linux$ cat .config | grep -v "is not set" | grep -
v "^#"
CONFIG_X86=y
CONFIG_MMU=y
CONFIG_UID16=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_EXPERIMENTAL=y
CONFIG_SYSVIPC=y
CONFIG_SYSCTL=y
CONFIG_LOG_BUF_SHIFT=15
CONFIG_KALLSYMS=y
CONFIG_FUTEX=y
CONFIG_EPOLL=y
CONFIG_IOSCHED_AS=y
CONFIG_IOSCHED_DEADLINE=y
CONFIG_X86_PC=y
CONFIG_MPENTIUMII=y
CONFIG_X86_CMPXCHG=y
CONFIG_X86_XADD=y
CONFIG_X86_L1_CACHE_SHIFT=5
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_X86_WP_WORKS_OK=y
CONFIG_X86_INVLPG=y
CONFIG_X86_BSWAP=y
CONFIG_X86_POPAD_OK=y
CONFIG_X86_GOOD_APIC=y
CONFIG_X86_INTEL_USERCOPY=y
CONFIG_X86_USE_PPRO_CHECKSUM=y
CONFIG_SMP=y
CONFIG_NR_CPUS=2
CONFIG_PREEMPT=y
CONFIG_X86_LOCAL_APIC=y
CONFIG_X86_IO_APIC=y
CONFIG_X86_TSC=y
CONFIG_X86_MCE=y
CONFIG_MICROCODE=y
CONFIG_NOHIGHMEM=y
CONFIG_HAVE_DEC_LOCK=y
CONFIG_PCI=y
CONFIG_PCI_GOANY=y
CONFIG_PCI_BIOS=y
CONFIG_PCI_DIRECT=y
CONFIG_PCI_NAMES=y
CONFIG_HOTPLUG=y
CONFIG_KCORE_ELF=y
CONFIG_BINFMT_ELF=y
CONFIG_BINFMT_AOUT=y
CONFIG_PNP=y
CONFIG_PNP_NAMES=y
CONFIG_BLK_DEV_FD=y
CONFIG_SCSI=y
CONFIG_BLK_DEV_SD=y
CONFIG_BLK_DEV_SR=y
CONFIG_CHR_DEV_SG=y
CONFIG_SCSI_MULTI_LUN=y
CONFIG_SCSI_REPORT_LUNS=y
CONFIG_SCSI_CONSTANTS=y
CONFIG_SCSI_AIC7XXX=y
CONFIG_AIC7XXX_CMDS_PER_DEVICE=253
CONFIG_AIC7XXX_RESET_DELAY_MS=15000
CONFIG_AIC7XXX_DEBUG_ENABLE=y
CONFIG_AIC7XXX_DEBUG_MASK=0
CONFIG_AIC7XXX_REG_PRETTY_PRINT=y
CONFIG_NET=y
CONFIG_PACKET=y
CONFIG_UNIX=y
CONFIG_INET=y
CONFIG_IP_MULTICAST=y
CONFIG_IPV6_SCTP__=y
CONFIG_NETDEVICES=y
CONFIG_DUMMY=y
CONFIG_NET_ETHERNET=y
CONFIG_NET_VENDOR_3COM=y
CONFIG_VORTEX=y
CONFIG_NET_PCI=y
CONFIG_INPUT=y
CONFIG_INPUT_MOUSEDEV=y
CONFIG_INPUT_MOUSEDEV_PSAUX=y
CONFIG_INPUT_MOUSEDEV_SCREEN_X=1024
CONFIG_INPUT_MOUSEDEV_SCREEN_Y=768
CONFIG_SOUND_GAMEPORT=y
CONFIG_SERIO=y
CONFIG_SERIO_I8042=y
CONFIG_SERIO_SERPORT=y
CONFIG_INPUT_KEYBOARD=y
CONFIG_KEYBOARD_ATKBD=y
CONFIG_INPUT_MOUSE=y
CONFIG_MOUSE_PS2=y
CONFIG_MOUSE_SERIAL=y
CONFIG_INPUT_MISC=y
CONFIG_VT=y
CONFIG_VT_CONSOLE=y
CONFIG_HW_CONSOLE=y
CONFIG_SERIAL_8250=y
CONFIG_SERIAL_8250_CONSOLE=y
CONFIG_SERIAL_CORE=y
CONFIG_SERIAL_CORE_CONSOLE=y
CONFIG_UNIX98_PTYS=y
CONFIG_UNIX98_PTY_COUNT=256
CONFIG_EXT2_FS=y
CONFIG_EXT3_FS=y
CONFIG_EXT3_FS_XATTR=y
CONFIG_JBD=y
CONFIG_FS_MBCACHE=y
CONFIG_AUTOFS4_FS=y
CONFIG_ISO9660_FS=y
CONFIG_PROC_FS=y
CONFIG_DEVPTS_FS=y
CONFIG_TMPFS=y
CONFIG_RAMFS=y
CONFIG_NFS_FS=y
CONFIG_NFSD=y
CONFIG_LOCKD=y
CONFIG_EXPORTFS=y
CONFIG_SUNRPC=y
CONFIG_SMB_FS=y
CONFIG_MSDOS_PARTITION=y
CONFIG_SMB_NLS=y
CONFIG_NLS=y
CONFIG_NLS_DEFAULT="iso8859-1"
CONFIG_VGA_CONSOLE=y
CONFIG_DUMMY_CONSOLE=y
CONFIG_DEBUG_KERNEL=y
CONFIG_DEBUG_STACKOVERFLOW=y
CONFIG_DEBUG_SLAB=y
CONFIG_DEBUG_IOVIRT=y
CONFIG_MAGIC_SYSRQ=y
CONFIG_DEBUG_SPINLOCK=y
CONFIG_DEBUG_PAGEALLOC=y
CONFIG_DEBUG_SPINLOCK_SLEEP=y
CONFIG_X86_EXTRA_IRQS=y
CONFIG_X86_FIND_SMP_CONFIG=y
CONFIG_X86_MPPARSE=y
CONFIG_X86_SMP=y
CONFIG_X86_HT=y
CONFIG_X86_BIOS_REBOOT=y
CONFIG_X86_TRAMPOLINE=y
Comment 16 Burton Windle 2003-08-06 07:30:43 UTC
2.6.0-test2-bk6, slightly different... now I get an oops, and then a BUG

Unable to handle kernel paging request at virtual address c5234068
 printing eip:
c02856ca
*pde = 00015067
*pte = 05234163
Oops: 0000 [#1]
CPU:    0
EIP:    0060:[<c02856ca>]    Not tainted
EFLAGS: 00010246
EIP is at raw_rcv_skb+0x1ba/0x20c
eax: 00000040   ebx: c7753060   ecx: 00000000   edx: 00000014
esi: c775306c   edi: c7753004   ebp: c5234004   esp: c4955c40
ds: 007b   es: 007b   ss: 0068
Process sshd (pid: 312, threadinfo=c4954000 task=c6a97000)
Stack: c5234004 c7753004 00000001 c4e5e004 00000206 00000000 c02857d5 c7753004
       c5234004 c7753004 c57c3024 c4e5e004 c02853ac c7753004 c5234004 c4e5e004
       c7808004 00000001 c0263ba1 c4e5e004 c57c3024 00000001 c7afb004 c57c3024
Call Trace:
 [<c02857d5>] raw_rcv+0xb9/0xc4
 [<c02853ac>] raw_v4_input+0xb0/0x108
 [<c0263ba1>] ip_local_deliver+0xa5/0x1e0
 [<c0264071>] ip_rcv+0x395/0x432
 [<c013bb72>] kmem_cache_free+0x2aa/0x2b4
 [<c02540e0>] netif_receive_skb+0x16c/0x1b8
 [<c02541b1>] process_backlog+0x85/0x114
 [<c02542c5>] net_rx_action+0x85/0x140
 [<c0120cfb>] do_softirq+0x6b/0xd8
 [<c010b4e6>] do_IRQ+0x1de/0x1f8
 [<c0109ae4>] common_interrupt+0x18/0x20
 [<c011007b>] old_mmap+0x6b/0x140
 [<c013999d>] poison_obj+0x59/0x6c
 [<c013b77a>] __kmalloc+0x11e/0x1c4
 [<c024f94c>] alloc_skb+0x3c/0xd8
 [<c024f94c>] alloc_skb+0x3c/0xd8
 [<c026db4a>] tcp_sendmsg+0x2de/0x14b8
 [<c029059d>] inet_sendmsg+0x41/0x48
 [<c024c39d>] sock_aio_write+0xc5/0xd0
 [<c014d53d>] do_sync_write+0x81/0xb0
 [<c0118428>] default_wake_function+0x0/0x20
 [<c014d61d>] vfs_write+0xb1/0xd0
 [<c014d6b9>] sys_write+0x31/0x4c
 [<c0109177>] syscall_call+0x7/0xb
 
Code: 8b 45 64 50 57 8b 87 50 01 00 00 ff d0 83 c4 08 83 7c 24 14
 <0>Kernel panic: Fatal exception in interrupt
------------[ cut here ]------------
kernel BUG at include/asm/spinlock.h:75!
invalid operand: 0000 [#2]
CPU:    0
EIP:    0060:[<c011d7d9>]    Not tainted
EFLAGS: 00010002
EIP is at printk+0x1d1/0x250
eax: 00000001   ebx: c03aba2e   ecx: c03172e8   edx: c4954000
esi: 00000202   edi: 0000002e   ebp: c4955b3c   esp: c4955b0c
ds: 007b   es: 007b   ss: 0068
Process sshd (pid: 312, threadinfo=c4954000 task=c6a97000)
Stack: c02bae39 00000000 c4955c0c c011cd81 c02c3635 c03ab5c0 c03ab5c0 00000400
       c02bae55 c4955b48 00000001 c4954000 c5234004 c010a1e6 c02bae39 05234163
       c6a97000 00000234 c0115f3e c02c2a5e c4955c0c 00000000 c7753060 00000000
Call Trace:
 [<c011cd81>] panic+0x31/0xe4
 [<c010a1e6>] die+0xf6/0x138
 [<c0115f3e>] do_page_fault+0x2de/0x420
 [<c0115c60>] do_page_fault+0x0/0x420
 [<c020d2cf>] boomerang_interrupt+0x14b/0x460
 [<c010afe1>] handle_IRQ_event+0x31/0x58
 [<c010afe1>] handle_IRQ_event+0x31/0x58
 [<c010b480>] do_IRQ+0x178/0x1f8
 [<c0109be1>] error_code+0x2d/0x38
 [<c02856ca>] raw_rcv_skb+0x1ba/0x20c
 [<c02857d5>] raw_rcv+0xb9/0xc4
 [<c02853ac>] raw_v4_input+0xb0/0x108
 [<c0263ba1>] ip_local_deliver+0xa5/0x1e0
 [<c0264071>] ip_rcv+0x395/0x432
 [<c013bb72>] kmem_cache_free+0x2aa/0x2b4
 [<c02540e0>] netif_receive_skb+0x16c/0x1b8
 [<c02541b1>] process_backlog+0x85/0x114
 [<c02542c5>] net_rx_action+0x85/0x140
 [<c0120cfb>] do_softirq+0x6b/0xd8
 [<c010b4e6>] do_IRQ+0x1de/0x1f8
 [<c0109ae4>] common_interrupt+0x18/0x20
 [<c011007b>] old_mmap+0x6b/0x140
 [<c013999d>] poison_obj+0x59/0x6c
 [<c013b77a>] __kmalloc+0x11e/0x1c4
 [<c024f94c>] alloc_skb+0x3c/0xd8
 [<c024f94c>] alloc_skb+0x3c/0xd8
 [<c026db4a>] tcp_sendmsg+0x2de/0x14b8
 [<c029059d>] inet_sendmsg+0x41/0x48
 [<c024c39d>] sock_aio_write+0xc5/0xd0
 [<c014d53d>] do_sync_write+0x81/0xb0
 [<c0118428>] default_wake_function+0x0/0x20
 [<c014d61d>] vfs_write+0xb1/0xd0
 [<c014d6b9>] sys_write+0x31/0x4c
 [<c0109177>] syscall_call+0x7/0xb
 
Code: 0f 0b 4b 00 f8 36 2c c0 c6 05 00 73 31 c0 01 56 9d ff 4a 14
 <0>Kernel panic: Fatal exception in interrupt
In interrupt handler - not syncing
 
Comment 17 Burton Windle 2003-08-06 09:25:38 UTC
OK, getting closer. I can *not* get this to happen in 2.5.73, but it does oops 
in 2.5.75 (2.5.74 wouldn't boot, something about bad gzip magic). I will start 
trying bk snapshots now.
Comment 18 Burton Windle 2003-08-28 07:11:44 UTC
This problem only happens if CONFIG_DEBUG_PAGEALLOC=y 

Note You need to log in before you can comment on or make changes to this bug.