Bug 5785 - allocation failure and dead skge interface when changing mtu to 9000
Summary: allocation failure and dead skge interface when changing mtu to 9000
Status: CLOSED PATCH_ALREADY_AVAILABLE
Alias: None
Product: Drivers
Classification: Unclassified
Component: Network (show other bugs)
Hardware: i386 Linux
: P2 normal
Assignee: Stephen Hemminger
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-12-26 09:34 UTC by Mikko Tiihonen
Modified: 2006-02-03 15:27 UTC (History)
0 users

See Also:
Kernel Version: 2.6.14.4 (2.6.14-gentoo-r5)
Subsystem:
Regression: ---
Bisected commit-id:


Attachments

Description Mikko Tiihonen 2005-12-26 09:34:13 UTC
Most recent kernel where this bug did not occur:
Distribution: gentoo
Hardware Environment:
 Athlon64 KT800
 Yukon-Lite rev 7 (Marvell 88E8001 rev 13)
 Gigabit network (other side is e1000)
Software Environment:
 2.6.14.2 kernel
 ifconfig 1.42
 ethtool 3
Problem Description:

When trying to increase the mtu of the network interface that had almost no
traffic there was a memory allocation failure (the machine had been up less than
a day). After the memory allocation failure all commands on the interface
failed. In the end also traffic on other interfaces failed.

command: ifconfig lan mtu 9000
syslog:
skge lan: Link is up at 1000 Mbps, full duplex, flow control none
skge lan: disabling interface
skge lan: enabling interface
ifconfig: page allocation failure. order:2, mode:0x20

Call Trace:<ffffffff80152a4e>{__alloc_pages+942}
<ffffffff80155af1>{cache_alloc_refill+577}
<ffffffff801554c2>{__kmalloc+98} <ffffffff803213d8>{__alloc_skb+104}
<ffffffff88010158>{:skge:skge_up+280} <ffffffff880111a3>{:skge:skge_change_mtu+83}
<ffffffff80327528>{dev_set_mtu+72} <ffffffff80327c9b>{dev_ioctl+731}
<ffffffff8035f3ca>{inet_ioctl+138} <ffffffff8031d3ec>{sock_ioctl+556}
<ffffffff8017e4d1>{do_ioctl+33} <ffffffff8017e79b>{vfs_ioctl+651}
<ffffffff8017e80d>{sys_ioctl+77} <ffffffff8010d896>{system_call+126}

Mem-info:
DMA per-cpu:
cpu 0 hot: low 2, high 6, batch 1 used:4
cpu 0 cold: low 0, high 2, batch 1 used:1
Normal per-cpu:
cpu 0 hot: low 62, high 186, batch 31 used:159
cpu 0 cold: low 0, high 62, batch 31 used:5
HighMem per-cpu: empty
Free pages:        5860kB (0kB HighMem)
Active:167160 inactive:75107 dirty:27 writeback:0 unstable:0 free:1465 slab:8553
mapped:162671 pagetables:1901
DMA free:4052kB min:60kB low:72kB high:88kB active:7328kB inactive:0kB
present:15996kB pages_scanned:748 all_unreclaimable? no
lowmem_reserve[]: 0 1007 1007
Normal free:1808kB min:4028kB low:5032kB high:6040kB active:661312kB
inactive:300428kB present:1031360kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0
HighMem free:0kB min:128kB low:160kB high:192kB active:0kB inactive:0kB
present:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0
DMA: 1*4kB 0*8kB 1*16kB 0*32kB 1*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 1*2048kB
0*4096kB = 4052kB
Normal: 260*4kB 46*8kB 1*16kB 0*32kB 0*64kB 1*128kB 1*256kB 0*512kB 0*1024kB
0*2048kB 0*4096kB = 1808kB
HighMem: empty
Swap cache: add 35, delete 35, find 0/0, race 0+0
Free swap  = 2048044kB
Total swap = 2048184kB
Free swap:       2048044kB
261936 pages of RAM
5346 reserved pages
52621 pages shared
0 pages swap cached

NETDEV WATCHDOG: lan: transmit timed out
NETDEV WATCHDOG: lan: transmit timed out
NETDEV WATCHDOG: lan: transmit timed out
.. continues every 5 seconds until the machine is rebooted

command: ethtool -A lan rx on
syslog:
skge lan: disabling interface
----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at mm/page_alloc.c:1019
invalid operand: 0000 [1] 
CPU 0 
Modules linked in: skge mga ipt_TOS iptable_mangle ipt_MASQUERADE iptable_nat
ip_nat ipt_TCPMSS ipt_LOG ipt_limit ipt_state iptable_filter ip_tables
cpufreq_ondemand cpufreq_powersave snd_pcm_oss snd_mixer_oss nfsd exportfs
usb_storage uhci_hcd ehci_hcd 3c59x mii
Pid: 5683, comm: ethtool Not tainted 2.6.14-gentoo-r5 #1
RIP: 0010:[<ffffffff80152b4e>] <ffffffff80152b4e>{__free_pages+14}
RSP: 0018:ffff81000c7efa00  EFLAGS: 00010256
RAX: 0000000000000000 RBX: ffff81003f482c40 RCX: 0000000000000000
RDX: 000000000018d668 RSI: 0000000000000003 RDI: ffff810001c6b340
RBP: ffff81003f482c40 R08: 0000000000005000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: ffff81003e1a83a0
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000001023
FS:  00002aaaaade8ae0(0000) GS:ffffffff8053c800(0000) knlGS:0000000056299dc0
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00002aaaaac71570 CR3: 000000001b37b000 CR4: 00000000000006e0
Process ethtool (pid: 5683, threadinfo ffff81000c7ee000, task ffff810009aec7b0)
Stack: ffffffff88010f23 ffff81003e1a8000 ffff81003e1a8000 0000000000514b10 
ffffffff88016260 00007fffffc2fbf0 ffffffff88010f75 ffff81000c7efae8 
ffffffff803292a3 0000000000000001 
Call Trace:<ffffffff88010f23>{:skge:skge_down+883}
<ffffffff88010f75>{:skge:skge_set_pauseparam+69}
<ffffffff803292a3>{dev_ethtool+2131} <ffffffff80152793>{__alloc_pages+243}
<ffffffff80154ade>{__do_page_cache_readahead+126}
<ffffffff8015dc37>{__handle_mm_fault+391}
<ffffffff80229114>{prio_tree_insert+484} <ffffffff8023845a>{extract_buf+266}
<ffffffff8023845a>{extract_buf+266} <ffffffff8014e43e>{find_get_page+14}
<ffffffff8014f27a>{filemap_nopage+394} <ffffffff8015da0a>{do_no_page+1050}
<ffffffff80327bf2>{dev_ioctl+562} <ffffffff8035f3ca>{inet_ioctl+138}
<ffffffff8031d3ec>{sock_ioctl+556} <ffffffff8017e4d1>{do_ioctl+33}
<ffffffff8017e79b>{vfs_ioctl+651} <ffffffff8017e80d>{sys_ioctl+77}
<ffffffff8010d896>{system_call+126} 

Code: 0f 0b 68 2b 3c 3b 80 c2 fb 03 83 47 08 ff 0f 98 c0 84 c0 74 
RIP <ffffffff80152b4e>{__free_pages+14} RSP <ffff81000c7efa00>
<6>NETDEV WATCHDOG: lan: transmit timed out

and the system starts to fall apart:
general protection fault: 0000 [2] 
CPU 0 
Modules linked in: skge mga ipt_TOS iptable_mangle ipt_MASQUERADE iptable_nat
ip_nat ipt_TCPMSS ipt_LOG ipt_limit ipt_state iptable_filter ip_tables
cpufreq_ondemand cpufreq_powersave snd_pcm_oss snd_mixer_oss nfsd exportfs
usb_storage uhci_hcd ehci_hcd 3c59x mii
Pid: 5709, comm: file Not tainted 2.6.14-gentoo-r5 #1
RIP: 0010:[<ffffffff80152002>] <ffffffff80152002>{__rmqueue+82}
RSP: 0000:ffff8100223d1c98  EFLAGS: 00010083
RAX: 38ffff8100017660 RBX: ffffffff80418da8 RCX: ffff810001840039
RDX: ffffffff80418e38 RSI: 0000000000000001 RDI: ffffffff80418dc0
RBP: ffffffff80418df0 R08: ffffffff80418e38 R09: 0000000000000001
R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
R13: ffffffff80418de0 R14: ffffffff80418da8 R15: ffff810001840011
FS:  00002aaaaaef86d0(0000) GS:ffffffff8053c800(0000) knlGS:0000000056299dc0
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00002aaaab223000 CR3: 000000002f77f000 CR4: 00000000000006e0
Process file (pid: 5709, threadinfo ffff8100223d0000, task ffff810005e38280)
Stack: 0000000025de4a28 ffffffff80418da8 ffffffff80418df0 0000000000000000 
ffffffff80418de0 000000000000000f 0000000000000000 ffffffff8015241c 
0000000f00000000 000000000000001f 
Call Trace:<ffffffff8015241c>{buffered_rmqueue+124}
<ffffffff80152793>{__alloc_pages+243}
<ffffffff8015d6e0>{do_no_page+240} <ffffffff8015dc37>{__handle_mm_fault+391}
<ffffffff8038c6b6>{do_page_fault+998} <ffffffff8010e159>{error_exit+0}

The following program was communicating on a totally different interface:

Code: 48 89 50 08 48 89 02 48 c7 41 08 00 02 20 00 48 c7 01 00 01 
RIP <ffffffff80152002>{__rmqueue+82} RSP <ffff8100223d1c98>
<6>NETDEV WATCHDOG: lan: transmit timed out
NETDEV WATCHDOG: lan: transmit timed out
NETDEV WATCHDOG: lan: transmit timed out
general protection fault: 0000 [4] 
CPU 0 
Modules linked in: skge mga ipt_TOS iptable_mangle ipt_MASQUERADE iptable_nat
ip_nat ipt_TCPMSS ipt_LOG ipt_limit ipt_state iptable_filter ip_tables
cpufreq_ondemand cpufreq_powersave snd_pcm_oss snd_mixer_oss nfsd exportfs
usb_storage uhci_hcd ehci_hcd 3c59x mii
Pid: 3443, comm: vncviewer Not tainted 2.6.14-gentoo-r5 #1
RIP: 0010:[<ffffffff80152002>] <ffffffff80152002>{__rmqueue+82}
RSP: 0018:ffff81000e225ab8  EFLAGS: 00010083
RAX: 38ffff8100017660 RBX: ffffffff80418da8 RCX: ffff810001840039
RDX: ffffffff80418e38 RSI: 0000000000000001 RDI: ffffffff80418dc0
RBP: 0000000000000000 R08: ffffffff80418e38 R09: 0000000000000002
R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000000
R13: 0000000000000001 R14: ffffffff80418da8 R15: ffff810001840011
FS:  00002aaaabb26f60(0000) GS:ffffffff8053c800(0000) knlGS:0000000056299dc0
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000633860 CR3: 0000000019ee4000 CR4: 00000000000006e0
Process vncviewer (pid: 3443, threadinfo ffff81000e224000, task ffff8100109d58d0)
Stack: 0000000100000046 ffffffff80418da8 0000000000000000 0000000000000000 
0000000000000001 ffffffff804191f8 0000000000000001 ffffffff801524a9 
ffff810015a9b1d8 ffff810015a9b1d8 
Call Trace:<ffffffff801524a9>{buffered_rmqueue+265}
<ffffffff8033df39>{ip_output+537}
<ffffffff80152793>{__alloc_pages+243} <ffffffff80155af1>{cache_alloc_refill+577}
<ffffffff801554c2>{__kmalloc+98} <ffffffff803213d8>{__alloc_skb+104}
<ffffffff8031f67a>{sock_alloc_send_skb+106} <ffffffff80352bd5>{tcp_v4_do_rcv+37}
<ffffffff80344d82>{tcp_recvmsg+1858} <ffffffff80371bcb>{unix_stream_sendmsg+395}
<ffffffff8031cfb8>{sock_aio_write+280} <ffffffff8016c733>{do_sync_write+211}
<ffffffff8017ee50>{__pollwait+0} <ffffffff80142180>{autoremove_wake_function+0}
<ffffffff80372787>{unix_ioctl+183} <ffffffff8031d3ec>{sock_ioctl+556}
<ffffffff8016c851>{vfs_write+225} <ffffffff8016c9d3>{sys_write+83}
<ffffffff8010d896>{system_call+126} 

Code: 48 89 50 08 48 89 02 48 c7 41 08 00 02 20 00 48 c7 01 00 01 
RIP <ffffffff80152002>{__rmqueue+82} RSP <ffff81000e225ab8>
<6>NETDEV WATCHDOG: lan: transmit timed out



Steps to reproduce:
I can try to reproduce if required.
Comment 1 Stephen Hemminger 2006-01-20 09:44:17 UTC
Already fixed in 2.6.16 and 2.6.15.1
Comment 2 Stephen Hemminger 2006-02-03 15:27:17 UTC
Fixed and verified in 2.6.16-rc1

Note You need to log in before you can comment on or make changes to this bug.