Most recent kernel where this bug did not occur: unknown Distribution: gentoo Hardware Environment: Athlon64 KT800 Yukon-Lite rev 7 (Marvell 88E8001 rev 13) Gigabit network (other side is e1000) Software Environment: 2.6.14.2 kernel ethtool 3 Problem Description: While trying to speed up traffic between my two home machines I started tuning the network card options with ethtool. Everything went nicely and I got rid of RX overruns until I did the following commands: %> ethtool -g lan Ring parameters for lan: Pre-set maximums: RX: 4096 RX Mini: 0 RX Jumbo: 0 TX: 1024 Current hardware settings: RX: 512 RX Mini: 0 RX Jumbo: 0 TX: 256 %> ethtool -G lan rx 4096 %> netperf -cC -f M -t TCP_STREAM -H dual -- -C - did not seem to do anything so ^C %> ethtool -G lan rx 2048 Killed %> ethtool -G lan rx 2048 - has not yet returned Before this the network card was configured to 1000gig etherenet with forced rx and tx flow control. MTU was 9000. There was no traffic when the above commands were run. The system log contained the following oops: skge lan: Link is up at 1000 Mbps, full duplex, flow control tx and rx skge lan: disabling interface skge lan: enabling interface NETDEV WATCHDOG: lan: transmit timed out NETDEV WATCHDOG: lan: transmit timed out skge lan: disabling interface Unable to handle kernel NULL pointer dereference at 0000000000000008 RIP: <ffffffff8805b990>{:skge:skge_rx_clean+16} PGD 9586067 PUD 1986e067 PMD 0 Oops: 0000 [1] CPU 0 Modules linked in: ipt_LOG ipt_TOS ipt_TCPMSS ipt_tos ip_nat_ftp ipt_tcpmss ip_nat_irc ip_conntrack_irc ipt_multiport ipt_state ipt_limit ipt_conntrack ip_conntrack_ftp iptable_mangle nfsd exportfs skge iptable_nat ipt_MASQUERADE ip_nat iptable_filter ip_tables 3c59x mii mga cpufreq_ondemand cpufreq_powersave snd_pcm_oss snd_mixer_oss usb_storage uhci_hcd ehci_hcd Pid: 514, comm: ethtool Not tainted 2.6.14-gentoo-r2 #2 RIP: 0010:[<ffffffff8805b990>] <ffffffff8805b990>{:skge:skge_rx_clean+16} RSP: 0018:ffff810004fdf9e8 EFLAGS: 00010286 RAX: ffff810004fdffd8 RBX: 0000000000000000 RCX: 000000000d1224f8 RDX: 00000000000377d6 RSI: 0000000000002880 RDI: ffff810032868ba0 RBP: ffff810032868be0 R08: 0000000000000005 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: ffff81003e7f2dc0 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000001023 FS: 00002aaaaade8ae0(0000) GS:ffffffff80545800(0000) knlGS:000000005627ddb0 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000008 CR3: 000000000349a000 CR4: 00000000000006e0 Process ethtool (pid: 514, threadinfo ffff810004fde000, task ffff810033553790) Stack: ffff81003e7f2dc0 ffff81003e7f2dc0 ffff810032868ba0 ffffffff8805dee1 ffff810032868800 ffff810032868800 0000000000514b20 ffffffff88063260 00007fffffa1c660 ffffffff8805dfd9 Call Trace:<ffffffff8805dee1>{:skge:skge_down+817} <ffffffff8805dfd9>{:skge:skge_set_ring_param+73} <ffffffff8032f684>{dev_ethtool+1956} <ffffffff8015e880>{do_no_page+1040} <ffffffff80153463>{__alloc_pages+275} <ffffffff8015e63a>{do_no_page+458} <ffffffff8015eab7>{__handle_mm_fault+391} <ffffffff8022c834>{prio_tree_insert+484} <ffffffff8023bd7a>{extract_buf+266} <ffffffff8023bd7a>{extract_buf+266} <ffffffff8014eeae>{find_get_page+14} <ffffffff8014fe5a>{filemap_nopage+394} <ffffffff8015e88a>{do_no_page+1050} <ffffffff8032e555>{netdev_run_todo+53} <ffffffff8032e052>{dev_ioctl+578} <ffffffff8036589a>{inet_ioctl+138} <ffffffff8032378c>{sock_ioctl+604} <ffffffff8017ffe1>{do_ioctl+33} <ffffffff801802ab>{vfs_ioctl+651} <ffffffff8018031d>{sys_ioctl+77} <ffffffff8010d8a6>{system_call+126} Code: 48 8b 43 08 c7 00 00 00 00 00 48 83 7b 10 00 74 4d 49 8b 4c RIP <ffffffff8805b990>{:skge:skge_rx_clean+16} RSP <ffff810004fdf9e8> CR2: 0000000000000008 <6>NETDEV WATCHDOG: lan: transmit timed out NETDEV WATCHDOG: lan: transmit timed out NETDEV WATCHDOG: lan: transmit timed out NETDEV WATCHDOG: lan: transmit timed out Here is what I got with SysRq+T about the last ethtool instance is now hanging: ethtool D ffffffff8044c160 0 515 6826 (NOTLB) ffff81002c50dd28 0000000000000086 879c317800000286 ffff81002c50de58 0000000000000004 ffff81003f285710 ffffffff80410740 ffff81003f285928 ffff81002c50dec8 ffffffff8023bd7a Call Trace:<ffffffff8023bd7a>{extract_buf+266} <ffffffff8039244f>{__down+143} <ffffffff8012ba10>{default_wake_function+0} <ffffffff8039224a>{__down_failed+53} <ffffffff80323530>{sock_ioctl+0} <ffffffff80335c62>{.text.lock.rtnetlink+5} <ffffffff8032e04a>{dev_ioctl+570} <ffffffff8036589a>{inet_ioctl+138} <ffffffff8032378c>{sock_ioctl+604} <ffffffff8017ffe1>{do_ioctl+33} <ffffffff801802ab>{vfs_ioctl+651} <ffffffff8018031d>{sys_ioctl+77} <ffffffff8010d8a6>{system_call+126} Steps to reproduce: I can try to reproduce if required.
Stephen, this is your driver.
Problem is caused by the ring parameter changes being unable to allocate enough memory. The driver doesn't return an error, leaves the device up, and generally goes down hill from there...
patch submitted to jeff as part of 0.13 (hopefully for 2.6.15)
If it's small enough, could you also submit it for 2.6.14.5?