Bug 5715 - oops in skge when changing rx buffer size with ethtool
Summary: oops in skge when changing rx buffer size with ethtool
Status: RESOLVED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: Network (show other bugs)
Hardware: i386 Linux
: P2 normal
Assignee: Stephen Hemminger
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-12-09 07:41 UTC by Mikko Tiihonen
Modified: 2005-12-14 15:51 UTC (History)
2 users (show)

See Also:
Kernel Version: 2.6.14.2 (2.6.14-gentoo-r2)
Subsystem:
Regression: ---
Bisected commit-id:


Attachments

Description Mikko Tiihonen 2005-12-09 07:41:33 UTC
Most recent kernel where this bug did not occur: unknown
Distribution: gentoo
Hardware Environment:
 Athlon64 KT800
 Yukon-Lite rev 7 (Marvell 88E8001 rev 13)
 Gigabit network (other side is e1000)
Software Environment:
 2.6.14.2 kernel
 ethtool 3
Problem Description:

While trying to speed up traffic between my two home machines I started tuning
the network card options with ethtool. Everything went nicely and I got rid of
RX overruns until I did the following commands:

%> ethtool -g lan
Ring parameters for lan:
Pre-set maximums:
RX:             4096
RX Mini:        0
RX Jumbo:       0
TX:             1024
Current hardware settings:
RX:             512
RX Mini:        0
RX Jumbo:       0
TX:             256

%> ethtool -G lan rx 4096

%> netperf -cC -f M -t TCP_STREAM -H dual -- -C
- did not seem to do anything so ^C

%> ethtool -G lan rx 2048
Killed

%> ethtool -G lan rx 2048
- has not yet returned

Before this the network card was configured to 1000gig etherenet with forced rx
and tx flow control. MTU was 9000. There was no traffic when the above commands
were run.

The system log contained the following oops:

skge lan: Link is up at 1000 Mbps, full duplex, flow control tx and rx
skge lan: disabling interface
skge lan: enabling interface
NETDEV WATCHDOG: lan: transmit timed out
NETDEV WATCHDOG: lan: transmit timed out
skge lan: disabling interface
Unable to handle kernel NULL pointer dereference at 0000000000000008 RIP:
<ffffffff8805b990>{:skge:skge_rx_clean+16}
PGD 9586067 PUD 1986e067 PMD 0
Oops: 0000 [1]
CPU 0
Modules linked in: ipt_LOG ipt_TOS ipt_TCPMSS ipt_tos ip_nat_ftp ipt_tcpmss 
ip_nat_irc ip_conntrack_irc ipt_multiport ipt_state ipt_limit ipt_conntrack 
ip_conntrack_ftp iptable_mangle nfsd exportfs skge iptable_nat ipt_MASQUERADE 
ip_nat iptable_filter ip_tables 3c59x mii mga cpufreq_ondemand 
cpufreq_powersave snd_pcm_oss snd_mixer_oss usb_storage uhci_hcd ehci_hcd
Pid: 514, comm: ethtool Not tainted 2.6.14-gentoo-r2 #2
RIP: 0010:[<ffffffff8805b990>] <ffffffff8805b990>{:skge:skge_rx_clean+16}
RSP: 0018:ffff810004fdf9e8  EFLAGS: 00010286
RAX: ffff810004fdffd8 RBX: 0000000000000000 RCX: 000000000d1224f8
RDX: 00000000000377d6 RSI: 0000000000002880 RDI: ffff810032868ba0
RBP: ffff810032868be0 R08: 0000000000000005 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: ffff81003e7f2dc0
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000001023
FS:  00002aaaaade8ae0(0000) GS:ffffffff80545800(0000) knlGS:000000005627ddb0
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000008 CR3: 000000000349a000 CR4: 00000000000006e0
Process ethtool (pid: 514, threadinfo ffff810004fde000, task ffff810033553790)
Stack: ffff81003e7f2dc0 ffff81003e7f2dc0 ffff810032868ba0 ffffffff8805dee1
        ffff810032868800 ffff810032868800 0000000000514b20 ffffffff88063260
        00007fffffa1c660 ffffffff8805dfd9
Call Trace:<ffffffff8805dee1>{:skge:skge_down+817}
<ffffffff8805dfd9>{:skge:skge_set_ring_param+73}
        <ffffffff8032f684>{dev_ethtool+1956} <ffffffff8015e880>{do_no_page+1040}
        <ffffffff80153463>{__alloc_pages+275} <ffffffff8015e63a>{do_no_page+458}
        <ffffffff8015eab7>{__handle_mm_fault+391}
<ffffffff8022c834>{prio_tree_insert+484}
        <ffffffff8023bd7a>{extract_buf+266} <ffffffff8023bd7a>{extract_buf+266}
        <ffffffff8014eeae>{find_get_page+14} <ffffffff8014fe5a>{filemap_nopage+394}
        <ffffffff8015e88a>{do_no_page+1050} <ffffffff8032e555>{netdev_run_todo+53}
        <ffffffff8032e052>{dev_ioctl+578} <ffffffff8036589a>{inet_ioctl+138}
        <ffffffff8032378c>{sock_ioctl+604} <ffffffff8017ffe1>{do_ioctl+33}
        <ffffffff801802ab>{vfs_ioctl+651} <ffffffff8018031d>{sys_ioctl+77}
        <ffffffff8010d8a6>{system_call+126}

Code: 48 8b 43 08 c7 00 00 00 00 00 48 83 7b 10 00 74 4d 49 8b 4c
RIP <ffffffff8805b990>{:skge:skge_rx_clean+16} RSP <ffff810004fdf9e8>
CR2: 0000000000000008
  <6>NETDEV WATCHDOG: lan: transmit timed out
NETDEV WATCHDOG: lan: transmit timed out
NETDEV WATCHDOG: lan: transmit timed out
NETDEV WATCHDOG: lan: transmit timed out
Here is what I got with SysRq+T about the last ethtool instance is now hanging:

ethtool       D ffffffff8044c160     0   515   6826                     (NOTLB)
ffff81002c50dd28 0000000000000086 879c317800000286 ffff81002c50de58
0000000000000004 ffff81003f285710 ffffffff80410740 ffff81003f285928
ffff81002c50dec8 ffffffff8023bd7a
Call Trace:<ffffffff8023bd7a>{extract_buf+266} <ffffffff8039244f>{__down+143}
<ffffffff8012ba10>{default_wake_function+0} 
<ffffffff8039224a>{__down_failed+53}
<ffffffff80323530>{sock_ioctl+0} <ffffffff80335c62>{.text.lock.rtnetlink+5}
<ffffffff8032e04a>{dev_ioctl+570} <ffffffff8036589a>{inet_ioctl+138}
<ffffffff8032378c>{sock_ioctl+604} <ffffffff8017ffe1>{do_ioctl+33}
<ffffffff801802ab>{vfs_ioctl+651} <ffffffff8018031d>{sys_ioctl+77}
<ffffffff8010d8a6>{system_call+126}

Steps to reproduce:
I can try to reproduce if required.
Comment 1 Adrian Bunk 2005-12-11 12:49:37 UTC
Stephen, this is your driver.
Comment 2 Stephen Hemminger 2005-12-13 16:08:26 UTC
Problem is caused by the ring parameter changes being unable to allocate enough
memory. The driver doesn't return an error, leaves the device up, and generally
goes down hill from there...
Comment 3 Stephen Hemminger 2005-12-14 15:45:54 UTC
patch submitted to jeff as part of 0.13 (hopefully for 2.6.15)
Comment 4 Adrian Bunk 2005-12-14 15:51:19 UTC
If it's small enough, could you also submit it for 2.6.14.5?

Note You need to log in before you can comment on or make changes to this bug.