Bug 14684 - e1000e jumbo frames failure
e1000e jumbo frames failure
Status: CLOSED UNREPRODUCIBLE
Product: Drivers
Classification: Unclassified
Component: Network
All Linux
: P1 normal
Assigned To: drivers_network@kernel-bugs.osdl.org
:
Depends on:
Blocks: 13615
  Show dependency treegraph
 
Reported: 2009-11-24 23:08 UTC by Nathan Grennan
Modified: 2010-12-18 00:48 UTC (History)
3 users (show)

See Also:
Kernel Version:
Tree: Mainline
Regression: Yes


Attachments

Description Nathan Grennan 2009-11-24 23:08:00 UTC
I am running kernel-2.6.31.5-127.fc12.x86_64 on two computers with one e1000e in each computer. They are connected via a crossover cable. I set the MTU to 9216, aka jumbo frames. It works for a while, and then the link stops working. If I change the MTU to 1500 on both ends, it starts working again. If I start with a MTU of 1500 on both ends, the link always works.

  Before upgrading both computers to F12 I was using kernel-2.6.30.9-96.fc11.x86_64 with no problems.
Comment 1 Andrew Morton 2009-11-24 23:34:47 UTC
(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Tue, 24 Nov 2009 23:08:00 GMT
bugzilla-daemon@bugzilla.kernel.org wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=14684
> 
>            Summary: e1000e jumbo frames failure
>            Product: Drivers
>            Version: 2.5
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: Network
>         AssignedTo: drivers_network@kernel-bugs.osdl.org
>         ReportedBy: kernel-bugzilla@cygnusx-1.org
>         Regression: No
> 
> 
> I am running kernel-2.6.31.5-127.fc12.x86_64 on two computers with one e1000e
> in each computer. They are connected via a crossover cable. I set the MTU to
> 9216, aka jumbo frames. It works for a while, and then the link stops working.
> If I change the MTU to 1500 on both ends, it starts working again. If I start
> with a MTU of 1500 on both ends, the link always works.
> 
>   Before upgrading both computers to F12 I was using
> kernel-2.6.30.9-96.fc11.x86_64 with no problems.
> 

Thanks, I'll mark this as a regression.
Comment 2 Bruce Allan 2009-11-25 00:30:53 UTC
>-----Original Message-----
>From: Andrew Morton [mailto:akpm@linux-foundation.org]
>Sent: Tuesday, November 24, 2009 3:34 PM
>To: Kirsher, Jeffrey T; Brandeburg, Jesse; Allan, Bruce W; Waskiewicz Jr,
>Peter P; Ronciak, John
>Cc: bugzilla-daemon@bugzilla.kernel.org; bugme-daemon@bugzilla.kernel.org;
>e1000-devel@lists.sourceforge.net; kernel-bugzilla@cygnusx-1.org
>Subject: Re: [Bugme-new] [Bug 14684] New: e1000e jumbo frames failure
>
>
>(switched to email.  Please respond via emailed reply-to-all, not via the
>bugzilla web interface).
>
>On Tue, 24 Nov 2009 23:08:00 GMT
>bugzilla-daemon@bugzilla.kernel.org wrote:
>
>> http://bugzilla.kernel.org/show_bug.cgi?id=14684
>>
>>            Summary: e1000e jumbo frames failure
>>            Product: Drivers
>>            Version: 2.5
>>           Platform: All
>>         OS/Version: Linux
>>               Tree: Mainline
>>             Status: NEW
>>           Severity: normal
>>           Priority: P1
>>          Component: Network
>>         AssignedTo: drivers_network@kernel-bugs.osdl.org
>>         ReportedBy: kernel-bugzilla@cygnusx-1.org
>>         Regression: No
>>
>>
>> I am running kernel-2.6.31.5-127.fc12.x86_64 on two computers with one
>e1000e
>> in each computer. They are connected via a crossover cable. I set the
>MTU to
>> 9216, aka jumbo frames. It works for a while, and then the link stops
>working.
>> If I change the MTU to 1500 on both ends, it starts working again. If I
>start
>> with a MTU of 1500 on both ends, the link always works.
>>
>>   Before upgrading both computers to F12 I was using
>> kernel-2.6.30.9-96.fc11.x86_64 with no problems.
>>
>
>Thanks, I'll mark this as a regression.

You didn't say which device supported by e1000e you have.  Please provide the output of lspci and any pertinent messages that may be in your system log.  The output of 'ethtool -S ethX' both before increasing your mtu and after increasing the mtu and it stops working might also help (where ethX is your interface name).

Thanks,
Bruce.
Comment 3 Nathan Grennan 2009-11-25 01:03:04 UTC
  I haven't rebooted since I got errors with jumbo frames. So there may be clues in the ethtool data from the problem.

  Both computers are desktops, and both cards are PCI-E 1x.

  I use the link almost exclusively for iSCSI. The exceptions are testing like icmp when it fails.

Computer 1:
4:00.0 Ethernet controller: Intel Corporation 82572EI Gigabit Ethernet Controller (Copper) (rev 06)
        Subsystem: Intel Corporation PRO/1000 PT Desktop Adapter
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 32 bytes
        Interrupt: pin A routed to IRQ 30
        Region 0: Memory at fe9e0000 (32-bit, non-prefetchable) [size=128K]
        Region 1: Memory at fe9c0000 (32-bit, non-prefetchable) [size=128K]
        Region 2: I/O ports at bc00 [size=32]
        Expansion ROM at fe9a0000 [disabled] [size=128K]
        Capabilities: [c8] Power Management version 2
                Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
        Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
                Address: 00000000fee0200c  Data: 41d9
        Capabilities: [e0] Express (v1) Endpoint, MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset-
                DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
                        RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- UncorrErr+ FatalErr- UnsuppReq+ AuxPwr+ TransPend-
                LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s, Latency L0 <4us, L1 <64us
                        ClockPM- Surprise- LLActRep- BwNot-
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
        Capabilities: [100] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr+ BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
                AERCap: First Error Pointer: 14, GenCap- CGenEn- ChkCap- ChkEn-
        Capabilities: [140] Device Serial Number 00-1b-21-ff-ff-14-ef-d7
        Kernel driver in use: e1000e
        Kernel modules: e1000e

"ethtool -S eth2" output at mtu 1500:

NIC statistics:
     rx_packets: 10620133
     tx_packets: 15304833
     rx_bytes: 10406062051
     tx_bytes: 15657625639
     rx_broadcast: 5
     tx_broadcast: 1164
     rx_multicast: 280
     tx_multicast: 588
     rx_errors: 0
     tx_errors: 0
     tx_dropped: 0
     multicast: 280
     collisions: 0
     rx_length_errors: 0
     rx_over_errors: 0
     rx_crc_errors: 0
     rx_frame_errors: 0
     rx_no_buffer_count: 0
     rx_missed_errors: 0
     tx_aborted_errors: 0
     tx_carrier_errors: 0
     tx_fifo_errors: 0
     tx_heartbeat_errors: 0
     tx_window_errors: 0
     tx_abort_late_coll: 0
     tx_deferred_ok: 2088958
     tx_single_coll_ok: 0
     tx_multi_coll_ok: 0
     tx_timeout_count: 0
     tx_restart_queue: 1188
     rx_long_length_errors: 0
     rx_short_length_errors: 0
     rx_align_errors: 0
     tx_tcp_seg_good: 2643618
     tx_tcp_seg_failed: 0
     rx_flow_control_xon: 3176949638
     rx_flow_control_xoff: 79245597
     tx_flow_control_xon: 357771
     tx_flow_control_xoff: 11857745
     rx_long_byte_count: 10406062051
     rx_csum_offload_good: 10572127
     rx_csum_offload_errors: 0
     rx_header_split: 2508581
     alloc_rx_buff_failed: 0
     tx_smbus: 0
     rx_smbus: 0
     dropped_smbus: 0
     rx_dma_failed: 0
     tx_dma_failed: 0

Computer 2:
03:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection
        Subsystem: Intel Corporation Gigabit CT Desktop Adapter
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 17
        Region 0: Memory at fd2c0000 (32-bit, non-prefetchable) [size=128K]
        Region 1: Memory at fd200000 (32-bit, non-prefetchable) [size=512K]
        Region 2: I/O ports at bf00 [size=32]
        Region 3: Memory at fd2fc000 (32-bit, non-prefetchable) [size=16K]
        [virtual] Expansion ROM at fd100000 [disabled] [size=256K]
        Capabilities: [c8] Power Management version 2
                Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
        Capabilities: [d0] MSI: Enable- Count=1/1 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [e0] Express (v1) Endpoint, MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
                DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
                        RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
                LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Latency L0 <128ns, L1 <64us
                        ClockPM- Surprise- LLActRep- BwNot-
                LnkCtl: ASPM L0s Enabled; RCB 64 bytes Disabled- Retrain- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
        Capabilities: [a0] MSI-X: Enable+ Count=5 Masked-
                Vector table: BAR=3 offset=00000000
                PBA: BAR=3 offset=00002000
        Capabilities: [100] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr+ BadTLP+ BadDLLP+ Rollover- Timeout+ NonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
        Capabilities: [140] Device Serial Number 00-1b-21-ff-ff-2c-d4-bc
        Kernel driver in use: e1000e
        Kernel modules: e1000e


"ethtool -S eth1" output at mtu 1500:

NIC statistics:
     rx_packets: 15287659
     tx_packets: 10601666
     rx_bytes: 15620091082
     tx_bytes: 10378919127
     rx_broadcast: 27
     tx_broadcast: 5
     rx_multicast: 510
     tx_multicast: 287
     rx_errors: 5
     tx_errors: 0
     tx_dropped: 0
     multicast: 510
     collisions: 0
     rx_length_errors: 5
     rx_over_errors: 0
     rx_crc_errors: 0
     rx_frame_errors: 0
     rx_no_buffer_count: 0
     rx_missed_errors: 6641
     tx_aborted_errors: 0
     tx_carrier_errors: 0
     tx_fifo_errors: 0
     tx_heartbeat_errors: 0
     tx_window_errors: 0
     tx_abort_late_coll: 0
     tx_deferred_ok: 0
     tx_single_coll_ok: 0
     tx_multi_coll_ok: 0
     tx_timeout_count: 2
     tx_restart_queue: 0
     rx_long_length_errors: 5
     rx_short_length_errors: 0
     rx_align_errors: 0
     tx_tcp_seg_good: 766110
     tx_tcp_seg_failed: 0
     rx_flow_control_xon: 357714
     rx_flow_control_xoff: 11855647
     tx_flow_control_xon: 3175077458
     tx_flow_control_xoff: 79245597
     rx_long_byte_count: 15620091082
     rx_csum_offload_good: 15239778
     rx_csum_offload_errors: 0
     rx_header_split: 2562045
     alloc_rx_buff_failed: 0
     tx_smbus: 0
     rx_smbus: 0
     dropped_smbus: 0
     rx_dma_failed: 0
     tx_dma_failed: 0
Comment 4 Florian Mickler 2010-10-07 21:21:42 UTC
Is this fixed in the meantime? 

There was http://patchwork.ozlabs.org/patch/34339/ which made it into mainline as:

commit a825e00c98a2ee37eb2a0ad93b352e79d2bc1593
Author: Alexander Duyck <alexander.h.duyck@intel.com>
Date:   Fri Oct 2 12:30:42 2009 +0000

    e1000e: swap max hw supported frame size between 82574 and 82583


which could affect one of your machines. (check bug #14261)
Comment 5 Florian Mickler 2010-12-18 00:41:09 UTC
I'm closing this as unreproducible. If that is incorrect, please shout.

Note You need to log in before you can comment on or make changes to this bug.