Bug 10990 - e1000/e1000e driver doesn't work with gigabit connection
Summary: e1000/e1000e driver doesn't work with gigabit connection
Status: RESOLVED PATCH_ALREADY_AVAILABLE
Alias: None
Product: Drivers
Classification: Unclassified
Component: Network (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Jesse Brandeburg
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-06-26 11:49 UTC by mjc
Modified: 2008-09-12 11:58 UTC (History)
4 users (show)

See Also:
Kernel Version: 2.6.25.6-55.fc9.i686
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
proposed fix for iAMT interaction (3.39 KB, patch)
2008-07-16 15:19 UTC, Jesse Brandeburg
Details | Diff

Description mjc 2008-06-26 11:49:14 UTC
Linux lyra 2.6.25.6-55.fc9.i686 #1 SMP Tue Jun 10 16:27:49 EDT 2008 i686 i686
i386 GNU/Linux

I have a multiple HP DC7700s with integrated Intel ethernet adapters. They work
when the network cable is plugged into a 10/100 port on a switch. They do not
work when plugged into a 10/100/1000 port. I also have DC7700s with integrated
Broadcom chips (tg3 driver) and they work fine.

From dmesg:
e1000e: Intel(R) PRO/1000 Network Driver - 0.2.0
e1000e: Copyright (c) 1999-2007 Intel Corporation.
...
eth0: (PCI Express:2.5GB/s:Width x1) 00:0f:fe:4a:68:37
eth0: Intel(R) PRO/1000 Network Connection
eth0: MAC: 4, PHY: 6, PBA No: 1002ff-0ff

When plugged into a 10/100/1000 porting, forcing:
ethtool -s eth0 autoneg off speed 1000 duplex full

results in erratic (some pings work) or dead operation (no pings work).

Using:
ethtool -s eth0 autoneg off speed 100 duplex full

seems to work OK.

Returning it to autoneg results in erratic/dead operation:

[root@lyra ~]# ethtool -s eth0 autoneg on
[root@lyra ~]# ethtool eth0
Settings for eth0:
	Supported ports: [ TP ]
	Supported link modes:   10baseT/Half 10baseT/Full 
	                        100baseT/Half 100baseT/Full 
	                        1000baseT/Full 
	Supports auto-negotiation: Yes
	Advertised link modes:  10baseT/Half 10baseT/Full 
	                        100baseT/Half 100baseT/Full 
	                        1000baseT/Full 
	Advertised auto-negotiation: Yes
	Speed: 1000Mb/s
	Duplex: Full
	Port: Twisted Pair
	PHYAD: 1
	Transceiver: internal
	Auto-negotiation: on
	Supports Wake-on: pumbag
	Wake-on: g
	Current message level: 0x00000001 (1)
	Link detected: yes
[root@lyra ~]# ping server2
PING server2.domain.avtechpulse.com (192.168.0.3) 56(84) bytes of data.
64 bytes from server2.domain.avtechpulse.com (192.168.0.3): icmp_seq=1 ttl=64
time=0.079 ms
64 bytes from server2.domain.avtechpulse.com (192.168.0.3): icmp_seq=2 ttl=64
time=0.158 ms
64 bytes from server2.domain.avtechpulse.com (192.168.0.3): icmp_seq=7 ttl=64
time=0.161 ms
64 bytes from server2.domain.avtechpulse.com (192.168.0.3): icmp_seq=11 ttl=64
time=0.113 ms
64 bytes from server2.domain.avtechpulse.com (192.168.0.3): icmp_seq=12 ttl=64
time=0.131 ms
64 bytes from server2.domain.avtechpulse.com (192.168.0.3): icmp_seq=17 ttl=64
time=0.127 ms
^C
--- server2.domain.avtechpulse.com ping statistics ---
18 packets transmitted, 6 received, 66% packet loss, time 17553ms
rtt min/avg/max/mdev = 0.079/0.128/0.161/0.028 ms
[root@lyra ~]# 

This was confirmed with multiple 10/100/1000 switches from different manufacturers.


[root@lyra ~]# more /etc/modprobe.conf
alias eth0 e1000e  #same behaviour with "e1000"
alias scsi_hostadapter libata
alias scsi_hostadapter1 ata_piix
alias snd-card-0 snd-hda-intel
options snd-card-0 index=0
options snd-hda-intel index=0


Need any additional data?

- Mike
Comment 1 Anonymous Emailer 2008-06-26 12:01:17 UTC
Reply-To: akpm@linux-foundation.org


(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Thu, 26 Jun 2008 11:49:14 -0700 (PDT)
bugme-daemon@bugzilla.kernel.org wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=10990
> 
>            Summary: e1000/e1000e driver doesn't work with gigabit connection
>            Product: Drivers
>            Version: 2.5
>      KernelVersion: 2.6.25.6-55.fc9.i686
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: Network
>         AssignedTo: jgarzik@pobox.com
>         ReportedBy: mjc@avtechpulse.com
> 
> 
> Linux lyra 2.6.25.6-55.fc9.i686 #1 SMP Tue Jun 10 16:27:49 EDT 2008 i686 i686
> i386 GNU/Linux
> 
> I have a multiple HP DC7700s with integrated Intel ethernet adapters. They
> work
> when the network cable is plugged into a 10/100 port on a switch. They do not
> work when plugged into a 10/100/1000 port. I also have DC7700s with
> integrated
> Broadcom chips (tg3 driver) and they work fine.
> 
> >From dmesg:
> e1000e: Intel(R) PRO/1000 Network Driver - 0.2.0
> e1000e: Copyright (c) 1999-2007 Intel Corporation.
> ...
> eth0: (PCI Express:2.5GB/s:Width x1) 00:0f:fe:4a:68:37
> eth0: Intel(R) PRO/1000 Network Connection
> eth0: MAC: 4, PHY: 6, PBA No: 1002ff-0ff
> 
> When plugged into a 10/100/1000 porting, forcing:
> ethtool -s eth0 autoneg off speed 1000 duplex full
> 
> results in erratic (some pings work) or dead operation (no pings work).
> 
> Using:
> ethtool -s eth0 autoneg off speed 100 duplex full
> 
> seems to work OK.
> 
> Returning it to autoneg results in erratic/dead operation:
> 
> [root@lyra ~]# ethtool -s eth0 autoneg on
> [root@lyra ~]# ethtool eth0
> Settings for eth0:
>         Supported ports: [ TP ]
>         Supported link modes:   10baseT/Half 10baseT/Full 
>                                 100baseT/Half 100baseT/Full 
>                                 1000baseT/Full 
>         Supports auto-negotiation: Yes
>         Advertised link modes:  10baseT/Half 10baseT/Full 
>                                 100baseT/Half 100baseT/Full 
>                                 1000baseT/Full 
>         Advertised auto-negotiation: Yes
>         Speed: 1000Mb/s
>         Duplex: Full
>         Port: Twisted Pair
>         PHYAD: 1
>         Transceiver: internal
>         Auto-negotiation: on
>         Supports Wake-on: pumbag
>         Wake-on: g
>         Current message level: 0x00000001 (1)
>         Link detected: yes
> [root@lyra ~]# ping server2
> PING server2.domain.avtechpulse.com (192.168.0.3) 56(84) bytes of data.
> 64 bytes from server2.domain.avtechpulse.com (192.168.0.3): icmp_seq=1 ttl=64
> time=0.079 ms
> 64 bytes from server2.domain.avtechpulse.com (192.168.0.3): icmp_seq=2 ttl=64
> time=0.158 ms
> 64 bytes from server2.domain.avtechpulse.com (192.168.0.3): icmp_seq=7 ttl=64
> time=0.161 ms
> 64 bytes from server2.domain.avtechpulse.com (192.168.0.3): icmp_seq=11
> ttl=64
> time=0.113 ms
> 64 bytes from server2.domain.avtechpulse.com (192.168.0.3): icmp_seq=12
> ttl=64
> time=0.131 ms
> 64 bytes from server2.domain.avtechpulse.com (192.168.0.3): icmp_seq=17
> ttl=64
> time=0.127 ms
> ^C
> --- server2.domain.avtechpulse.com ping statistics ---
> 18 packets transmitted, 6 received, 66% packet loss, time 17553ms
> rtt min/avg/max/mdev = 0.079/0.128/0.161/0.028 ms
> [root@lyra ~]# 
> 
> This was confirmed with multiple 10/100/1000 switches from different
> manufacturers.
> 
> 
> [root@lyra ~]# more /etc/modprobe.conf
> alias eth0 e1000e  #same behaviour with "e1000"
> alias scsi_hostadapter libata
> alias scsi_hostadapter1 ata_piix
> alias snd-card-0 snd-hda-intel
> options snd-card-0 index=0
> options snd-hda-intel index=0
> 
> 
> Need any additional data?
> 
> - Mike
> 
Comment 2 Auke Kok 2008-06-26 12:07:27 UTC
>> When plugged into a 10/100/1000 porting, forcing:
>> ethtool -s eth0 autoneg off speed 1000 duplex full

gigabit requires autonegotiation, and if you force speeds you would have to force
it on both ends anyway. The above configuration is therefore not a valid
configuration.

I suggest changing the autonegotiation mask, which leaves autoneg enabled:

	ethtool -s eth0 advertise 0x20

this is supported and should accomplish what you want.
Comment 3 mjc 2008-06-26 12:20:58 UTC
> I suggest changing the autonegotiation mask, which leaves autoneg enabled:
> 
>         ethtool -s eth0 advertise 0x20
> 
> this is supported and should accomplish what you want.

That doesn't fix anything for me - I still get unreliable operation:

[root@lyra ~]# ethtool -s eth0 advertise 0x20
...wait...
[root@lyra ~]# ethtool eth0
Settings for eth0:
	Supported ports: [ TP ]
	Supported link modes:   10baseT/Half 10baseT/Full 
	                        100baseT/Half 100baseT/Full 
	                        1000baseT/Full 
	Supports auto-negotiation: Yes
	Advertised link modes:  1000baseT/Full 
	Advertised auto-negotiation: Yes
	Speed: 1000Mb/s
	Duplex: Full
	Port: Twisted Pair
	PHYAD: 1
	Transceiver: internal
	Auto-negotiation: on
	Supports Wake-on: pumbag
	Wake-on: g
	Current message level: 0x00000001 (1)
	Link detected: yes
[root@lyra ~]# ping 192.168.0.3
PING 192.168.0.3 (192.168.0.3) 56(84) bytes of data.
64 bytes from 192.168.0.3: icmp_seq=3 ttl=64 time=0.163 ms
64 bytes from 192.168.0.3: icmp_seq=7 ttl=64 time=0.106 ms
64 bytes from 192.168.0.3: icmp_seq=8 ttl=64 time=0.127 ms
64 bytes from 192.168.0.3: icmp_seq=13 ttl=64 time=0.144 ms
64 bytes from 192.168.0.3: icmp_seq=17 ttl=64 time=0.118 ms
64 bytes from 192.168.0.3: icmp_seq=18 ttl=64 time=0.168 ms
^C
--- 192.168.0.3 ping statistics ---
19 packets transmitted, 6 received, 68% packet loss, time 18829ms
rtt min/avg/max/mdev = 0.106/0.137/0.168/0.026 ms
[root@lyra ~]# 

Note the packet loss.

These systems worked OK before upgrading to FC9.


- Mike
Comment 4 mjc 2008-06-27 02:28:41 UTC
Some extra data points / summary:

- 1 Gb is erratic/dead on DC7700 Intel-integrated network adapter with Fedora 9. 100 Mb is OK. Tested on multiple machines and switches.

- It was fine before the upgrade to F9

- DC7700s with embedded Broadcom chipsets are OK

- In one DC7700 I added a PCI Intel PRO/1000 Gigabit adapter in a free PCI slot (not pci-e), and it worked fine. So it is something specific about the embedded Intel network adapter.


- Mike
Comment 5 mjc 2008-06-27 02:29:50 UTC
Also, the problem occurs with both the original BIOS (1.05) and the latest applicable BIOS (1.14?).
Comment 6 Jesse Brandeburg 2008-07-16 14:33:41 UTC
Mike, we've heard several reports about this kind of problem, I think it is related to the driver not communicating to firmware that it is loaded.

If you're not using iAMT, you can fix this by disabling the iAMT management in the BIOS.

Looks like you have to hit CTRL-P as soon as the monitor light turns on.  This will get you into the Management bios options which should allow you to disable the management.  If you want to see the CTRL-P prompt at boot, you can turn it on in the BIOS options, under Advance Setup, MEBx Setup Prompt = Enabled.

see http://bizsupport.austin.hp.com/bc/docs/support/SupportManual/c01082181/c01082181.pdf

and the BIOS user guide at 
http://bizsupport.austin.hp.com/bc/docs/support/SupportManual/c01302182/c01302182.pdf

I'll be trying to get our internal lab to reproduce this so we can make sure the right patch gets to the in-kernel driver sooner rather than later.
Comment 7 Jesse Brandeburg 2008-07-16 15:19:51 UTC
Created attachment 16850 [details]
proposed fix for iAMT interaction

This patch is against linus' tree for v2.6.26 tag or later.  Let me know if you need a patch against another kernel.

this patch is untested.
Comment 8 mjc 2008-07-23 05:33:01 UTC
Jesse,

I disabled iAMT on one test computer, but it didn't seem to help anything.

I'm not set up to test kernel patches. (Is it possible to patch a Fedora 9 system easily?)

- Mike
Comment 9 mjc 2008-09-11 08:59:26 UTC
My previous comment was incorrect. Disabling iAMT (using ctrl+p) seems to fix the issue.

- Mike
Comment 10 Jesse Brandeburg 2008-09-12 11:18:30 UTC
Sorry Mike, I've pushed this patch upstream, and it is in 2.6.27-rc2 and newer.  I mentioned this in the redhat bugzilla too.

I don't know if there is a fedora-development kernel that you might be able to test.

If you're running fedora and have our sourceforge driver, you can try that as the fix was already in it.  rpmbuild -tb e1000e-0.4.1.7.tar.gz, then install that RPM.
Comment 11 mjc 2008-09-12 11:58:19 UTC
Thanks for the fix! The current rawhide kernels don't boot for me for other reasons, so I'll have to delaying the e1000e fix.

- Mike
Comment 12 mjc 2008-09-12 11:58:57 UTC
That should read "delay testing the fix".

Note You need to log in before you can comment on or make changes to this bug.