Bug 42711 - [BISECTED] e1000 driver fails to transmit or receive traffic
Summary: [BISECTED] e1000 driver fails to transmit or receive traffic
Status: RESOLVED OBSOLETE
Alias: None
Product: Drivers
Classification: Unclassified
Component: Network (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: drivers_network@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-02-01 17:48 UTC by Bruce Guenter
Modified: 2013-03-08 23:24 UTC (History)
5 users (show)

See Also:
Kernel Version: 3.2.2
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments

Description Bruce Guenter 2012-02-01 17:48:40 UTC
The e1000 driver is failing for me in 3.2.  It worked in 3.1.

On boot, dmesg reports that the driver is initialized, and:
Feb  1 14:06:34 lorien kernel: ADDRCONF(NETDEV_UP): eth1: link is not ready
Feb  1 14:06:34 lorien kernel: e1000: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
Feb  1 14:06:34 lorien kernel: ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready
Feb  1 14:06:34 lorien kernel: e1000: eth1 changing MTU from 1500 to 9000

However, pings go nowhere, and tcpdump reports no traffic.  tcpdump on a remote system fails to observe traffic coming from the affected system too.

I have bisected the problem down to the following commit:

commit a4010afef585b7142eb605e3a6e4210c0e1b2957
Author: Jesse Brandeburg <jesse.brandeburg@intel.com>
Date:   Wed Oct 5 07:24:41 2011 +0000

    e1000: convert hardware management from timers to threads

I tried reverting this on top of v3.2.2, but it doesn't go in cleanly due to other changes.
Comment 1 Bruce Guenter 2012-02-01 18:09:07 UTC
lspci shows:

03:06.0 Ethernet controller: Intel Corporation 82540EM Gigabit Ethernet Controller (rev 02)
	Subsystem: Intel Corporation PRO/1000 MT Desktop Adapter
	Flags: bus master, 66MHz, medium devsel, latency 64, IRQ 21
	Memory at febe0000 (32-bit, non-prefetchable) [size=128K]
	Memory at febc0000 (32-bit, non-prefetchable) [size=128K]
	I/O ports at e800 [size=64]
	Expansion ROM at feba0000 [disabled] [size=128K]
	Capabilities: [dc] Power Management version 2
	Capabilities: [e4] PCI-X non-bridge device
	Capabilities: [f0] MSI: Enable- Count=1/1 Maskable- 64bit+
	Kernel driver in use: e1000
	Kernel modules: e1000

I'm running x86_64 kernel if it matters.
Comment 2 Bruce Guenter 2012-04-19 20:06:54 UTC
This issue is still a problem in v3.3.1.  Reverting back to the version of the driver that is in 3.1 makes it work.
Comment 3 Roger 2012-04-25 06:21:41 UTC
I'm seeing similar with the e100 driver and >=kernel-3.2 versions.

To temporarily resolve, I'm using <=kernel-3.1 versions.
Comment 4 Roger 2012-04-25 06:25:36 UTC
FYI: even though syslog and lspci looks good, try using ifconfig to see if a eth0 device shows up.  I'm only showing a "lo" device after loading e100 within kernel-3.2.  (Might need to enable more kernel level debugging options.)
Comment 5 Bruce Guenter 2012-04-25 16:40:49 UTC
The device shows up no problem for me.  It configures, takes its static IP, and a route is added.  However, no traffic goes in or comes out of the interface.
Comment 6 Tushar 2012-08-30 17:30:40 UTC
Are you sure you are running with patch I applied while ago?
i.e.
From: Tushar Dave <tushar.n.dave@intel.com>
commit 8ce6909f77ba1b7bcdea65cc2388fd1742b6d669 upstream.
Killing reset task while adapter is resetting causes deadlock.
Only kill reset task if adapter is not resetting.
Ref bug #43132 on bugzilla.kernel.org
Comment 7 Bruce Guenter 2013-03-08 20:40:58 UTC
This is no longer an issue with recent kernels.
Comment 8 Roger 2013-03-08 23:24:58 UTC
I should mention, just ran into an issue with this Intel e1000 PCI card, where the driver would reset itself shortly after boot and then the network card would become unusable afterwords.  Reloading the module didn't help either.

After several reboots and cold starts, shortly after booting the O/S, again the driver/card would reset and still become unusable.

I then booted with SystemRescueCD and found that kernel and e1000 driver remarkably stable.  So, I rebooted back into my default Gentoo distro, and funny, the driver is now stable!

Think I'm seeing some weird low-level activity here.  Possible, I just need to vacuum the dust from all the sockets, but felt I should at least mention this scenario anyways.  I think the full error was "eeprom initialization fail" & reset -- but this is going from memory, and it could have just been a "reset" error logged.  Funny, I can no longer find any  of the saved error messages within /var/log/messages history!  But I vividly recall I had at least a "reset" message within /var/log/messages.

Note You need to log in before you can comment on or make changes to this bug.