Bug 9836
Summary: | e1000 network driver doesn't recognize (and load on) its hardware | ||
---|---|---|---|
Product: | Drivers | Reporter: | Artem S. Tashkinov (aros) |
Component: | Network | Assignee: | Jesse Brandeburg (jbrandeb) |
Status: | REJECTED WILL_NOT_FIX | ||
Severity: | blocking | CC: | jbrandeb, jeffrey.t.kirsher |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.24 | Subsystem: | |
Regression: | --- | Bisected commit-id: | |
Attachments: | Miscellaneous hardware information including dmesg output. |
Description
Artem S. Tashkinov
2008-01-28 05:32:00 UTC
Created attachment 14620 [details]
Miscellaneous hardware information including dmesg output.
The kernel .config file is also attached.
Reply-To: akpm@linux-foundation.org On Mon, 28 Jan 2008 05:32:00 -0800 (PST) bugme-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=9836 > > Summary: e1000 network driver doesn't recognize (and load on) its > hardware > Product: Drivers > Version: 2.5 > KernelVersion: 2.6.24 > Platform: All > OS/Version: Linux > Tree: Mainline > Status: NEW > Severity: blocking > Priority: P1 > Component: Network > AssignedTo: jgarzik@pobox.com > ReportedBy: t.artem@mailcity.com > > > Latest working kernel version: not known > Earliest failing kernel version: every > Distribution: Fedora Core 7 > Hardware Environment: lspci output and dmesg information are attached > Software Environment: doesn't matter > Problem Description: recently e1000 network drivers stopped working when > right > after switching on or rebooting our Intel server. While trying to 'modprobe > e1000; ifconfig eth0 IP_address' I've got a failure and e1000 says hardware > is > not detected. After about a *hundred* attempts to modprobe and rmmod it, the > e1000 network driver finally loads and allows to set up networking. > > Steps to reproduce: reboot or power on your server. Try to up the network > which > is attached to your Intel PCI-X Pro 1000 NIC. > Odd. potentially a dead slot, card, overheating or other non-driver related issue. Check fan cooling and thermals, check if card properly sits in slot etc. try card in other slots, run memtest? re-test with older kernel version that worked correctly? Andrew Morton wrote: > On Mon, 28 Jan 2008 05:32:00 -0800 (PST) > bugme-daemon@bugzilla.kernel.org wrote: > >> http://bugzilla.kernel.org/show_bug.cgi?id=9836 >> Problem Description: recently e1000 network drivers stopped working >> when right after switching on or rebooting our Intel server. While >> trying to 'modprobe e1000; ifconfig eth0 IP_address' I've got a >> failure and e1000 says hardware is not detected. After about a >> *hundred* attempts to modprobe and rmmod it, the e1000 network >> driver finally loads and allows to set up networking. I'm not sure how you figured out rmmod/insmoding the driver gets you working, but I don't see any failure messages in the dmesg log you sent, I just see: Intel(R) PRO/1000 Network Driver - version 7.3.20-k2-NAPI Copyright (c) 1999-2006 Intel Corporation. ACPI: PCI Interrupt 0000:07:04.0[A] -> GSI 16 (level, low) -> IRQ 16 e1000: 0000:07:04.0: e1000_probe: (PCI:33MHz:32-bit) 00:04:23:cc:82:5e e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection ACPI: PCI interrupt for device 0000:07:04.0 disabled Intel(R) PRO/1000 Network Driver - version 7.3.20-k2-NAPI Copyright (c) 1999-2006 Intel Corporation. ACPI: PCI Interrupt 0000:07:04.0[A] -> GSI 16 (level, low) -> IRQ 16 e1000: 0000:07:04.0: e1000_probe: (PCI:33MHz:32-bit) 00:04:23:cc:82:5e e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection ACPI: PCI interrupt for device 0000:07:04.0 disabled Intel(R) PRO/1000 Network Driver - version 7.3.20-k2-NAPI Copyright (c) 1999-2006 Intel Corporation. ACPI: PCI Interrupt 0000:07:04.0[A] -> GSI 16 (level, low) -> IRQ 16 e1000: 0000:07:04.0: e1000_probe: (PCI:33MHz:32-bit) 00:04:23:cc:82:5e e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection e1000: eth0: e1000_watchdog: NIC Link is Up 100 Mbps Full Duplex, Flow Control: RX/TX I believe the PCI interrupt for device 7:4.0 disabled is you unloading the driver again each time, and I don't see any messages at all that indicate the driver knows anything is wrong. can you run 'ethtool -t eth0 offline' after rebooting, loading the module, ifconfig eth0 up, and waiting 4 seconds? what does ethtool eth0 say after loading the driver (before you ifconfig up it?) are you sure you're not having some udev problem? Can you send the console output or attach it to the bug? >> Steps to reproduce: reboot or power on your server. Try to up the >> network which is attached to your Intel PCI-X Pro 1000 NIC. This is a PCI only adapter, FYI. >> > > Odd. agreed, something doesn't add up here, yet. The NIC is onboard so I cannot move it into another slot :-) And I think it's a PCI-X version.
The bug first appeared with kernel 2.6.22.5 (the server has been working with this kernel for four months in a row without reboots then the reboot broke everything), yesterday I upgraded to 2.6.24 in a hope of resolving the bug. Unfortunately that didn't help.
> I'm not sure how you figured out rmmod/insmoding the driver gets you
> working, but I don't see any failure messages in the dmesg log you sent,
Right after modprobe e1000
# ifconfig eht0 IP_address
I get some SIGIOXXXXX error, after rmmod'ing and modprobing the module several times I get it working.
As for overheating - that's worth checking out. As soon as I have time to test I'll post the results of `ethtool`.
does reverting back to 2.6.22.5 make it work again? No, the situation is the same with both kernels (22 and 24). I suppose this problem is related to a faulty hardware. As soon as I have a physical access to that server again I'll post here an additional information. any update? I've got these errors while trying to set up the interface (Fedora 7 distro): SIOCSIFADDR: No such device eth0: unknown interface: No such device SIOCSIFNETMASK: No such device SIOCGIFADDR: No such device SIOCSIFBROADCAST: No such device To make the NIC work I wrote this script IP=192.168.0.1 while :; do ifconfig eth0 $IP netmask 255.255.255.0 UP=`ifconfig eth0 2> /dev/null | awk -F : '/inet addr/{print $2}' | awk '{print $1}'` test "$UP" == "$IP" && echo UP && /sbin/ifup eth0 && break sleep 1 rmmod e1000 done I still believe your hardware has a bug, I think we should just close this. Also, you script should wait at least 4 seconds for link to come up. But why? I know no other NIC that requires 4 seconds delay (after modprobe'ing) to set up a network interface. I'm closing the bug since I no longer look after that server and no one else in the world has such a problem. |