Bug 9836 - e1000 network driver doesn't recognize (and load on) its hardware
Summary: e1000 network driver doesn't recognize (and load on) its hardware
Status: REJECTED WILL_NOT_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: Network (show other bugs)
Hardware: All Linux
: P1 blocking
Assignee: Jesse Brandeburg
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-01-28 05:32 UTC by Artem S. Tashkinov
Modified: 2008-10-06 14:10 UTC (History)
2 users (show)

See Also:
Kernel Version: 2.6.24
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
Miscellaneous hardware information including dmesg output. (24.78 KB, application/octet-stream)
2008-01-28 05:33 UTC, Artem S. Tashkinov
Details

Description Artem S. Tashkinov 2008-01-28 05:32:00 UTC
Latest working kernel version: not known
Earliest failing kernel version: every
Distribution: Fedora Core 7
Hardware Environment: lspci output and dmesg information are attached
Software Environment: doesn't matter
Problem Description: recently e1000 network drivers stopped working when right after switching on or rebooting our Intel server. While trying to 'modprobe e1000; ifconfig eth0 IP_address' I've got a failure and e1000 says hardware is not detected. After about a *hundred* attempts to modprobe and rmmod it, the e1000 network driver finally loads and allows to set up networking.

Steps to reproduce: reboot or power on your server. Try to up the network which is attached to your Intel PCI-X Pro 1000 NIC.
Comment 1 Artem S. Tashkinov 2008-01-28 05:33:13 UTC
Created attachment 14620 [details]
Miscellaneous hardware information including dmesg output.

The kernel .config file is also attached.
Comment 2 Anonymous Emailer 2008-01-28 13:44:09 UTC
Reply-To: akpm@linux-foundation.org

On Mon, 28 Jan 2008 05:32:00 -0800 (PST)
bugme-daemon@bugzilla.kernel.org wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=9836
> 
>            Summary: e1000 network driver doesn't recognize (and load on) its
>                     hardware
>            Product: Drivers
>            Version: 2.5
>      KernelVersion: 2.6.24
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: blocking
>           Priority: P1
>          Component: Network
>         AssignedTo: jgarzik@pobox.com
>         ReportedBy: t.artem@mailcity.com
> 
> 
> Latest working kernel version: not known
> Earliest failing kernel version: every
> Distribution: Fedora Core 7
> Hardware Environment: lspci output and dmesg information are attached
> Software Environment: doesn't matter
> Problem Description: recently e1000 network drivers stopped working when
> right
> after switching on or rebooting our Intel server. While trying to 'modprobe
> e1000; ifconfig eth0 IP_address' I've got a failure and e1000 says hardware
> is
> not detected. After about a *hundred* attempts to modprobe and rmmod it, the
> e1000 network driver finally loads and allows to set up networking.
> 
> Steps to reproduce: reboot or power on your server. Try to up the network
> which
> is attached to your Intel PCI-X Pro 1000 NIC.
> 

Odd.
Comment 3 Auke Kok 2008-01-28 14:24:04 UTC
potentially a dead slot, card, overheating or other non-driver related issue. Check fan cooling and thermals, check if card properly sits in slot etc.

try card in other slots, run memtest?

re-test with older kernel version that worked correctly?
Comment 4 Jesse Brandeburg 2008-01-28 14:24:49 UTC
Andrew Morton wrote:
> On Mon, 28 Jan 2008 05:32:00 -0800 (PST)
> bugme-daemon@bugzilla.kernel.org wrote:
> 
>> http://bugzilla.kernel.org/show_bug.cgi?id=9836
>> Problem Description: recently e1000 network drivers stopped working
>> when right after switching on or rebooting our Intel server. While
>> trying to 'modprobe e1000; ifconfig eth0 IP_address' I've got a
>> failure and e1000 says hardware is not detected. After about a
>> *hundred* attempts to modprobe and rmmod it, the e1000 network
>> driver finally loads and allows to set up networking. 

I'm not sure how you figured out rmmod/insmoding the driver gets you
working, but I don't see any failure messages in the dmesg log you sent,
I just see:

Intel(R) PRO/1000 Network Driver - version 7.3.20-k2-NAPI
Copyright (c) 1999-2006 Intel Corporation.
ACPI: PCI Interrupt 0000:07:04.0[A] -> GSI 16 (level, low) -> IRQ 16
e1000: 0000:07:04.0: e1000_probe: (PCI:33MHz:32-bit) 00:04:23:cc:82:5e
e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection
ACPI: PCI interrupt for device 0000:07:04.0 disabled
Intel(R) PRO/1000 Network Driver - version 7.3.20-k2-NAPI
Copyright (c) 1999-2006 Intel Corporation.
ACPI: PCI Interrupt 0000:07:04.0[A] -> GSI 16 (level, low) -> IRQ 16
e1000: 0000:07:04.0: e1000_probe: (PCI:33MHz:32-bit) 00:04:23:cc:82:5e
e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection
ACPI: PCI interrupt for device 0000:07:04.0 disabled
Intel(R) PRO/1000 Network Driver - version 7.3.20-k2-NAPI
Copyright (c) 1999-2006 Intel Corporation.
ACPI: PCI Interrupt 0000:07:04.0[A] -> GSI 16 (level, low) -> IRQ 16
e1000: 0000:07:04.0: e1000_probe: (PCI:33MHz:32-bit) 00:04:23:cc:82:5e
e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection
e1000: eth0: e1000_watchdog: NIC Link is Up 100 Mbps Full Duplex, Flow
Control: RX/TX

I believe the PCI interrupt for device 7:4.0 disabled is you unloading
the driver again each time, and I don't see any messages at all that
indicate the driver knows anything is wrong.

can you run 'ethtool -t eth0 offline' after rebooting, loading the
module, ifconfig eth0 up, and waiting 4 seconds?

what does ethtool eth0 say after loading the driver (before you ifconfig
up it?)  are you sure you're not having some udev problem?  Can you send
the console output or attach it to the bug?

>> Steps to reproduce: reboot or power on your server. Try to up the
>> network which is attached to your Intel PCI-X Pro 1000 NIC.

This is a PCI only adapter, FYI.

>> 
> 
> Odd.

agreed, something doesn't add up here, yet.
Comment 5 Artem S. Tashkinov 2008-01-28 23:46:43 UTC
The NIC is onboard so I cannot move it into another slot :-) And I think it's a PCI-X version.

The bug first appeared with kernel 2.6.22.5 (the server has been working with this kernel for four months in a row without reboots then the reboot broke everything), yesterday I upgraded to 2.6.24 in a hope of resolving the bug. Unfortunately that didn't help.

> I'm not sure how you figured out rmmod/insmoding the driver gets you
> working, but I don't see any failure messages in the dmesg log you sent,

Right after modprobe e1000

# ifconfig eht0 IP_address

I get some SIGIOXXXXX error, after rmmod'ing and modprobing the module several times I get it working.

As for overheating - that's worth checking out. As soon as I have time to test I'll post the results of `ethtool`.
Comment 6 Auke Kok 2008-01-30 09:41:23 UTC
does reverting back to 2.6.22.5 make it work again?
Comment 7 Artem S. Tashkinov 2008-01-31 01:12:07 UTC
No, the situation is the same with both kernels (22 and 24). I suppose this problem is related to a faulty hardware. As soon as I have a physical access to that server again I'll post here an additional information.
Comment 8 Jesse Brandeburg 2008-03-02 20:26:07 UTC
any update?
Comment 9 Artem S. Tashkinov 2008-03-02 22:18:59 UTC
I've got these errors while trying to set up the interface (Fedora 7 distro):

SIOCSIFADDR: No such device                                                     eth0: unknown interface: No such device                                         SIOCSIFNETMASK: No such device                                                  SIOCGIFADDR: No such device                                                     SIOCSIFBROADCAST: No such device

To make the NIC work I wrote this script

IP=192.168.0.1

while :; do
    ifconfig eth0 $IP netmask 255.255.255.0
    UP=`ifconfig eth0 2> /dev/null | awk -F : '/inet addr/{print $2}' | awk '{print $1}'`
    test "$UP" == "$IP" && echo UP && /sbin/ifup eth0 && break
    sleep 1
    rmmod e1000
done
Comment 10 Jesse Brandeburg 2008-10-06 12:39:52 UTC
I still believe your hardware has a bug, I think we should just close this.
Comment 11 Jesse Brandeburg 2008-10-06 12:42:11 UTC
Also, you script should wait at least 4 seconds for link to come up.
Comment 12 Artem S. Tashkinov 2008-10-06 14:10:16 UTC
But why? I know no other NIC that requires 4 seconds delay (after modprobe'ing) to set up a network interface.

I'm closing the bug since I no longer look after that server and no one else in the world has such a problem.

Note You need to log in before you can comment on or make changes to this bug.