Bug 14748
Summary: | e1000e NIC not working after reboot | ||
---|---|---|---|
Product: | Drivers | Reporter: | Maciek Sitarz (macieks) |
Component: | Network | Assignee: | Tushar (tushar.n.dave) |
Status: | CLOSED CODE_FIX | ||
Severity: | normal | CC: | bruce.w.allan, florian, jbrandeb, rjw, tushar.n.dave |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.36 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Bug Depends on: | |||
Bug Blocks: | 14230 | ||
Attachments: |
lspci -vvv output NIC working properly
ethregs output NIC working properly lspci -vvv output NIC NOT working ethregs output NIC NOT working properly control mdi-x mode Logs for test scenario described in comment #13 adds debug output Kernel logs with debug output and ethregs |
Description
Maciek Sitarz
2009-12-06 13:04:18 UTC
(switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). On Sun, 6 Dec 2009 13:04:20 GMT bugzilla-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=14748 > > Summary: e1000e NIC not working after reboot > Product: Drivers > Version: 2.5 > Kernel Version: 2.6.32 > Platform: All > OS/Version: Linux > Tree: Mainline > Status: NEW > Severity: normal > Priority: P1 > Component: Network > AssignedTo: drivers_network@kernel-bugs.osdl.org > ReportedBy: macieks@freesco.pl > Regression: Yes > > > When I power up my system the NIC is working properly. > After every reboot the NIC is not working. I mean the eth0 is created, but > neither dhcpcd gets IP nor static setup helps > . > ifconfig eth0 shows zero packets on Rx and Tx (no errors, overrunns, etc.) > > logs after modprobing e1000e (NIC working OK): > Dec 6 12:29:28 mcxR kernel: e1000e: Intel(R) PRO/1000 Network Driver - > 1.0.2-k2 > Dec 6 12:29:28 mcxR kernel: e1000e: Copyright (c) 1999-2008 Intel > Corporation. > Dec 6 12:29:28 mcxR kernel: e1000e 0000:00:19.0: PCI INT A -> GSI 20 (level, > low) -> IRQ 20 > Dec 6 12:29:28 mcxR kernel: e1000e 0000:00:19.0: setting latency timer to 64 > Dec 6 12:29:28 mcxR kernel: e1000e 0000:00:19.0: irq 30 for MSI/MSI-X > Dec 6 12:29:28 mcxR kernel: 0000:00:19.0: eth0: (PCI Express:2.5GB/s:Width > x1) > 00:15:58:cc:0f:35 > Dec 6 12:29:28 mcxR kernel: 0000:00:19.0: eth0: Intel(R) PRO/1000 Network > Connection > Dec 6 12:29:28 mcxR kernel: 0000:00:19.0: eth0: MAC: 6, PHY: 6, PBA No: > ffffff-0ff > Dec 6 12:29:28 mcxR kernel: e1000e 0000:00:19.0: irq 30 for MSI/MSI-X > Dec 6 12:29:28 mcxR kernel: e1000e 0000:00:19.0: irq 30 for MSI/MSI-X > Dec 6 12:29:30 mcxR kernel: e1000e: eth0 NIC Link is Up 100 Mbps Full > Duplex, > Flow Control: RX/TX > > > logs after rebooting system and modprobing e1000e (NIC not working): > Dec 6 11:57:46 mcxR kernel: e1000e: Intel(R) PRO/1000 Network Driver - > 1.0.2-k2 > Dec 6 11:57:46 mcxR kernel: e1000e: Copyright (c) 1999-2008 Intel > Corporation. > Dec 6 11:57:46 mcxR kernel: e1000e 0000:00:19.0: PCI INT A -> GSI 20 (level, > low) -> IRQ 20 > Dec 6 11:57:46 mcxR kernel: e1000e 0000:00:19.0: setting latency timer to 64 > Dec 6 11:57:46 mcxR kernel: e1000e 0000:00:19.0: irq 30 for MSI/MSI-X > Dec 6 11:57:46 mcxR kernel: 0000:00:19.0: eth0: (PCI Express:2.5GB/s:Width > x1) > 00:15:58:cc:0f:35 > Dec 6 11:57:46 mcxR kernel: 0000:00:19.0: eth0: Intel(R) PRO/1000 Network > Connection > Dec 6 11:57:46 mcxR kernel: 0000:00:19.0: eth0: MAC: 6, PHY: 6, PBA No: > ffffff-0ff > Dec 6 11:57:48 mcxR kernel: e1000e 0000:00:19.0: irq 30 for MSI/MSI-X > Dec 6 11:57:48 mcxR kernel: e1000e 0000:00:19.0: irq 30 for MSI/MSI-X > > > Additional info: > Software: > - distro: Arch Linux > - kernel version: 2.6.32 > - e1000e version: 1.0.2-k2 > > > Hardware: > - notebook: Lenovo ThinkPad R61 > - network card: Intel Gigabit > > # lspci -v > 00:19.0 Ethernet controller: Intel Corporation 82566MC Gigabit Network > Connection (rev 03) > Subsystem: Lenovo Device 20ba > Flags: bus master, fast devsel, latency 0, IRQ 11 > Memory at fe200000 (32-bit, non-prefetchable) [size=128K] > Memory at fe224000 (32-bit, non-prefetchable) [size=4K] > I/O ports at 1800 [size=32] > Capabilities: [c8] Power Management version 2 > Capabilities: [d0] MSI: Enable- Count=1/1 Maskable- 64bit+ > Kernel modules: e1000e > > $ uname -a > Linux mcxR 2.6.32-ARCH #7 SMP PREEMPT Fri Dec 4 15:39:16 CET 2009 x86_64 > Intel(R) Core(TM)2 Duo CPU T7100 @ 1.80GHz GenuineIntel GNU/Linux > Thanks. You don't mention which previous kernel version worked OK. Was it 2.6.31? On Mon, 7 Dec 2009, Andrew Morton wrote: > > When I power up my system the NIC is working properly. > > After every reboot the NIC is not working. I mean the eth0 is created, but > > neither dhcpcd gets IP nor static setup helps We have a userspace tool called ethregs downloadable from http://downloads.sourceforge.net/project/e1000/Register%20Dump%20Tool/1.7.2/ethregs-1.7.2.tar.gz?use_mirror=iweb if it is not too much trouble can you build this tool and run it before (when the port is working) and after (when the link didn't come up) you can attach them to the bug, and reply to this thread would be best. also please include the output of lspci -vvv after the failure. Thanks, Jesse Created attachment 24082 [details]
lspci -vvv output NIC working properly
Created attachment 24083 [details]
ethregs output NIC working properly
Created attachment 24084 [details]
lspci -vvv output NIC NOT working
Created attachment 24085 [details]
ethregs output NIC NOT working properly
lspci -vvv and ethregs outputs attached to Bugzilla. I checked kernel 2.6.31.6 and the problem exists there also. PS. I had a problem building ethregs: gcc -Wall -W -Wno-parentheses -Wstrict-prototypes -Wmissing-prototypes -Winline -DEXTERNAL_RELEASE -g -O2 -o ethregs ethregs.o 8254x.o 8257x.o ichlan.o 82575.o 82576.o 80003es2lan.o 82598.o 82599.o -lpci -lz /usr/lib/gcc/x86_64-unknown-linux-gnu/4.4.2/../../../../lib/libpci.a(names-net.o): In function `pci_id_net_lookup': (.text+0x138): undefined reference to `__res_query' collect2: ld returned 1 exit status I built it on another system and used on my system. Best regards On Monday 04 January 2010, Maciej Sitarz wrote:
> On 29.12.2009 16:28, Rafael J. Wysocki wrote:
> > This message has been generated automatically as a part of a report
> > of regressions introduced between 2.6.31 and 2.6.32.
> >
> > The following bug entry is on the current list of known regressions
> > introduced between 2.6.31 and 2.6.32. Please verify if it still should
> > be listed and let me know (either way).
>
> I confirm. The problem still exists.
Reply-To: jesse.brandeburg@gmail.com On Mon, Dec 7, 2009 at 2:01 PM, Brandeburg, Jesse <jesse.brandeburg@intel.com> wrote: > On Mon, 7 Dec 2009, Andrew Morton wrote: >> > When I power up my system the NIC is working properly. >> > After every reboot the NIC is not working. I mean the eth0 is created, but >> > neither dhcpcd gets IP nor static setup helps > > We have a userspace tool called ethregs downloadable from > > http://downloads.sourceforge.net/project/e1000/Register%20Dump%20Tool/1.7.2/ethregs-1.7.2.tar.gz?use_mirror=iweb > > if it is not too much trouble can you build this tool and run it before > (when the port is working) and after (when the link didn't come up) > > you can attach them to the bug, and reply to this thread would be best. I've looked at the ethregs dumps, the good news is it looks like the hardware succeeds to self-init, but on the ethregs-fails.txt did you load the driver? it appears you did not, or at least didn't do # ip link set eth0 up # ethregs > regs.txt also looked at the lspci -vvv information and in both cases MSI was enabled, but in the fails case the value in the data field for the MSI vector is different, which seems a a little strange but I'm not sure if it is responsible for failure if the driver was loaded, and failed dhcp, what happens when you run ethtool -t eth0 offline? when the driver is loaded, and the dhcp fails, can you assign an address manually (and bring the interface up) and have it work? one more thing to note please, can you send cat /proc/interrupts from 10 seconds apart when the driver is loaded and the port is UP, but not working. dhcpcd or dhclient both have a tendency to put the port DOWN after they fail to get address, so thats why you may need to do # ip link command above before gathering /proc/interrupts. is your bios up to date? Thanks, sorry for the delay, lets see if we can figure out what is up. Jesse On Tuesday 26 January 2010, Maciej Sitarz wrote:
> On 24.01.2010 23:22, Rafael J. Wysocki wrote:
> > This message has been generated automatically as a part of a report
> > of regressions introduced between 2.6.31 and 2.6.32.
> >
> > The following bug entry is on the current list of known regressions
> > introduced between 2.6.31 and 2.6.32. Please verify if it still should
> > be listed and let me know (either way).
>
> The problem still exist I'm using kernel version 2.6.32.6 now.
>
> I have one more observation:
> After the reboot, when the NIC is not working, both leds are on. Not
> blinking, they light all the time, even if I remove the plug.
Created attachment 24750 [details]
control mdi-x mode
can you try the attached patch with module parameters set in /etc/modprobe.d/e1000e.conf
# also try value 2
alias e1000e mdix=1
<apply patch>
make M=drivers/net/e1000e modules modules_install
rmmod e1000e; modprobe e1000e mdix=1
This is just a shot in the dark on this one, since I just made this patch for another issue I thought we should test it here too, just in case.
Some data Maciej provided in LKML: --- The problem still exist I'm using kernel version 2.6.32.6 now. I have one more observation: After the reboot, when the NIC is not working, both leds are on. Not blinking, they light all the time, even if I remove the plug. --- I tried to test all the cases and provide all logs(In reply to comment #9) > Reply-To: jesse.brandeburg@gmail.com > > On Mon, Dec 7, 2009 at 2:01 PM, Brandeburg, Jesse > <jesse.brandeburg@intel.com> wrote: > > On Mon, 7 Dec 2009, Andrew Morton wrote: > >> > When I power up my system the NIC is working properly. > >> > After every reboot the NIC is not working. I mean the eth0 is created, > but > >> > neither dhcpcd gets IP nor static setup helps > > > > We have a userspace tool called ethregs downloadable from > > > http://downloads.sourceforge.net/project/e1000/Register%20Dump%20Tool/1.7.2/ethregs-1.7.2.tar.gz?use_mirror=iweb > > > > if it is not too much trouble can you build this tool and run it before > > (when the port is working) and after (when the link didn't come up) > > > > you can attach them to the bug, and reply to this thread would be best. > > I've looked at the ethregs dumps, the good news is it looks like the > hardware succeeds to self-init, but on the ethregs-fails.txt did you > load the driver? it appears you did not, or at least didn't do > # ip link set eth0 up > # ethregs > regs.txt > > also looked at the lspci -vvv information and in both cases MSI was > enabled, but in the fails case the value in the data field for the MSI > vector is different, which seems a a little strange but I'm not sure > if it is responsible for failure > > if the driver was loaded, and failed dhcp, what happens when you run > ethtool -t eth0 offline? > > when the driver is loaded, and the dhcp fails, can you assign an > address manually (and bring the interface up) and have it work? > > one more thing to note please, can you send cat /proc/interrupts from > 10 seconds apart when the driver is loaded and the port is UP, but not > working. dhcpcd or dhclient both have a tendency to put the port DOWN > after they fail to get address, so thats why you may need to do # ip > link command above before gathering /proc/interrupts. I did some tests you proposed. I described all the scenarios below and attached the gathered logs. > is your bios up to date? I updated the BIOS today and tested, but the problem remained. > Thanks, sorry for the delay, lets see if we can figure out what is up. No problem, I tried not to reboot too often :) Test scenarios: STATUS: Working Description: I shutdown the computer and started it, then I did the steps before loading module: green LED shines all the time, orange LED blinks after loading module : green LED shines all the time, orange LED blinks mkdir working modprobe e1000e dhcpcd eth0 # OK ip link set eth0 up sleep 10 && cat /proc/interrupts > working/interrupts.log lspci -vvv > working/lspci_vvv.log ./ethregs > working/ethregs.log STATUS: Not working (module not loaded) Description: I rebooted the computer and then I did the steps before loading module: green LED shines all the time, orange LED blinks mkdir not_working sleep 10 && cat /proc/interrupts > not_working/interrupts.log lspci -vvv > not_working/lspci_vvv.log ./ethregs > not_working/ethregs.log STATUS: Not working (module loaded) Description: After the previous test I did the steps below before loading module: green LED shines all the time, orange LED blinks after loading module : green LED shines all the time, orange LED shunes all the time (!) KEYBOARD HAS LAG once a few seconds!!! mkdir not_working_module_loaded modprobe e1000e dhcpcd eth0 # Timeouts at waiting for carrier ip link set eth0 up sleep 10 && cat /proc/interrupts > not_working_module_loaded/interrupts.log lspci -vvv > not_working_module_loaded/lspci_vvv.log ./ethregs > not_working_module_loaded/ethregs.log STATUS: Not working (module loaded and then unloaded ) Description: After the previous test I just removed the module before loading module: green LED shines all the time, orange LED blinks after loading module : green LED shines all the time, orange LED shunes all the time (!) KEYBOARD HAS LAG once a few seconds!!! after removing module : Keyboard is working fine again! green and orange LED shine mkdir not_working_module_unloaded modprobe -r e1000e sleep 10 && cat /proc/interrupts > not_working_module_unloaded/interrupts.log lspci -vvv > not_working_module_unloaded/lspci_vvv.log ./ethregs > not_working_module_unloaded/ethregs.log STATUS: Patched module mdix=1 WORKS Description: I shutdown the system, started it loaded the new module, then I did the steps before loading module: green LED shines all the time, orange LED blinks after loading module : green LED shines all the time, orange LED blinks mkdir patched_module_mdix1_loaded modprobe e1000e mdix=1 dhcpcd eth0 # OK ip link set eth0 up sleep 10 && cat /proc/interrupts > patched_module_mdix1_loaded/interrupts.log lspci -vvv > patched_module_mdix1_loaded/lspci_vvv.log ./ethregs > patched_module_mdix1_loaded/ethregs.log STATUS: Patched module mdix=1 after reboot NOT WORKING Description: I rebooted the system (new module was loaded), then I did the steps before loading module: green LED shines all the time, orange LED blinks after loading module : green LED shines all the time, orange LED shunes all the time (!) KEYBOARD HAS LAG once a few seconds!!! mkdir patched_module_mdix1_loaded_reboot modprobe e1000e mdix=1 dhcpcd eth0 # Timeouts at waiting for carrier ip link set eth0 up sleep 10 && cat /proc/interrupts > patched_module_mdix1_loaded_reboot/interrupts.log lspci -vvv > patched_module_mdix1_loaded_reboot/lspci_vvv.log ./ethregs > patched_module_mdix1_loaded_reboot/ethregs.log STATUS: Patched module mdix=2 WORKS Description: I shutdown the system, started it loaded the new module, then I did the steps before loading module: green LED shines all the time, orange LED blinks after loading module : green LED shines all the time, orange LED blinks mkdir patched_module_mdix2_loaded modprobe e1000e mdix=2 dhcpcd eth0 # OK ip link set eth0 up sleep 10 && cat /proc/interrupts > patched_module_mdix2_loaded/interrupts.log lspci -vvv > patched_module_mdix2_loaded/lspci_vvv.log ./ethregs > patched_module_mdix2_loaded/ethregs.log STATUS: Patched module mdix=2 after reboot NOT WORKING Description: I shutdown the system, started it loaded the module and rebooted, then I did the steps before loading module: green LED shines all the time, orange LED blinks after loading module : green LED shines all the time, orange LED shunes all the time (!) KEYBOARD HAS LAG once a few seconds!!! mkdir patched_module_mdix2_loaded_reboot modprobe e1000e mdix=2 dhcpcd eth0 # Timeouts wainting for carrier ip link set eth0 up sleep 10 && cat /proc/interrupts > patched_module_mdix2_loaded_reboot/interrupts.log lspci -vvv > patched_module_mdix2_loaded_reboot/lspci_vvv.log ./ethregs > patched_module_mdix2_loaded_reboot/ethregs.log Created attachment 24784 [details] Logs for test scenario described in comment #13 Still an issue. Kernel version 2.6.32.7 Still an issue. Kernel version 2.6.32.8 Thank you for your dilligent testing, we have someone looking at it. Sorry if I was too importunate, but I got this message every two weeks: "This message has been generated automatically as a part of a report of regressions introduced between 2.6.31 and 2.6.32. The following bug entry is on the current list of known regressions introduced between 2.6.31 and 2.6.32. Please verify if it still should be listed and let the tracking team know (either way)." so I felt obliged to let you know :) I'll just wait now for news from you side. Reply-To: nicholasx.d.nunley@intel.com I am looking into this bug but I am not able to reproduce it on my test machine, so if you could provide some debug info it would be very helpful. First, could you download the e1000e driver on sourceforge (http://sourceforge.net/projects/e1000/files/) and see if the problem is present there? The in-tree driver and the sourceforge driver are generally kept in-sync but sometimes an update to the kernel is overlooked. Secondly, please apply the attached patch and provide the debug output when the driver is working/not working as well as the phy register dumps accessible through ethtool -d ethX. Turning on the debug messages allows us to see if the driver is encountering any unusual conditions that we may be ignoring otherwise and printing the phy registers will allow us to see if the phy is being configured correctly. Thanks, Nick Created attachment 25199 [details]
adds debug output
e1000e module compiled from sf.net didn't help. I attached logs from patched module, but I can't find any lines containing "e1000e: phy reg offset" you wanted to print out. It didn't show up in any file in my /var/log directory. Created attachment 25202 [details]
Kernel logs with debug output and ethregs
Kernel logs and ethregs output from module from sf.net patched with additional debug messages
Is this issue still present in current mainline kernels? (In reply to comment #18) > Sorry if I was too importunate, but I got this message every two weeks: > > "This message has been generated automatically as a part of a report > of regressions introduced between 2.6.31 and 2.6.32. > > The following bug entry is on the current list of known regressions > introduced between 2.6.31 and 2.6.32. Please verify if it still should > be listed and let the tracking team know (either way)." > > so I felt obliged to let you know :) I'll just wait now for news from you > side. There are too sides... one side is the regression tracking view. From that viewpoint, status updates are very much appreciated! Especially if it goes towards the end of the release cycle and Linus has to decide when to cut it. From the bug fixing perspective, I guess, as soon as the bug is acknowledged by a developer and worked upon, it is not that important anymore... yet, a ping from time to time does not harm.. Regards, Flo Is this issue still a problem in current mainline kernels? Regards, Flo Still an issue. Kernel version 2.6.36 Regards, Maciek Still an issue. Kernel version 2.6.36.2 Regards, Maciek updated kernel version and reassigned to Tushar Maciek, Sorry for so late in responding. Have you tried latest e1000e driver from SF (i.e 1.2.20), if you not can you give it a try? The notebook with the network card is broken(graphic card). Right now it's being fix so I can't check if this will fix the problem. But as soon as I get it back I'll try to upgrade the driver and reproduce the issue. Tushar, I got my notebook back and I tried to reproduce the problem. I did about 5-6 reboots and it seems to work fine. I tested on: $ uname -a Linux mcxR 2.6.38-ARCH #1 SMP PREEMPT Tue Mar 15 09:36:10 CET 2011 x86_64 Intel(R) Core(TM)2 Duo CPU T7100 @ 1.80GHz GenuineIntel GNU/Linux $ modinfo e1000e filename: /lib/modules/2.6.38-ARCH/kernel/drivers/net/e1000e/e1000e.ko.gz version: 1.2.20-k2 license: GPL description: Intel(R) PRO/1000 Network Driver author: Intel Corporation, <linux.nics@intel.com> srcversion: 566D897FE2181A99FA51235 We can assume the bug is fixed now. Is it fixed in 2.6.38 or only in the sourceforge driver? If only in sf, will it be fixed in linus tree? @Maciek: thanks for the feedback! @florian: the 1.2.20-k2 driver version indicates it was the in-kernel driver. Updating status to Resolved. [Hm.. bugzilla eating notifications emails again :( ] Thanks for the update and closing down yet another regression. |