Most recent kernel where this bug did not occur: None found yet, seems to affect all recent 2.6.x kernels on my system Distribution: Ubuntu (Breezy) Hardware Environment: Foxconn motherboard with r8169 integrated (gigabit ethernet version). lspci -vvv output supplied. Software Environment: Linux kernel 2.6.15.1 (with Ubuntu Breezy). Output from dmesg supplied. Problem Description: r8169 gets a bad MAC address. After that occurs, no network connectivity is possible. Attempting to use ifconfig to reconfigure the interface causes a hang. System also hangs at "deconfiguring network interfaces" phase of shutdown. Steps to reproduce: Boot my system. Networking simply doesn't work. More information: r8169 driver and Linux kernel 2.6.15.1 fails to work with my Realtek 8169 ethernet controller. This is the gigabit ethernet-capable embedded version which is integrated into a Foxconn 925A01-8EKRS2 motherboard. Whenever the system boots, the r8169 driver does not pick up the MAC address for the card; it reports the MAC addrress as being FF:FF:FF:FF:FF:FF. (I have also tried various 2.6.12, 2.6.13, 2.6.14 kernels with the same experience). My first thought was a broken chip, but this box has similar issues with other network cards which I'll mention later. When I run ifconfig, this reports: eth0 Link encap:Ethernet HWaddr FF:FF:FF:FF:FF:FF inet addr:192.168.0.9 Bcast:192.168.255.255 Mask:255.255.0.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:4294967290 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:0 (0.0 b) TX bytes:0 (0.0 b) Interrupt:21 Base address:0x4800 The IP address has been assigned statically in /etc/network/interfaces as I don't use DHCP (and I doubt DHCP would currently work anyway). If I try to use "ifconfig eth0 hw ether" to set the MAC address then ifconfig just hangs. I am unable to ping anywhere in my network or beyond the gateway, nor can I establish any network connections to any other machine. Given that there appears to be something wrong at the link layer, I wouldn't expect higher layer diagnostics to reveal much, but I figured it was worth a try. In addition, my system tends to hang at the "deconfiguring network interfaces" stage of system shutdown. I have tried building the r8169 module both with and without gigabit ethernet support and using MMIO versus PIO - no luck. From browsing around the web, some people suggest changing the Plug and Play OS setting in the BIOS, only my BIOS doesn't appear to have any such setting. The only option in the PNP area was to reset the ESCD data, so I tried this to see if that would help, but still the issue persisted. Some reports seemed to suggest an issue with IRQ assignment and ACPI, so I tried booting with kernel options including nolapic, acpi=off, acpi=noirq, pci=noacpi, pci=routeirq and various combinations of these. None of them worked (in fact nolapic hangs my system during boot-up). I include the output from lspci -vvv and dmesg so you can see what is happening. I also used Donald Becker's rtl8169-diag program, but that complained that "A recognized chip has been found, but it does not appear to exist in I/O space". Incidentally, I tried some known good 3Com and D-Link network cards in this machine (both with and without the Realtek enabled in the BIOS) and I couldn't get them to work either for similar reasons. If it were just the Realtek I might assume it was a defective chip, but the lack of ability to get any network card working on this machine suggests a deeper issue. I'm happy to try testing new patches or settings, but would appreciate some tips on what to do next. In all other respects my system is happy and stable, it just lacks a working network connection. Thanks in advance. Andrew
Created attachment 7207 [details] lspci -vvv output
Created attachment 7208 [details] dmesg output
Please add the dmesg/ifconfig when the 3com is plugged in a PCI slot. The /proc/interrupts will be welcome too. May I assume that you have already tried the options in the setup of the bios (if any) as well as playing with different PCI slot ? How does the dropped count which is given by ifconfig evolve with time ? Is there anything it could be correlated to (see /proc/interrupts) ? -- Ueimor
Created attachment 7226 [details] dmesg output when my 3com card is plugged in
Created attachment 7227 [details] /proc/interrupts when 3com card is plugged in
Created attachment 7228 [details] ifconfig showing Realtek and 3com when both are enabled
Created attachment 7229 [details] lspci -vvv showing 3com and Realtek together
Please see the latest attachments provided. I have also tried the 3Com (as eth1) in a different PCI slot, and also my DLink DFE-530TX instead of the 3Com, with similar results. I can provide further examples if you wish. At the time that I took these latest diagnostics, the network cable was connected to the 3Com (eth1) instead of the Realtek (so you probably won't see "link up" for eth0). Unfortunately this particular motherboard has only 2 regular PCI slots as it's aimed more at PCI Express cards, but neither slot has worked with any NIC I've tried on any recent Linux 2.6 series kernel. My 3Com and DLink still don't work properly even if the Realtek is completely disabled in the BIOS. The ifconfig output changes each time I run it. For the r8169, the number of dropped packets starts out at a very large positive number (looks suspiciously close to 2^32 to me). The value then decreases by 1 for each time I run ifconfig. The change doesn't seem to be timing-related: no matter how long a delay I leave between runs of ifconfig, the value always decreases by 1. I have tried to correlate it with /proc/interrupts but I couldn't see any relationship with the values in there. All the /proc/interrupts values are either constant or are increasing by much larger numbers than anything ifconfig shows. Incidentally, I notice that RX dropped packets also decreases by 1 if I cat /proc/net/dev, but I expect that's what ifconfig is doing internally anyway. In terms of my BIOS settings, the options I have tried were to reset the ESCD data (in case it was a BIOS resource allocation issue) and the option to enable/disable boot ROM for the r8169. There are a few other BIOS settings such as overriding ESCD resource assignment and manually assigning IRQs to pins, but this seemed like an extreme solution. I have also checked my motherboard manufacturer's website, but there does not seem to be a BIOS update newer than the date my system displays at POST time.
Created attachment 10172 [details] last merge from Realtek's driver It would be nice to know how the system behaves with a recent 2.6.20-rcX kernel. In addition to it, the patch above could help (no warranty though, it's still wet). -- Ueimor
Please reopen this bug if it's still present with kernel 2.6.20.
I have the same problem, i just bought a new r8169, also had to boot up with irqpoll as mentioned in dmesg output, allthough that didn't solve the problem: [root@localhost alien]# ifconfig eth2 eth2 Link encap:Ethernet HWaddr FF:FF:FF:FF:FF:FF inet6 addr: fe80::fdff:ffff:feff:ffff/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:30064771065 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:0 (0.0 b) TX bytes:0 (0.0 b) Interrupt:74 Base address:0x6000 [root@localhost alien]# dhclient eth2 Internet Systems Consortium DHCP Client V3.0.5 Copyright 2004-2006 Internet Systems Consortium. All rights reserved. For info, please visit http://www.isc.org/sw/dhcp/ Listening on LPF/eth2/ff:ff:ff:ff:ff:ff Sending on LPF/eth2/ff:ff:ff:ff:ff:ff Sending on Socket/fallback DHCPDISCOVER on eth2 to 255.255.255.255 port 67 interval 8 DHCPDISCOVER on eth2 to 255.255.255.255 port 67 interval 14 [root@localhost alien]# ifconfig eth2 eth2 Link encap:Ethernet HWaddr FF:FF:FF:FF:FF:FF inet6 addr: fe80::fdff:ffff:feff:ffff/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:30064771065 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:0 (0.0 b) TX bytes:0 (0.0 b) Interrupt:74 Base address:0x6000 no increased droppings.... dmesg output r8169 Gigabit Ethernet driver 2.2LK loaded PCI: Enabling device 0000:07:0b.0 (0006 -> 0007) GSI 22 sharing vector 0x4A and IRQ 22 ACPI: PCI Interrupt 0000:07:0b.0[A] -> GSI 18 (level, low) -> IRQ 74 eth1: RTL8100e at 0xffffc20000006000, ff:ff:ff:ff:ff:ff, IRQ 74 NET: Registered protocol family 10 lo: Disabled Privacy Extensions IPv6 over IPv4 tunneling driver GSI 23 sharing vector 0x52 and IRQ 23 ACPI: PCI Interrupt 0000:04:01.0[A] -> GSI 17 (level, low) -> IRQ 82 PCI: Setting latency timer of device 0000:04:01.0 to 64 NET: Registered protocol family 17 hda_codec: Unknown model for ALC883, trying auto-probe from BIOS... r8169: eth2: link up eth2: no IPv6 routers present usbcore: registered new driver usblp drivers/usb/class/usblp.c: v0.13: USB Printer Device Class driver ip_tables: (C) 2000-2006 Netfilter Core Team Netfilter messages via NETLINK v0.30. ip_conntrack version 2.4 (4091 buckets, 32728 max) - 304 bytes per conntrack Installing knfsd (copyright (C) 1996 okir@monad.swb.de). NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory NFSD: starting 90-second grace period ip_conntrack_pptp version 3.1 loaded ip_nat_pptp version 3.0 loaded nvidia: module license 'NVIDIA' taints kernel. GSI 24 sharing vector 0x5A and IRQ 24 ACPI: PCI Interrupt 0000:02:00.0[A] -> GSI 24 (level, low) -> IRQ 90 PCI: Setting latency timer of device 0000:02:00.0 to 64 NVRM: loading NVIDIA UNIX x86_64 Kernel Module 100.14.11 Wed Jun 13 16:33:22 PDT 2007 ClusterIP Version 0.8 loaded successfully ipt_recent v0.3.1: Stephen Frost <sfrost@snowman.net>. http://snowman.net/projects/ipt_recent/ netfilter PSD loaded - (c) astaro AG IFWLOG: register target ACPI: PCI interrupt for device 0000:07:0b.0 disabled ACPI: PCI Interrupt 0000:07:0b.0[A] -> GSI 18 (level, low) -> IRQ 74 eth1: RTL8100e at 0xffffc20000006000, ff:ff:ff:ff:ff:ff, IRQ 74 r8169: eth2: TBI auto-negotiating r8169: eth2: link up eth2: no IPv6 routers present as you can see, for some reason the chip is detected at eth1, but i don't have that device, only eth2, that may have something to do with my system, but i don't know about that. [root@localhost alien]# ifconfig eth1 eth1: error fetching interface information: Device not found [root@localhost alien]# cat /proc/interrupts CPU0 CPU1 0: 1602275 0 IO-APIC-edge timer 1: 2135 0 IO-APIC-edge i8042 8: 0 0 IO-APIC-edge rtc 9: 0 0 IO-APIC-level acpi 12: 36493 0 IO-APIC-edge i8042 14: 114305 0 IO-APIC-edge ide0 50: 0 0 IO-APIC-level uhci_hcd:usb3 58: 50565 0 PCI-MSI libata 66: 1938 0 IO-APIC-level uhci_hcd:usb4, eth0 74: 0 0 IO-APIC-level eth2 82: 297 0 IO-APIC-level HDA Intel 90: 72785 0 IO-APIC-level nvidia 185: 26 0 IO-APIC-level uhci_hcd:usb1 193: 336044 0 IO-APIC-level uhci_hcd:usb2, ehci_hcd:usb5 NMI: 176 6275 LOC: 1601915 1601892 ERR: 1 MIS: 0 lspci -vvv: 07:0b.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8169 Gigabit Ethernet (rev 10) Subsystem: Realtek Semiconductor Co., Ltd. RTL-8169 Gigabit Ethernet Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR+ Latency: 32 (8000ns min, 16000ns max), Cache Line Size: 32 bytes Interrupt: pin A routed to IRQ 74 Region 0: I/O ports at 7000 [size=256] Region 1: Memory at df5fe000 (32-bit, non-prefetchable) [size=256] [virtual] Expansion ROM at df400000 [disabled] [size=128K] Capabilities: [dc] Power Management version 2 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0-,D1+,D2+,D3hot+,D3cold+) Status: D0 PME-Enable- DSel=0 DScale=0 PME- [root@localhost alien]# uname -a Linux localhost 2.6.17-14mdv-1mv #1 SMP Sat Aug 25 19:50:21 CEST 2007 x86_64 Intel(R) Core(TM)2 CPU 6400 @ 2.13GHz GNU/Linux i'll try with a 2.6.22 mandriva-patched, and if that doesn't work, a clean 2.6.2x with the patch attached.
Maarten : [...] > [root@localhost alien]# uname -a > Linux localhost 2.6.17-14mdv-1mv #1 SMP Sat Aug 25 19:50:21 CEST 2007 x86_64 Please go directly to 2.6.23 and send: - a complete dmesg of the system (newer r8169 driver adds a bit of information amongst others) - a 'lspci -vvvxxxx' A test with the current 2.6.24-git would be welcome too. It would be better to reopen the bug under the "r8169: TBI falsely detected" topic. -- Ueimor
what is TBI ?? the 2.6.22.9-1mdv still has the same problem, allthough it's detected as ff:ff:ff:ff:ff:fb . dhcp seems to work, since at bootup it has an ip, and even ping seems to work, but anything requiring bigger payload is problematic. also the dropped stats are still the same... i'll retest with 2.6.23 vanilla. but where can i get the 24-git version?
> what is TBI ?? TBI = Ten Bit Interface is an alternate to the (G)MII interface. It appears in your dmesg but it should almost surely not. You can try the attached patch with 2.6.23. Please: - send a complete 'dmesg' - a 'lspci -vvvxxxx' - open a new bugreport (his one is closed/unrelated). > I'll retest with 2.6.23 vanilla. but where can i get the 24-git version? ftp://www.kernel.org/pub/linux/kernel/v2.6/snapshots -- Ueimor
Created attachment 13394 [details] Disable TBI autodetection for the 8100
I have the same issue on 2.6.22 running Ubuntu It actually works if I clear the bios when power is disconnected but after 12 reboot it is gone again. 02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev ff) (prog-if ff) !!! Unknown header type 7f 00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 10: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 20: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 30: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff After modprobe r8169 [ 3105.078699] ACPI: PCI interrupt for device 0000:02:00.0 disabled [ 3119.225412] r8169 Gigabit Ethernet driver 2.2LK loaded [ 3119.225430] ACPI: PCI Interrupt 0000:02:00.0[A] -> GSI 17 (level, low) -> IRQ 17 [ 3119.225438] PCI: cache line size of 32 is not supported by device 0000:02:00.0 [ 3119.225443] ACPI: PCI interrupt for device 0000:02:00.0 disabled [ 3119.225448] r8169: probe of 0000:02:00.0 failed with error -22
gbj@theforce.com.au 2008-04-10 05:40 : > I have the same issue on 2.6.22 running Ubuntu Can you reproduce it with the current -rc kernel ? 2.6.22 was suffering some mmconfig problems. It is quite sticky, event after reboot.
I have installed 2.6.25-rc8 Still has the same problem after reboot. Is there anything I can/should test to help identify the issue. This has been a long term bug by the look of it.
Jordan, can you add a 'noapic' option in the kernel boot command line as well ? Whatever the result, I'd appreciate the outputs of 'lspci -vvxx', 'lspci -H1 -vvxx' and dmesg with and without the 'noapic' option for this kernel. There is a pattern. -- Ueimor
Created attachment 15767 [details] lspci -vvxx noapic
Created attachment 15768 [details] lspci -H1 -vvxx noapic
Created attachment 15769 [details] dmesg noapic
Created attachment 15770 [details] ifconfig noapic
Created attachment 15772 [details] lspci -vvxx
Created attachment 15773 [details] lspci -H1 -vvxx
Created attachment 15774 [details] dmesg
Created attachment 15775 [details] ifconfig
The option should have read 'noapic' instead of 'noacpi' but the device was apparently working anyway, right ? If so, is there any way you could capture the same output (dmesg, lspci ..., lspci -H1 ...) when the device is not working ? -- Ueimor
Mmmm? maybe I was mistaken that it was not working on 2.6.25-rc8. It seems to work every time now. Even after it has locked on 2.6.22-14-generic after a reboot into 2.6.25-rc8. it seems to come up. This line in ifconfig: RX packets:3323 errors:0 dropped:3978457284 overruns:0 frame:0 is a concern. I thought that that counter was reset on reboot? -- Grahame
Grahame Jordan : [...] > This line in ifconfig: > RX packets:3323 errors:0 dropped:3978457284 overruns:0 frame:0 > is a concern. I thought that that counter was reset on reboot ? Please see http://bugzilla.kernel.org/show_bug.cgi?id=11062 for this part. The patch has been submitted for upstream. -- Ueimor