Bug 5992 - r8169 driver - no network connection, hang at shutdown
Summary: r8169 driver - no network connection, hang at shutdown
Status: CLOSED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: Network (show other bugs)
Hardware: i386 Linux
: P2 normal
Assignee: Francois Romieu
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-02-01 13:16 UTC by Andrew Akehurst
Modified: 2008-09-24 13:05 UTC (History)
4 users (show)

See Also:
Kernel Version: 2.6.15.1
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
lspci -vvv output (1.74 KB, text/plain)
2006-02-01 13:17 UTC, Andrew Akehurst
Details
dmesg output (29.75 KB, text/plain)
2006-02-01 13:18 UTC, Andrew Akehurst
Details
dmesg output when my 3com card is plugged in (30.07 KB, text/plain)
2006-02-03 14:14 UTC, Andrew Akehurst
Details
/proc/interrupts when 3com card is plugged in (621 bytes, text/plain)
2006-02-03 14:15 UTC, Andrew Akehurst
Details
ifconfig showing Realtek and 3com when both are enabled (910 bytes, text/plain)
2006-02-03 14:16 UTC, Andrew Akehurst
Details
lspci -vvv showing 3com and Realtek together (13.07 KB, text/plain)
2006-02-03 14:18 UTC, Andrew Akehurst
Details
last merge from Realtek's driver (11.79 KB, patch)
2007-01-24 15:18 UTC, Francois Romieu
Details | Diff
Disable TBI autodetection for the 8100 (632 bytes, patch)
2007-11-04 14:33 UTC, Francois Romieu
Details | Diff
lspci -vvxx noapic (20.92 KB, text/plain)
2008-04-16 06:12 UTC, Grahame Jordan
Details
lspci -H1 -vvxx noapic (20.20 KB, text/plain)
2008-04-16 06:13 UTC, Grahame Jordan
Details
dmesg noapic (28.29 KB, text/plain)
2008-04-16 06:14 UTC, Grahame Jordan
Details
ifconfig noapic (922 bytes, text/plain)
2008-04-16 06:14 UTC, Grahame Jordan
Details
lspci -vvxx (20.92 KB, text/plain)
2008-04-16 06:15 UTC, Grahame Jordan
Details
lspci -H1 -vvxx (20.20 KB, text/plain)
2008-04-16 06:15 UTC, Grahame Jordan
Details
dmesg (28.92 KB, text/plain)
2008-04-16 06:16 UTC, Grahame Jordan
Details
ifconfig (942 bytes, text/plain)
2008-04-16 06:16 UTC, Grahame Jordan
Details

Description Andrew Akehurst 2006-02-01 13:16:22 UTC
Most recent kernel where this bug did not occur: None found yet, seems to affect
all recent 2.6.x kernels on my system

Distribution: Ubuntu (Breezy)

Hardware Environment: Foxconn motherboard with r8169 integrated (gigabit
ethernet version). lspci -vvv output supplied.

Software Environment: Linux kernel 2.6.15.1 (with Ubuntu Breezy). Output from
dmesg supplied.

Problem Description: r8169 gets a bad MAC address. After that occurs, no network
connectivity is possible. Attempting to use ifconfig to reconfigure the
interface causes a hang. System also hangs at "deconfiguring network interfaces"
phase of shutdown.

Steps to reproduce: Boot my system. Networking simply doesn't work.

More information:

r8169 driver and Linux kernel 2.6.15.1 fails to work with my Realtek 8169
ethernet controller. This is the gigabit ethernet-capable embedded version which
is integrated into a Foxconn 925A01-8EKRS2 motherboard.

Whenever the system boots, the r8169 driver does not pick up the MAC
address for the card; it reports the MAC addrress as being
FF:FF:FF:FF:FF:FF. (I have also tried various 2.6.12, 2.6.13, 2.6.14
kernels with the same experience). My first thought was a broken chip, but this
box has similar issues with other network cards which I'll mention later.

When I run ifconfig, this reports:

eth0      Link encap:Ethernet  HWaddr FF:FF:FF:FF:FF:FF
             inet addr:192.168.0.9  Bcast:192.168.255.255  Mask:255.255.0.0
             UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
             RX packets:0 errors:0 dropped:4294967290 overruns:0 frame:0
             TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
             collisions:0 txqueuelen:1000
             RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)
             Interrupt:21 Base address:0x4800

The IP address has been assigned statically in /etc/network/interfaces
as I don't use DHCP (and I doubt DHCP would currently work anyway). If I
try to use "ifconfig eth0 hw ether" to set the MAC address then ifconfig
just hangs.

I am unable to ping anywhere in my network or beyond the gateway, nor
can I establish any network connections to any other machine. Given that
there appears to be something wrong at the link layer, I wouldn't expect
higher layer diagnostics to reveal much, but I figured it was worth a try.

In addition, my system tends to hang at the "deconfiguring network
interfaces" stage of system shutdown.

I have tried building the r8169 module both with and without gigabit
ethernet support and using MMIO versus PIO - no luck.

From browsing around the web, some people suggest changing the Plug and
Play OS setting in the BIOS, only my BIOS doesn't appear to have any
such setting. The only option in the PNP area was to reset the ESCD
data, so I tried this to see if that would help, but still the issue
persisted.

Some reports seemed to suggest an issue with IRQ assignment and ACPI,
so I tried booting with kernel options including nolapic, acpi=off,
acpi=noirq, pci=noacpi, pci=routeirq and various combinations of these. None of
them worked (in fact nolapic hangs my system during boot-up).

I include the output from lspci -vvv and dmesg so you can see what is
happening. I also used Donald Becker's rtl8169-diag program, but that
complained that "A recognized chip has been found, but it does not
appear to exist in I/O space".

Incidentally, I tried some known good 3Com and D-Link network cards in
this machine (both with and without the Realtek enabled in the BIOS) and
I couldn't get them to work either for similar reasons. If it were just
the Realtek I might assume it was a defective chip, but the lack of
ability to get any network card working on this machine suggests a
deeper issue.

I'm happy to try testing new patches or settings, but would appreciate
some tips on what to do next. In all other respects my system is happy
and stable, it just lacks a working network connection.

Thanks in advance.

Andrew
Comment 1 Andrew Akehurst 2006-02-01 13:17:39 UTC
Created attachment 7207 [details]
lspci -vvv output
Comment 2 Andrew Akehurst 2006-02-01 13:18:09 UTC
Created attachment 7208 [details]
dmesg output
Comment 3 Francois Romieu 2006-02-01 14:17:47 UTC
Please add the dmesg/ifconfig when the 3com is plugged in a PCI slot. The
/proc/interrupts will be welcome too.

May I assume that you have already tried the options in the setup of the bios
(if any) as well as playing with different PCI slot ?

How does the dropped count which is given by ifconfig evolve with time ?
Is there anything it could be correlated to (see /proc/interrupts) ?

-- 
Ueimor
Comment 4 Andrew Akehurst 2006-02-03 14:14:33 UTC
Created attachment 7226 [details]
dmesg output when my 3com card is plugged in
Comment 5 Andrew Akehurst 2006-02-03 14:15:16 UTC
Created attachment 7227 [details]
/proc/interrupts when 3com card is plugged in
Comment 6 Andrew Akehurst 2006-02-03 14:16:44 UTC
Created attachment 7228 [details]
ifconfig showing Realtek and 3com when both are enabled
Comment 7 Andrew Akehurst 2006-02-03 14:18:00 UTC
Created attachment 7229 [details]
lspci -vvv showing 3com and Realtek together
Comment 8 Andrew Akehurst 2006-02-03 14:55:45 UTC
Please see the latest attachments provided. I have also tried the 3Com (as eth1)
in a different PCI slot, and also my DLink DFE-530TX instead of the 3Com, with
similar results. I can provide further examples if you wish. 

At the time that I took these latest diagnostics, the network cable was
connected to the 3Com (eth1) instead of the Realtek (so you probably won't see
"link up" for eth0). Unfortunately this particular motherboard has only 2
regular PCI slots as it's aimed more at PCI Express cards, but neither slot has
worked with any NIC I've tried on any recent Linux 2.6 series kernel.

My 3Com and DLink still don't work properly even if the Realtek is completely
disabled in the BIOS.

The ifconfig output changes each time I run it. For the r8169, the number of
dropped packets starts out at a very large positive number (looks suspiciously
close to 2^32 to me). The value then decreases by 1 for each time I run
ifconfig. The change doesn't seem to be timing-related: no matter how long a
delay I leave between runs of ifconfig, the value always decreases by 1. I have
tried to correlate it with /proc/interrupts but I couldn't see any relationship
with the values in there. All the /proc/interrupts values are either constant or
are increasing by much larger numbers than anything ifconfig shows.

Incidentally, I notice that RX dropped packets also decreases by 1 if I cat
/proc/net/dev, but I expect that's what ifconfig is doing internally anyway.

In terms of my BIOS settings, the options I have tried were to reset the ESCD
data (in case it was a BIOS resource allocation issue) and the option to
enable/disable boot ROM for the r8169. There are a few other BIOS settings such
as overriding ESCD resource assignment and manually assigning IRQs to pins, but
this seemed like an extreme solution.

I have also checked my motherboard manufacturer's website, but there does not
seem to be a BIOS update newer than the date my system displays at POST time.
Comment 9 Francois Romieu 2007-01-24 15:18:57 UTC
Created attachment 10172 [details]
last merge from Realtek's driver

It would be nice to know how the system behaves with a recent 2.6.20-rcX
kernel.

In addition to it, the patch above could help (no warranty though, it's still
wet).

-- 
Ueimor
Comment 10 Adrian Bunk 2007-03-05 17:12:31 UTC
Please reopen this bug if it's still present with kernel 2.6.20.
Comment 11 Maarten Vanraes 2007-11-03 07:00:20 UTC
I have the same problem, i just bought a new r8169, also had to boot up with irqpoll as mentioned in dmesg output, allthough that didn't solve the problem:

[root@localhost alien]# ifconfig eth2
eth2      Link encap:Ethernet  HWaddr FF:FF:FF:FF:FF:FF
          inet6 addr: fe80::fdff:ffff:feff:ffff/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:30064771065 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)
          Interrupt:74 Base address:0x6000

[root@localhost alien]# dhclient eth2
Internet Systems Consortium DHCP Client V3.0.5
Copyright 2004-2006 Internet Systems Consortium.
All rights reserved.
For info, please visit http://www.isc.org/sw/dhcp/

Listening on LPF/eth2/ff:ff:ff:ff:ff:ff
Sending on   LPF/eth2/ff:ff:ff:ff:ff:ff
Sending on   Socket/fallback
DHCPDISCOVER on eth2 to 255.255.255.255 port 67 interval 8
DHCPDISCOVER on eth2 to 255.255.255.255 port 67 interval 14

[root@localhost alien]# ifconfig eth2
eth2      Link encap:Ethernet  HWaddr FF:FF:FF:FF:FF:FF
          inet6 addr: fe80::fdff:ffff:feff:ffff/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:30064771065 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)
          Interrupt:74 Base address:0x6000



no increased droppings....

dmesg output
r8169 Gigabit Ethernet driver 2.2LK loaded
PCI: Enabling device 0000:07:0b.0 (0006 -> 0007)
GSI 22 sharing vector 0x4A and IRQ 22
ACPI: PCI Interrupt 0000:07:0b.0[A] -> GSI 18 (level, low) -> IRQ 74
eth1: RTL8100e at 0xffffc20000006000, ff:ff:ff:ff:ff:ff, IRQ 74
NET: Registered protocol family 10
lo: Disabled Privacy Extensions
IPv6 over IPv4 tunneling driver
GSI 23 sharing vector 0x52 and IRQ 23
ACPI: PCI Interrupt 0000:04:01.0[A] -> GSI 17 (level, low) -> IRQ 82
PCI: Setting latency timer of device 0000:04:01.0 to 64
NET: Registered protocol family 17
hda_codec: Unknown model for ALC883, trying auto-probe from BIOS...
r8169: eth2: link up
eth2: no IPv6 routers present
usbcore: registered new driver usblp
drivers/usb/class/usblp.c: v0.13: USB Printer Device Class driver
ip_tables: (C) 2000-2006 Netfilter Core Team
Netfilter messages via NETLINK v0.30.
ip_conntrack version 2.4 (4091 buckets, 32728 max) - 304 bytes per conntrack
Installing knfsd (copyright (C) 1996 okir@monad.swb.de).
NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
NFSD: starting 90-second grace period
ip_conntrack_pptp version 3.1 loaded
ip_nat_pptp version 3.0 loaded
nvidia: module license 'NVIDIA' taints kernel.
GSI 24 sharing vector 0x5A and IRQ 24
ACPI: PCI Interrupt 0000:02:00.0[A] -> GSI 24 (level, low) -> IRQ 90
PCI: Setting latency timer of device 0000:02:00.0 to 64
NVRM: loading NVIDIA UNIX x86_64 Kernel Module  100.14.11  Wed Jun 13 16:33:22 PDT 2007
ClusterIP Version 0.8 loaded successfully
ipt_recent v0.3.1: Stephen Frost <sfrost@snowman.net>.  http://snowman.net/projects/ipt_recent/
netfilter PSD loaded - (c) astaro AG
IFWLOG: register target
ACPI: PCI interrupt for device 0000:07:0b.0 disabled
ACPI: PCI Interrupt 0000:07:0b.0[A] -> GSI 18 (level, low) -> IRQ 74
eth1: RTL8100e at 0xffffc20000006000, ff:ff:ff:ff:ff:ff, IRQ 74
r8169: eth2: TBI auto-negotiating
r8169: eth2: link up
eth2: no IPv6 routers present


as you can see, for some reason the chip is detected at eth1, but i don't have that device, only eth2, that may have something to do with my system, but i don't know about that.

[root@localhost alien]# ifconfig eth1
eth1: error fetching interface information: Device not found

[root@localhost alien]# cat /proc/interrupts
           CPU0       CPU1
  0:    1602275          0    IO-APIC-edge  timer
  1:       2135          0    IO-APIC-edge  i8042
  8:          0          0    IO-APIC-edge  rtc
  9:          0          0   IO-APIC-level  acpi
 12:      36493          0    IO-APIC-edge  i8042
 14:     114305          0    IO-APIC-edge  ide0
 50:          0          0   IO-APIC-level  uhci_hcd:usb3
 58:      50565          0         PCI-MSI  libata
 66:       1938          0   IO-APIC-level  uhci_hcd:usb4, eth0
 74:          0          0   IO-APIC-level  eth2
 82:        297          0   IO-APIC-level  HDA Intel
 90:      72785          0   IO-APIC-level  nvidia
185:         26          0   IO-APIC-level  uhci_hcd:usb1
193:     336044          0   IO-APIC-level  uhci_hcd:usb2, ehci_hcd:usb5
NMI:        176       6275
LOC:    1601915    1601892
ERR:          1
MIS:          0


lspci -vvv:

07:0b.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8169 Gigabit Ethernet (rev 10)
        Subsystem: Realtek Semiconductor Co., Ltd. RTL-8169 Gigabit Ethernet
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B-
        Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR+
        Latency: 32 (8000ns min, 16000ns max), Cache Line Size: 32 bytes
        Interrupt: pin A routed to IRQ 74
        Region 0: I/O ports at 7000 [size=256]
        Region 1: Memory at df5fe000 (32-bit, non-prefetchable) [size=256]
        [virtual] Expansion ROM at df400000 [disabled] [size=128K]
        Capabilities: [dc] Power Management version 2
                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0-,D1+,D2+,D3hot+,D3cold+)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-


[root@localhost alien]# uname -a
Linux localhost 2.6.17-14mdv-1mv #1 SMP Sat Aug 25 19:50:21 CEST 2007 x86_64 Intel(R) Core(TM)2 CPU          6400  @ 2.13GHz GNU/Linux


i'll try with a 2.6.22 mandriva-patched, and if that doesn't work, a clean 2.6.2x with the patch attached.
Comment 12 Francois Romieu 2007-11-03 07:27:27 UTC
Maarten :
[...]
> [root@localhost alien]# uname -a
> Linux localhost 2.6.17-14mdv-1mv #1 SMP Sat Aug 25 19:50:21 CEST 2007 x86_64

Please go directly to 2.6.23 and send:
- a complete dmesg of the system (newer r8169 driver adds a bit of information
  amongst others)
- a 'lspci -vvvxxxx'

A test with the current 2.6.24-git would be welcome too.

It would be better to reopen the bug under the "r8169: TBI falsely detected" topic.

-- 
Ueimor
Comment 13 Maarten Vanraes 2007-11-04 03:43:06 UTC
what is TBI ??

the 2.6.22.9-1mdv still has the same problem, allthough it's detected as ff:ff:ff:ff:ff:fb .

dhcp seems to work, since at bootup it has an ip, and even ping seems to work, but anything requiring bigger payload is problematic. also the dropped stats are still the same...

i'll retest with 2.6.23 vanilla.  but where can i get the 24-git version?
Comment 14 Francois Romieu 2007-11-04 14:32:37 UTC
> what is TBI ??

TBI = Ten Bit Interface is an alternate to the (G)MII interface.
It appears in your dmesg but it should almost surely not. You
can try the attached patch with 2.6.23.

Please:
- send a complete 'dmesg'
- a 'lspci -vvvxxxx'
- open a new bugreport (his one is closed/unrelated).

> I'll retest with 2.6.23 vanilla.  but where can i get the 24-git version?

ftp://www.kernel.org/pub/linux/kernel/v2.6/snapshots

-- 
Ueimor
Comment 15 Francois Romieu 2007-11-04 14:33:21 UTC
Created attachment 13394 [details]
Disable TBI autodetection for the 8100
Comment 16 Grahame Jordan 2008-04-10 05:40:46 UTC
I have the same issue on 2.6.22 running Ubuntu
It actually works if I clear the bios when power is disconnected but after 12 reboot it is gone again.

02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev ff) (prog-if ff)
        !!! Unknown header type 7f
00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
10: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
20: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
30: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff

After modprobe r8169
[ 3105.078699] ACPI: PCI interrupt for device 0000:02:00.0 disabled
[ 3119.225412] r8169 Gigabit Ethernet driver 2.2LK loaded
[ 3119.225430] ACPI: PCI Interrupt 0000:02:00.0[A] -> GSI 17 (level, low) -> IRQ 17
[ 3119.225438] PCI: cache line size of 32 is not supported by device 0000:02:00.0
[ 3119.225443] ACPI: PCI interrupt for device 0000:02:00.0 disabled
[ 3119.225448] r8169: probe of 0000:02:00.0 failed with error -22
Comment 17 Francois Romieu 2008-04-10 17:07:41 UTC
gbj@theforce.com.au  2008-04-10 05:40 :
> I have the same issue on 2.6.22 running Ubuntu

Can you reproduce it with the current -rc kernel ?

2.6.22 was suffering some mmconfig problems.

It is quite sticky, event after reboot.
Comment 18 Grahame Jordan 2008-04-15 05:20:00 UTC
I have installed 2.6.25-rc8
Still has the same problem after reboot.
Is there anything I can/should test to help identify the issue.
This has been a long term bug by the look of it.
Comment 19 Francois Romieu 2008-04-15 14:13:15 UTC
Jordan, can you add a 'noapic' option in the kernel boot command line as well ?

Whatever the result, I'd appreciate the outputs of 'lspci -vvxx',
'lspci -H1 -vvxx' and dmesg with and without the 'noapic' option
for this kernel. There is a pattern.

-- 
Ueimor
Comment 20 Grahame Jordan 2008-04-16 06:12:47 UTC
Created attachment 15767 [details]
lspci -vvxx noapic
Comment 21 Grahame Jordan 2008-04-16 06:13:25 UTC
Created attachment 15768 [details]
lspci -H1 -vvxx noapic
Comment 22 Grahame Jordan 2008-04-16 06:14:09 UTC
Created attachment 15769 [details]
dmesg noapic
Comment 23 Grahame Jordan 2008-04-16 06:14:30 UTC
Created attachment 15770 [details]
ifconfig noapic
Comment 24 Grahame Jordan 2008-04-16 06:15:16 UTC
Created attachment 15772 [details]
lspci -vvxx
Comment 25 Grahame Jordan 2008-04-16 06:15:57 UTC
Created attachment 15773 [details]
lspci -H1 -vvxx
Comment 26 Grahame Jordan 2008-04-16 06:16:20 UTC
Created attachment 15774 [details]
dmesg
Comment 27 Grahame Jordan 2008-04-16 06:16:43 UTC
Created attachment 15775 [details]
ifconfig
Comment 28 Francois Romieu 2008-04-16 14:49:18 UTC
The option should have read 'noapic' instead of 'noacpi' but the device was
apparently working anyway, right ?

If so, is there any way you could capture the same output
(dmesg, lspci ..., lspci -H1 ...) when the device is not working ?

-- 
Ueimor
Comment 29 Grahame Jordan 2008-04-20 17:28:15 UTC
Mmmm? maybe I was mistaken that it was not working on 
2.6.25-rc8.  It seems to work every time now. Even after it has locked on 2.6.22-14-generic after a reboot into  2.6.25-rc8. it seems to come up.

This line in ifconfig:
RX packets:3323 errors:0 dropped:3978457284 overruns:0 frame:0
is a concern. I thought that that counter was reset on reboot?

--
Grahame
Comment 30 Francois Romieu 2008-09-24 13:05:30 UTC
Grahame Jordan  :
[...]
> This line in ifconfig:
> RX packets:3323 errors:0 dropped:3978457284 overruns:0 frame:0
> is a concern. I thought that that counter was reset on reboot ?

Please see http://bugzilla.kernel.org/show_bug.cgi?id=11062 for this part.

The patch has been submitted for upstream.

-- 
Ueimor

Note You need to log in before you can comment on or make changes to this bug.