Bug 33782

Summary: [r8169] No link/carrier detected, sometimes
Product: Drivers Reporter: Matthew Gyurgyik (pyther)
Component: NetworkAssignee: Francois Romieu (romieu)
Status: RESOLVED INSUFFICIENT_DATA    
Severity: normal CC: 1123581321, alan, j.fikar, kirr, romieu, vkrevs
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.0.3 Subsystem:
Regression: No Bisected commit-id:
Attachments: dmesg 3.0.0

Description Matthew Gyurgyik 2011-04-21 03:45:16 UTC
Hardware: Thinkpad Edge E420s
Distribution: Arch Linux

If ifconfig eth0 down and ifconfig eth0 up are not ran within miliseconds of each other no link/carrier will be detected.

skynet:~ $ (sudo ifconfig eth0 down; sleep 1; sudo ifconfig eth0 up); sleep 5; cat /sys/class/net/eth0/carrier
0

With a pause of 1 second between the two commands the link/carrier is not detected. /sys/class/net/eth0/carrier remains 0 even after waiting extended periods of time.

skynet:~ $ (sudo ifconfig eth0 down; sudo ifconfig eth0 up); sleep 5; cat /sys/class/net/eth0/carrier
1


To further complicate issues, sometimes the opposite is true:

skynet:~ $ (sudo ifconfig eth0 down; sleep 1; sudo ifconfig eth0 up); sleep 5; cat /sys/class/net/eth0/carrier
1

skynet:~ $ (sudo ifconfig eth0 down; sudo ifconfig eth0 up); sleep 5; cat /sys/class/net/eth0/carrier
0

Let me know what other information is needed.
Comment 1 Sebastian N 2011-04-23 06:53:00 UTC
I have a similar issue on my acer travelmate with a realtek r8169. (Also Archlinux, both 2.6.38.3 and 2.6.39-rc4-20110422-00149-g91e8549 (from linus tree)). 
On my system this made ethernet unusable even before I first run ifconfig down (well maybe something is in the startup scripts). The first time a kernel update actually broke ethernet!

Enough ranting: now to the details:

1. Booting the system with kernel 2.6.38.3 or 2.6.39-rc4... : NO_CARRIER. (No LOWER_UP).
2. Run 'ifconfig down && ifconfig up': As explained above: link becomes ready.
3. Turning acpi=off fixes the problem. (Though I"d rather want acpi on.)

dmesg output:

r8169 0000:02:00.0: PME# disabled
r8169 0000:02:00.0: eth0: link down
r8169 0000:02:00.0: eth0: link down
ADDRCONF(NETDEV_UP): eth0: link is not ready
ADDRCONF(NETDEV_UP): wlan0: link is not ready
r8169 0000:02:00.0: PME# enabled
r8169 0000:02:00.0: PME# disabled
r8169 0000:02:00.0: PME# enabled
r8169 0000:02:00.0: PME# disabled
r8169 0000:02:00.0: eth0: link down
ADDRCONF(NETDEV_UP): eth0: link is not ready
r8169 0000:02:00.0: eth0: link up
ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready

Notice the PME# messages: disabling acpi (acpi=off in boot parameters) fixes the problem. (Archlinux is the only system running on this computer).

About this issue on the netdev list: (Francois Romieu):
> Well, as luck would have it, my system will boot today's upstream
> kernel (39-rc4+).  And, I no longer see the problem in that release,
> so it seems it is fixed (or harder to reproduce that I thought).

I vote for harder to reproduce. It is not fixed in 39-rc4+.
Comment 2 j.fikar 2011-06-10 11:40:52 UTC
It may be related to runtime power management, see my last comment on bug https://bugzilla.kernel.org/show_bug.cgi?id=30452

the problem also goes away when I do:
echo on > /sys/devices/pci0000:00/0000:00:0a.0/0000:03:00.0/power/control

so no need to switch off the whole acpi

and I see the problem on 2.6.39 also, running gentoo x86_64
Comment 3 Francois Romieu 2011-08-04 08:56:35 UTC
Can you send complete dmesg, including r8169 XID line ?

-- 
Ueimor
Comment 4 j.fikar 2011-08-04 09:36:26 UTC
dmesg | grep r8169:

[    1.516081] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
[    1.516100] r8169 0000:03:00.0: PCI INT A -> GSI 18 (level, low) -> IRQ 18
[    1.516129] r8169 0000:03:00.0: setting latency timer to 64
[    1.516162] r8169 0000:03:00.0: irq 42 for MSI/MSI-X
[    1.516300] r8169 0000:03:00.0: eth0: RTL8168e/8111e at 0xffffc90000022000, bc:ae:c5:28:61:95, XID 0c200000 IRQ 42
[   16.749777] r8169 0000:03:00.0: eth0: link down
[   16.749785] r8169 0000:03:00.0: eth0: link down
[   19.743193] r8169 0000:03:00.0: eth0: link up
[304028.067257] r8169 0000:03:00.0: eth0: link down
[304031.723550] r8169 0000:03:00.0: eth0: link up
[521754.093821] r8169 0000:03:00.0: eth0: link down
[521757.674233] r8169 0000:03:00.0: eth0: link up
Comment 5 Francois Romieu 2011-08-04 11:17:08 UTC
(In reply to comment #4)
> dmesg | grep r8169:

Please don't. It removes a lot of information, including the kernel version.

> [    1.516081] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
> [    1.516100] r8169 0000:03:00.0: PCI INT A -> GSI 18 (level, low) -> IRQ 18
> [    1.516129] r8169 0000:03:00.0: setting latency timer to 64
> [    1.516162] r8169 0000:03:00.0: irq 42 for MSI/MSI-X
> [    1.516300] r8169 0000:03:00.0: eth0: RTL8168e/8111e at
> 0xffffc90000022000,
> bc:ae:c5:28:61:95, XID 0c200000 IRQ 42

8168e support is post 2.6.39 (included).

Which firmware version - if any - does it run ?

You should see the specific fw version with ethtool -i once the device is up.

-- 
Ueimor
Comment 6 j.fikar 2011-08-05 12:05:46 UTC
I'm sorry I forgot to mention that meanwhile I upgraded to 3.0.0, so maybe PME works now? I can try to turn it on next week. 

ethtool -i eth0
driver: r8169
version: 2.3LK-NAPI
firmware-version: rtl_nic/rtl8168e-2.fw
bus-info: 0000:03:00.0
Comment 7 j.fikar 2011-08-05 12:06:59 UTC
Created attachment 67642 [details]
dmesg 3.0.0
Comment 8 Matthew Gyurgyik 2011-08-05 12:11:01 UTC
Been using 3.0.0 since 2011-07-24 without any problems. For me, this issue is resolved.

Thanks!
Comment 9 j.fikar 2011-08-17 11:50:43 UTC
For me 3.0.0 still shows the bug:
https://bugzilla.kernel.org/show_bug.cgi?id=30452
Comment 10 Matthew Gyurgyik 2011-08-21 17:04:28 UTC
I retract my statement. I'm still having problems using 3.0.3. Would really love to get this solved.
Comment 11 Francois Romieu 2012-08-22 20:03:14 UTC
(In reply to comment #9)
> For me 3.0.0 still shows the bug:
> https://bugzilla.kernel.org/show_bug.cgi?id=30452

The up-down cycle in https://bugzilla.kernel.org/show_bug.cgi?id=30452#c14
suggests that you could be helped by 10953db8e1a278742ef7e64a3d1491802bcfa98b
("r8169: increase the delay parameter of pm_schedule_suspend").

It was added between v3.1 and v3.2.

-- 
Ueimor
Comment 12 j.fikar 2012-08-23 08:39:47 UTC
> The up-down cycle in https://bugzilla.kernel.org/show_bug.cgi?id=30452#c14
> suggests that you could be helped by 10953db8e1a278742ef7e64a3d1491802bcfa98b
> ("r8169: increase the delay parameter of pm_schedule_suspend").
> 
> It was added between v3.1 and v3.2.

I can confirm, running 3.5.1 now and the problem is gone.
Comment 13 Francois Romieu 2012-08-24 22:06:16 UTC
(In reply to comment #12)
[...]
> I can confirm, running 3.5.1 now and the problem is gone.

Thanks.

Matthew, any update ?

-- 
Ueimor
Comment 14 Matthew Gyurgyik 2012-08-24 22:08:23 UTC
Sadly, I can not provide any updates as I am no longer in possession of the machine this was happening on. However, thank you for looking into this.