Bug 33782 - [r8169] No link/carrier detected, sometimes
Summary: [r8169] No link/carrier detected, sometimes
Status: RESOLVED INSUFFICIENT_DATA
Alias: None
Product: Drivers
Classification: Unclassified
Component: Network (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Francois Romieu
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-04-21 03:45 UTC by Matthew Gyurgyik
Modified: 2013-12-23 12:08 UTC (History)
6 users (show)

See Also:
Kernel Version: 3.0.3
Subsystem:
Regression: No
Bisected commit-id:


Attachments
dmesg 3.0.0 (57.01 KB, application/octet-stream)
2011-08-05 12:06 UTC, j.fikar
Details

Description Matthew Gyurgyik 2011-04-21 03:45:16 UTC
Hardware: Thinkpad Edge E420s
Distribution: Arch Linux

If ifconfig eth0 down and ifconfig eth0 up are not ran within miliseconds of each other no link/carrier will be detected.

skynet:~ $ (sudo ifconfig eth0 down; sleep 1; sudo ifconfig eth0 up); sleep 5; cat /sys/class/net/eth0/carrier
0

With a pause of 1 second between the two commands the link/carrier is not detected. /sys/class/net/eth0/carrier remains 0 even after waiting extended periods of time.

skynet:~ $ (sudo ifconfig eth0 down; sudo ifconfig eth0 up); sleep 5; cat /sys/class/net/eth0/carrier
1


To further complicate issues, sometimes the opposite is true:

skynet:~ $ (sudo ifconfig eth0 down; sleep 1; sudo ifconfig eth0 up); sleep 5; cat /sys/class/net/eth0/carrier
1

skynet:~ $ (sudo ifconfig eth0 down; sudo ifconfig eth0 up); sleep 5; cat /sys/class/net/eth0/carrier
0

Let me know what other information is needed.
Comment 1 Sebastian N 2011-04-23 06:53:00 UTC
I have a similar issue on my acer travelmate with a realtek r8169. (Also Archlinux, both 2.6.38.3 and 2.6.39-rc4-20110422-00149-g91e8549 (from linus tree)). 
On my system this made ethernet unusable even before I first run ifconfig down (well maybe something is in the startup scripts). The first time a kernel update actually broke ethernet!

Enough ranting: now to the details:

1. Booting the system with kernel 2.6.38.3 or 2.6.39-rc4... : NO_CARRIER. (No LOWER_UP).
2. Run 'ifconfig down && ifconfig up': As explained above: link becomes ready.
3. Turning acpi=off fixes the problem. (Though I"d rather want acpi on.)

dmesg output:

r8169 0000:02:00.0: PME# disabled
r8169 0000:02:00.0: eth0: link down
r8169 0000:02:00.0: eth0: link down
ADDRCONF(NETDEV_UP): eth0: link is not ready
ADDRCONF(NETDEV_UP): wlan0: link is not ready
r8169 0000:02:00.0: PME# enabled
r8169 0000:02:00.0: PME# disabled
r8169 0000:02:00.0: PME# enabled
r8169 0000:02:00.0: PME# disabled
r8169 0000:02:00.0: eth0: link down
ADDRCONF(NETDEV_UP): eth0: link is not ready
r8169 0000:02:00.0: eth0: link up
ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready

Notice the PME# messages: disabling acpi (acpi=off in boot parameters) fixes the problem. (Archlinux is the only system running on this computer).

About this issue on the netdev list: (Francois Romieu):
> Well, as luck would have it, my system will boot today's upstream
> kernel (39-rc4+).  And, I no longer see the problem in that release,
> so it seems it is fixed (or harder to reproduce that I thought).

I vote for harder to reproduce. It is not fixed in 39-rc4+.
Comment 2 j.fikar 2011-06-10 11:40:52 UTC
It may be related to runtime power management, see my last comment on bug https://bugzilla.kernel.org/show_bug.cgi?id=30452

the problem also goes away when I do:
echo on > /sys/devices/pci0000:00/0000:00:0a.0/0000:03:00.0/power/control

so no need to switch off the whole acpi

and I see the problem on 2.6.39 also, running gentoo x86_64
Comment 3 Francois Romieu 2011-08-04 08:56:35 UTC
Can you send complete dmesg, including r8169 XID line ?

-- 
Ueimor
Comment 4 j.fikar 2011-08-04 09:36:26 UTC
dmesg | grep r8169:

[    1.516081] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
[    1.516100] r8169 0000:03:00.0: PCI INT A -> GSI 18 (level, low) -> IRQ 18
[    1.516129] r8169 0000:03:00.0: setting latency timer to 64
[    1.516162] r8169 0000:03:00.0: irq 42 for MSI/MSI-X
[    1.516300] r8169 0000:03:00.0: eth0: RTL8168e/8111e at 0xffffc90000022000, bc:ae:c5:28:61:95, XID 0c200000 IRQ 42
[   16.749777] r8169 0000:03:00.0: eth0: link down
[   16.749785] r8169 0000:03:00.0: eth0: link down
[   19.743193] r8169 0000:03:00.0: eth0: link up
[304028.067257] r8169 0000:03:00.0: eth0: link down
[304031.723550] r8169 0000:03:00.0: eth0: link up
[521754.093821] r8169 0000:03:00.0: eth0: link down
[521757.674233] r8169 0000:03:00.0: eth0: link up
Comment 5 Francois Romieu 2011-08-04 11:17:08 UTC
(In reply to comment #4)
> dmesg | grep r8169:

Please don't. It removes a lot of information, including the kernel version.

> [    1.516081] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
> [    1.516100] r8169 0000:03:00.0: PCI INT A -> GSI 18 (level, low) -> IRQ 18
> [    1.516129] r8169 0000:03:00.0: setting latency timer to 64
> [    1.516162] r8169 0000:03:00.0: irq 42 for MSI/MSI-X
> [    1.516300] r8169 0000:03:00.0: eth0: RTL8168e/8111e at
> 0xffffc90000022000,
> bc:ae:c5:28:61:95, XID 0c200000 IRQ 42

8168e support is post 2.6.39 (included).

Which firmware version - if any - does it run ?

You should see the specific fw version with ethtool -i once the device is up.

-- 
Ueimor
Comment 6 j.fikar 2011-08-05 12:05:46 UTC
I'm sorry I forgot to mention that meanwhile I upgraded to 3.0.0, so maybe PME works now? I can try to turn it on next week. 

ethtool -i eth0
driver: r8169
version: 2.3LK-NAPI
firmware-version: rtl_nic/rtl8168e-2.fw
bus-info: 0000:03:00.0
Comment 7 j.fikar 2011-08-05 12:06:59 UTC
Created attachment 67642 [details]
dmesg 3.0.0
Comment 8 Matthew Gyurgyik 2011-08-05 12:11:01 UTC
Been using 3.0.0 since 2011-07-24 without any problems. For me, this issue is resolved.

Thanks!
Comment 9 j.fikar 2011-08-17 11:50:43 UTC
For me 3.0.0 still shows the bug:
https://bugzilla.kernel.org/show_bug.cgi?id=30452
Comment 10 Matthew Gyurgyik 2011-08-21 17:04:28 UTC
I retract my statement. I'm still having problems using 3.0.3. Would really love to get this solved.
Comment 11 Francois Romieu 2012-08-22 20:03:14 UTC
(In reply to comment #9)
> For me 3.0.0 still shows the bug:
> https://bugzilla.kernel.org/show_bug.cgi?id=30452

The up-down cycle in https://bugzilla.kernel.org/show_bug.cgi?id=30452#c14
suggests that you could be helped by 10953db8e1a278742ef7e64a3d1491802bcfa98b
("r8169: increase the delay parameter of pm_schedule_suspend").

It was added between v3.1 and v3.2.

-- 
Ueimor
Comment 12 j.fikar 2012-08-23 08:39:47 UTC
> The up-down cycle in https://bugzilla.kernel.org/show_bug.cgi?id=30452#c14
> suggests that you could be helped by 10953db8e1a278742ef7e64a3d1491802bcfa98b
> ("r8169: increase the delay parameter of pm_schedule_suspend").
> 
> It was added between v3.1 and v3.2.

I can confirm, running 3.5.1 now and the problem is gone.
Comment 13 Francois Romieu 2012-08-24 22:06:16 UTC
(In reply to comment #12)
[...]
> I can confirm, running 3.5.1 now and the problem is gone.

Thanks.

Matthew, any update ?

-- 
Ueimor
Comment 14 Matthew Gyurgyik 2012-08-24 22:08:23 UTC
Sadly, I can not provide any updates as I am no longer in possession of the machine this was happening on. However, thank you for looking into this.

Note You need to log in before you can comment on or make changes to this bug.