Bug 45911

Summary: Realtek RTL8111/8168B ethernet device unusable after suspend cycle.
Product: Drivers Reporter: Coacher (itumaykin+kernel)
Component: NetworkAssignee: Francois Romieu (romieu)
Status: ASSIGNED ---    
Severity: normal CC: romieu, szg00000, vkrevs
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.x Subsystem:
Regression: No Bisected commit-id:
Attachments: lspci -vvv
dmesg with the described problem
lspci -vvv from Sony Vaio SVE1711X1EB laptop
dmesg from Sony Vaio SVE1711X1EB laptop

Description Coacher 2012-08-13 04:14:33 UTC
Created attachment 77471 [details]
lspci -vvv

Hello.

I have an ASUS F8Va laptop with a builtin ethernet card Realtek RTL8111/8168B and it uses kernel module r8169. The card's working fine until I put my laptop through suspend-resume ( or hibernation-power on) cycle.

After such a cycle I am unable to do anything with my card. For example, `ifconfig eth0 up` fails with "SIOCSIFFLAGS: Cannot assign requested address" message and wicd running in cycles trying to initiate eth0 and so on. Reloading r8169 doesn't help and unloading it before suspend and loading after resume doesn't help either. I have this message printed out in dmesg:

[ 6282.222156] r8169 0000:06:00.0: Refused to change power state, currently in D3
[ 6282.233067] r8169 0000:06:00.0: Refused to change power state, currently in D3
[ 6285.570778] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
[ 6285.581117] r8169 0000:06:00.0: Refused to change power state, currently in D3
[ 6285.581143] r8169 0000:06:00.0: cache line size of 32 is not supported
[ 6285.581150] r8169 0000:06:00.0: (unregistered net_device): Mem-Wr-Inval unavailable
[ 6285.581231] r8169 0000:06:00.0: (unregistered net_device): unknown MAC, using family default
[ 6285.592423] r8169 0000:06:00.0: eth0: RTL8168b/8111b at 0xffffc900106a2000, ff:ff:ff:ff:ff:ff, XID 9cf0f8ff IRQ 17
[ 6285.592431] r8169 0000:06:00.0: eth0: jumbo features [frames: 4080 bytes, tx checksumming: ko]

Running Gentoo amd64 with vanilla kernel 3.5.1, but I am having this problem for a pretty long time now. I don't remember exactly when it started, but I am pretty sure it was already presented in 3.2 kernel.
Comment 1 Coacher 2012-08-13 04:19:22 UTC
Created attachment 77481 [details]
dmesg with the described problem

I noticed that after suspend-resume I have only this line in dmesg:
r8169 0000:06:00.0: Refused to change power state, currently in D3

While after the hibernation-unfreeze OR unloading/loading r8169 after resume I have the full output mentioned in previous comment.
Comment 2 Coacher 2012-08-15 02:34:03 UTC
Some specification: when I do only suspend-resume everything is fine. The problem occurs only after hibernate-unfreeze cycle and behaves as described above.
Just tested on 3.4.8 - problem is also presented there.
Comment 3 Coacher 2012-08-15 06:10:14 UTC
Also 100% reproducible on 3.2.27 and 3.0.40
Comment 4 vkrevs 2012-08-31 11:29:37 UTC
Similar issue on my new Sony VAIO SVE1711X1EB laptop on openSUSE 12.1 for x86_64. 

$ rpm -q kernel-desktop
kernel-desktop-3.1.10-1.16.1.x86_64

This laptop comes with a Realtek 8111/8168B Gigabit Ethernet adapter (from lspci -v):

0e:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI
Express Gigabit Ethernet controller (rev 07)
        Subsystem: Sony Corporation Device 90ac
        Flags: bus master, fast devsel, latency 0, IRQ 43
        I/O ports at 2000 [size=256]
        Memory at c5004000 (64-bit, prefetchable) [size=4K]
        Memory at c5000000 (64-bit, prefetchable) [size=16K]
        Capabilities: [40] Power Management version 3
        Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit+
        Capabilities: [70] Express Endpoint, MSI 01
        Capabilities: [b0] MSI-X: Enable- Count=4 Masked-
        Capabilities: [d0] Vital Product Data
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [140] Virtual Channel
        Capabilities: [160] Device Serial Number 07-00-00-00-68-4c-e0-00
        Kernel driver in use: r8169

After hibernate, the r8169 driver loads, but Network Manager fails to get an IP
address and /var/log/messages contains

Aug 31 08:29:43 starfire rcnetwork[13640]: Starting the NetworkManager
Aug 31 08:29:43 starfire kernel: [ 1233.787223] r8169 0000:0e:00.0: eth0: link
down
Aug 31 08:29:43 starfire kernel: [ 1233.788409] ADDRCONF(NETDEV_UP): eth0: link
is not ready

I've used "solution" (1) (power off/unplug power cable for 10 sec, reconnect and restart)  from http://en.opensuse.org/SDB:Realtek_8169_driver_problem to deal with it, but  would very much prefer the native kernel driver to work properly.

BTW, everything else on the laptop works just fine - I suppose, thanks for that!
Comment 5 vkrevs 2012-08-31 11:32:06 UTC
Created attachment 78901 [details]
lspci -vvv from Sony Vaio SVE1711X1EB laptop
Comment 6 vkrevs 2012-08-31 11:32:41 UTC
Created attachment 78911 [details]
dmesg from Sony Vaio SVE1711X1EB laptop
Comment 7 Coacher 2012-08-31 15:15:15 UTC
(In reply to comment #4)
> I've used "solution" (1) (power off/unplug power cable for 10 sec, reconnect
> and restart)  from http://en.opensuse.org/SDB:Realtek_8169_driver_problem to
> deal with it, but  would very much prefer the native kernel driver to work
> properly.

Yes, after restart everything is back to normal, but it is completely killing the idea behind hibernation.
Comment 8 vkrevs 2012-08-31 15:25:46 UTC
I noticed that there is no need to do a full power off, disconnect the cable, restart again cycle. With my laptop it is sufficient to disconnect the power cable for ~10 sec after the laptop hibernates, and then plug the power cable back, and then, when the laptop resumes from hibernation, the network adapter is up and running. So it is not such a burden so long as you don't forget to do this.
Comment 9 Coacher 2012-08-31 15:31:43 UTC
(In reply to comment #8)
> I noticed that there is no need to do a full power off, disconnect the cable,
> restart again cycle. With my laptop it is sufficient to disconnect the power
> cable for ~10 sec after the laptop hibernates, and then plug the power cable
> back, and then, when the laptop resumes from hibernation, the network adapter
> is up and running. So it is not such a burden so long as you don't forget to
> do
> this.

Will try it, but won't help in case laptop is not connected to AC 100% of the time (my case). Sometimes there is just no power cable to unplug.
Comment 10 Coacher 2012-10-13 07:27:27 UTC
This bug is still presented in 3.6.1 kernel. Is there any debug info I should provide or some tests I should do? It is really annoying as it renders ethernet card unusable after hibernation cycle.
Comment 11 Coacher 2013-05-07 14:40:40 UTC
This problem persists in 3.9.0 kernel. I've discovered that after hibernate/thaw cycle MAC address on my ethernet card is set to "ff:ff:ff:ff:ff:ff". Upon googling on this new evidence I've found a couple of links with the same problems:

https://bugzilla.redhat.com/show_bug.cgi?id=503988

https://github.com/aptivate/linux-ischool-classmate/blob/master/r8169-Restore-MAC-address-after-resume-on-buggy-BIOS.patch

The latter one provides a patch that possibly can be used to fix this issue.
Comment 12 Francois Romieu 2013-05-08 10:13:42 UTC
(In reply to comment #11)
> This problem persists in 3.9.0 kernel. I've discovered that after
> hibernate/thaw cycle MAC address on my ethernet card is set to
> "ff:ff:ff:ff:ff:ff". Upon googling on this new evidence I've found a couple
> of
> links with the same problems:
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=503988
> 
>
> https://github.com/aptivate/linux-ischool-classmate/blob/master/r8169-Restore-MAC-address-after-resume-on-buggy-BIOS.patch
> 
> The latter one provides a patch that possibly can be used to fix this issue.

This change is included in mainline as 9ecb9aabaf634677c77af467f4e3028b09d7bcda.
It targets a RTL8168evl (RTL_GIGA_MAC_VER_34) chipset while your is a
RTL8168c (RTL_GIGA_MAC_VER_22). We may tweak the test in rtl_rar_set but
the content of the exgmac registers ought to be checked beforehand as I have
no documentation to compare the layouts of the 8168c and 8168evl. I can't tell
if your laptop may turn into a brick.

Do you still see these lines with the current kernel :
[...]
[ 6285.581117] r8169 0000:06:00.0: Refused to change power state, currently in
D3
[ 6285.581143] r8169 0000:06:00.0: cache line size of 32 is not supported
[ 6285.581150] r8169 0000:06:00.0: (unregistered net_device): Mem-Wr-Inval
unavailable
[ 6285.581231] r8169 0000:06:00.0: (unregistered net_device): unknown MAC,
using family default
[ 6285.592423] r8169 0000:06:00.0: eth0: RTL8168b/8111b at 0xffffc900106a2000,
ff:ff:ff:ff:ff:ff, XID 9cf0f8ff IRQ 17

It does not only look like a matter of MAC address: the device is still in
D3 and it returns all ones answers (0xff) to all register accesses (see XID).

-- 
Ueimor
Comment 13 Francois Romieu 2013-05-08 10:17:18 UTC
(In reply to comment #6)
> Created an attachment (id=78911) [details]
> dmesg from Sony Vaio SVE1711X1EB laptop

[...]
    8.848249] r8169 0000:0e:00.0: eth0: RTL8168evl/8111evl at 0xffffc900117b6000, 54:53:ed:26:04:0b, XID 0c900800 IRQ 43

It may be fixed by the post 3.8 commit 9ecb9aabaf634677c77af467f4e3028b09d7bcda
("r8169: workaround for missing extended GigaMAC registers").

Can you check it ?

Thanks.

-- 
Ueimor
Comment 14 Coacher 2013-05-08 10:42:48 UTC
(In reply to comment #12)
> This change is included in mainline as
> 9ecb9aabaf634677c77af467f4e3028b09d7bcda.
> It targets a RTL8168evl (RTL_GIGA_MAC_VER_34) chipset while your is a
> RTL8168c (RTL_GIGA_MAC_VER_22). We may tweak the test in rtl_rar_set but
> the content of the exgmac registers ought to be checked beforehand as I have
> no documentation to compare the layouts of the 8168c and 8168evl. I can't
> tell
> if your laptop may turn into a brick.

Is there a tool or any other way to give you what's in these registers?

> Do you still see these lines with the current kernel :
> [...]
> [ 6285.581117] r8169 0000:06:00.0: Refused to change power state, currently
> in
> D3
> [ 6285.581143] r8169 0000:06:00.0: cache line size of 32 is not supported
> [ 6285.581150] r8169 0000:06:00.0: (unregistered net_device): Mem-Wr-Inval
> unavailable
> [ 6285.581231] r8169 0000:06:00.0: (unregistered net_device): unknown MAC,
> using family default
> [ 6285.592423] r8169 0000:06:00.0: eth0: RTL8168b/8111b at
> 0xffffc900106a2000,
> ff:ff:ff:ff:ff:ff, XID 9cf0f8ff IRQ 17

AFAIR, these lines are there as well, but I will be able to check it for sure only later today when I'll get back home to that machine.

Thank you for your reply.
Comment 15 Coacher 2013-05-08 19:41:51 UTC
After suspend/resume cycle everything is fine: MAC address is restored properly and  I am able to connect to wired network without any problems. After resume in dmesg is this:

[   76.999899] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
[   77.000432] r8169 0000:06:00.0: irq 49 for MSI/MSI-X
[   77.005264] r8169 0000:06:00.0 eth0: RTL8168c/8111c at 0xffffc900117ae000, 00:22:15:43:15:3a, XID 1c4000c0 IRQ 49
[   77.005269] r8169 0000:06:00.0 eth0: jumbo features [frames: 6128 bytes, tx checksumming: ko]

So suspend/resume is fine.

After hibernate/thaw cycle the problem decribed above appears and in dmesg I have this:

[  462.456803] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
[  462.467117] r8169 0000:06:00.0: Refused to change power state, currently in D3
[  462.467175] r8169 0000:06:00.0: cache line size of 32 is not supported
[  462.467178] r8169 0000:06:00.0 (unregistered net_device): Mem-Wr-Inval unavailable
[  462.467254] r8169 0000:06:00.0 (unregistered net_device): unknown MAC, using family default
[  462.501684] r8169 0000:06:00.0 (unregistered net_device): rtl_chipcmd_cond == 1 (loop: 100, delay: 100).
[  462.503835] r8169 0000:06:00.0 eth0: RTL8168b/8111b at 0xffffc90012bb0000, ff:ff:ff:ff:ff:ff, XID 9cf0f8ff IRQ 17
[  462.503839] r8169 0000:06:00.0 eth0: jumbo features [frames: 4080 bytes, tx checksumming: ko]