Bug 202851

Summary: r8169 reverts to 100Mbps and loses overridden MAC address after suspend
Product: Drivers Reporter: Alex Xu (Hello71) (alex_y_xu)
Component: NetworkAssignee: drivers_network (drivers_network)
Status: NEW ---    
Severity: normal CC: corngood, hkallweit1
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: torvalds master Tree: Mainline
Regression: Yes
Attachments: diff of ethtool before and after suspend
lspci before suspend

Description Alex Xu (Hello71) 2019-03-09 16:15:39 UTC
steps to reproduce:

1. suspend the machine to RAM.
2. use the network.

expected results:

it works.

actual results:

no packets are received. don't know if packets are correctly sent.

additional information:

5.0 works fine.

running "ip link set eth0 promisc on" resolves the issue, as does wireshark, but only if promiscuous mode is enabled.

dmesg has nothing interesting except for the (gigabit) card reverting to 100Mbps/Full. this is resolved by taking the link down and back up again.

I don't know if removing and re-inserting the module works, as I have it built-in.

other information forthcoming.
Comment 1 Alex Xu (Hello71) 2019-03-09 16:45:53 UTC
oh, I see. the problem is related to the problem that the MAC address is incorrectly set to 8e:00:00:00:8e:8e on boot, so I set systemd-networkd to override it to the real MAC address. if I remove that configuration and reboot and suspend and resume again, it works again, albeit reportedly at reduced speed (100Mbps, I haven't tested it, but it's definitely not 10).
Comment 2 Alex Xu (Hello71) 2019-03-09 16:48:56 UTC
Created attachment 281669 [details]
diff of ethtool before and after suspend

This is from after I removed the MAC address override. Before that, the address before would show the correct value, whereas the address after would show 8e:00:00:00:8e:8e. Presumably, the upper and lower layers of the networking stack had some disagreement about what the actual address is, so they just gave up.
Comment 3 Alex Xu (Hello71) 2019-03-09 16:50:28 UTC
Created attachment 281671 [details]
lspci before suspend

After suspend, it shows "DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-" as opposed to "CorrErr+".
Comment 4 Heiner Kallweit 2019-03-10 14:39:03 UTC
Could you please try to revert 58ba566ccbae ("r8169: reset chip synchronously in __rtl8169_resume") whether this fixes the issue?
Comment 5 Alex Xu (Hello71) 2019-03-10 17:19:11 UTC
doesn't seem to make any difference.

to be clear, I've had the problem of invalid MAC address at boot for some time, I think since 4.20 or even before. in master, the problem is that now the speed drops after suspend, and the MAC is silently reset to the (broken) MAC.
Comment 6 David McFarland 2019-04-14 23:46:00 UTC
This happens to me quite often after hibernation, but it's not 100%.  If I can find a reliable way to reproduce it, I'll bisect.
Comment 7 David McFarland 2019-05-01 10:53:42 UTC
My repro steps were a bit more complicated than I expected:

- boot linux (ethernet connects at 1G)
- hibernate linux
- boot/wake windows
- hibernate windows
- wake linux (ethernet connects at 10/100M)

I bisected it, and I believe fa6821c is the first commit with this behaviour.
Comment 8 Heiner Kallweit 2019-05-01 13:12:20 UTC
Indeed, looks tricky. Before this commit I think your systems considers WoL to be active, afterwards that's not the case any longer. Difference is:
Suspend/Hibernate with WoL: PHY speeds down to save power
Suspend/Hibernate w/o WoL: PHY is completely powered down
Resume with WoL: Original (1G) speed advertisement is set and an autoneg is started
Resume w/o WoL: Advertisement isn't touched, PHY is powered up

The Windows Realtek driver may behave similar to Linux and speed down the PHY when hibernating (behaving like the Linux driver before the commit). When Linux resumes the speed isn't touched (see "Resume w/o WoL").
To verify this assumption you could change the WoL speed-down setting of the Windows network driver.
Comment 9 David McFarland 2019-05-01 21:47:25 UTC
Good call.  Disabling speed-down in windows stopped it from happening.

So should the kernel be making assumptions about the state after resume from hibernation?  I typically dual-boot by hibernating one system at a time, but it's probably a pretty uncommon thing to do.
Comment 10 Heiner Kallweit 2019-05-02 18:00:49 UTC
Right, I suppose quite few people suffer from this issue. However, the following should fix it. And as you can see in Andrew's comment, it's useful for other use cases too: https://patchwork.ozlabs.org/patch/1093818/
Could you test this patch? It's based on latest linux-next, on older kernel versions it may not apply cleanly.
Comment 11 David McFarland 2019-05-05 19:11:15 UTC
I just tested it on 5.1.0-rc5 and it does appear to fix the problem.