steps to reproduce:
1. suspend the machine to RAM.
2. use the network.
no packets are received. don't know if packets are correctly sent.
5.0 works fine.
running "ip link set eth0 promisc on" resolves the issue, as does wireshark, but only if promiscuous mode is enabled.
dmesg has nothing interesting except for the (gigabit) card reverting to 100Mbps/Full. this is resolved by taking the link down and back up again.
I don't know if removing and re-inserting the module works, as I have it built-in.
other information forthcoming.
oh, I see. the problem is related to the problem that the MAC address is incorrectly set to 8e:00:00:00:8e:8e on boot, so I set systemd-networkd to override it to the real MAC address. if I remove that configuration and reboot and suspend and resume again, it works again, albeit reportedly at reduced speed (100Mbps, I haven't tested it, but it's definitely not 10).
Created attachment 281669 [details]
diff of ethtool before and after suspend
This is from after I removed the MAC address override. Before that, the address before would show the correct value, whereas the address after would show 8e:00:00:00:8e:8e. Presumably, the upper and lower layers of the networking stack had some disagreement about what the actual address is, so they just gave up.
Created attachment 281671 [details]
lspci before suspend
After suspend, it shows "DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-" as opposed to "CorrErr+".
Could you please try to revert 58ba566ccbae ("r8169: reset chip synchronously in __rtl8169_resume") whether this fixes the issue?
doesn't seem to make any difference.
to be clear, I've had the problem of invalid MAC address at boot for some time, I think since 4.20 or even before. in master, the problem is that now the speed drops after suspend, and the MAC is silently reset to the (broken) MAC.
This happens to me quite often after hibernation, but it's not 100%. If I can find a reliable way to reproduce it, I'll bisect.
My repro steps were a bit more complicated than I expected:
- boot linux (ethernet connects at 1G)
- hibernate linux
- boot/wake windows
- hibernate windows
- wake linux (ethernet connects at 10/100M)
I bisected it, and I believe fa6821c is the first commit with this behaviour.
Indeed, looks tricky. Before this commit I think your systems considers WoL to be active, afterwards that's not the case any longer. Difference is:
Suspend/Hibernate with WoL: PHY speeds down to save power
Suspend/Hibernate w/o WoL: PHY is completely powered down
Resume with WoL: Original (1G) speed advertisement is set and an autoneg is started
Resume w/o WoL: Advertisement isn't touched, PHY is powered up
The Windows Realtek driver may behave similar to Linux and speed down the PHY when hibernating (behaving like the Linux driver before the commit). When Linux resumes the speed isn't touched (see "Resume w/o WoL").
To verify this assumption you could change the WoL speed-down setting of the Windows network driver.
Good call. Disabling speed-down in windows stopped it from happening.
So should the kernel be making assumptions about the state after resume from hibernation? I typically dual-boot by hibernating one system at a time, but it's probably a pretty uncommon thing to do.
Right, I suppose quite few people suffer from this issue. However, the following should fix it. And as you can see in Andrew's comment, it's useful for other use cases too: https://patchwork.ozlabs.org/patch/1093818/
Could you test this patch? It's based on latest linux-next, on older kernel versions it may not apply cleanly.
I just tested it on 5.1.0-rc5 and it does appear to fix the problem.