Bug 208033 - r8169 wake-on-lan (WOL) works only after a manual suspend/resume cycle
Summary: r8169 wake-on-lan (WOL) works only after a manual suspend/resume cycle
Status: NEW
Alias: None
Product: Networking
Classification: Unclassified
Component: Other (show other bugs)
Hardware: x86-64 Linux
: P1 normal
Assignee: Stephen Hemminger
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-06-02 19:12 UTC by jackdroido
Modified: 2020-06-05 19:35 UTC (History)
1 user (show)

See Also:
Kernel Version: 5.6.15
Subsystem:
Regression: No
Bisected commit-id:


Attachments
before setting WoL (1.18 KB, text/plain)
2020-06-03 12:05 UTC, jackdroido
Details
after setting WoL and before first suspend (1.18 KB, text/plain)
2020-06-03 12:06 UTC, jackdroido
Details
after resume from first suspend (1.18 KB, text/plain)
2020-06-03 12:06 UTC, jackdroido
Details

Description jackdroido 2020-06-02 19:12:51 UTC
Hi !

I have a weird problem with wake-on-lan on my Realtek card.

After turning wake-on-lan flag on (using either ethtool or NetworkManager), to make the PC responsive to wol packets I have to first suspend and wakeup it once manually (hitting any key or power button).

After this maneuver, I can wakeup via magic packet from all subsequent suspend transitions AND also from the first shutdown. Then I have to repeat the manual suspend/resume cycle to make it work again.

A similar problem was mentioned at https://forums.centos.org/viewtopic.php?t=71861, but the proposed solution (forcing autonegotiation on) didn't work for me.

The PC is a Gigabyte NUC-like barebone: https://www.gigabyte.com/Mini-PcBarebone/GB-BLCE-4105R-rev-10. BIOS is up-to-date and doesn't show any "obvious" wol settings.

Here's some infos (just ask me if more are needed):

# dmesg | grep XID
[    5.738979] r8169 0000:02:00.0 eth0: RTL8168h/8111h, b4:2e:99:78:73:cf, XID 541, IRQ 126

# lspci -k
(...)
02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 15)
	Subsystem: Gigabyte Technology Co., Ltd Onboard Ethernet
	Kernel driver in use: r8169
	Kernel modules: r8169

# lspci -vvv
(...)
02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 15)
	Subsystem: Gigabyte Technology Co., Ltd Onboard Ethernet
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 20
	Region 0: I/O ports at e000 [size=256]
	Region 2: Memory at a1104000 (64-bit, non-prefetchable) [size=4K]
	Region 4: Memory at a1100000 (64-bit, non-prefetchable) [size=16K]
	Capabilities: [40] Power Management version 3
		Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
		Address: 0000000000000000  Data: 0000
	Capabilities: [70] Express (v2) Endpoint, MSI 01
		DevCap:	MaxPayload 128 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
			ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 10.000W
		DevCtl:	CorrErr- NonFatalErr- FatalErr- UnsupReq-
			RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop-
			MaxPayload 128 bytes, MaxReadReq 4096 bytes
		DevSta:	CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr+ TransPend-
		LnkCap:	Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s unlimited, L1 <64us
			ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk+
			ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s (ok), Width x1 (ok)
			TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, NROPrPrP-, LTR+
			 10BitTagComp-, 10BitTagReq-, OBFF Via message/WAKE#, ExtFmt-, EETLPPrefix-
			 EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
			 FRS-, TPHComp-, ExtTPHComp-
			 AtomicOpsCap: 32bit- 64bit- 128bitCAS-
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled
			 AtomicOpsCtl: ReqEn-
		LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
	Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
		Vector table: BAR=4 offset=00000000
		PBA: BAR=4 offset=00000800
	Capabilities: [100 v2] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
		AERCap:	First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
			MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
		HeaderLog: 00000000 00000000 00000000 00000000
	Capabilities: [140 v1] Virtual Channel
		Caps:	LPEVC=0 RefClk=100ns PATEntryBits=1
		Arb:	Fixed- WRR32- WRR64- WRR128-
		Ctrl:	ArbSelect=Fixed
		Status:	InProgress-
		VC0:	Caps:	PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
			Arb:	Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
			Ctrl:	Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
			Status:	NegoPending- InProgress-
	Capabilities: [160 v1] Device Serial Number 01-00-00-00-68-4c-e0-00
	Capabilities: [170 v1] Latency Tolerance Reporting
		Max snoop latency: 3145728ns
		Max no snoop latency: 3145728ns
	Capabilities: [178 v1] L1 PM Substates
		L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
			  PortCommonModeRestoreTime=150us PortTPowerOnTime=150us
		L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
			   T_CommonMode=0us LTR1.2_Threshold=0ns
		L1SubCtl2: T_PwrOn=10us
	Kernel driver in use: r8169
	Kernel modules: r8169

# ethtool enp2s0
Settings for enp2s0:
	Supported ports: [ TP	 MII ]
	Supported link modes:   10baseT/Half 10baseT/Full
	                        100baseT/Half 100baseT/Full
	                        1000baseT/Full
	Supported pause frame use: Symmetric Receive-only
	Supports auto-negotiation: Yes
	Supported FEC modes: Not reported
	Advertised link modes:  10baseT/Half 10baseT/Full
	                        100baseT/Half 100baseT/Full
	                        1000baseT/Full
	Advertised pause frame use: Symmetric Receive-only
	Advertised auto-negotiation: Yes
	Advertised FEC modes: Not reported
	Link partner advertised link modes:  Not reported
	Link partner advertised pause frame use: No
	Link partner advertised auto-negotiation: No
	Link partner advertised FEC modes: No
	Speed: 100Mb/s
	Duplex: Full
	Auto-negotiation: on
	Port: MII
	PHYAD: 0
	Transceiver: external
	Supports Wake-on: pumbg
	Wake-on: g
        Current message level: 0x00000033 (51)
                               drv probe ifdown ifup
	Link detected: yes

# cat /proc/acpi/wakeup 
Device	S-state	  Status   Sysfs node
SIO1	  S3	*disabled  pnp:00:00
HDAS	  S3	*disabled  pci:0000:00:0e.0
PRT0	  S4	*disabled  no-bus:dev1.0
PRT1	  S4	*disabled  no-bus:dev2.0
XHC	  S4	*enabled   pci:0000:00:15.0
XDCI	  S4	*disabled
RP01	  S4	*disabled
PXSX	  S4	*disabled
RP02	  S4	*disabled
PXSX	  S4	*disabled
RP03	  S4	*enabled   pci:0000:00:13.0
PXSX	  S4	*disabled  pci:0000:01:00.0
RP04	  S4	*disabled
PXSX	  S4	*disabled
RP05	  S4	*enabled   pci:0000:00:13.2
PXSX	  S4	*enabled   pci:0000:02:00.0
RP06	  S4	*disabled
PXSX	  S4	*disabled
CNVW	  S4	*disabled
Comment 1 Heiner Kallweit 2020-06-02 22:14:29 UTC
The following looks a little bit suspicious. What is your link partner?
Is it 1Gbps capable? Is it configured for autonegotiation?

Link partner advertised link modes:  Not reported
	Link partner advertised pause frame use: No
	Link partner advertised auto-negotiation: No
	Link partner advertised FEC modes: No
	Speed: 100Mb/s
	Duplex: Full
	Auto-negotiation: on

Did WoL work properly on an earlier kernel version?
Comment 2 Heiner Kallweit 2020-06-03 08:15:45 UTC
The PCI status details show that PME is disabled. This should not be the case after enabling WoL. Do you have an active link whilst enabling WoL? Or do you enable WoL with no link and establish the link later?
Comment 3 jackdroido 2020-06-03 08:39:31 UTC
Thanks for replying.

PC is connected directly to a 10/100 TP-Link router (hence the speed) the only one I have at hand sadly.

First time I tested WoL was around end-2019, so I would say with a 5.4.x kernel. Same result.

WoL is activated at boot via NetworkManager, but also tried excluding it and just do:

# ip link set up
# ethtool -s enp2s0 wol g

Same result too.

The link LED on the router is on even during the first suspend (where WoL is not functional).
Comment 4 jackdroido 2020-06-03 08:52:40 UTC
A little context: I ran the commands above just after poweron, before any suspend test.

I can run them again after resuming if it helps...
Comment 5 Heiner Kallweit 2020-06-03 09:47:24 UTC
I'd be interested in the following:

- lspci -vv output for the network card (grepped for PME)
- ethtool -d <if> output

in these three situations (with WoL setting by NetworkManager disabled):
1. before setting WoL
2. after setting WoL and before first suspend
3. after resume from first suspend
Comment 6 jackdroido 2020-06-03 12:05:33 UTC
Created attachment 289485 [details]
before setting WoL
Comment 7 jackdroido 2020-06-03 12:06:14 UTC
Created attachment 289487 [details]
after setting WoL and before first suspend
Comment 8 jackdroido 2020-06-03 12:06:47 UTC
Created attachment 289489 [details]
after resume from first suspend
Comment 9 Heiner Kallweit 2020-06-03 20:08:38 UTC
Thanks for testing. All that looks normal. Also I can't reproduce the issue here with RTL8168g.
What you can check in addition: After first unsuccessful WoL, please do a "grep PME /proc/interrupts". Each WoL event should generate a PME interrupt.
Comment 10 jackdroido 2020-06-03 20:41:49 UTC
Indeed, PME interrupts are there:

$ grep PME /proc/interrupts
 122:          0          0          0          0  IR-PCI-MSI 311296-edge      PCIe PME
 123:          2          0          0          0  IR-PCI-MSI 315392-edge      PCIe PME

(one for the first wakeup via keyboard, one for the second via magic packet)

To rule out problems with the router, I tried connecting the PC directly in cross with my laptop.
Same result.
Comment 11 Heiner Kallweit 2020-06-03 20:58:01 UTC
If also the first (failed) attempt generated a PME interrupt, then the network chip did all it can do. So maybe the issue is system-specific, IOW it may be a BIOS bug. A hint that root cause isn't in network driver is also that you face the same issue with the vendor driver.
Comment 12 jackdroido 2020-06-04 20:37:39 UTC
Thanks for looking at it.

I don't follow you here: "...the first (failed) attempt generated a PME interrupt...", when there is only one PME related to the (successful) WoL (the other comes from the keyboard).

Forgive me in advance, if I misunderstand you.
Comment 13 Heiner Kallweit 2020-06-04 20:45:55 UTC
PME's are triggered by PCI devices, therefore a keystroke doesn't generate a PME. My understanding is that at the first suspend you send a WoL packet, and because this doesn't work press a key afterwards.
I interpret the two PME's in a way that the first WoL packet generates a PME, but a lower level doesn't translate the PME into a system wakeup.
Comment 14 jackdroido 2020-06-04 21:14:03 UTC
Thanks for clarifying.

One thing I forgot to mention (but I think is not relevant at this point) is I also tried playing (with no success) with pcie_aspm parameter as mentioned in https://bugzilla.redhat.com/show_bug.cgi?id=1671958 (and related issues).

Got any other advice or things to try (maybe outside the network layer), before I start to dismantle the damn thing ? (just kidding :-)
Comment 15 Heiner Kallweit 2020-06-04 21:34:57 UTC
The referenced ASPM issue has been solved for quite some time.
Well, we now know that the first WoL attempt generates a PME.
You could check whether this PME interrupt generates an ACPI interrupt.
For this check whether /sys/firmware/acpi/interrupts/gpe_all gets incremented.

This info doesn't help me as I'm just a simple network driver maintainer.
But maybe it helps some other guy looking at the issue who's more fit with the ACPI subsystem.

Last but not least you could play a little with all wake-related BIOS settings. For example I can remember one quite old Gigabyte BIOS issue that could be resolved by enabling "LAN Boot ROM" in the BIOS.
Comment 16 jackdroido 2020-06-05 19:35:19 UTC
OK, here's the info:

1. first suspend, sent magic packet, then wakeup by keyboard:

+ grep PME /proc/interrupts
 122:          0          0          0          0  IR-PCI-MSI 311296-edge      PCIe PME
 123:          1          0          0          0  IR-PCI-MSI 315392-edge      PCIe PME
+ cat /sys/firmware/acpi/interrupts/gpe_all
       2

2. second suspend, wakeup by magic packet:

+ grep PME /proc/interrupts
 122:          0          0          0          0  IR-PCI-MSI 311296-edge      PCIe PME
 123:          2          0          0          0  IR-PCI-MSI 315392-edge      PCIe PME
+ cat /sys/firmware/acpi/interrupts/gpe_all
       2

3. third suspend, wakeup by keyboard:

+ grep PME /proc/interrupts
 122:          0          0          0          0  IR-PCI-MSI 311296-edge      PCIe PME
 123:          2          0          0          0  IR-PCI-MSI 315392-edge      PCIe PME
+ cat /sys/firmware/acpi/interrupts/gpe_all
       4

So it seems gpe is incremented only for keyboard wakeup.

On the BIOS front, I thikk I tried toggling every possible entry (including Enable EFI LAN driver, CSM Option ROM, S5 USB Wakeup Support...) without success. 
There is even a FAQ entry for the thing on the vendor site: https://www.gigabyte.com/Support/FAQ/2676. 
The mentioned setting was already Disabled by default.
So I tried to set it to Enabled (despite the description explicitly sayin that doing so will DISABLE WoL functionality)... and guess what ? No change at all, same behaviour !

Feel free to reassign the bug or adding "some other guy" to CC list. And again, thank you for your attention.

Note You need to log in before you can comment on or make changes to this bug.