Hi ! I have a weird problem with wake-on-lan on my Realtek card. After turning wake-on-lan flag on (using either ethtool or NetworkManager), to make the PC responsive to wol packets I have to first suspend and wakeup it once manually (hitting any key or power button). After this maneuver, I can wakeup via magic packet from all subsequent suspend transitions AND also from the first shutdown. Then I have to repeat the manual suspend/resume cycle to make it work again. A similar problem was mentioned at https://forums.centos.org/viewtopic.php?t=71861, but the proposed solution (forcing autonegotiation on) didn't work for me. The PC is a Gigabyte NUC-like barebone: https://www.gigabyte.com/Mini-PcBarebone/GB-BLCE-4105R-rev-10. BIOS is up-to-date and doesn't show any "obvious" wol settings. Here's some infos (just ask me if more are needed): # dmesg | grep XID [ 5.738979] r8169 0000:02:00.0 eth0: RTL8168h/8111h, b4:2e:99:78:73:cf, XID 541, IRQ 126 # lspci -k (...) 02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 15) Subsystem: Gigabyte Technology Co., Ltd Onboard Ethernet Kernel driver in use: r8169 Kernel modules: r8169 # lspci -vvv (...) 02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 15) Subsystem: Gigabyte Technology Co., Ltd Onboard Ethernet Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 20 Region 0: I/O ports at e000 [size=256] Region 2: Memory at a1104000 (64-bit, non-prefetchable) [size=4K] Region 4: Memory at a1100000 (64-bit, non-prefetchable) [size=16K] Capabilities: [40] Power Management version 3 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0+,D1+,D2+,D3hot+,D3cold+) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+ Address: 0000000000000000 Data: 0000 Capabilities: [70] Express (v2) Endpoint, MSI 01 DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 10.000W DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq- RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop- MaxPayload 128 bytes, MaxReadReq 4096 bytes DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr+ TransPend- LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s unlimited, L1 <64us ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+ LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+ ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt- LnkSta: Speed 2.5GT/s (ok), Width x1 (ok) TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, NROPrPrP-, LTR+ 10BitTagComp-, 10BitTagReq-, OBFF Via message/WAKE#, ExtFmt-, EETLPPrefix- EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit- FRS-, TPHComp-, ExtTPHComp- AtomicOpsCap: 32bit- 64bit- 128bitCAS- DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled AtomicOpsCtl: ReqEn- LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1- EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- Capabilities: [b0] MSI-X: Enable+ Count=4 Masked- Vector table: BAR=4 offset=00000000 PBA: BAR=4 offset=00000800 Capabilities: [100 v2] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr- CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+ AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn- MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap- HeaderLog: 00000000 00000000 00000000 00000000 Capabilities: [140 v1] Virtual Channel Caps: LPEVC=0 RefClk=100ns PATEntryBits=1 Arb: Fixed- WRR32- WRR64- WRR128- Ctrl: ArbSelect=Fixed Status: InProgress- VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans- Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256- Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=ff Status: NegoPending- InProgress- Capabilities: [160 v1] Device Serial Number 01-00-00-00-68-4c-e0-00 Capabilities: [170 v1] Latency Tolerance Reporting Max snoop latency: 3145728ns Max no snoop latency: 3145728ns Capabilities: [178 v1] L1 PM Substates L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+ PortCommonModeRestoreTime=150us PortTPowerOnTime=150us L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1- T_CommonMode=0us LTR1.2_Threshold=0ns L1SubCtl2: T_PwrOn=10us Kernel driver in use: r8169 Kernel modules: r8169 # ethtool enp2s0 Settings for enp2s0: Supported ports: [ TP MII ] Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Supported pause frame use: Symmetric Receive-only Supports auto-negotiation: Yes Supported FEC modes: Not reported Advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Advertised pause frame use: Symmetric Receive-only Advertised auto-negotiation: Yes Advertised FEC modes: Not reported Link partner advertised link modes: Not reported Link partner advertised pause frame use: No Link partner advertised auto-negotiation: No Link partner advertised FEC modes: No Speed: 100Mb/s Duplex: Full Auto-negotiation: on Port: MII PHYAD: 0 Transceiver: external Supports Wake-on: pumbg Wake-on: g Current message level: 0x00000033 (51) drv probe ifdown ifup Link detected: yes # cat /proc/acpi/wakeup Device S-state Status Sysfs node SIO1 S3 *disabled pnp:00:00 HDAS S3 *disabled pci:0000:00:0e.0 PRT0 S4 *disabled no-bus:dev1.0 PRT1 S4 *disabled no-bus:dev2.0 XHC S4 *enabled pci:0000:00:15.0 XDCI S4 *disabled RP01 S4 *disabled PXSX S4 *disabled RP02 S4 *disabled PXSX S4 *disabled RP03 S4 *enabled pci:0000:00:13.0 PXSX S4 *disabled pci:0000:01:00.0 RP04 S4 *disabled PXSX S4 *disabled RP05 S4 *enabled pci:0000:00:13.2 PXSX S4 *enabled pci:0000:02:00.0 RP06 S4 *disabled PXSX S4 *disabled CNVW S4 *disabled
The following looks a little bit suspicious. What is your link partner? Is it 1Gbps capable? Is it configured for autonegotiation? Link partner advertised link modes: Not reported Link partner advertised pause frame use: No Link partner advertised auto-negotiation: No Link partner advertised FEC modes: No Speed: 100Mb/s Duplex: Full Auto-negotiation: on Did WoL work properly on an earlier kernel version?
The PCI status details show that PME is disabled. This should not be the case after enabling WoL. Do you have an active link whilst enabling WoL? Or do you enable WoL with no link and establish the link later?
Thanks for replying. PC is connected directly to a 10/100 TP-Link router (hence the speed) the only one I have at hand sadly. First time I tested WoL was around end-2019, so I would say with a 5.4.x kernel. Same result. WoL is activated at boot via NetworkManager, but also tried excluding it and just do: # ip link set up # ethtool -s enp2s0 wol g Same result too. The link LED on the router is on even during the first suspend (where WoL is not functional).
A little context: I ran the commands above just after poweron, before any suspend test. I can run them again after resuming if it helps...
I'd be interested in the following: - lspci -vv output for the network card (grepped for PME) - ethtool -d <if> output in these three situations (with WoL setting by NetworkManager disabled): 1. before setting WoL 2. after setting WoL and before first suspend 3. after resume from first suspend
Created attachment 289485 [details] before setting WoL
Created attachment 289487 [details] after setting WoL and before first suspend
Created attachment 289489 [details] after resume from first suspend
Thanks for testing. All that looks normal. Also I can't reproduce the issue here with RTL8168g. What you can check in addition: After first unsuccessful WoL, please do a "grep PME /proc/interrupts". Each WoL event should generate a PME interrupt.
Indeed, PME interrupts are there: $ grep PME /proc/interrupts 122: 0 0 0 0 IR-PCI-MSI 311296-edge PCIe PME 123: 2 0 0 0 IR-PCI-MSI 315392-edge PCIe PME (one for the first wakeup via keyboard, one for the second via magic packet) To rule out problems with the router, I tried connecting the PC directly in cross with my laptop. Same result.
If also the first (failed) attempt generated a PME interrupt, then the network chip did all it can do. So maybe the issue is system-specific, IOW it may be a BIOS bug. A hint that root cause isn't in network driver is also that you face the same issue with the vendor driver.
Thanks for looking at it. I don't follow you here: "...the first (failed) attempt generated a PME interrupt...", when there is only one PME related to the (successful) WoL (the other comes from the keyboard). Forgive me in advance, if I misunderstand you.
PME's are triggered by PCI devices, therefore a keystroke doesn't generate a PME. My understanding is that at the first suspend you send a WoL packet, and because this doesn't work press a key afterwards. I interpret the two PME's in a way that the first WoL packet generates a PME, but a lower level doesn't translate the PME into a system wakeup.
Thanks for clarifying. One thing I forgot to mention (but I think is not relevant at this point) is I also tried playing (with no success) with pcie_aspm parameter as mentioned in https://bugzilla.redhat.com/show_bug.cgi?id=1671958 (and related issues). Got any other advice or things to try (maybe outside the network layer), before I start to dismantle the damn thing ? (just kidding :-)
The referenced ASPM issue has been solved for quite some time. Well, we now know that the first WoL attempt generates a PME. You could check whether this PME interrupt generates an ACPI interrupt. For this check whether /sys/firmware/acpi/interrupts/gpe_all gets incremented. This info doesn't help me as I'm just a simple network driver maintainer. But maybe it helps some other guy looking at the issue who's more fit with the ACPI subsystem. Last but not least you could play a little with all wake-related BIOS settings. For example I can remember one quite old Gigabyte BIOS issue that could be resolved by enabling "LAN Boot ROM" in the BIOS.
OK, here's the info: 1. first suspend, sent magic packet, then wakeup by keyboard: + grep PME /proc/interrupts 122: 0 0 0 0 IR-PCI-MSI 311296-edge PCIe PME 123: 1 0 0 0 IR-PCI-MSI 315392-edge PCIe PME + cat /sys/firmware/acpi/interrupts/gpe_all 2 2. second suspend, wakeup by magic packet: + grep PME /proc/interrupts 122: 0 0 0 0 IR-PCI-MSI 311296-edge PCIe PME 123: 2 0 0 0 IR-PCI-MSI 315392-edge PCIe PME + cat /sys/firmware/acpi/interrupts/gpe_all 2 3. third suspend, wakeup by keyboard: + grep PME /proc/interrupts 122: 0 0 0 0 IR-PCI-MSI 311296-edge PCIe PME 123: 2 0 0 0 IR-PCI-MSI 315392-edge PCIe PME + cat /sys/firmware/acpi/interrupts/gpe_all 4 So it seems gpe is incremented only for keyboard wakeup. On the BIOS front, I thikk I tried toggling every possible entry (including Enable EFI LAN driver, CSM Option ROM, S5 USB Wakeup Support...) without success. There is even a FAQ entry for the thing on the vendor site: https://www.gigabyte.com/Support/FAQ/2676. The mentioned setting was already Disabled by default. So I tried to set it to Enabled (despite the description explicitly sayin that doing so will DISABLE WoL functionality)... and guess what ? No change at all, same behaviour ! Feel free to reassign the bug or adding "some other guy" to CC list. And again, thank you for your attention.