Bug 107421

Summary: r8169 - rtl_counters_cond == 1 (loop: 1000, delay: 10). (klog spam)
Product: Drivers Reporter: Sverd Johnsen (sverd.johnsen)
Component: NetworkAssignee: drivers_network (drivers_network)
Status: NEW ---    
Severity: normal CC: abhigenie92, arbitraryadirc, eharastasan, esesmu, groeger, hkallweit1, jernej.jakob, joey.corleone, jonathan.p.schuster, khillman, larry_chiang, lopeonline+kernelbugzilla, mike, mirage, nailzuk, stephen, vmxevilstar, wangjiezhe, wfkernel
Priority: P1    
Hardware: x86-64   
OS: Linux   
See Also: https://bugzilla.kernel.org/show_bug.cgi?id=104351
Kernel Version: 4.3.0 Subsystem:
Regression: Yes Bisected commit-id:

Description Sverd Johnsen 2015-11-07 04:02:43 UTC
4.3.0 smp x86_64

[    4.384336] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
[    4.384747] r8169 0000:0e:00.0 eth0: RTL8101e at 0xffffc90000060000, 00:1b:38:b5:9f:d6, XID 94200000 IRQ 29
[   13.288711] r8169 0000:0e:00.0 enp14s0: renamed from eth0
[   13.317747] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[   28.238567] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[   28.904171] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[   28.923315] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[   29.029901] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[   30.562934] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[   30.740113] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[   31.232822] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[   32.519930] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[   32.536962] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[   33.311612] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[   33.328095] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[   33.474262] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[   33.493735] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[   33.657576] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[   33.674116] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[   33.851311] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[   33.867912] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[   34.004213] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[   34.025282] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[   34.086090] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[   34.102623] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[   34.148333] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[   34.165005] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[   34.210847] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[   34.227479] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[   34.274966] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[   34.291638] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[   34.342473] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[   34.358967] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[   34.404815] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[   34.421305] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[   34.466806] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[   34.483506] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[   34.531613] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[   34.548165] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[   34.596182] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[   34.613536] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[   34.675545] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[   34.699661] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[   34.746621] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[   34.763151] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[   34.808652] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[   34.825145] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[   34.871819] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[   34.888318] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[   34.933838] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[   34.950440] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[   35.536276] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[   35.552772] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[   35.708147] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[   35.724832] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[   35.889057] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[   35.905662] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[   35.950317] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[   35.967924] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[   36.096605] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[   36.113932] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[   36.131298] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[   36.148587] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[   46.511024] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[   48.304603] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[   61.528387] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[   76.545869] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[   91.563233] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[  106.580576] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[  119.670595] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[  121.597940] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[  136.615290] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[  151.632648] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[  166.649970] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[  181.667356] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[  196.684759] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[  211.702155] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[  226.719488] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[  241.736897] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[  256.754369] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[  271.771826] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[  271.789052] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[  286.806618] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[  301.824017] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[  316.841431] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[  331.858860] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[  346.876312] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[  361.893768] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[  376.911245] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[  391.928647] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[  406.946073] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[  421.963516] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[  436.980970] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[  451.998417] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[  467.015869] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[  482.033358] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[  497.050848] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[  512.068346] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[  512.085675] r8169 0000:0e:00.0 enp14s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).



0e:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8101E/RTL8102E PCI Express Fast Ethernet controller (rev 01)
        Subsystem: Toshiba America Info Systems RTL8101E/RTL8102E PCI Express Fast Ethernet controller
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 32 bytes
        Interrupt: pin A routed to IRQ 29
        Region 0: I/O ports at a000 [size=256]
        Region 2: Memory at f8100000 (64-bit, non-prefetchable) [size=4K]
        [virtual] Expansion ROM at f8120000 [disabled] [size=128K]
        Capabilities: [40] Power Management version 2
                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0-,D1+,D2+,D3hot+,D3cold+)
                Status: D3 NoSoftRst- PME-Enable+ DSel=0 DScale=0 PME-
        Capabilities: [48] Vital Product Data
                Unknown small resource type 05, will not decode more.
        Capabilities: [50] MSI: Enable+ Count=1/2 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [60] Express (v1) Endpoint, MSI 00
                DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <128ns, L1 unlimited
                        ExtTag+ AttnBtn+ AttnInd+ PwrInd+ RBE- FLReset-
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- UncorrErr+ FatalErr- UnsuppReq+ AuxPwr+ TransPend-
                LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s, Exit Latency L0s unlimited, L1 unlimited
                        ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
        Capabilities: [84] Vendor Specific Information: Len=4c <?>
        Capabilities: [100 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
                AERCap: First Error Pointer: 14, GenCap- CGenEn- ChkCap- ChkEn-
        Capabilities: [12c v1] Virtual Channel
                Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
                Arb:    Fixed- WRR32- WRR64- WRR128-
                Ctrl:   ArbSelect=Fixed
                Status: InProgress-
                VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
                        Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
                        Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
                        Status: NegoPending- InProgress-
        Capabilities: [148 v1] Device Serial Number 02-00-00-00-10-ec-81-36
        Capabilities: [154 v1] Power Budgeting <?>
        Kernel driver in use: r8169
Comment 1 Arthur Borsboom 2015-12-31 10:35:13 UTC
I notice the same error in dmesg after booting without any network connector attached, after cold boot and warm boot.

Arch Linux kernel: 4.3.3-2-ARCH

[    6.221104] r8169 0000:04:00.1 enp4s0f1: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[    6.233220] r8169 0000:04:00.1 enp4s0f1: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[    6.386644] r8169 0000:04:00.1 enp4s0f1: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[    6.664475] r8169 0000:04:00.1 enp4s0f1: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[   10.977972] r8169 0000:04:00.1 enp4s0f1: link down
Comment 2 Ugis G 2016-01-17 15:37:34 UTC
Log spam, without any eth connection
-----------
Linux ArchPC 4.3.3-2-ARCH #1 SMP PREEMPT Wed Dec 23 20:09:18 CET 2015 x86_64 GNU/Linux
09:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 03)

-----------

jan 17 17:20:47 ArchPC kernel: r8169 0000:09:00.0 enp9s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
jan 17 17:20:47 ArchPC kernel: r8169 0000:09:00.0 enp9s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
jan 17 17:20:57 ArchPC kernel: r8169 0000:09:00.0 enp9s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
jan 17 17:20:57 ArchPC kernel: r8169 0000:09:00.0 enp9s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
jan 17 17:21:07 ArchPC kernel: r8169 0000:09:00.0 enp9s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
jan 17 17:21:07 ArchPC kernel: r8169 0000:09:00.0 enp9s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
jan 17 17:21:17 ArchPC kernel: r8169 0000:09:00.0 enp9s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
jan 17 17:21:17 ArchPC kernel: r8169 0000:09:00.0 enp9s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
jan 17 17:21:27 ArchPC kernel: r8169 0000:09:00.0 enp9s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
jan 17 17:21:27 ArchPC kernel: r8169 0000:09:00.0 enp9s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
jan 17 17:21:37 ArchPC kernel: r8169 0000:09:00.0 enp9s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
jan 17 17:21:37 ArchPC kernel: r8169 0000:09:00.0 enp9s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
jan 17 17:21:47 ArchPC kernel: r8169 0000:09:00.0 enp9s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
jan 17 17:21:47 ArchPC kernel: r8169 0000:09:00.0 enp9s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
jan 17 17:21:57 ArchPC kernel: r8169 0000:09:00.0 enp9s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
jan 17 17:21:57 ArchPC kernel: r8169 0000:09:00.0 enp9s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
jan 17 17:22:07 ArchPC kernel: r8169 0000:09:00.0 enp9s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
jan 17 17:22:07 ArchPC kernel: r8169 0000:09:00.0 enp9s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
jan 17 17:22:17 ArchPC kernel: r8169 0000:09:00.0 enp9s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
jan 17 17:22:17 ArchPC kernel: r8169 0000:09:00.0 enp9s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
jan 17 17:22:27 ArchPC kernel: r8169 0000:09:00.0 enp9s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
jan 17 17:22:27 ArchPC kernel: r8169 0000:09:00.0 enp9s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
Comment 3 Will Frew 2016-01-30 12:10:38 UTC
As above, on 4.3.3-3-ARCH.

$ uname -a
Linux h4lfwit 4.3.3-3-ARCH #1 SMP PREEMPT Wed Jan 20 08:12:23 CET 2016 x86_64 GNU/Linux

$ dmesg | tail -4
[ 1932.335123] r8169 0000:01:00.0 enp1s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[ 1932.364492] r8169 0000:01:00.0 enp1s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[ 1933.489744] r8169 0000:01:00.0 enp1s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[ 1933.519148] r8169 0000:01:00.0 enp1s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).

# lspci -vvv
01:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8101E/RTL8102E PCI Express Fast Ethernet controller (rev 05)
	Subsystem: Toshiba America Info Systems Device fb30
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 29
	Region 0: I/O ports at 2000 [size=256]
	Region 2: Memory at c0104000 (64-bit, prefetchable) [size=4K]
	Region 4: Memory at c0100000 (64-bit, prefetchable) [size=16K]
	Capabilities: [40] Power Management version 3
		Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
		Status: D3 NoSoftRst+ PME-Enable+ DSel=0 DScale=0 PME-
	Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit+
		Address: 0000000000000000  Data: 0000
	Capabilities: [70] Express (v2) Endpoint, MSI 01
		DevCap:	MaxPayload 128 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
			ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
			MaxPayload 128 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr+ UncorrErr+ FatalErr- UnsuppReq+ AuxPwr+ TransPend-
		LnkCap:	Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s unlimited, L1 <64us
			ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp-
		LnkCtl:	ASPM L0s L1 Enabled; RCB 64 bytes Disabled- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Not Supported, TimeoutDis+, LTR-, OBFF Not Supported
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
		LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
	Capabilities: [b0] MSI-X: Enable- Count=4 Masked-
		Vector table: BAR=4 offset=00000000
		PBA: BAR=4 offset=00000800
	Capabilities: [d0] Vital Product Data
		Unknown small resource type 00, will not decode more.
	Capabilities: [100 v1] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr+ BadTLP- BadDLLP+ Rollover- Timeout- NonFatalErr+
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
		AERCap:	First Error Pointer: 14, GenCap+ CGenEn- ChkCap+ ChkEn-
	Capabilities: [140 v1] Virtual Channel
		Caps:	LPEVC=0 RefClk=100ns PATEntryBits=1
		Arb:	Fixed- WRR32- WRR64- WRR128-
		Ctrl:	ArbSelect=Fixed
		Status:	InProgress-
		VC0:	Caps:	PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
			Arb:	Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
			Ctrl:	Enable+ ID=0 ArbSelect=Fixed TC/VC=01
			Status:	NegoPending- InProgress-
	Capabilities: [160 v1] Device Serial Number 2f-01-00-00-36-4c-e0-00
	Kernel driver in use: r8169
	Kernel modules: r8169
Comment 4 Oleg Muraviov 2016-01-31 05:19:38 UTC
Same for me

09:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8101E/RTL8102E PCI Express Fast Ethernet controller (rev 0a)

Linux hp 4.4.0-gentoo-r1 #11 SMP Sat Jan 30 22:51:38 EET 2016 x86_64 Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz GenuineIntel GNU/Linux

[  413.406739] r8169 0000:09:00.0 eno1: rtl_counters_cond == 1 (loop: 1000, delay: 10).

Card seems to work fine, just annoying messages
Comment 5 Stephen Hemminger 2016-02-03 04:52:21 UTC
It happens for me on laptop when nothing is connected.
Comment 6 paranoidabhi 2016-07-01 08:43:19 UTC
Notice the same issue with:
abhishek log $ uname -a
Linux hp 4.4.0-28-generic #47-Ubuntu SMP Fri Jun 24 10:09:13 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

It blows my syslog off. Any fixes? I have noticed it happens after I disconnected my LAN cable.
Comment 7 Neil McCirrus 2016-07-12 18:27:13 UTC
#Archlinux x86_64
>uname -a 
Linux hawker64 4.6.4-1-ck #1 SMP PREEMPT Mon Jul 11 17:37:05 EDT 2016 x86_64 GNU/Linux
>inxi -M 
Machine:   Mobo: ASUSTeK model: P6T SE v: Rev 1.xx Bios: American Megatrends v: 0908 date: 09/21/2010

I too have been experiencing this but i put it down to the Intel 55x0 chipset errata - Interrupt remapping issue (Intel 5500/5520/X58 chipset revision 0x13 and 0x22 have an errata (#47 and #53) which makes the IOMMU interrupt remapping unit unreliable. This erratum causes interruptions and the interrupt remapping invalidations become unresponsive) https://forums.gentoo.org/viewtopic-t-1030102-start-0.html?sid=59c8eddb43e0553296f93355ea10b42d
below are some snippets from logs that i *think* may be relevant. Ocaasionaly i get hard lockups where only a hard reboot will suffice,on other occasion i just lose network connectivity, mostly x freezes and i get dropped to tty with errors about radeon fence/ring. This started happening for me since kernels 4.* if i rollback to say kernel 3.19-1 all is well. So again i *guess* it's a kernel regression, either that or my issue is a mixture of this and the IOMMU thing. (happens when using linux-ck or stock archlinux kernels)

perf: interrupt took too long (2711 > 2500), lowering kernel.perf_event_max_sample_rate to 73000
 perf: interrupt took too long (3512 > 3388), lowering kernel.perf_event_max_sample_rate to 56000
 perf: interrupt took too long (4459 > 4390), lowering kernel.perf_event_max_sample_rate to 44000
 perf: interrupt took too long (5613 > 5573), lowering kernel.perf_event_max_sample_rate to 35000
hawker64 kernel: [drm:radeon_cs_ioctl [radeon]] *ERROR* Failed to schedule IB !
hawker64 kernel: [drm:radeon_cs_ioctl [radeon]] *ERROR* Failed to schedule IB !
hawker64 kernel: radeon 0000:02:00.0: scheduling IB failed (-2).
hawker64 kernel: [drm:radeon_cs_ioctl [radeon]] *ERROR* Failed to schedule IB !
hawker64 kernel: r8169 0000:05:00.0 enp5s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
hawker64 kernel: r8169 0000:05:00.0 enp5s0: rtl_counters_cond == 1 (loop: 1000, delay: 10).
hawker64 kernel: INFO: rcu_preempt detected stalls on CPUs/tasks:
hawker64 kernel:         1-...: (0 ticks this GP) idle=9b2/0/0 softirq=4017531/4017531 fqs=0 
hawker64 kernel:         2-...: (6 GPs behind) idle=dd2/0/0 softirq=2426637/2426637 fqs=0 
hawker64 kernel:         3-...: (3 GPs behind) idle=e78/0/0 softirq=1361775/1361777 fqs=0 
hawker64 kernel:         4-...: (38 GPs behind) idle=3e0/0/0 softirq=440832/440833 fqs=0 
hawker64 kernel:         6-...: (1 GPs behind) idle=efe/0/0 softirq=305520/305520 fqs=0 
hawker64 kernel:         7-...: (1 GPs behind) idle=a2a/0/0 softirq=204822/204822 fqs=0 
hawker64 kernel:         (detected by 0, t=127647 jiffies, g=2627574, c=2627573, q=9503)
hawker64 kernel: rcu_preempt kthread starved for 127647 jiffies! g2627574 c2627573 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x0
Jul 01 14:13:57 hawker64 login[541]: pam_systemd(login:session): Failed to release session: Connection reset by peer
Jul 01 14:13:57 hawker64 systemd-logind[507]: Failed to abandon session scope: Transport endpoint is not connected
-- Reboot --
Comment 8 Michael Joya 2018-05-30 17:47:35 UTC
I am also getting this with 4.17.0-0.rc7. lspci says my card is: 
Realtek Semiconductor Co., Ltd. RTL8169 PCI Gigabit Ethernet Controller (rev 10)


r8169 0000:04:06.0 enp4s6: rtl_counters_cond == 1 (loop: 1000, delay: 10)
...(spam)...


I compiled this kernel myself since I saw that this was supposed to have been fixed by previous commits:

f51d4a10ac39ecf06b25e7a79121b06f7ed59928
and
e06362369ae1e5b0ba70b66f8703ff08bcb63b23

...however it persists in 4.17.0.


Another interesting fact is that ethtool cannot disable the wol features:

# ethtool enp4s6
Settings for enp4s6:
	Supported ports: [ TP MII ]
	Supported link modes:   10baseT/Half 10baseT/Full 
	                        100baseT/Half 100baseT/Full 
	                        1000baseT/Half 1000baseT/Full 
	Supported pause frame use: No
	Supports auto-negotiation: Yes
	Supported FEC modes: Not reported
	Advertised link modes:  10baseT/Half 10baseT/Full 
	                        100baseT/Half 100baseT/Full 
	                        1000baseT/Half 1000baseT/Full 
	Advertised pause frame use: Symmetric Receive-only
	Advertised auto-negotiation: Yes
	Advertised FEC modes: Not reported
	Link partner advertised link modes:  10baseT/Half 10baseT/Full 
	                                     100baseT/Half 100baseT/Full 
	                                     1000baseT/Half 1000baseT/Full 
	Link partner advertised pause frame use: Symmetric Receive-only
	Link partner advertised auto-negotiation: Yes
	Link partner advertised FEC modes: Not reported
	Speed: 1000Mb/s
	Duplex: Full
	Port: MII
	PHYAD: 0
	Transceiver: internal
	Auto-negotiation: on
	Supports Wake-on: pumbg
	Wake-on: pumbg
	Current message level: 0x00000033 (51)
			       drv probe ifdown ifup
	Link detected: yes

# ethtool -s enp4s6 wol d

... and # ethtool enp4s6 will report same information, no change.
Comment 9 Jernej Jakob 2018-12-24 12:28:58 UTC
Still present in 

$ uname -a

Linux vyos 4.19.4-amd64-vyos #1 SMP Thu Dec 13 10:10:42 CET 2018 x86_64 GNU/Linux

$ dmesg |grep r8169

[    1.700147] libphy: r8169: probed
[    1.700762] r8169 0000:01:00.0 eth0: RTL8168e/8111e, aa:aa:aa:00:1e:00, XID 2c200000, IRQ 24
[    1.700767] r8169 0000:01:00.0 eth0: jumbo features [frames: 9200 bytes, tx checksumming: ko]
[    1.705981] libphy: r8169: probed
[    1.706263] r8169 0000:02:00.0 eth1: RTL8101e, 00:19:21:df:fc:1e, XID 34000000, IRQ 25
[    1.709367] libphy: r8169: probed
[    1.709994] r8169 0000:03:00.0 eth2: RTL8168e/8111e, aa:aa:aa:00:1d:50, XID 2c200000, IRQ 26
[    1.710000] r8169 0000:03:00.0 eth2: jumbo features [frames: 9200 bytes, tx checksumming: ko]
[    1.710214] r8169 0000:04:02.0: not PCI Express
[    1.713563] libphy: r8169: probed
[    1.713839] r8169 0000:04:02.0 eth3: RTL8169sb/8110sb, c8:3a:35:dd:e3:b8, XID 10000000, IRQ 19
[    1.713844] r8169 0000:04:02.0 eth3: jumbo features [frames: 7152 bytes, tx checksumming: ok]
[   25.660949] r8169 0000:02:00.0 rename3: renamed from eth1
[   26.797862] r8169 0000:01:00.0: invalid short VPD tag 05 at offset 2
[   26.798967] r8169 0000:02:00.0: invalid short VPD tag 05 at offset 2
[   26.802305] r8169 0000:03:00.0: invalid short VPD tag 05 at offset 2
[   26.820900] r8169 0000:03:00.0 rename4: renamed from eth2
[   26.865677] r8169 0000:02:00.0 eth2: renamed from rename3
[   27.912200] r8169 0000:01:00.0 eth1: renamed from eth0
[   27.950468] r8169 0000:03:00.0 eth0: renamed from rename4
[   49.346688] RTL8211C Gigabit Ethernet r8169-410:00: attached PHY driver [RTL8211C Gigabit Ethernet] (mii_bus:phy_addr=r8169-410:00, irq=IGNORE)
[   49.447626] r8169 0000:04:02.0 eth3: Link is Down
[   51.929646] RTL8201CP Ethernet r8169-200:00: attached PHY driver [RTL8201CP Ethernet] (mii_bus:phy_addr=r8169-200:00, irq=IGNORE)
[   52.696796] r8169 0000:04:02.0 eth3: Link is Up - 1Gbps/Full - flow control rx/tx
[   53.593780] RTL8211DN Gigabit Ethernet r8169-100:00: attached PHY driver [RTL8211DN Gigabit Ethernet] (mii_bus:phy_addr=r8169-100:00, irq=IGNORE)
[   53.856398] r8169 0000:01:00.0 eth1: No native access to PCI extended config space, falling back to CSI
[   55.400747] RTL8211DN Gigabit Ethernet r8169-300:00: attached PHY driver [RTL8211DN Gigabit Ethernet] (mii_bus:phy_addr=r8169-300:00, irq=IGNORE)
[   55.784729] r8169 0000:01:00.0 eth1: Link is Up - 1Gbps/Full - flow control rx/tx
[   57.276860] r8169 0000:03:00.0 eth0: Link is Up - 100Mbps/Full - flow control rx/tx
[ 1455.954345] r8169 0000:02:00.0 eth2: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[ 1455.965944] r8169 0000:02:00.0 eth2: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[ 1456.475123] r8169 0000:02:00.0 eth2: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[ 1456.486725] r8169 0000:02:00.0 eth2: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[ 1458.145224] r8169 0000:02:00.0 eth2: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[ 1458.156894] r8169 0000:02:00.0 eth2: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[ 1458.348005] r8169 0000:02:00.0 eth2: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[ 1458.359591] r8169 0000:02:00.0 eth2: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[ 1460.720713] r8169 0000:02:00.0 eth2: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[ 1460.732311] r8169 0000:02:00.0 eth2: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[ 1827.272274] r8169 0000:02:00.0 eth2: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[ 2236.992186] r8169 0000:02:00.0 eth2: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[ 2237.003751] r8169 0000:02:00.0 eth2: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[ 2239.731677] r8169 0000:02:00.0 eth2: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[ 2239.743239] r8169 0000:02:00.0 eth2: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[ 2267.371138] r8169 0000:02:00.0 eth2: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[ 2267.382701] r8169 0000:02:00.0 eth2: rtl_counters_cond == 1 (loop: 1000, delay: 10).
[ 2427.267260] r8169 0000:02:00.0 eth2: rtl_counters_cond == 1 (loop: 1000, delay: 10).

$ lspci

00:00.0 Host bridge: Intel Corporation 82945G/GZ/P/PL Memory Controller Hub (rev 02)
00:01.0 PCI bridge: Intel Corporation 82945G/GZ/P/PL PCI Express Root Port (rev 02)
00:02.0 VGA compatible controller: Intel Corporation 82945G/GZ Integrated Graphics Controller (rev 02)
00:1c.0 PCI bridge: Intel Corporation NM10/ICH7 Family PCI Express Port 1 (rev 01)
00:1c.1 PCI bridge: Intel Corporation NM10/ICH7 Family PCI Express Port 2 (rev 01)
00:1d.0 USB controller: Intel Corporation NM10/ICH7 Family USB UHCI Controller #1 (rev 01)
00:1d.1 USB controller: Intel Corporation NM10/ICH7 Family USB UHCI Controller #2 (rev 01)
00:1d.2 USB controller: Intel Corporation NM10/ICH7 Family USB UHCI Controller #3 (rev 01)
00:1d.3 USB controller: Intel Corporation NM10/ICH7 Family USB UHCI Controller #4 (rev 01)
00:1d.7 USB controller: Intel Corporation NM10/ICH7 Family USB2 EHCI Controller (rev 01)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev e1)
00:1f.0 ISA bridge: Intel Corporation 82801GB/GR (ICH7 Family) LPC Interface Bridge (rev 01)
00:1f.3 SMBus: Intel Corporation NM10/ICH7 Family SMBus Controller (rev 01)
01:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 06)
02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8101E/RTL8102E PCI Express Fast Ethernet controller (rev 01)
03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 06)
04:02.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8169 PCI Gigabit Ethernet Controller (rev 10)

https://github.com/vyos/vyos-kernel
Comment 10 Heiner Kallweit 2018-12-24 23:42:24 UTC
It seems that only RTL8101e has the described problem. Or did anybody face this problem with another chip version too?
Comment 11 Jernej Jakob 2018-12-25 01:53:51 UTC
(In reply to Heiner Kallweit from comment #10)
> It seems that only RTL8101e has the described problem. Or did anybody face
> this problem with another chip version too?

I see a variety of chips in this bug report, including RTL8111/8168/8411, RTL8101E/RTL8102E and RTL8169.
The 8101e is most likely mentioned more times because it is only fast ethernet (100M), not gigabit, so people are more likely to have it unplugged and using a gigabit card instead.
Comment 12 Heiner Kallweit 2018-12-25 08:58:51 UTC
Unfortunately the lspci output doesn't really help. What I would need is the dmesg line with chip name and XID (like in yesterdays report). And it would be good if affected people could re-test with a recent kernel version. I won't pick up 3 years old reports, for certain chip versions this may have been fixed in the meantime.
I'll check with Realtek whether there are any known issues with the statistics counters on specific chip versions.
Comment 13 Heiner Kallweit 2018-12-25 09:48:43 UTC
(In reply to Jernej Jakob from comment #9)
> Still present in 
> 
> $ uname -a
> 
> Linux vyos 4.19.4-amd64-vyos #1 SMP Thu Dec 13 10:10:42 CET 2018 x86_64
> GNU/Linux
> 
> $ dmesg |grep r8169
> 
> [    1.700147] libphy: r8169: probed
> [    1.700762] r8169 0000:01:00.0 eth0: RTL8168e/8111e, aa:aa:aa:00:1e:00,
> XID 2c200000, IRQ 24
> [    1.700767] r8169 0000:01:00.0 eth0: jumbo features [frames: 9200 bytes,
> tx checksumming: ko]
> [    1.705981] libphy: r8169: probed
> [    1.706263] r8169 0000:02:00.0 eth1: RTL8101e, 00:19:21:df:fc:1e, XID
> 34000000, IRQ 25
> [    1.709367] libphy: r8169: probed
> [    1.709994] r8169 0000:03:00.0 eth2: RTL8168e/8111e, aa:aa:aa:00:1d:50,
> XID 2c200000, IRQ 26
> [    1.710000] r8169 0000:03:00.0 eth2: jumbo features [frames: 9200 bytes,
> tx checksumming: ko]
> [    1.710214] r8169 0000:04:02.0: not PCI Express
> [    1.713563] libphy: r8169: probed
> [    1.713839] r8169 0000:04:02.0 eth3: RTL8169sb/8110sb, c8:3a:35:dd:e3:b8,
> XID 10000000, IRQ 19
> [    1.713844] r8169 0000:04:02.0 eth3: jumbo features [frames: 7152 bytes,
> tx checksumming: ok]
> [   25.660949] r8169 0000:02:00.0 rename3: renamed from eth1
> [   26.797862] r8169 0000:01:00.0: invalid short VPD tag 05 at offset 2
> [   26.798967] r8169 0000:02:00.0: invalid short VPD tag 05 at offset 2
> [   26.802305] r8169 0000:03:00.0: invalid short VPD tag 05 at offset 2
> [   26.820900] r8169 0000:03:00.0 rename4: renamed from eth2
> [   26.865677] r8169 0000:02:00.0 eth2: renamed from rename3
> [   27.912200] r8169 0000:01:00.0 eth1: renamed from eth0
> [   27.950468] r8169 0000:03:00.0 eth0: renamed from rename4
> [   49.346688] RTL8211C Gigabit Ethernet r8169-410:00: attached PHY driver
> [RTL8211C Gigabit Ethernet] (mii_bus:phy_addr=r8169-410:00, irq=IGNORE)
> [   49.447626] r8169 0000:04:02.0 eth3: Link is Down
> [   51.929646] RTL8201CP Ethernet r8169-200:00: attached PHY driver
> [RTL8201CP Ethernet] (mii_bus:phy_addr=r8169-200:00, irq=IGNORE)
> [   52.696796] r8169 0000:04:02.0 eth3: Link is Up - 1Gbps/Full - flow
> control rx/tx
> [   53.593780] RTL8211DN Gigabit Ethernet r8169-100:00: attached PHY driver
> [RTL8211DN Gigabit Ethernet] (mii_bus:phy_addr=r8169-100:00, irq=IGNORE)
> [   53.856398] r8169 0000:01:00.0 eth1: No native access to PCI extended
> config space, falling back to CSI
> [   55.400747] RTL8211DN Gigabit Ethernet r8169-300:00: attached PHY driver
> [RTL8211DN Gigabit Ethernet] (mii_bus:phy_addr=r8169-300:00, irq=IGNORE)
> [   55.784729] r8169 0000:01:00.0 eth1: Link is Up - 1Gbps/Full - flow
> control rx/tx
> [   57.276860] r8169 0000:03:00.0 eth0: Link is Up - 100Mbps/Full - flow
> control rx/tx
> [ 1455.954345] r8169 0000:02:00.0 eth2: rtl_counters_cond == 1 (loop: 1000,
> delay: 10).
> [ 1455.965944] r8169 0000:02:00.0 eth2: rtl_counters_cond == 1 (loop: 1000,
> delay: 10).
> [ 1456.475123] r8169 0000:02:00.0 eth2: rtl_counters_cond == 1 (loop: 1000,
> delay: 10).

The issue starts from second 1455 after boot. This leaves few questions:
- Was interface eth2 brought up at boot? Looks like it's brought up at second 51, because the PHY driver is attached on device open.
- Is something connected to eth2, IOW should a link have been established?
- Any action on second 1455 which could be related to the start of the issue?
Comment 14 Heiner Kallweit 2018-12-25 18:42:45 UTC
(In reply to Jernej Jakob from comment #9)
> Still present in 
> 
> $ uname -a
> 
> Linux vyos 4.19.4-amd64-vyos #1 SMP Thu Dec 13 10:10:42 CET 2018 x86_64
> GNU/Linux
> 
> $ dmesg |grep r8169
> 
[..]
> [ 1455.954345] r8169 0000:02:00.0 eth2: rtl_counters_cond == 1 (loop: 1000,
> delay: 10).
> [ 1455.965944] r8169 0000:02:00.0 eth2: rtl_counters_cond == 1 (loop: 1000,
> delay: 10).
[..]
Seems like the chip isn't reachable, e.g. because it's in a PCI powersave state.
Because all PCI reads return 0xff, this would also explain why in comment 8 all WoL flags seem to be set:
	Supports Wake-on: pumbg
	Wake-on: pumbg

I would need to know the call trace of the attempt to access the sleeping chip.
Could you apply the following and post the resulting warning?


diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index 7c252e198..cb2aab4e6 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -768,6 +768,7 @@ static bool rtl_loop_wait(struct rtl8169_private *tp, const struct rtl_cond *c,
 	}
 	netif_err(tp, drv, tp->dev, "%s == %d (loop: %d, delay: %d).\n",
 		  c->msg, !high, n, d);
+	WARN_ON_ONCE(1);
 	return false;
 }
 
-- 
2.20.1
Comment 15 Heiner Kallweit 2019-01-07 15:27:16 UTC
With the following commit the log spam shouldn't occur any longer (even though it's not clear how the chip can be in a PCI power-save state if it's not runtime-suspended). However it will take some time until it gets applied to stable.

https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=10262b0b53666cbc506989b17a3ead1e9c3b43b4
Comment 16 Jernej Jakob 2019-02-02 13:17:53 UTC
(In reply to Heiner Kallweit from comment #14)
...
> Could you apply the following and post the resulting warning?
> 
> 
> diff --git a/drivers/net/ethernet/realtek/r8169.c
> b/drivers/net/ethernet/realtek/r8169.c
> index 7c252e198..cb2aab4e6 100644
> --- a/drivers/net/ethernet/realtek/r8169.c
> +++ b/drivers/net/ethernet/realtek/r8169.c
> @@ -768,6 +768,7 @@ static bool rtl_loop_wait(struct rtl8169_private *tp,
> const struct rtl_cond *c,
>       }
>       netif_err(tp, drv, tp->dev, "%s == %d (loop: %d, delay: %d).\n",
>                 c->msg, !high, n, d);
> +     WARN_ON_ONCE(1);
>       return false;
>  }
>  
> -- 
> 2.20.1

Applied and tested, but no change - not getting any extra output. Do I need to change any log level settings? My rtl_loop_wait is on line 772 (kernel 4.19.4-amd64-vyos)

When I run "ip link" I get rtl_counters_cond logged on the console 4 times before the command gives any output.

Same with "ethtool eth2", I get two rtl_counters_cond on the console.
Comment 17 Heiner Kallweit 2019-02-02 13:28:17 UTC
WARN_ON_ONCE prints a stack trace plus some additional info to the syslog (dmesg).
Could you please re-test with 4.19.19 or 4.20.6? They include the patch referenced in comment 15.
Comment 18 Jernej Jakob 2019-02-02 14:07:40 UTC
(In reply to Heiner Kallweit from comment #17)
> WARN_ON_ONCE prints a stack trace plus some additional info to the syslog
> (dmesg).
> Could you please re-test with 4.19.19 or 4.20.6? They include the patch
> referenced in comment 15.

I'm testing that patch right now on the same kernel version. I can't easily upgrade the kernel as this is an image-based system (vyos) that's running on that particular hardware. I can however quickly and easily recompile the module and swap out the .ko (with rebooting) as I have everything set up from the first recompilation.
Comment 19 Jernej Jakob 2019-02-02 22:00:10 UTC
(In reply to Heiner Kallweit from comment #15)
> With the following commit the log spam shouldn't occur any longer (even
> though it's not clear how the chip can be in a PCI power-save state if it's
> not runtime-suspended). However it will take some time until it gets applied
> to stable.
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/
> ?id=10262b0b53666cbc506989b17a3ead1e9c3b43b4

No difference with this patch either. Interestingly, ethtool seems to show the WOL state fine:

Settings for eth2:
	Supported ports: [ TP MII ]
	Supported link modes:   10baseT/Half 10baseT/Full 
	                        100baseT/Half 100baseT/Full 
	Supported pause frame use: Symmetric Receive-only
	Supports auto-negotiation: Yes
	Advertised link modes:  10baseT/Half 10baseT/Full 
	                        100baseT/Half 100baseT/Full 
	Advertised pause frame use: Symmetric Receive-only
	Advertised auto-negotiation: Yes
	Speed: Unknown!
	Duplex: Unknown! (255)
	Port: MII
	PHYAD: 0
	Transceiver: external
	Auto-negotiation: on
	Supports Wake-on: pumbg
	Wake-on: ug
	Current message level: 0x00000033 (51)
			       drv probe ifdown ifup
	Link detected: no

Also there seems to be no difference between ethtool output when it works normally (without rtl_counters_cond logged) and when it doesn't. 
The logs always start a certain time after booting, not immediately. And there is exactly one log every 600 seconds (something polling the NIC?).
Comment 20 Jernej Jakob 2019-02-02 22:24:50 UTC
lspci -vvvnn:

02:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8101E/RTL8102E PCI Express Fast Ethernet controller [10ec:8136] (rev 01)
	Subsystem: Hewlett-Packard Company Device [103c:2a57]
	Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Interrupt: pin A routed to IRQ 25
	Region 0: I/O ports at c800 [disabled] [size=256]
	Region 2: Memory at fe9ff000 (64-bit, non-prefetchable) [disabled] [size=4K]
	[virtual] Expansion ROM at fe9c0000 [disabled] [size=128K]
	Capabilities: [40] Power Management version 2
		Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0-,D1+,D2+,D3hot+,D3cold+)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [48] Vital Product Data
pcilib: sysfs_read_vpd: read failed: Input/output error
		Not readable
	Capabilities: [50] MSI: Enable- Count=1/2 Maskable- 64bit+
		Address: 0000000000000000  Data: 0000
	Capabilities: [60] Express (v1) Endpoint, MSI 00
		DevCap:	MaxPayload 128 bytes, PhantFunc 0, Latency L0s <128ns, L1 unlimited
			ExtTag+ AttnBtn+ AttnInd+ PwrInd+ RBE- FLReset-
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
			MaxPayload 128 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr+ UncorrErr+ FatalErr+ UnsuppReq+ AuxPwr+ TransPend-
		LnkCap:	Port #0, Speed 2.5GT/s, Width x1, ASPM L0s, Exit Latency L0s unlimited, L1 unlimited
			ClockPM- Surprise- LLActRep- BwNot-
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk-
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
	Capabilities: [84] Vendor Specific Information: Len=4c <?>
	Kernel driver in use: r8169

The VPD read error is present at all three PCI-e cards in this system (the one in question is onboard, the other two are RTL8111/8168/8411 add-in cards) but the rtl_counters_cond happens only on this one with no link. One RTL8168 PCI card shows no such error. There are a total of 4 cards, 3 with link up and 1 down.
Comment 21 Heiner Kallweit 2019-02-02 22:43:41 UTC
Quite some things look weird here:

(In reply to Jernej Jakob from comment #20)
> lspci -vvvnn:
> 
> 02:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd.
> RTL8101E/RTL8102E PCI Express Fast Ethernet controller [10ec:8136] (rev 01)
>       Subsystem: Hewlett-Packard Company Device [103c:2a57]
>       Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr-
> Stepping- SERR- FastB2B- DisINTx-
>       Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
> <MAbort- >SERR- <PERR- INTx-
>       Interrupt: pin A routed to IRQ 25
>       Region 0: I/O ports at c800 [disabled] [size=256]
>       Region 2: Memory at fe9ff000 (64-bit, non-prefetchable) [disabled]
> [size=4K]
Region is disabled, this shouldn't be the case.

>       [virtual] Expansion ROM at fe9c0000 [disabled] [size=128K]
>       Capabilities: [40] Power Management version 2
>               Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA
> PME(D0-,D1+,D2+,D3hot+,D3cold+)
>               Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
>       Capabilities: [48] Vital Product Data
> pcilib: sysfs_read_vpd: read failed: Input/output error
>               Not readable

The VPD error is normal.

>       Capabilities: [50] MSI: Enable- Count=1/2 Maskable- 64bit+
>               Address: 0000000000000000  Data: 0000
>       Capabilities: [60] Express (v1) Endpoint, MSI 00
>               DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <128ns,
> L1 unlimited
>                       ExtTag+ AttnBtn+ AttnInd+ PwrInd+ RBE- FLReset-
>               DevCtl: Report errors: Correctable- Non-Fatal- Fatal-
> Unsupported-
>                       RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
>                       MaxPayload 128 bytes, MaxReadReq 512 bytes
>               DevSta: CorrErr+ UncorrErr+ FatalErr+ UnsuppReq+ AuxPwr+
> TransPend-
>               LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s, Exit
> Latency L0s
> unlimited, L1 unlimited
>                       ClockPM- Surprise- LLActRep- BwNot-
>               LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk-
>                       ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
>               LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+
> DLActive- BWMgmt-
> ABWMgmt-
>       Capabilities: [84] Vendor Specific Information: Len=4c <?>

This also doesn't look good. Here it breaks.

>       Kernel driver in use: r8169
> 
> The VPD read error is present at all three PCI-e cards in this system (the
> one in question is onboard, the other two are RTL8111/8168/8411 add-in
> cards) but the rtl_counters_cond happens only on this one with no link. One
> RTL8168 PCI card shows no such error. There are a total of 4 cards, 3 with
> link up and 1 down.

Was there ever a kernel version that didn't show these errors? IOW, is it a regression? Else support for this very old chip version may always have been broken.
Comment 22 Jernej Jakob 2019-02-03 20:59:48 UTC
(In reply to Heiner Kallweit from comment #21)
> Was there ever a kernel version that didn't show these errors? IOW, is it a
> regression? Else support for this very old chip version may always have been
> broken.

I'm not sure, I think it was previously on 3.13, definitely no errors then.

Now I tried plugging a cable in and the link LED came on, but the system didn't detect the link up change (ip link and ethtool both showed link down), then I rebooted it and the link came up, now it works normally so it was likely in some kind of weird powered down state.
lspci now shows:

02:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8101E/RTL8102E PCI Express Fast Ethernet controller [10ec:8136] (rev 01)
	Subsystem: Hewlett-Packard Company Device [103c:2a57]
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 32 bytes
	Interrupt: pin A routed to IRQ 25
	Region 0: I/O ports at c800 [size=256]
	Region 2: Memory at fe9ff000 (64-bit, non-prefetchable) [size=4K]
	Expansion ROM at fe9c0000 [disabled] [size=128K]
	Capabilities: [40] Power Management version 2
		Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0-,D1+,D2+,D3hot+,D3cold+)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME+
	Capabilities: [48] Vital Product Data
pcilib: sysfs_read_vpd: read failed: Input/output error
		Not readable
	Capabilities: [50] MSI: Enable+ Count=1/2 Maskable- 64bit+
		Address: 00000000fee01004  Data: 4026
	Capabilities: [60] Express (v1) Endpoint, MSI 00
		DevCap:	MaxPayload 128 bytes, PhantFunc 0, Latency L0s <128ns, L1 unlimited
			ExtTag+ AttnBtn+ AttnInd+ PwrInd+ RBE- FLReset-
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
			MaxPayload 128 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr+ UncorrErr+ FatalErr- UnsuppReq+ AuxPwr+ TransPend-
		LnkCap:	Port #0, Speed 2.5GT/s, Width x1, ASPM L0s, Exit Latency L0s unlimited, L1 unlimited
			ClockPM- Surprise- LLActRep- BwNot-
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
	Capabilities: [84] Vendor Specific Information: Len=4c <?>
	Kernel driver in use: r8169

Region 2 is no longer disabled.
Also the RTL8111 and RTL8169 that are now unplugged show no errors.
Comment 23 Heiner Kallweit 2019-02-03 21:04:23 UTC
(In reply to Jernej Jakob from comment #22)
> (In reply to Heiner Kallweit from comment #21)
> > Was there ever a kernel version that didn't show these errors? IOW, is it a
> > regression? Else support for this very old chip version may always have
> been
> > broken.
> 
> I'm not sure, I think it was previously on 3.13, definitely no errors then.
> 
> Now I tried plugging a cable in and the link LED came on, but the system
> didn't detect the link up change (ip link and ethtool both showed link
> down), then I rebooted it and the link came up, now it works normally so it
> was likely in some kind of weird powered down state.

Currently ther's an issue with wakeup if cable is plugged in. It's an issue in the PCIe subsystem, fix is waiting to be applied:
https://patchwork.ozlabs.org/patch/1034385/
Comment 24 Jernej Jakob 2019-02-03 21:35:20 UTC
Neither of those 2 patches will apply on my 4.19 branch. Seems like those functions were added in later commits. Do I need to update and rebuild my whole kernel or can I somehow find which commits I need to backport?
Comment 25 Heiner Kallweit 2019-02-03 21:43:45 UTC
Right, maybe the change reverted by this patch was added after 4.19.

When checking the vendor r8101 driver, I saw that they disable MSI for this chip version and use a legacy interrupt. But I don't know whether disabling MSI could help in your case. This old RTL8101e chip version seems to be somewhat quirky.
Your log says subsystem HP, what kind of device is it, an old HP laptop?
Comment 26 Jernej Jakob 2019-02-03 21:54:24 UTC
It's an old HP Livermore 945GCT-HM mobo that's used as a network router, old and obsolete but otherwise super reliable other than this bug.

Yeah, the removed functions in https://patchwork.ozlabs.org/patch/1034385/ don't exist, but neither do the two in https://patchwork.ozlabs.org/patch/1034384/ that are the supposed replacement. I'm not sure if I should try searching for all the required commits to apply the second patch or if this is a totally unrelated issue.
Comment 27 lopeonline+kernelbugzilla 2019-03-10 14:12:46 UTC
I experienced a variation of this issue now.
I rename me r8169 ethernet adapter to primary-eth using udev rules. It's been working reliably, however now I booted up and primary-eth was there as usual, but it was not part of primary-bridge, as it is supposed to be.

The kernel had somehow created an additional bridge called "rename3" and added the renamed r8169 adapter primary-eth to the "rename3" bridge.

==================================
details

Below I rename my primary ethernet adapter to primary-eth, catering for booting my hard drive in either of my 2 different computers
===========================================================
/etc/udev/rules.d/70-mainnet-setup-link.rules

#config for when I boot hard drive in computer 1
SUBSYSTEM=="net", ACTION=="add", ATTR{address}=="aa:aa:aa:aa:aa:aa", NAME="primary-eth"

#config for when I boot hard drive in computer 2
SUBSYSTEM=="net", ACTION=="add", ATTR{address}=="bb:bb:bb:bb:bb:bb", NAME="primary-eth"
===========================================================


I use "computer 1" 99% of the time and haven't used "computer 2" in weeks.

Then primary-eth is part of a bridge: primary-bridge.


However I booted up computer 1 now from a cold boot, and I found primary-eth was part of a BRIDGE called rename3.
I deleted the rename3 bridge.
`brctl delbr rename3`
Then I saw primary-eth automatically got added to primary-bridge.

Then I just had to run `ip link set primary-bridge up state up` and all was well.

Is there a way I can prevent this rename3 BRIDGE from appearing?

primary-eth on "Computer 1" is: 03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 11)

I found these bugs, in which an ETHERNET ADAPTER gets renamed to rename3
https://bugzilla.kernel.org/show_bug.cgi?id=107421
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1578141

But in my case it's not an ethernet adapter that's getting renamed. It's a BRIDGE called rename3 being CREATED.

Any ideas?
Comment 28 Heiner Kallweit 2019-03-10 14:28:06 UTC
(In reply to lopeonline+kernelbugzilla from comment #27)
> I experienced a variation of this issue now.
> I rename me r8169 ethernet adapter to primary-eth using udev rules. It's
> been working reliably, however now I booted up and primary-eth was there as
> usual, but it was not part of primary-bridge, as it is supposed to be.
> 
> The kernel had somehow created an additional bridge called "rename3" and
> added the renamed r8169 adapter primary-eth to the "rename3" bridge.
> 
This is a totally different question und not related to this issue at all.
Comment 29 lopeonline+kernelbugzilla 2019-03-10 15:21:46 UTC
My issue was likely caused by a race condition where udev probably tried to rename my eth adapter and br0 (because it takes on the MAC of the eth) to primary-bridge.
So I've cleaned up my udev rules and hopefully it won't happen again.

@Heiner Yes, you're right. It's a different issue.

Mods feel free to delete my 2 posts in this particular bug.

The reason I assumed my issue is related is because of some vague similarities
* My eth uses the same kernel driver as in this bug
* I've had issues with this eth driver for suspend/resume
* An unwanted bridge got created for me, named rename3, in this bug the eth is named rename3. Different but vaguely similar.

Please accept my apology and remove my 2 posts here if possible.

My issue was likely caused by a race condition where udev probably tried to rename my eth adapter and br0 (because it takes on the MAC of the eth) to primary-bridge.
So I've cleaned up my udev rules and hopefully it won't happen again.
Comment 30 Renato Gallo 2023-01-17 11:22:28 UTC
*lspci|grep -i eth

*02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8125 2.5GbE Controller (rev 05)


*uname -a
*Linux ghost 6.2.0-rc3 #1 SMP PREEMPT_DYNAMIC Tue Jan 10 13:21:10 CET 2023 x86_64 GNU/Linux


*dmesg -T
*[Tue Jan 17 09:02:35 2023] r8169 0000:02:00.0 enp2s0: rtl_chipcmd_cond == 1 (loop: 100, delay: 100).
*[Tue Jan 17 09:02:35 2023] r8169 0000:02:00.0 enp2s0: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
*[Tue Jan 17 09:02:35 2023] r8169 0000:02:00.0 enp2s0: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
*[Tue Jan 17 09:02:35 2023] r8169 0000:02:00.0 enp2s0: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
*[Tue Jan 17 09:02:35 2023] r8169 0000:02:00.0 enp2s0: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
*[Tue Jan 17 09:02:35 2023] r8169 0000:02:00.0 enp2s0: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
*[Tue Jan 17 09:02:35 2023] r8169 0000:02:00.0 enp2s0: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
*[Tue Jan 17 09:02:35 2023] r8169 0000:02:00.0 enp2s0: rtl_mac_ocp_e00e_cond == 1 (loop: 10, delay: 1000).

.... etc
Comment 31 Renato Gallo 2023-01-23 13:58:27 UTC
Dear Mantainers,

I am having an issue with my ethernet card.
It works when the system boots but after around a couple of hours it disconnects.
I tried different ways to get it working without having to reboot but nothing else seemed to work.
Even rebooting doesn't solve the problem since again, after a couple of hours, it stops working again.
I have googled around and found that some people had this same problem on older kernels but no solution seemed to apply to this rc nor latest stable kernel versions.
I am probably missing something here.
The issue happened also with recent stable 6.1.7 and rc kernel versions.
I am actually testing the latest 6.2-rc4 version.

Following are some data I think might be useful but if you feel I neglected to give enough informations and you need more please just ask me.

Here some informations about my system :
uname -a
Linux ghost 6.2.0-rc4 #2 SMP PREEMPT_DYNAMIC Tue Jan 17 13:35:46 CET 2023 x86_64 GNU/Linux

gcc --version
gcc (Debian 12.2.0-14) 12.2.0
Copyright (C) 2022 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.


/usr/src# lspci|grep -i net
02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8125 2.5GbE Controller (rev 05)

description: Ethernet interface
product: RTL8125 2.5GbE Controller
vendor: Realtek Semiconductor Co., Ltd.
physical id: 0
bus info: pci@0000:02:00.0
logical name: enp2s0
version: ff
serial: b0:25:aa:49:a5:3a
size: 1Gbit/s
capacity: 1Gbit/s
width: 32 bits
clock: 66MHz
capabilities: bus_master vga_palette cap_list ethernet physical tp mii 10bt 10bt-fd 100bt 100bt-fd 1000bt-fd autonegotiation
configuration: autonegotiation=on broadcast=yes driver=r8169 driverversion=6.2.0-rc4 duplex=full firmware=rtl8125b-2_0.0.2 07/13/20 latency=255 link=yes maxlatency=255 mingnt=255 multicast=yes port=twisted pair speed=1Gbit/s



lsmod|grep r8169
r8169                 110592  0
mdio_devres            16384  1 r8169
libphy                200704  3 r8169,mdio_devres,realtek

the firmware version I am using is linux-firmware-20221214.tar.gz


Here you can find what happens (dmesg -wT)
[Fri Jan 20 11:04:32 2023] userif-3: sent link up event.

[Fri Jan 20 13:19:41 2023] r8169 0000:02:00.0 enp2s0: rtl_chipcmd_cond == 1 (loop: 100, delay: 100).
[Fri Jan 20 13:19:41 2023] r8169 0000:02:00.0 enp2s0: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
[Fri Jan 20 13:19:41 2023] r8169 0000:02:00.0 enp2s0: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
[Fri Jan 20 13:19:41 2023] r8169 0000:02:00.0 enp2s0: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
[Fri Jan 20 13:19:41 2023] r8169 0000:02:00.0 enp2s0: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
[Fri Jan 20 13:19:41 2023] r8169 0000:02:00.0 enp2s0: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
[Fri Jan 20 13:19:41 2023] r8169 0000:02:00.0 enp2s0: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
[Fri Jan 20 13:19:41 2023] r8169 0000:02:00.0 enp2s0: rtl_mac_ocp_e00e_cond == 1 (loop: 10, delay: 1000).
[Fri Jan 20 13:20:17 2023] r8169 0000:02:00.0 enp2s0: rtl_chipcmd_cond == 1 (loop: 100, delay: 100).
[Fri Jan 20 13:20:17 2023] r8169 0000:02:00.0 enp2s0: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
[Fri Jan 20 13:20:17 2023] r8169 0000:02:00.0 enp2s0: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
[Fri Jan 20 13:20:17 2023] r8169 0000:02:00.0 enp2s0: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
[Fri Jan 20 13:20:17 2023] r8169 0000:02:00.0 enp2s0: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
[Fri Jan 20 13:20:17 2023] r8169 0000:02:00.0 enp2s0: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
[Fri Jan 20 13:20:17 2023] r8169 0000:02:00.0 enp2s0: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
[Fri Jan 20 13:20:18 2023] r8169 0000:02:00.0 enp2s0: rtl_mac_ocp_e00e_cond == 1 (loop: 10, delay: 1000).

I would love to provide a patch of any kind but I am afraid I don't have enough programming skills.

Thanks in advance for your time.
Comment 32 Stephen Hemminger 2023-01-23 16:38:18 UTC
There have been reports of issues on the Realtek chip on some platforms due to buggy implementation of power saving. Look at netdev mailing list history for many threads on r8169.
Comment 33 Heiner Kallweit 2023-01-24 06:46:35 UTC
(In reply to Renato Gallo from comment #31)
> Dear Mantainers,
> 
> I am having an issue with my ethernet card.

You received an answer to the same question on the netdev mailing list already.
https://lore.kernel.org/netdev/37b1001d-688c-fa35-0d8a-cbbbae5e6fa8@gmail.com/T/
Why duplicate the communication?
Comment 34 eharastasan 2023-03-21 11:36:32 UTC
There is an issue with power saving for r8169 as Stephen Hemminger mentioned before. 

A quick fix can be to disable the power saving mode from "/etc/default/grub" by adding "igb.EEE=0" to the variable "GRUB_CMDLINE_LINUX_DEFAULT" as follows:

...
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash igb.EEE=0"
...
Comment 35 Heiner Kallweit 2023-03-21 11:58:49 UTC
(In reply to https://github.com/emidev98 from comment #34)
> There is an issue with power saving for r8169 as Stephen Hemminger mentioned
> before. 
> 
> A quick fix can be to disable the power saving mode from "/etc/default/grub"
> by adding "igb.EEE=0" to the variable "GRUB_CMDLINE_LINUX_DEFAULT" as
> follows:
> 
> ...
> GRUB_CMDLINE_LINUX_DEFAULT="quiet splash igb.EEE=0"
> ...

This doesn't make sense. As the name states the igb.EEE parameter is for the igb driver. If this helps for you then the issue is not with r8169.
Comment 36 Thomas Groeger 2023-05-06 07:15:19 UTC
(In reply to eharastasan from comment #34)
> There is an issue with power saving for r8169 as Stephen Hemminger mentioned
> before. 
> 
> A quick fix can be to disable the power saving mode from "/etc/default/grub"
> by adding "igb.EEE=0" to the variable "GRUB_CMDLINE_LINUX_DEFAULT" as
> follows:
> 
> ...
> GRUB_CMDLINE_LINUX_DEFAULT="quiet splash igb.EEE=0"
> ...

I had the same issue (network driver crashes after a while) with kernel 5.x and 6.x on debian 11 with proxmox.

But I could resolve the problem disable ASPM:
Editing /etc/default/grub with:

GRUB_CMDLINE_LINUX="pcie_aspm=off pcie_port_pm=off"

grub-update and reboot. After this it works fine for me.
Comment 37 Heiner Kallweit 2023-05-06 10:58:16 UTC
That's something you can do also during runtime w/o disabling ASPM for all devices. Just use the standard sysfs attributes under /sys/class/net/<if>/device/link.
Comment 38 arbitraryadirc 2023-07-23 04:14:39 UTC
(In reply to Heiner Kallweit from comment #37)
> That's something you can do also during runtime w/o disabling ASPM for all
> devices. Just use the standard sysfs attributes under
> /sys/class/net/<if>/device/link.

Could you walk a total beginner through how to do this?
Comment 39 Heiner Kallweit 2023-07-25 19:04:15 UTC
(In reply to arbitraryadirc from comment #38)
> (In reply to Heiner Kallweit from comment #37)
> > That's something you can do also during runtime w/o disabling ASPM for all
> > devices. Just use the standard sysfs attributes under
> > /sys/class/net/<if>/device/link.
> 
> Could you walk a total beginner through how to do this?

See documentation:
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/Documentation/ABI/testing/sysfs-bus-pci?h=next-20230725

Start with disabling the deepest power saving state: L1.2
Comment 40 Kai Hillmann 2023-08-12 09:38:08 UTC
Hello together,

I stumpled over the same problem I guess - one of the two realtek 8169 nics of my system often go down after some time and the only possible fix is a reboot. It is not allways the same of the two nics. This happens especially after upgrading from proxmox 7 to proxmox 8 (which means upgrade from debian bullseye to debian bookworm) before that I did not really recognized that behavior. It might be there before but definitely occurs a lot less often than now (at least daily/every second day). Especially on a little longer and higher (compared to idle) network load (~ 9-10 MByte/s but far far away from max theoretical or practical linkspeed ~ 115 MByte/s)

@Heiner Kallweit regarding 

"Start with disabling the deepest power saving state: L1.2"

I wanted to try that but the corresponding pathes e.g. /sys/bus/pci/devices/0000\:00\:1d.0/link/  are empty - so I currently do not know whether I can disable that? I read it might be not possible because of the bios settings but when I look into lscpi -vvv it seems to be available?!

02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 15)
	Subsystem: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 19
	Region 0: I/O ports at 3000 [size=256]
	Region 2: Memory at 9c204000 (64-bit, non-prefetchable) [size=4K]
	Region 4: Memory at 9c200000 (64-bit, non-prefetchable) [size=16K]
	Capabilities: [40] Power Management version 3
		Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
		Address: 0000000000000000  Data: 0000
	Capabilities: [70] Express (v2) Endpoint, MSI 01
		DevCap:	MaxPayload 128 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
			ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 10W
		DevCtl:	CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
			RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop-
			MaxPayload 128 bytes, MaxReadReq 4096 bytes
		DevSta:	CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
		LnkCap:	Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s unlimited, L1 <64us
			ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
		LnkCtl:	ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s, Width x1
			TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Range ABCD, TimeoutDis+ NROPrPrP- LTR+
			 10BitTagComp- 10BitTagReq- OBFF Via message/WAKE#, ExtFmt- EETLPPrefix-
			 EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
			 FRS- TPHComp- ExtTPHComp-
			 AtomicOpsCap: 32bit- 64bit- 128bitCAS-
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR+ 10BitTagReq- OBFF Disabled,
			 AtomicOpsCtl: ReqEn-
		LnkCap2: Supported Link Speeds: 2.5GT/s, Crosslink- Retimer- 2Retimers- DRS-
		LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
			 EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
			 Retimer- 2Retimers- CrosslinkRes: unsupported
	Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
		Vector table: BAR=4 offset=00000000
		PBA: BAR=4 offset=00000800
	Capabilities: [100 v2] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
		AERCap:	First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
			MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
		HeaderLog: 00000000 00000000 00000000 00000000
	Capabilities: [140 v1] Virtual Channel
		Caps:	LPEVC=0 RefClk=100ns PATEntryBits=1
		Arb:	Fixed- WRR32- WRR64- WRR128-
		Ctrl:	ArbSelect=Fixed
		Status:	InProgress-
		VC0:	Caps:	PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
			Arb:	Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
			Ctrl:	Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
			Status:	NegoPending- InProgress-
	Capabilities: [160 v1] Device Serial Number redacted serial no
	Capabilities: [170 v1] Latency Tolerance Reporting
		Max snoop latency: 3145728ns
		Max no snoop latency: 3145728ns
	Capabilities: [178 v1] L1 PM Substates
		L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
			  PortCommonModeRestoreTime=150us PortTPowerOnTime=150us
		L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
			   T_CommonMode=0us LTR1.2_Threshold=81920ns
		L1SubCtl2: T_PwrOn=150us
	Kernel driver in use: r8169
	Kernel modules: r8169


I experimented a bit with 

echo "performance" > /sys/module/pcie_aspm/parameters/policy

which seems to improve the situation a bit (but also with negative measurable power consumption effects), but I'm not yet sure whether this really solves it.

dmesg | grep r8169:

[    1.398724] r8169 0000:01:00.0 eth0: RTL8168h/8111h, redacted mac, XID 541, IRQ 127
[    1.398728] r8169 0000:01:00.0 eth0: jumbo features [frames: 9194 bytes, tx checksumming: ko]
[    1.418757] r8169 0000:02:00.0 eth1: RTL8168h/8111h, redacted mac, XID 541, IRQ 128
[    1.418761] r8169 0000:02:00.0 eth1: jumbo features [frames: 9194 bytes, tx checksumming: ko]
[    1.490168] r8169 0000:02:00.0 enp2s0: renamed from eth1
[    1.773791] r8169 0000:01:00.0 enp1s0: renamed from eth0
[    5.497016] Generic FE-GE Realtek PHY r8169-0-100:00: attached PHY driver (mii_bus:phy_addr=r8169-0-100:00, irq=MAC)
[    5.701229] r8169 0000:01:00.0 enp1s0: Link is Down
[    5.761156] Generic FE-GE Realtek PHY r8169-0-200:00: attached PHY driver (mii_bus:phy_addr=r8169-0-200:00, irq=MAC)
[    5.937429] r8169 0000:02:00.0 enp2s0: Link is Down
[    8.774445] r8169 0000:01:00.0 enp1s0: Link is Up - 1Gbps/Full - flow control off
[    9.358071] r8169 0000:02:00.0 enp2s0: Link is Up - 1Gbps/Full - flow control off

lspci -t -v (stripped down to realtek + parent)
-[0000:00]-+-00.0  Intel Corporation Comet Lake-U v1 4c Host Bridge/DRAM Controller
           +-1d.0-[01]----00.0  Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
           +-1d.3-[02]----00.0  Realtek Semiconductor Co., Ltd. 

Actually for power consumption reasons I did not test the suggestion:

GRUB_CMDLINE_LINUX="pcie_aspm=off pcie_port_pm=off"

as of now.

Is there something that has changed between kernel 5.15 (which has been used in proxmox 7) and kernel 6.2 in proxmox 8 which could explain it?

Any other ideas what could fix the situation other than going back to older kernel and/or completely disable the aspm with above kernel cmdline?

I also stumpled over the following dmesg line:

[    0.403207] MMIO Stale Data CPU bug present and SMT on, data leak possible. See https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/processor_mmio_stale_data.html for more details.

Could this influence the r8169 driver problem here?

Please let me know whether I could help you with providing some more information which helps finding/resolving these mysterious issue.
Comment 41 Kai Hillmann 2023-08-12 10:27:07 UTC
(In reply to Kai Hillmann from comment #40)
> Hello together,
> 
> I stumpled over the same problem I guess - one of the two realtek 8169 nics
> of my system often go down after some time and the only possible fix is a
> reboot. It is not allways the same of the two nics. This happens especially
> after upgrading from proxmox 7 to proxmox 8 (which means upgrade from debian
> bullseye to debian bookworm) before that I did not really recognized that
> behavior. It might be there before but definitely occurs a lot less often
> than now (at least daily/every second day). Especially on a little longer
> and higher (compared to idle) network load (~ 9-10 MByte/s but far far away
> from max theoretical or practical linkspeed ~ 115 MByte/s)
> 
> @Heiner Kallweit regarding 
> 
> "Start with disabling the deepest power saving state: L1.2"
> 
> I wanted to try that but the corresponding pathes e.g.
> /sys/bus/pci/devices/0000\:00\:1d.0/link/  are empty - so I currently do not
> know whether I can disable that? I read it might be not possible because of
> the bios settings but when I look into lscpi -vvv it seems to be available?!
> 
> 02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
> RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 15)
>       Subsystem: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI
> Express
> Gigabit Ethernet Controller
>       Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
> Stepping- SERR- FastB2B- DisINTx+
>       Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
> <MAbort- >SERR- <PERR- INTx-
>       Latency: 0, Cache Line Size: 64 bytes
>       Interrupt: pin A routed to IRQ 19
>       Region 0: I/O ports at 3000 [size=256]
>       Region 2: Memory at 9c204000 (64-bit, non-prefetchable) [size=4K]
>       Region 4: Memory at 9c200000 (64-bit, non-prefetchable) [size=16K]
>       Capabilities: [40] Power Management version 3
>               Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA
> PME(D0+,D1+,D2+,D3hot+,D3cold+)
>               Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
>       Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
>               Address: 0000000000000000  Data: 0000
>       Capabilities: [70] Express (v2) Endpoint, MSI 01
>               DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <512ns,
> L1 <64us
>                       ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
> SlotPowerLimit 10W
>               DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
>                       RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop-
>                       MaxPayload 128 bytes, MaxReadReq 4096 bytes
>               DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr+
> TransPend-
>               LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit
> Latency L0s
> unlimited, L1 <64us
>                       ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
>               LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
>                       ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
>               LnkSta: Speed 2.5GT/s, Width x1
>                       TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
>               DevCap2: Completion Timeout: Range ABCD, TimeoutDis+ NROPrPrP-
> LTR+
>                        10BitTagComp- 10BitTagReq- OBFF Via message/WAKE#,
> ExtFmt- EETLPPrefix-
>                        EmergencyPowerReduction Not Supported,
> EmergencyPowerReductionInit-
>                        FRS- TPHComp- ExtTPHComp-
>                        AtomicOpsCap: 32bit- 64bit- 128bitCAS-
>               DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR+
> 10BitTagReq-
> OBFF Disabled,
>                        AtomicOpsCtl: ReqEn-
>               LnkCap2: Supported Link Speeds: 2.5GT/s, Crosslink- Retimer-
> 2Retimers-
> DRS-
>               LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
>                        Transmit Margin: Normal Operating Range,
> EnterModifiedCompliance-
> ComplianceSOS-
>                        Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB
> preshoot
>               LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-
> EqualizationPhase1-
>                        EqualizationPhase2- EqualizationPhase3-
> LinkEqualizationRequest-
>                        Retimer- 2Retimers- CrosslinkRes: unsupported
>       Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
>               Vector table: BAR=4 offset=00000000
>               PBA: BAR=4 offset=00000800
>       Capabilities: [100 v2] Advanced Error Reporting
>               UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
> RxOF- MalfTLP-
> ECRC- UnsupReq- ACSViol-
>               UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
> RxOF- MalfTLP-
> ECRC- UnsupReq- ACSViol-
>               UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt-
> RxOF+ MalfTLP+
> ECRC- UnsupReq- ACSViol-
>               CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout-
> AdvNonFatalErr-
>               CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout-
> AdvNonFatalErr+
>               AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn-
> ECRCChkCap+
> ECRCChkEn-
>                       MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
>               HeaderLog: 00000000 00000000 00000000 00000000
>       Capabilities: [140 v1] Virtual Channel
>               Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
>               Arb:    Fixed- WRR32- WRR64- WRR128-
>               Ctrl:   ArbSelect=Fixed
>               Status: InProgress-
>               VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
>                       Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
>                       Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
>                       Status: NegoPending- InProgress-
>       Capabilities: [160 v1] Device Serial Number redacted serial no
>       Capabilities: [170 v1] Latency Tolerance Reporting
>               Max snoop latency: 3145728ns
>               Max no snoop latency: 3145728ns
>       Capabilities: [178 v1] L1 PM Substates
>               L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
> L1_PM_Substates+
>                         PortCommonModeRestoreTime=150us
> PortTPowerOnTime=150us
>               L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
>                          T_CommonMode=0us LTR1.2_Threshold=81920ns
>               L1SubCtl2: T_PwrOn=150us
>       Kernel driver in use: r8169
>       Kernel modules: r8169
> 
> 
> I experimented a bit with 
> 
> echo "performance" > /sys/module/pcie_aspm/parameters/policy
> 
> which seems to improve the situation a bit (but also with negative
> measurable power consumption effects), but I'm not yet sure whether this
> really solves it.
> 
> dmesg | grep r8169:
> 
> [    1.398724] r8169 0000:01:00.0 eth0: RTL8168h/8111h, redacted mac, XID
> 541, IRQ 127
> [    1.398728] r8169 0000:01:00.0 eth0: jumbo features [frames: 9194 bytes,
> tx checksumming: ko]
> [    1.418757] r8169 0000:02:00.0 eth1: RTL8168h/8111h, redacted mac, XID
> 541, IRQ 128
> [    1.418761] r8169 0000:02:00.0 eth1: jumbo features [frames: 9194 bytes,
> tx checksumming: ko]
> [    1.490168] r8169 0000:02:00.0 enp2s0: renamed from eth1
> [    1.773791] r8169 0000:01:00.0 enp1s0: renamed from eth0
> [    5.497016] Generic FE-GE Realtek PHY r8169-0-100:00: attached PHY driver
> (mii_bus:phy_addr=r8169-0-100:00, irq=MAC)
> [    5.701229] r8169 0000:01:00.0 enp1s0: Link is Down
> [    5.761156] Generic FE-GE Realtek PHY r8169-0-200:00: attached PHY driver
> (mii_bus:phy_addr=r8169-0-200:00, irq=MAC)
> [    5.937429] r8169 0000:02:00.0 enp2s0: Link is Down
> [    8.774445] r8169 0000:01:00.0 enp1s0: Link is Up - 1Gbps/Full - flow
> control off
> [    9.358071] r8169 0000:02:00.0 enp2s0: Link is Up - 1Gbps/Full - flow
> control off
> 
> lspci -t -v (stripped down to realtek + parent)
> -[0000:00]-+-00.0  Intel Corporation Comet Lake-U v1 4c Host Bridge/DRAM
> Controller
>            +-1d.0-[01]----00.0  Realtek Semiconductor Co., Ltd.
> RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
>            +-1d.3-[02]----00.0  Realtek Semiconductor Co., Ltd. 
> 
> Actually for power consumption reasons I did not test the suggestion:
> 
> GRUB_CMDLINE_LINUX="pcie_aspm=off pcie_port_pm=off"
> 
> as of now.
> 
> Is there something that has changed between kernel 5.15 (which has been used
> in proxmox 7) and kernel 6.2 in proxmox 8 which could explain it?
> 
> Any other ideas what could fix the situation other than going back to older
> kernel and/or completely disable the aspm with above kernel cmdline?
> 
> I also stumpled over the following dmesg line:
> 
> [    0.403207] MMIO Stale Data CPU bug present and SMT on, data leak
> possible. See
> https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/
> processor_mmio_stale_data.html for more details.
> 
> Could this influence the r8169 driver problem here?
> 
> Please let me know whether I could help you with providing some more
> information which helps finding/resolving these mysterious issue.
I also found a kernel trace before the typical 

2023-08-01T23:10:18.072470+02:00 pve kernel: [199380.810315] r8169 0000:02:00.0 enp2s0: rtl_chipcmd_cond == 1 (loop: 100, delay: 100).

line starts in my logs:

2023-08-01T23:10:18.036574+02:00 pve kernel: [199380.778029] ------------[ cut here ]------------
2023-08-01T23:10:18.036609+02:00 pve kernel: [199380.778046] NETDEV WATCHDOG: enp2s0 (r8169): transmit queue 0 timed out
2023-08-01T23:10:18.036614+02:00 pve kernel: [199380.778082] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:525 dev_watchdog+0x23a/0x250
2023-08-01T23:10:18.036940+02:00 pve kernel: [199380.778103] Modules linked in: tcp_diag inet_diag veth ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables scsi_transport_iscsi msr iptable_filter bpfilter softdog bonding tls sunrpc nfnetlink_log nfnetlink binfmt_misc snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_sof_pci_intel_cnl snd_sof_intel_hda_common soundwire_intel soundwire_generic_allocation soundwire_cadence snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_sof_utils snd_soc_hdac_hda snd_hda_ext_core snd_soc_acpi_intel_match intel_rapl_msr snd_soc_acpi intel_rapl_common intel_tcc_cooling soundwire_bus iwlmvm x86_pkg_temp_thermal intel_powerclamp coretemp snd_soc_core snd_compress mac80211 kvm_intel ac97_bus i915 snd_pcm_dmaengine libarc4 kvm drm_buddy snd_hda_intel ttm irqbypass snd_intel_dspcfg snd_intel_sdw_acpi drm_display_helper crct10dif_pclmul snd_hda_codec polyval_clmulni polyval_generic ghash_clmulni_intel cec sha512_ssse3 snd_hda_core btusb
2023-08-01T23:10:18.037392+02:00 pve kernel: [199380.778501]  rc_core snd_hwdep aesni_intel btrtl mei_pxp mei_hdcp btbcm snd_pcm crypto_simd drm_kms_helper btintel cryptd cmdlinepart btmtk iwlwifi i2c_algo_bit spi_nor rapl mei_me snd_timer syscopyarea sysfillrect snd bluetooth ecdh_generic intel_cstate soundcore sysimgblt wmi_bmof pcspkr ecc mtd mei ee1004 cfg80211 intel_pch_thermal acpi_pad acpi_tad mac_hid zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) vhost_net vhost vhost_iotlb tap efi_pstore drm dmi_sysfs ip_tables x_tables autofs4 btrfs blake2b_generic xor raid6_pq simplefb ums_realtek dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c uas usb_storage spi_intel_pci crc32_pclmul xhci_pci spi_intel xhci_pci_renesas i2c_i801 i2c_smbus ahci r8169 realtek xhci_hcd libahci video wmi pinctrl_cannonlake
2023-08-01T23:10:18.037408+02:00 pve kernel: [199380.778968] CPU: 0 PID: 0 Comm: swapper/0 Tainted: P           O       6.2.16-5-pve #1
2023-08-01T23:10:18.037410+02:00 pve kernel: [199380.778978] Hardware name: ZOTAC ZBOX-CI622/CI642/CI662NANO/ZBOX-CI622/CI642/CI662NANO, BIOS B418P108 07/08/2021
2023-08-01T23:10:18.037429+02:00 pve kernel: [199380.778984] RIP: 0010:dev_watchdog+0x23a/0x250
2023-08-01T23:10:18.037431+02:00 pve kernel: [199380.779002] Code: 00 e9 2b ff ff ff 48 89 df c6 05 8b 6c 7d 01 01 e8 6b 08 f8 ff 44 89 f1 48 89 de 48 c7 c7 98 65 80 a0 48 89 c2 e8 86 a6 30 ff <0f> 0b e9 1c ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00
2023-08-01T23:10:18.037452+02:00 pve kernel: [199380.779011] RSP: 0018:ffffacafc0003e38 EFLAGS: 00010246
2023-08-01T23:10:18.037453+02:00 pve kernel: [199380.779025] RAX: 0000000000000000 RBX: ffff9f3f9261c000 RCX: 0000000000000000
2023-08-01T23:10:18.037459+02:00 pve kernel: [199380.779032] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
2023-08-01T23:10:18.037465+02:00 pve kernel: [199380.779038] RBP: ffffacafc0003e68 R08: 0000000000000000 R09: 0000000000000000
2023-08-01T23:10:18.037492+02:00 pve kernel: [199380.779045] R10: 0000000000000000 R11: 0000000000000000 R12: ffff9f3f9261c4c8
2023-08-01T23:10:18.037495+02:00 pve kernel: [199380.779052] R13: ffff9f3f9261c41c R14: 0000000000000000 R15: 0000000000000000
2023-08-01T23:10:18.037496+02:00 pve kernel: [199380.779058] FS:  0000000000000000(0000) GS:ffff9f4ea2e00000(0000) knlGS:0000000000000000
2023-08-01T23:10:18.037498+02:00 pve kernel: [199380.779065] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
2023-08-01T23:10:18.037505+02:00 pve kernel: [199380.779073] CR2: 00007fe339a08ed7 CR3: 0000000d20610002 CR4: 00000000003726f0
2023-08-01T23:10:18.037506+02:00 pve kernel: [199380.779080] Call Trace:
2023-08-01T23:10:18.037512+02:00 pve kernel: [199380.779086]  <IRQ>
2023-08-01T23:10:18.037537+02:00 pve kernel: [199380.779096]  ? __pfx_dev_watchdog+0x10/0x10
2023-08-01T23:10:18.037543+02:00 pve kernel: [199380.779112]  call_timer_fn+0x29/0x160
2023-08-01T23:10:18.037569+02:00 pve kernel: [199380.779125]  ? __pfx_dev_watchdog+0x10/0x10
2023-08-01T23:10:18.037584+02:00 pve kernel: [199380.779138]  __run_timers+0x259/0x310
2023-08-01T23:10:18.037591+02:00 pve kernel: [199380.779152]  run_timer_softirq+0x1d/0x40
2023-08-01T23:10:18.037594+02:00 pve kernel: [199380.779161]  __do_softirq+0xd6/0x346
2023-08-01T23:10:18.037609+02:00 pve kernel: [199380.779170]  ? hrtimer_interrupt+0x11f/0x250
2023-08-01T23:10:18.037630+02:00 pve kernel: [199380.779187]  __irq_exit_rcu+0xa2/0xd0
2023-08-01T23:10:18.037634+02:00 pve kernel: [199380.779198]  irq_exit_rcu+0xe/0x20
2023-08-01T23:10:18.037641+02:00 pve kernel: [199380.779207]  sysvec_apic_timer_interrupt+0x92/0xd0
2023-08-01T23:10:18.037650+02:00 pve kernel: [199380.779220]  </IRQ>
2023-08-01T23:10:18.037651+02:00 pve kernel: [199380.779226]  <TASK>
2023-08-01T23:10:18.037671+02:00 pve kernel: [199380.779233]  asm_sysvec_apic_timer_interrupt+0x1b/0x20
2023-08-01T23:10:18.037679+02:00 pve kernel: [199380.779244] RIP: 0010:cpuidle_enter_state+0xde/0x6f0
2023-08-01T23:10:18.037686+02:00 pve kernel: [199380.779256] Code: 27 57 60 e8 d4 79 4a ff 8b 53 04 49 89 c7 0f 1f 44 00 00 31 ff e8 02 82 49 ff 80 7d d0 00 0f 85 eb 00 00 00 fb 0f 1f 44 00 00 <45> 85 f6 0f 88 12 02 00 00 4d 63 ee 49 83 fd 09 0f 87 c7 04 00 00
2023-08-01T23:10:18.037692+02:00 pve kernel: [199380.779264] RSP: 0018:ffffffffa1003da8 EFLAGS: 00000246
2023-08-01T23:10:18.037709+02:00 pve kernel: [199380.779275] RAX: 0000000000000000 RBX: ffffccafbfc00000 RCX: 0000000000000000
2023-08-01T23:10:18.037711+02:00 pve kernel: [199380.779283] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
2023-08-01T23:10:18.037719+02:00 pve kernel: [199380.779288] RBP: ffffffffa1003df8 R08: 0000000000000000 R09: 0000000000000000
2023-08-01T23:10:18.037720+02:00 pve kernel: [199380.779294] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffa12c33e0
2023-08-01T23:10:18.037725+02:00 pve kernel: [199380.779300] R13: 0000000000000008 R14: 0000000000000008 R15: 0000b555f472d0f6
2023-08-01T23:10:18.037748+02:00 pve kernel: [199380.779312]  ? cpuidle_enter_state+0xce/0x6f0
2023-08-01T23:10:18.037750+02:00 pve kernel: [199380.779321]  cpuidle_enter+0x2e/0x50
2023-08-01T23:10:18.037768+02:00 pve kernel: [199380.779329]  do_idle+0x216/0x2a0
2023-08-01T23:10:18.037781+02:00 pve kernel: [199380.779341]  cpu_startup_entry+0x1d/0x20
2023-08-01T23:10:18.037790+02:00 pve kernel: [199380.779350]  rest_init+0xdc/0x100
2023-08-01T23:10:18.037792+02:00 pve kernel: [199380.779361]  ? acpi_enable_subsystem+0xe6/0x2a0
2023-08-01T23:10:18.037810+02:00 pve kernel: [199380.779372]  arch_call_rest_init+0xe/0x30
2023-08-01T23:10:18.037817+02:00 pve kernel: [199380.779385]  start_kernel+0x6b0/0xb80
2023-08-01T23:10:18.037824+02:00 pve kernel: [199380.779395]  ? load_ucode_intel_bsp+0x3d/0x80
2023-08-01T23:10:18.037845+02:00 pve kernel: [199380.779408]  x86_64_start_kernel+0x102/0x180
2023-08-01T23:10:18.037863+02:00 pve kernel: [199380.779419]  secondary_startup_64_no_verify+0xe5/0xeb
2023-08-01T23:10:18.037877+02:00 pve kernel: [199380.779435]  </TASK>
2023-08-01T23:10:18.037879+02:00 pve kernel: [199380.779440] ---[ end trace 0000000000000000 ]---

I hope this helps to find the cause of this behavior.
Comment 42 Heiner Kallweit 2023-08-12 12:47:10 UTC
Please note that vendor kernels aren't supported here. Always test with a (best self-compiled) mainline kernel. Your report doesn't even mentioned the affected kernel version.

If the ASPM sysfs attributes aren't visible, most likely reason is that BIOS claims exclusive access to ASPM settings. You can override this with pcie_aspm=force. However, according to the lspci output, ASPM is disabled anyway.

If the issue doesn't occur with a previous kernel version, please bisect.
Comment 43 Kai Hillmann 2023-08-12 14:09:23 UTC
(In reply to Heiner Kallweit from comment #42)
> Please note that vendor kernels aren't supported here. Always test with a
> (best self-compiled) mainline kernel. Your report doesn't even mentioned the
> affected kernel version.
> 
> If the ASPM sysfs attributes aren't visible, most likely reason is that BIOS
> claims exclusive access to ASPM settings. You can override this with
> pcie_aspm=force. However, according to the lspci output, ASPM is disabled
> anyway.
> 
> If the issue doesn't occur with a previous kernel version, please bisect.

Full ack - you're right - sorry for pasting it here - I should first had a look directly within proxmox - just for reference the thread regarding this problem is here:

https://forum.proxmox.com/threads/system-hanging-after-upgrade-nic-driver.129366 

I'll try to test it against a mainline kernel instead of my current problematic proxmox one: proxmox-kernel-6.2.16-6-pve

I'm not very familar yet with bisecting that but I'll have a look whether I can trace the behaviour change down to a change between pve-kernel-5.15.108-1-pve and the current proxmox-kernel-6.2.16-6-pve.
Comment 44 Larry Chiang 2023-08-29 03:06:05 UTC
Dear all,

  I found it might realtek driver entering into power saving mode.

  My ubuntu version is 22.04 and kernel 6.2

  My solution is download driver from realtek web:

https://www.realtek.com/zh-tw/component/zoo/category/pci-8169-8110

https://www.realtek.com/zh-tw/directly-download?downloadid=28907f3d6ddbf32a2041c1ceadcace96

  Please follow the "readme", soft remind it's only suitable for kernel 5. If you're using kernel 6, please modify netif_napi_add to netif_napi_add_weight, for detail please reference to https://lore.kernel.org/netdev/20221002175650.1491124-1-kuba@kernel.org/t/

  Thank you

Best Regards,
Larry from Arcadyan