Bug 198567 - r8169: NIC unusable for about 2 mins after reboot
Summary: r8169: NIC unusable for about 2 mins after reboot
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: Network (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: drivers_network@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-01-23 20:44 UTC by Armin K.
Modified: 2020-04-23 13:19 UTC (History)
3 users (show)

See Also:
Kernel Version: 4.14.14
Subsystem:
Regression: No
Bisected commit-id:


Attachments
kernel trace (5.02 KB, text/plain)
2018-09-28 09:37 UTC, Mario van der Linde
Details
system info (8.74 KB, text/plain)
2018-09-28 09:38 UTC, Mario van der Linde
Details

Description Armin K. 2018-01-23 20:44:25 UTC
After rebooting Windows or Linux, and booting into Linux, the network card is unusable for about 2-3 minutes and dhcp client can't acquire any address. The error message in dmesg shows as following:

dhclient[1010]: DHCPREQUEST on eth0 to 255.255.255.255 port 67
dhclient[1010]: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 7
kernel: NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out
kernel: ------------[ cut here ]------------
kernel: WARNING: CPU: 2 PID: 0 at net/sched/sch_generic.c:320 dev_watchdog+0x215/0x220
kernel: Modules linked in:
kernel: CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.14.13-krejzi #1
kernel: Hardware name: HP HP ProBook 470 G3/8102, BIOS N78 Ver. 01.22 12/06/2017
kernel: task: ffff880237193900 task.stack: ffffc900000bc000
kernel: RIP: 0010:dev_watchdog+0x215/0x220
kernel: RSP: 0018:ffff880240503e70 EFLAGS: 00010282
kernel: RAX: 0000000000000039 RBX: 0000000000000000 RCX: 0000000000000103
kernel: RDX: 0000000000000000 RSI: ffffffff8254b8ec RDI: 00000000ffffffff
kernel: RBP: ffff880240503ea0 R08: 0000000000000399 R09: 0000000000000003
kernel: R10: ffff880240503ee8 R11: 0000000000000001 R12: ffff880235c1d080
kernel: R13: 0000000000000002 R14: ffff8802369c6000 R15: 0000000000000001
kernel: FS:  0000000000000000(0000) GS:ffff880240500000(0000) knlGS:0000000000000000
kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 00007f32e26f4000 CR3: 000000000260a004 CR4: 00000000003606e0
kernel: Call Trace:
kernel:  <IRQ>
kernel:  ? qdisc_rcu_free+0x40/0x40
kernel:  call_timer_fn.isra.27+0x17/0x80
kernel:  ? qdisc_rcu_free+0x40/0x40
kernel:  run_timer_softirq+0x360/0x3b0
kernel:  ? ktime_get+0x3b/0xa0
kernel:  ? clockevents_program_event+0xc8/0x100
kernel:  __do_softirq+0xd5/0x1e2
kernel:  irq_exit+0xa0/0xb0
kernel:  smp_apic_timer_interrupt+0x5d/0x90
kernel:  apic_timer_interrupt+0x9c/0xb0
kernel:  </IRQ>
kernel: RIP: 0010:cpuidle_enter_state+0x13a/0x210
kernel: RSP: 0018:ffffc900000bfe80 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff10
kernel: RAX: ffff880240500000 RBX: ffff880240528600 RCX: 000000000000001f
kernel: RDX: 20c49ba5e353f7cf RSI: ffffffff824a1a5b RDI: ffffffff8248d85e
kernel: RBP: ffffc900000bfea8 R08: 53f6f738e00e7448 R09: 0000000000000010
kernel: R10: ffffc900000bfe58 R11: 0000000000000386 R12: 0000000000000004
kernel: R13: 0000001e458032c7 R14: 0000001e457163c5 R15: ffffffff8264ece0
kernel:  cpuidle_enter+0x12/0x20
kernel:  call_cpuidle+0x1e/0x30
kernel:  do_idle+0x17a/0x1b0
kernel:  cpu_startup_entry+0x6e/0x70
kernel:  start_secondary+0x176/0x1a0
kernel:  secondary_startup_64+0xa5/0xb0
kernel: Code: 63 8e 60 04 00 00 eb 95 4c 89 f7 c6 05 c6 1d db 00 01 e8 1f cb fd ff 89 d9 4c 89 f6 48 c7 c7 80 dd 55 82 48 89 c2 e8 07 8
kernel: ---[ end trace 22b76ffbadd53151 ]---
kernel: r8169 0000:02:00.0 eth0: link up

After the messages above appear in dmesg (about 2-3 minutes after boot has been completed), connection is established and no further issues happen. The issue does not happen when I do a cold boot - power off then turn the laptop on, only when I reboot from either Windows or Linux.

The kernel is self-built, with r8169 and required firmware built into kernel.

02:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [10ec:8168] (rev 15)
	Subsystem: Hewlett-Packard Company RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [103c:8102]
	Flags: bus master, fast devsel, latency 0, IRQ 125
	I/O ports at 4000 [size=256]
	Memory at e2104000 (64-bit, non-prefetchable) [size=4K]
	Memory at e2100000 (64-bit, non-prefetchable) [size=16K]
	Capabilities: [40] Power Management version 3
	Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit+
	Capabilities: [70] Express Endpoint, MSI 01
	Capabilities: [b0] MSI-X: Enable- Count=4 Masked-
	Kernel driver in use: r8169

Let me know if you need more info.
Comment 1 Mario van der Linde 2018-09-28 09:36:55 UTC
It seems to me as if I am hit by the same Problem (see attached kernel trace) - but the symptoms are slightly different:

The systems (5 HP systems) boot up without remarkable problems. But on higher workload (the systems are used as desktop workstations for office, with user profiles located on a nfs share) they freeze unconditionally due to the stalling network driver.

You can find a short sum-up of the involved system as attachment.

This bug is really annoying. If we can support you in any way to fix this Bug please don't hesitate to ask. As we are not unfamilar with open-source, maybe we are able to test proposed patches or dig some deeper (if you can direct us).
Comment 2 Mario van der Linde 2018-09-28 09:37:43 UTC
Created attachment 278811 [details]
kernel trace
Comment 3 Mario van der Linde 2018-09-28 09:38:14 UTC
Created attachment 278813 [details]
system info
Comment 4 Heiner Kallweit 2018-10-28 23:05:53 UTC
Can both of you re-test with 4.19 ? Also relevant would be the exact chip version (dmesg line with the XID).
Comment 5 Mario van der Linde 2018-10-30 18:11:00 UTC
r8169 0000:03:00.0 eth0: RTL8168g/8111g, xx:xx:xx:xx:xx:xx, XID 50900800, IRQ 36

On 4.19.0 I could not reproduce the error even though I stressed the link for several hours.
Comment 6 Heiner Kallweit 2018-10-30 21:02:54 UTC
(In reply to Mario van der Linde from comment #5)
> r8169 0000:03:00.0 eth0: RTL8168g/8111g, xx:xx:xx:xx:xx:xx, XID 50900800,
> IRQ 36
> 
> On 4.19.0 I could not reproduce the error even though I stressed the link
> for several hours.

Thanks for testing and the chip version details. Good to hear that 4.19.0 is fine, maybe one of the recent fixes does fix the issue also for 4.14 once having been backported.
In your case the issue was on 4.14.13 according to the log, did you use any previous kernel version w/o the issue (in other words: is it a regression in 4.14.13)?
Comment 7 Mario van der Linde 2018-10-30 22:29:25 UTC
I think you mean Armin K., the initial bug opener? My trace was thrown on 4.18.

The hardware used in our office was running on OpenSUSE Leap 42.3 (Kernel 4.4) and had been updated to Leap 15 (Kernel 4.12). Both versions caused many problems but at this time I could not trace back to the network driver - I suspected the NFS server, network components like switches, etc., but the symptoms were strikingly similar (complete freezing of the desktop for several seconds to minutes, systems not pingable). Unfortunately, I was never there myself when these problems occurred, I only was informed by mail or phone and had to solve the problems remotely.
Due to several other inconveniences I decided to test Archlinux (kernel 4.18, now 4.19 (testing)) - only then I did notice the corresponding kerneltraces - shame on me ...

To make the long story short:
 * linux 4.4 (OpenSUSE Leap 42.3):   probably affected
 * linux 4.12 (OpenSUSE Leap 15.0):  probably affected
 * linux 4.18 (Archlinux):           affected
 * linux 4.19 (Archlinux (testing)): not affected
Comment 8 Heiner Kallweit 2018-10-30 22:47:38 UTC
Thanks for the update. Then 4.19 fixes the issues for both of you, and regarding the older kernel versions we would have to see once recent fixes have been backported. 4.12 however isn't an LTS version.
IMO this bug can be closed.
Comment 9 Jorge Ferreira 2020-04-09 16:36:34 UTC
This bug reappeared in 5.5.x versions - at least from that i tryed 5.5.3 because i use Fedora 31 and they started at that version.

If i boot the system to older that is on /boot (kernel-5.4.3) the problem does not appear.

This apened in 5.5.15 release, passing some time. This is an school server and this interface is connected to a fiber router.

[ 4504.278857] ------------[ cut here ]------------
[ 4504.278875] NETDEV WATCHDOG: enp11s0 (r8169): transmit queue 0 timed out
[ 4504.278902] WARNING: CPU: 6 PID: 0 at net/sched/sch_generic.c:442 dev_watchdog+0x248/0x250
[ 4504.278906] Modules linked in: tcp_diag udp_diag raw_diag inet_diag cmac md4 nls_utf8 cifs dns_resolver fscache libdes ppp_mppe ppp_generic slhc libarc4 nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_conntrack_netbios_ns nf_conntrack_broadcast lp parport nf_nat_ftp nf_conntrack_ftp usb_storage sky2 e1000e xt_CHECKSUM ebtable_filter ebtables ip6_tables bridge r8169 dummy bnep xt_MASQUERADE xt_nat iptable_nat nf_nat xt_connmark xt_mac xt_hashlimit xt_DSCP xt_length xt_dscp iptable_mangle nf_log_ipv4 nf_log_common xt_mark xt_recent xt_time xt_statistic xt_ipp2p(OE) compat_xtables(OE) xt_multiport 8021q garp mrp xt_geoip(OE) stp llc xt_conntrack nf_conntrack tun nf_defrag_ipv6 nf_defrag_ipv4 ipt_REJECT nf_reject_ipv4 xt_LOG iptable_filter nct6775 hwmon_vid xfs libcrc32c intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm btusb btrtl irqbypass btbcm btintel iTCO_wdt bluetooth iTCO_vendor_support crct10dif_pclmul crc32_pclmul eeepc_wmi raid0 asus_wmi
[ 4504.278957]  ghash_clmulni_intel ecdh_generic sparse_keymap raid1 pl2303 intel_cstate ecc rfkill intel_uncore intel_rapl_perf i2c_i801 wmi_bmof lpc_ich mei_me mei nfsd nfs_acl lockd auth_rpcgss grace sunrpc ip_tables i915 nouveau crc32c_intel serio_raw mxm_wmi ttm firewire_ohci i2c_algo_bit firewire_core drm_kms_helper crc_itu_t drm wmi video [last unloaded: e1000e]
[ 4504.278979] CPU: 6 PID: 0 Comm: swapper/6 Tainted: G           OE     5.5.15-200.fc31.x86_64 #1
[ 4504.278980] Hardware name: System manufacturer System Product Name/P8Z68 DELUXE, BIOS 3304 04/17/2012
[ 4504.278983] RIP: 0010:dev_watchdog+0x248/0x250
[ 4504.278986] Code: 85 c0 75 e5 eb 9f 4c 89 ef c6 05 dd ce f0 00 01 e8 3d cf fa ff 44 89 e1 4c 89 ee 48 c7 c7 f8 2f 44 9f 48 89 c2 e8 7a 59 7f ff <0f> 0b eb 80 0f 1f 40 00 66 66 66 66 90 41 57 41 56 49 89 d6 41 55
[ 4504.278988] RSP: 0018:ffffb02f001cce60 EFLAGS: 00010286
[ 4504.278990] RAX: 0000000000000000 RBX: ffff8de84f0e6c00 RCX: 000000000000083f
[ 4504.278991] RDX: 0000000000000000 RSI: 00000000000000f6 RDI: 000000000000083f
[ 4504.278992] RBP: ffff8de850dd645c R08: 00000000000007cd R09: 0000000000000000
[ 4504.278993] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
[ 4504.278994] R13: ffff8de850dd6000 R14: ffff8de850dd6480 R15: 0000000000000001
[ 4504.278997] FS:  0000000000000000(0000) GS:ffff8de857380000(0000) knlGS:0000000000000000
[ 4504.278998] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4504.278999] CR2: 00007f4eb1dd3cb0 CR3: 000000012d60a005 CR4: 00000000000606e0
[ 4504.279001] Call Trace:
[ 4504.279004]  <IRQ>
[ 4504.279011]  ? pfifo_fast_enqueue+0x150/0x150
[ 4504.279016]  call_timer_fn+0x2d/0x130
[ 4504.279019]  __run_timers.part.0+0x16f/0x260
[ 4504.279022]  ? tick_sched_handle+0x22/0x60
[ 4504.279025]  ? tick_sched_timer+0x38/0x80
[ 4504.279027]  ? tick_sched_do_timer+0x70/0x70
[ 4504.279029]  run_timer_softirq+0x26/0x50
[ 4504.279034]  __do_softirq+0xee/0x2ff
[ 4504.279037]  irq_exit+0xe9/0xf0
[ 4504.279040]  smp_apic_timer_interrupt+0x76/0x130
[ 4504.279043]  apic_timer_interrupt+0xf/0x20
[ 4504.279046]  </IRQ>
[ 4504.279051] RIP: 0010:cpuidle_enter_state+0xc6/0x3e0
[ 4504.279054] Code: 90 31 ff e8 3c e1 8e ff 80 7c 24 0f 00 74 17 9c 58 66 66 90 66 90 f6 c4 02 0f 85 ea 02 00 00 31 ff e8 6e 35 95 ff fb 66 66 90 <66> 66 90 45 85 ed 0f 88 40 02 00 00 49 63 d5 4c 2b 64 24 10 48 8d
[ 4504.279055] RSP: 0018:ffffb02f000b3e68 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
[ 4504.279057] RAX: ffff8de8573aae00 RBX: ffff8de8573b5f00 RCX: 000000000000001f
[ 4504.279058] RDX: 0000000000000000 RSI: 0000000025863596 RDI: 0000000000000000
[ 4504.279060] RBP: ffffffff9f74ef00 R08: 00000418bbf0cbe5 R09: 00000000000026d6
[ 4504.279061] R10: 0000000000000c8d R11: ffff8de8573a9be4 R12: 00000418bbf0cbe5
[ 4504.279062] R13: 0000000000000004 R14: 0000000000000004 R15: ffff8de85374a680
[ 4504.279068]  ? cpuidle_enter_state+0xa4/0x3e0
[ 4504.279071]  cpuidle_enter+0x29/0x40
[ 4504.279076]  do_idle+0x1e4/0x280
[ 4504.279078]  cpu_startup_entry+0x19/0x20
[ 4504.279082]  start_secondary+0x162/0x1b0
[ 4504.279086]  secondary_startup_64+0xb6/0xc0
[ 4504.279091] ---[ end trace 86e552420bb9831d ]---
Comment 10 Jorge Ferreira 2020-04-09 16:41:35 UTC
More info about the machine:

[root@mastergate ~]# lspci 
00:00.0 Host bridge: Intel Corporation 2nd Generation Core Processor Family DRAM Controller (rev 09)
00:01.0 PCI bridge: Intel Corporation Xeon E3-1200/2nd Generation Core Processor Family PCI Express Root Port (rev 09)
00:01.1 PCI bridge: Intel Corporation Xeon E3-1200/2nd Generation Core Processor Family PCI Express Root Port (rev 09)
00:02.0 Display controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09)
00:16.0 Communication controller: Intel Corporation 6 Series/C200 Series Chipset Family MEI Controller #1 (rev 04)
00:19.0 Ethernet controller: Intel Corporation 82579V Gigabit Network Connection (rev 05)
00:1a.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #2 (rev 05)
00:1b.0 Audio device: Intel Corporation 6 Series/C200 Series Chipset Family High Definition Audio Controller (rev 05)
00:1c.0 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 1 (rev b5)
00:1c.4 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 5 (rev b5)
00:1c.6 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 7 (rev b5)
00:1c.7 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 8 (rev b5)
00:1d.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #1 (rev 05)
00:1f.0 ISA bridge: Intel Corporation Z68 Express Chipset LPC Controller (rev 05)
00:1f.2 SATA controller: Intel Corporation 6 Series/C200 Series Chipset Family 6 port Desktop SATA AHCI Controller (rev 05)
00:1f.3 SMBus: Intel Corporation 6 Series/C200 Series Chipset Family SMBus Controller (rev 05)
01:00.0 VGA compatible controller: NVIDIA Corporation G72 [GeForce 7200 GS / 7300 SE] (rev a1)
02:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8062 PCI-E IPMI Gigabit Ethernet Controller (rev 14)
04:00.0 USB controller: NEC Corporation uPD720200 USB 3.0 Host Controller (rev 04)
05:00.0 USB controller: NEC Corporation uPD720200 USB 3.0 Host Controller (rev 04)
06:00.0 PCI bridge: PLX Technology, Inc. PEX 8608 8-lane, 8-Port PCI Express Gen 2 (5.0 GT/s) Switch (rev ba)
07:01.0 PCI bridge: PLX Technology, Inc. PEX 8608 8-lane, 8-Port PCI Express Gen 2 (5.0 GT/s) Switch (rev ba)
07:04.0 PCI bridge: PLX Technology, Inc. PEX 8608 8-lane, 8-Port PCI Express Gen 2 (5.0 GT/s) Switch (rev ba)
07:05.0 PCI bridge: PLX Technology, Inc. PEX 8608 8-lane, 8-Port PCI Express Gen 2 (5.0 GT/s) Switch (rev ba)
07:06.0 PCI bridge: PLX Technology, Inc. PEX 8608 8-lane, 8-Port PCI Express Gen 2 (5.0 GT/s) Switch (rev ba)
07:07.0 PCI bridge: PLX Technology, Inc. PEX 8608 8-lane, 8-Port PCI Express Gen 2 (5.0 GT/s) Switch (rev ba)
07:08.0 PCI bridge: PLX Technology, Inc. PEX 8608 8-lane, 8-Port PCI Express Gen 2 (5.0 GT/s) Switch (rev ba)
07:09.0 PCI bridge: PLX Technology, Inc. PEX 8608 8-lane, 8-Port PCI Express Gen 2 (5.0 GT/s) Switch (rev ba)
08:00.0 FireWire (IEEE 1394): VIA Technologies, Inc. VT6315 Series Firewire Controller (rev 01)
09:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 01)
0a:00.0 SATA controller: JMicron Technology Corp. JMB362 SATA Controller (rev 10)
0b:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9128 PCIe SATA 6 Gb/s RAID controller with HyperDuo (rev 11)
0c:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 01)
0d:00.0 PCI bridge: ASMedia Technology Inc. ASM1083/1085 PCIe to PCI Bridge (rev 01)
0f:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 06)

I noted that this error is happening only in one of the RTL interfaces... i will try to see if the interface that is getting problems is the "rev 06" one.
Comment 11 Heiner Kallweit 2020-04-13 16:08:42 UTC
Please re-test with 5.5.17.
Comment 12 Jorge Ferreira 2020-04-23 13:19:10 UTC
Sorry for the delay but i could not go to local until today in morning, because of restrictions of social confinement here in Portugal.

Apparently the 5.5.17 kernel fixes the problem. I will report here if the crash happens again.

Thank you Heiner.

Note You need to log in before you can comment on or make changes to this bug.