Bug 201109 - r8169 - crash/lockup since 4.18.5
Summary: r8169 - crash/lockup since 4.18.5
Status: RESOLVED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: Network (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: drivers_network@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-09-12 22:16 UTC by Tony
Modified: 2018-10-04 06:51 UTC (History)
1 user (show)

See Also:
Kernel Version: >=4.18.5
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Tony 2018-09-12 22:16:08 UTC
Hi,
First time submitting a bug to the Linux kernel
Hope I do this right...

Since Linux 4.18.5, I am getting periodic crashes in my network driver - r8169
The crashes seem random, but does happen most often when under heavy network load

This started shortly after I upgraded from 4.18.4 to 4.18.5 on 26th August 2018

Two dmesg logs below (the earliest one to happen on Linux 4.18.5, and latest one to happen on Linux 4.18.7)

Linux 4.18.5
```
Aug 26 10:27:07 tony-gentoo kernel: ------------[ cut here ]------------
Aug 26 10:27:07 tony-gentoo kernel: NETDEV WATCHDOG: enp3s0 (r8169): transmit queue 0 timed out
Aug 26 10:27:07 tony-gentoo kernel: WARNING: CPU: 3 PID: 0 at net/sched/sch_generic.c:461 dev_watchdog+0x1ee/0x200
Aug 26 10:27:07 tony-gentoo kernel: Modules linked in: tun iptable_nat nf_nat_ipv4 nf_nat bridge stp llc amdkfd amdgpu mfd_core chash gpu_sched ttm efivarfs
Aug 26 10:27:07 tony-gentoo kernel: CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.18.5-gentoo #1
Aug 26 10:27:07 tony-gentoo kernel: Hardware name: NOVATECH LTD PC-1944/970A-DS3P, BIOS FDa 06/10/2015
Aug 26 10:27:07 tony-gentoo kernel: RIP: 0010:dev_watchdog+0x1ee/0x200
Aug 26 10:27:07 tony-gentoo kernel: Code: 00 48 63 4d e8 eb 93 4c 89 e7 c6 05 2d ad cf 00 01 e8 e6 38 fd ff 89 d9 48 89 c2 4c 89 e6 48 c7 c7 a0 19 d5 8a e8 92 5c 82 ff <0f> 0b eb c0 66 66 >
Aug 26 10:27:07 tony-gentoo kernel: RSP: 0018:ffff8e7b3ecc3e98 EFLAGS: 00010282
Aug 26 10:27:07 tony-gentoo kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
Aug 26 10:27:07 tony-gentoo kernel: RDX: 0000000000040400 RSI: 00000000000000f6 RDI: 0000000000000300
Aug 26 10:27:07 tony-gentoo kernel: RBP: ffff8e7b2c042438 R08: 0000000000000392 R09: 0000000000000007
Aug 26 10:27:07 tony-gentoo kernel: R10: ffff8e7b3ecc3ef8 R11: ffffffff8b4d7a2d R12: ffff8e7b2c042000
Aug 26 10:27:07 tony-gentoo kernel: R13: 0000000000000003 R14: ffff8e7b3ecc3ee8 R15: 0000000000000000
Aug 26 10:27:07 tony-gentoo kernel: FS:  0000000000000000(0000) GS:ffff8e7b3ecc0000(0000) knlGS:0000000000000000
Aug 26 10:27:07 tony-gentoo kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 26 10:27:07 tony-gentoo kernel: CR2: 00007f1e8f579010 CR3: 0000000422cf8000 CR4: 00000000000406e0
Aug 26 10:27:07 tony-gentoo kernel: Call Trace:
Aug 26 10:27:07 tony-gentoo kernel:  <IRQ>
Aug 26 10:27:07 tony-gentoo kernel:  ? qdisc_reset+0xe0/0xe0
Aug 26 10:27:07 tony-gentoo kernel:  call_timer_fn+0x26/0x120
Aug 26 10:27:07 tony-gentoo kernel:  run_timer_softirq+0x3b5/0x400
Aug 26 10:27:07 tony-gentoo kernel:  ? tick_sched_timer+0x32/0x70
Aug 26 10:27:07 tony-gentoo kernel:  ? __hrtimer_run_queues+0x113/0x280
Aug 26 10:27:07 tony-gentoo kernel:  __do_softirq+0xd8/0x2ac
Aug 26 10:27:07 tony-gentoo kernel:  irq_exit+0xa9/0xb0
Aug 26 10:27:07 tony-gentoo kernel:  smp_apic_timer_interrupt+0x67/0x120
Aug 26 10:27:07 tony-gentoo kernel:  apic_timer_interrupt+0xf/0x20
Aug 26 10:27:07 tony-gentoo kernel:  </IRQ>
Aug 26 10:27:07 tony-gentoo kernel: RIP: 0010:cpuidle_enter_state+0x98/0x2a0
Aug 26 10:27:07 tony-gentoo kernel: Code: 92 ff 65 8b 3d 99 cb ea 75 e8 04 7e 92 ff 48 89 c3 0f 1f 44 00 00 31 ff e8 35 87 92 ff 45 84 f6 0f 85 bd 01 00 00 fb 4c 29 fb <48> ba cf f7 53 e3 >
Aug 26 10:27:07 tony-gentoo kernel: RSP: 0018:ffff919c01917e98 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13
Aug 26 10:27:07 tony-gentoo kernel: RAX: ffff8e7b3ece0b40 RBX: 000000000079cf23 RCX: 000000000000001f
Aug 26 10:27:07 tony-gentoo kernel: RDX: 00003e93374c9f97 RSI: 000000001fda2e50 RDI: 0000000000000000
Aug 26 10:27:07 tony-gentoo kernel: RBP: 0000000000000002 R08: ffff8e7b3ecdbb00 R09: 00000000ffffffff
Aug 26 10:27:07 tony-gentoo kernel: R10: ffff919c01917e80 R11: 000000000000077d R12: ffff8e7b2be8f000
Aug 26 10:27:07 tony-gentoo kernel: R13: ffffffff8aeb10d8 R14: 0000000000000000 R15: 00003e9336d2d074
Aug 26 10:27:07 tony-gentoo kernel:  ? cpuidle_enter_state+0x8b/0x2a0
Aug 26 10:27:07 tony-gentoo kernel:  do_idle+0x1d8/0x230
Aug 26 10:27:07 tony-gentoo kernel:  cpu_startup_entry+0x6a/0x70
Aug 26 10:27:07 tony-gentoo kernel:  start_secondary+0x183/0x1b0
Aug 26 10:27:07 tony-gentoo kernel:  secondary_startup_64+0xa5/0xb0
Aug 26 10:27:07 tony-gentoo kernel: ---[ end trace ef1c15184b2e86f4 ]---
Aug 26 10:27:07 tony-gentoo kernel: r8169 0000:03:00.0 enp3s0: link up
```

Linux 4.18.7
```
Sep 12 22:07:05 tony-gentoo kernel: ------------[ cut here ]------------
Sep 12 22:07:06 tony-gentoo kernel: NETDEV WATCHDOG: enp3s0 (r8169): transmit queue 0 timed out
Sep 12 22:07:06 tony-gentoo kernel: WARNING: CPU: 3 PID: 0 at net/sched/sch_generic.c:461 dev_watchdog+0x1ee/0x200
Sep 12 22:07:06 tony-gentoo kernel: Modules linked in: tun iptable_nat nf_nat_ipv4 nf_nat bridge stp llc amdkfd amdgpu mfd_core chash gpu_sched ttm efivarfs
Sep 12 22:07:06 tony-gentoo kernel: CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.18.7-gentoo #1
Sep 12 22:07:06 tony-gentoo kernel: Hardware name: NOVATECH LTD PC-1944/970A-DS3P, BIOS FDa 06/10/2015
Sep 12 22:07:06 tony-gentoo kernel: RIP: 0010:dev_watchdog+0x1ee/0x200
Sep 12 22:07:06 tony-gentoo kernel: Code: 00 48 63 4d e8 eb 93 4c 89 e7 c6 05 1e 0d d0 00 01 e8 c6 2f fd ff 89 d9 4c 89 e6 48 c7 c7 f0 98 55 be 48 89 c2 e8 27 bf 82 ff <0f> 0b eb c0 66 66 >
Sep 12 22:07:06 tony-gentoo kernel: RSP: 0018:ffffa1c9becc3ea0 EFLAGS: 00010286
Sep 12 22:07:06 tony-gentoo kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
Sep 12 22:07:06 tony-gentoo kernel: RDX: 0000000000040400 RSI: 00000000000000f6 RDI: 0000000000000300
Sep 12 22:07:06 tony-gentoo kernel: RBP: ffffa1c9ac076438 R08: 0000000000000005 R09: 0000000000000007
Sep 12 22:07:06 tony-gentoo kernel: R10: 0000000000000000 R11: ffffffffbecd9a2d R12: ffffa1c9ac076000
Sep 12 22:07:06 tony-gentoo kernel: R13: 0000000000000003 R14: ffffa1c9becc3ef0 R15: 0000000000000000
Sep 12 22:07:06 tony-gentoo kernel: FS:  0000000000000000(0000) GS:ffffa1c9becc0000(0000) knlGS:0000000000000000
Sep 12 22:07:06 tony-gentoo kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 12 22:07:06 tony-gentoo kernel: CR2: 00007f681c6d31e8 CR3: 00000003fa5c0000 CR4: 00000000000406e0
Sep 12 22:07:06 tony-gentoo kernel: Call Trace:
Sep 12 22:07:06 tony-gentoo kernel:  <IRQ>
Sep 12 22:07:06 tony-gentoo kernel:  ? pfifo_fast_dequeue+0x160/0x160
Sep 12 22:07:06 tony-gentoo kernel:  call_timer_fn+0x26/0x120
Sep 12 22:07:06 tony-gentoo kernel:  run_timer_softirq+0x38c/0x3c0
Sep 12 22:07:06 tony-gentoo kernel:  ? tick_sched_timer+0x32/0x70
Sep 12 22:07:06 tony-gentoo kernel:  ? __hrtimer_run_queues+0x10b/0x280
Sep 12 22:07:06 tony-gentoo kernel:  ? ktime_get+0x31/0x90
Sep 12 22:07:06 tony-gentoo kernel:  __do_softirq+0xd4/0x2b5
Sep 12 22:07:06 tony-gentoo kernel:  irq_exit+0xa6/0xb0
Sep 12 22:07:06 tony-gentoo kernel:  smp_apic_timer_interrupt+0x67/0x120
Sep 12 22:07:06 tony-gentoo kernel:  apic_timer_interrupt+0xf/0x20
Sep 12 22:07:06 tony-gentoo kernel:  </IRQ>
Sep 12 22:07:06 tony-gentoo kernel: RIP: 0010:cpuidle_enter_state+0xb1/0x290
Sep 12 22:07:06 tony-gentoo kernel: Code: ff e8 b3 ed 92 ff 80 7c 24 03 00 74 12 9c 58 f6 c4 02 0f 85 d1 01 00 00 31 ff e8 5a e6 97 ff fb 48 b8 ff ff ff ff f3 01 00 00 <4c> 29 f3 ba ff ff >
Sep 12 22:07:06 tony-gentoo kernel: RSP: 0018:ffffb52941917e98 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
Sep 12 22:07:06 tony-gentoo kernel: RAX: 000001f3ffffffff RBX: 00000142ee261469 RCX: 000000000000001f
Sep 12 22:07:06 tony-gentoo kernel: RDX: 00000142ee261469 RSI: 0000000000000000 RDI: 0000000000000000
Sep 12 22:07:06 tony-gentoo kernel: RBP: 0000000000000002 R08: 00000523bdfe56cf R09: 00000142ee26f5c0
Sep 12 22:07:06 tony-gentoo kernel: R10: 00000000000003a8 R11: ffffa1c9becdfc28 R12: ffffa1c9abed0c00
Sep 12 22:07:06 tony-gentoo kernel: R13: ffffffffbe6b10d8 R14: 00000142ed68f2cd R15: 0000000000000000
Sep 12 22:07:06 tony-gentoo kernel:  ? cpuidle_enter_state+0x8d/0x290
Sep 12 22:07:06 tony-gentoo kernel:  do_idle+0x1e0/0x210
Sep 12 22:07:06 tony-gentoo kernel:  cpu_startup_entry+0x6a/0x70
Sep 12 22:07:06 tony-gentoo kernel:  start_secondary+0x183/0x1b0
Sep 12 22:07:06 tony-gentoo kernel:  secondary_startup_64+0xa5/0xb0
Sep 12 22:07:06 tony-gentoo kernel: ---[ end trace 62da3b5632f8d82c ]---
Sep 12 22:07:06 tony-gentoo kernel: r8169 0000:03:00.0 enp3s0: link up
```

Hardware details

lspci -k
```
03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c)
        Subsystem: Gigabyte Technology Co., Ltd Onboard Ethernet
        Kernel driver in use: r8169
```

lshw
```
*-network
     description: Ethernet interface
     product: RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
     vendor: Realtek Semiconductor Co., Ltd.
     physical id: 0
     bus info: pci@0000:03:00.0
     logical name: enp3s0
     version: 0c
     serial: 40:8d:5c:80:5e:15
     size: 1Gbit/s
     capacity: 1Gbit/s
     width: 64 bits
     clock: 33MHz
     capabilities: pm msi pciexpress msix vpd bus_master cap_list ethernet physical tp mii 10bt 10bt-fd 100bt 100bt-fd 1000bt 1000bt-fd autonegotiation
     configuration: autonegotiation=on broadcast=yes driver=r8169 driverversion=2.3LK-NAPI duplex=full firmware=rtl8168g-2_0.0.1 02/06/13 latency=0 link=yes multicast=yes port=MII speed=1Gbit/s
     resources: irq:27 ioport:d000(size=256) memory:fe800000-fe800fff memory:d0900000-d0903fff
```

Note: Kernel module r8169 is compiled into my kernel (Ie. not a module)

Currently running on Linux 4.18.4, which does not seem to be affected by this issue.

If there's any more information I can provide, please let me know
Thanks,
Comment 1 Adam Jones 2018-09-29 10:34:21 UTC
I'm seeing the same issue on my Fedora 28 system since upgrading from 4.17.14 to 4.18.8 and later.  Only occurs under reasonably high network load.

I get a "transmit queue timed out" error, then the link seems to bounce up and down, but seems unable to send/receive packets.  Using ethtool to force the interface down to 100Mbit doesn't seem to revive it.

lspci -v:
01:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c)
        Subsystem: Gigabyte Technology Co., Ltd Onboard Ethernet
        Flags: bus master, fast devsel, latency 0, IRQ 127
        I/O ports at e000 [size=256]
        Memory at 81500000 (64-bit, non-prefetchable) [size=4K]
        Memory at a0100000 (64-bit, prefetchable) [size=16K]
        Capabilities: [40] Power Management version 3
        Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit+
        Capabilities: [70] Express Endpoint, MSI 01
        Capabilities: [b0] MSI-X: Enable- Count=4 Masked-
        Capabilities: [d0] Vital Product Data
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [140] Virtual Channel
        Capabilities: [160] Device Serial Number 01-00-00-00-68-4c-e0-00
        Capabilities: [170] Latency Tolerance Reporting

02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c)
        Subsystem: Gigabyte Technology Co., Ltd Onboard Ethernet
        Flags: bus master, fast devsel, latency 0, IRQ 128
        I/O ports at d000 [size=256]
        Memory at 81400000 (64-bit, non-prefetchable) [size=4K]
        Memory at a0000000 (64-bit, prefetchable) [size=16K]
        Capabilities: [40] Power Management version 3
        Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit+
        Capabilities: [70] Express Endpoint, MSI 01
        Capabilities: [b0] MSI-X: Enable- Count=4 Masked-
        Capabilities: [d0] Vital Product Data
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [140] Virtual Channel
        Capabilities: [160] Device Serial Number 01-00-00-00-68-4c-e0-00
        Capabilities: [170] Latency Tolerance Reporting

lspci -n:
01:00.0 0200: 10ec:8168 (rev 0c)
        Subsystem: 1458:e000

02:00.0 0200: 10ec:8168 (rev 0c)
        Subsystem: 1458:e000
Comment 2 Adam Jones 2018-09-29 10:55:05 UTC
dmesg gives the hardware IDs as:
r8169 0000:01:00.0 eth0: RTL8168g/8111g, 40:8d:5c:d0:75:7b, XID 4c000800, IRQ 127
r8169 0000:01:00.0 eth0: jumbo features [frames: 9200 bytes, tx checksumming: ko]

r8169 0000:02:00.0 eth1: RTL8168g/8111g, 40:8d:5c:d0:75:7a, XID 4c000800, IRQ 128
r8169 0000:02:00.0 eth1: jumbo features [frames: 9200 bytes, tx checksumming: ko]

Disabling power management with:
echo 'on' > '/sys/bus/pci/devices/0000:01:00.0/power/control'
does not seem to prevent the issue.
Comment 3 Tony 2018-09-29 11:55:06 UTC
This issue has been fixed by the following patch by Heiner Kallweit
https://www.spinics.net/lists/netdev/msg526003.html

Hopefully making it's way up the chain into mainline and then down into the stable kernels
Comment 4 Tony 2018-10-04 06:51:51 UTC
Fix is now in mainline
commit ad5f97faff4231e72b96bd96adbe1b6e977a9b86

Note You need to log in before you can comment on or make changes to this bug.