Bug 209423

Summary: WARN_ON_ONCE() at rtl8169_tso_csum_v2()
Product: Drivers Reporter: Damian Wrobel (dwrobel)
Component: NetworkAssignee: drivers_network (drivers_network)
Status: RESOLVED CODE_FIX    
Severity: low CC: hkallweit1
Priority: P1    
Hardware: x86-64   
OS: Linux   
Kernel Version: 5.8.11 Subsystem:
Regression: No Bisected commit-id:
Attachments: kasan-5.8.12-204.fc32.x86_64-1.txt

Description Damian Wrobel 2020-09-29 09:05:49 UTC
Kernel: kernel-5.8.10-200.fc32.x86_64
Hardware: HP t630 Thin Client/8158, BIOS M40 v01.12
Ethernet card (lspci -v excerpt):
 01:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 15)
	Subsystem: Hewlett-Packard Company Device 8158
	Flags: bus master, fast devsel, latency 0, IRQ 32
	I/O ports at e000 [size=256]
	Memory at fea04000 (64-bit, non-prefetchable) [size=4K]
	Memory at fea00000 (64-bit, non-prefetchable) [size=16K]
	Capabilities: [40] Power Management version 3
	Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
	Capabilities: [70] Express Endpoint, MSI 01
	Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
	Capabilities: [100] Advanced Error Reporting
	Capabilities: [140] Virtual Channel
	Capabilities: [160] Device Serial Number 01-00-00-00-68-4c-e0-00
	Capabilities: [170] Latency Tolerance Reporting
	Capabilities: [178] L1 PM Substates
	Kernel driver in use: r8169
	Kernel modules: r8169

I observed the following warning:

[ 6783.350157] WARNING: CPU: 0 PID: 0 at drivers/net/ethernet/realtek/r8169_main.c:4095 rtl8169_start_xmit+0x467/0x7f0 [r8169]
[ 6783.350290] Modules linked in: tun nft_nat nft_masq nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat ip_set_hash_net ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security ip_set nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter sunrpc vfat fat snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi edac_mce_amd snd_hda_intel kvm_amd snd_intel_dspcfg ccp snd_hda_codec kvm snd_hda_core ax88179_178a irqbypass usbnet snd_hwdep snd_pcm mii hp_wmi sparse_keymap wmi_bmof snd_timer snd sp5100_tco k10temp fam15h_power i2c_piix4 soundcore rfkill_gpio rfkill acpi_cpufreq ip_tables xfs amdgpu iommu_v2 gpu_sched i2c_algo_bit ttm drm_kms_helper cec drm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel serio_raw r8169
[ 6783.350360]  wmi video
[ 6783.350420] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.8.11-200.fc32.x86_64 #1
[ 6783.350432] Hardware name: HP HP t630 Thin Client/8158, BIOS M40 v01.12 02/04/2020
[ 6783.350454] RIP: 0010:rtl8169_start_xmit+0x467/0x7f0 [r8169]
[ 6783.350468] Code: 0f 84 3f 03 00 00 89 f2 2b 55 74 e9 8c fc ff ff b9 01 00 00 00 66 89 8a 90 00 00 00 e9 50 ff ff ff 83 e2 10 0f 85 12 01 00 00 <0f> 0b e9 22 fe ff ff e8 9d eb 7d df 85 c0 0f 85 5c af 00 00 48 8b
[ 6783.350494] RSP: 0018:ffffa91a40003c80 EFLAGS: 00010246
[ 6783.350506] RAX: 0000000000000000 RBX: ffff909e5c64c000 RCX: ffff909e520382e2
[ 6783.350517] RDX: 0000000000000000 RSI: ffff909e52038000 RDI: ffff909e5203fec0
[ 6783.350527] RBP: ffff909e54105300 R08: 0000000000000024 R09: ffff909e52038304
[ 6783.350538] R10: 0000000000000006 R11: ffff909e4f1f1e80 R12: 0000000000000060
[ 6783.350549] R13: 0000000000000258 R14: 0000000000000022 R15: 0000000000000000
[ 6783.350561] FS:  0000000000000000(0000) GS:ffff909e5ec00000(0000) knlGS:0000000000000000
[ 6783.350589] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 6783.350599] CR2: 00007fc9aa5a3000 CR3: 0000000111a6a000 CR4: 00000000001406f0
[ 6783.350610] Call Trace:
[ 6783.350622]  <IRQ>
[ 6783.350644]  dev_hard_start_xmit+0x8d/0x1d0
[ 6783.350660]  sch_direct_xmit+0xeb/0x2f0
[ 6783.350674]  __dev_queue_xmit+0x710/0x8a0
[ 6783.350685]  ? eth_header+0x26/0xc0
[ 6783.350700]  ip_finish_output2+0x18b/0x560
[ 6783.350714]  __netif_receive_skb_core+0x4f0/0xf40
[ 6783.350730]  ? packet_rcv+0x44/0x490
[ 6783.350742]  __netif_receive_skb_one_core+0x2d/0x70
[ 6783.350762]  process_backlog+0x96/0x160
[ 6783.350777]  net_rx_action+0x13c/0x3e0
[ 6783.350802]  ? usbnet_bh+0x24/0x2b0 [usbnet]
[ 6783.350821]  __do_softirq+0xd9/0x2c4
[ 6783.350835]  asm_call_on_stack+0x12/0x20
[ 6783.350846]  </IRQ>
[ 6783.350858]  do_softirq_own_stack+0x39/0x50
[ 6783.350871]  irq_exit_rcu+0xc2/0x100
[ 6783.350884]  common_interrupt+0x75/0x140
[ 6783.350896]  asm_common_interrupt+0x1e/0x40
[ 6783.350910] RIP: 0010:cpuidle_enter_state+0xb6/0x3f0
[ 6783.350927] Code: 10 ac 6b 60 e8 db c4 7b ff 49 89 c7 0f 1f 44 00 00 31 ff e8 ac dd 7b ff 80 7c 24 0f 00 0f 85 d4 01 00 00 fb 66 0f 1f 44 00 00 <45> 85 e4 0f 88 e0 01 00 00 49 63 d4 4c 2b 7c 24 10 48 8d 04 52 48
[ 6783.350945] RSP: 0018:ffffffffa0a03e58 EFLAGS: 00000246
[ 6783.350957] RAX: ffff909e5ec2a2c0 RBX: ffff909e50701800 RCX: 000000000000001f
[ 6783.350968] RDX: 0000000000000000 RSI: 00000000401ec0c7 RDI: 0000000000000000
[ 6783.350979] RBP: ffffffffa0b78960 R08: 0000062b5f288767 R09: 0000000000000007
[ 6783.350989] R10: 000000000000000d R11: 0000000000000003 R12: 0000000000000002
[ 6783.350999] R13: ffff909e50701800 R14: 0000000000000002 R15: 0000062b5f288767
[ 6783.351018]  ? cpuidle_enter_state+0xa4/0x3f0
[ 6783.351031]  cpuidle_enter+0x29/0x40
[ 6783.351044]  do_idle+0x1d5/0x2a0
[ 6783.351056]  cpu_startup_entry+0x19/0x20
[ 6783.351069]  start_kernel+0x7f4/0x804
[ 6783.351086]  secondary_startup_64+0xb6/0xc0

I have this machine since a few days, so I cannot tell if this is reproducible and how easy.

For a convenience I'm attaching a part of the code with line numbers (I also verified that Fedora on kernel-5.8.11-200.fc32.x86_64 didn't patch this driver):

   4078 static bool rtl8169_tso_csum_v2(struct rtl8169_private *tp,
   4079                                 struct sk_buff *skb, u32 *opts)
   4080 {
   4081         u32 transport_offset = (u32)skb_transport_offset(skb);
   4082         struct skb_shared_info *shinfo = skb_shinfo(skb);
   4083         u32 mss = shinfo->gso_size;
   4084 
   4085         if (mss) {
   4086                 if (shinfo->gso_type & SKB_GSO_TCPV4) {
   4087                         opts[0] |= TD1_GTSENV4;
   4088                 } else if (shinfo->gso_type & SKB_GSO_TCPV6) {
   4089                         if (skb_cow_head(skb, 0))
   4090                                 return false;
   4091 
   4092                         tcp_v6_gso_csum_prep(skb);
   4093                         opts[0] |= TD1_GTSENV6;
   4094                 } else {
   4095                         WARN_ON_ONCE(1);
   4096                 }
   4097 
   4098                 opts[0] |= transport_offset << GTTCPHO_SHIFT;
   4099                 opts[1] |= mss << TD1_MSS_SHIFT;
   4100         } else if (skb->ip_summed == CHECKSUM_PARTIAL) {
Comment 1 Heiner Kallweit 2020-09-29 16:47:40 UTC
That's interesting. GSO is used, but it's neither TSO nor TSO6.
Do you have any special network setup / network traffic? r8169 is used with basically every consumer mainboard, therefore I'd expect much more such reports if standard network traffic would be affected.

With the following patch we get more info about the offending SKB if the warning should pops up again.
Note: It's a WARN_ONCE, therefore you'll see only one warning until next reboot.

diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
index 6c7c004c2..551f5b8ed 100644
--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -4097,7 +4097,8 @@ static bool rtl8169_tso_csum_v2(struct rtl8169_private *tp,
 			tcp_v6_gso_csum_prep(skb);
 			opts[0] |= TD1_GTSENV6;
 		} else {
-			WARN_ON_ONCE(1);
+			WARN_ONCE(1, "gso_size = %u, gso_type = 0x%08x\n",
+				  mss, shinfo->gso_type);
 		}
 
 		opts[0] |= transport_offset << GTTCPHO_SHIFT;
-- 
2.28.0
Comment 2 Damian Wrobel 2020-09-29 17:41:12 UTC
(In reply to Heiner Kallweit from comment #1)
> That's interesting. GSO is used, but it's neither TSO nor TSO6.
> Do you have any special network setup / network traffic? r8169 is used with
> basically every consumer mainboard,

This is my first machine with the realtek network card.

Basically this machine works as a internet router and has the following list of interfaces:

# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 7c:d3:0a:2d:1b:3b brd ff:ff:ff:ff:ff:ff
    inet 192.168.160.160/24 brd 192.168.160.255 scope global noprefixroute enp1s0
       valid_lft forever preferred_lft forever
    inet6 fe80::7ed3:aff:fe2d:1b3b/64 scope link 
       valid_lft forever preferred_lft forever
3: enp0s16u2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 80:1f:02:d6:2b:7f brd ff:ff:ff:ff:ff:ff
    inet 10.8.11.16/26 brd 10.8.11.63 scope global dynamic noprefixroute enp0s16u2
       valid_lft 57242sec preferred_lft 57242sec
    inet6 fe80::821f:2ff:fed6:2b7f/64 scope link 
       valid_lft forever preferred_lft forever
4: tun200: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UNKNOWN group default qlen 100
    link/none 
    inet 192.168.200.1/24 brd 192.168.200.255 scope global tun200
       valid_lft forever preferred_lft forever
    inet6 fe80::311f:6a70:3ebe:a9fe/64 scope link stable-privacy 
       valid_lft forever preferred_lft forever

2: This is the realtek built-in card connected to the LAN network to which dozen of devices is connected (mostly laptops and mobile devices via separate AP).

3: This is USB based "0b95:1790 ASIX Electronics Corp. AX88179 Gigabit Ethernet" card connected to WAN (my local Internet provider).

4: This is openvpn based interface to which I can connect to from mobile devices.

I don't use explicitly any ipv6 connections, however I let all of the devices to setup a local-link ipv6 address.

> With the following patch we get more info about the offending SKB if the
> warning should pops up again.
> Note: It's a WARN_ONCE, therefore you'll see only one warning until next
> reboot.
> 

Would you please consider to merge it upstream to 5.8 or 5.9 series?
Otherwise I would have to patch/rebuild manually every single kernel from Fedora. I could do that once or twice but only if I would knew that it's easy reproducible.
Comment 3 Heiner Kallweit 2020-09-29 18:29:18 UTC
The patch is just meant as a debug patch. Depending on what type of network communication triggers the WARN, it may require an actual fix.
Would be good if you can apply the patch, reboot, and then let's hope that it happens again.
Comment 4 Damian Wrobel 2020-09-29 21:17:28 UTC
(In reply to Heiner Kallweit from comment #3)
> The patch is just meant as a debug patch. Depending on what type of network
> communication triggers the WARN, it may require an actual fix.

The drawback with that approach is that you're relying on a single source to provide you helpful information instead of increasing a probability of getting this information by merging this change into mainline.

> Would be good if you can apply the patch, reboot, and then let's hope that
> it happens again.

Applied here[1] on 5.8.12 - will deploy when becomes available[2].

[1] https://src.fedoraproject.org/fork/dwrobel/rpms/kernel/c/8d0e67bc254db1889441e88f48b66115abcbb996?branch=f32-kbug-209423
[2] https://copr.fedorainfracloud.org/coprs/dwrobel/kernel-kbug-209423/
Comment 5 Heiner Kallweit 2020-09-30 15:54:17 UTC
(In reply to Damian Wrobel from comment #4)
> (In reply to Heiner Kallweit from comment #3)
> > The patch is just meant as a debug patch. Depending on what type of network
> > communication triggers the WARN, it may require an actual fix.
> 
> The drawback with that approach is that you're relying on a single source to
> provide you helpful information instead of increasing a probability of
> getting this information by merging this change into mainline.
> 
It's a question of timing. As a new feature this patch could make it into 5.10 only, and it would take some time until I get the first feedback.
However, based on the input we might get from your system, this additional debug output may be included in the fix for the issue you're facing.
Comment 6 Damian Wrobel 2020-10-01 06:31:20 UTC
(In reply to Heiner Kallweit from comment #5)
> (In reply to Damian Wrobel from comment #4)
> > (In reply to Heiner Kallweit from comment #3)
> > > The patch is just meant as a debug patch. Depending on what type of
> network
> > > communication triggers the WARN, it may require an actual fix.
> > 
> > The drawback with that approach is that you're relying on a single source
> to
> > provide you helpful information instead of increasing a probability of
> > getting this information by merging this change into mainline.
> > 
> It's a question of timing.

I understand you, however we also don't know how easily I can reproduce it, if at all.

$ uname -a
Linux localhost.localdomain 5.8.12-201.fc32.x86_64 #1 SMP Tue Sep 29 23:49:06 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

$ uptime
 08:30:54 up 1 day, 3 min,  1 user,  load average: 0.00, 0.00, 0.00

$ journalctl -b | grep r8169_main.c | wc -l
0

I'll keep it running...
Comment 7 Damian Wrobel 2020-10-01 19:19:24 UTC
Here it comes:

[86678.377120] ------------[ cut here ]------------
[86678.377155] gso_size = 1448, gso_type = 0x00000000
[86678.377381] WARNING: CPU: 0 PID: 0 at drivers/net/ethernet/realtek/r8169_main.c:4095 rtl8169_start_xmit+0x489/0x800 [r8169]
[86678.377393] Modules linked in: tun nft_nat nft_masq nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat ip_set_hash_net ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security ip_set nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter sunrpc vfat fat snd_hda_codec_realtek edac_mce_amd snd_hda_codec_generic snd_hda_codec_hdmi ledtrig_audio kvm_amd snd_hda_intel snd_intel_dspcfg ccp snd_hda_codec kvm snd_hda_core snd_hwdep snd_pcm hp_wmi snd_timer wmi_bmof sparse_keymap irqbypass snd sp5100_tco i2c_piix4 soundcore k10temp fam15h_power rfkill_gpio rfkill acpi_cpufreq ip_tables xfs amdgpu iommu_v2 gpu_sched i2c_algo_bit ttm drm_kms_helper cec drm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel ax88179_178a usbnet serio_raw r8169 mii
[86678.377442]  wmi video
[86678.377486] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.8.12-201.fc32.x86_64 #1
[86678.377495] Hardware name: HP HP t630 Thin Client/8158, BIOS M40 v01.12 02/04/2020
[86678.377511] RIP: 0010:rtl8169_start_xmit+0x489/0x800 [r8169]
[86678.377521] Code: 10 0f 85 43 01 00 00 80 3d bb 20 01 00 00 0f 85 16 fe ff ff 44 89 ee 48 c7 c7 b0 72 36 c0 c6 05 a4 20 01 00 01 e8 0d 33 d8 e1 <0f> 0b 44 8b 44 24 28 8b 74 24 2c 48 8b 8d c8 00 00 00 e9 e9 fd ff
[86678.377533] RSP: 0018:ffffa8f280003c80 EFLAGS: 00010282
[86678.377542] RAX: 0000000000000026 RBX: ffff8d331abc6000 RCX: 0000000000000000
[86678.377551] RDX: ffff8d331b427060 RSI: ffff8d331b418d00 RDI: 0000000000000300
[86678.377559] RBP: ffff8d32b5bb8200 R08: 00000000000003d0 R09: 000000000000000d
[86678.377576] R10: 0000000000000730 R11: ffffa8f280003b15 R12: 00000000000001c0
[86678.377596] R13: 00000000000005a8 R14: 0000000000000022 R15: 000000000000001c
[86678.377606] FS:  0000000000000000(0000) GS:ffff8d331b400000(0000) knlGS:0000000000000000
[86678.377617] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[86678.377624] CR2: 00007fa516f64520 CR3: 00000000b6de6000 CR4: 00000000001406f0
[86678.377632] Call Trace:
[86678.377641]  <IRQ>
[86678.377657]  dev_hard_start_xmit+0x8d/0x1d0
[86678.377676]  sch_direct_xmit+0xeb/0x2f0
[86678.377687]  __dev_queue_xmit+0x710/0x8a0
[86678.377713]  ? nf_confirm+0xcb/0xf0 [nf_conntrack]
[86678.377725]  ? nf_hook_slow+0x3f/0xb0
[86678.377735]  ip_finish_output2+0x2ad/0x560
[86678.377746]  __netif_receive_skb_core+0x4f0/0xf40
[86678.377758]  ? packet_rcv+0x44/0x490
[86678.377770]  __netif_receive_skb_one_core+0x2d/0x70
[86678.377779]  process_backlog+0x96/0x160
[86678.377789]  net_rx_action+0x13c/0x3e0
[86678.377804]  ? usbnet_bh+0x24/0x2b0 [usbnet]
[86678.377815]  __do_softirq+0xd9/0x2c4
[86678.377825]  asm_call_on_stack+0x12/0x20
[86678.377835]  </IRQ>
[86678.377845]  do_softirq_own_stack+0x39/0x50
[86678.377855]  irq_exit_rcu+0xc2/0x100
[86678.377865]  common_interrupt+0x75/0x140
[86678.377875]  asm_common_interrupt+0x1e/0x40
[86678.377885] RIP: 0010:cpuidle_enter_state+0xb6/0x3f0
[86678.377894] Code: e0 ab 6b 5d e8 ab c4 7b ff 49 89 c7 0f 1f 44 00 00 31 ff e8 7c dd 7b ff 80 7c 24 0f 00 0f 85 d4 01 00 00 fb 66 0f 1f 44 00 00 <45> 85 e4 0f 88 e0 01 00 00 49 63 d4 4c 2b 7c 24 10 48 8d 04 52 48
[86678.377907] RSP: 0018:ffffffffa3a03e58 EFLAGS: 00000246
[86678.377915] RAX: ffff8d331b42a2c0 RBX: ffff8d3312f3e400 RCX: 000000000000001f
[86678.377923] RDX: 0000000000000000 RSI: 00000000401ec2e2 RDI: 0000000000000000
[86678.377931] RBP: ffffffffa3b78960 R08: 00004ed561df8e36 R09: 0000000000000006
[86678.377939] R10: 000000000000001d R11: 000000000000000e R12: 0000000000000002
[86678.377956] R13: ffff8d3312f3e400 R14: 0000000000000002 R15: 00004ed561df8e36
[86678.377970]  ? cpuidle_enter_state+0xa4/0x3f0
[86678.377980]  cpuidle_enter+0x29/0x40
[86678.377990]  do_idle+0x1d5/0x2a0
[86678.377999]  cpu_startup_entry+0x19/0x20
[86678.378009]  start_kernel+0x7f4/0x804
[86678.378022]  secondary_startup_64+0xb6/0xc0
[86678.378032] ---[ end trace 263bcddb7119c953 ]---
Comment 8 Damian Wrobel 2020-10-01 19:22:46 UTC
I also executed:
# echo 1 > /sys/kernel/debug/clear_warn_once
to check if it will reappear.
Comment 9 Damian Wrobel 2020-10-01 19:27:53 UTC
In case you would like to test any other modifications, please prepare it using git format-patch, as fedora kernel.spec doesn't like plain .diff files.
Comment 10 Heiner Kallweit 2020-10-02 11:16:53 UTC
I received feedback that the issue you're seeing may indicate a bug in the network stack, not in r8169 driver. Could you please check whether the issue still occurs with the following change.


From 54ec9afb36b7e87aaeb45db5e9dcfa9fe78cc672 Mon Sep 17 00:00:00 2001
From: Heiner Kallweit <hkallweit1@gmail.com>
Date: Fri, 2 Oct 2020 13:05:51 +0200
Subject: [PATCH] net: clear gso_size in napi_reuse_skb

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
 net/core/dev.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/core/dev.c b/net/core/dev.c
index 62b06523b..8e75399cc 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6088,6 +6088,7 @@ static void napi_reuse_skb(struct napi_struct *napi, struct sk_buff *skb)
 
 	skb->encapsulation = 0;
 	skb_shinfo(skb)->gso_type = 0;
+	skb_shinfo(skb)->gso_size = 0;
 	skb->truesize = SKB_TRUESIZE(skb_end_offset(skb));
 	skb_ext_reset(skb);
 
-- 
2.28.0
Comment 11 Damian Wrobel 2020-10-02 14:06:38 UTC
(In reply to Heiner Kallweit from comment #10)
> I received feedback that the issue you're seeing may indicate a bug in the
> network stack, not in r8169 driver. Could you please check whether the issue
> still occurs with the following change.

Thank you for the feedback. Sure, I'll check it.

Applied on top of the previous patch here[3] on 5.8.12 - will deploy when becomes available.

[3] https://src.fedoraproject.org/fork/dwrobel/rpms/kernel
Comment 12 Heiner Kallweit 2020-10-03 14:07:59 UTC
I found some reports about a similar issue in tun when using docker.

https://forums.unraid.net/bug-reports/prereleases/68-rc1-kernel-tun-unexpected-gso-r634/?page=3

According to comment 2 you have a tun network interface. Are you also running docker on the machine? Or what do you use the tun interface for?
Comment 13 Damian Wrobel 2020-10-05 08:14:24 UTC
It is still reproducible in a build with patches from:
 - comment#1,
 - comment#10:

[27121.062075] ------------[ cut here ]------------
[27121.062085] gso_size = 1448, gso_type = 0x00000000
[27121.062220] WARNING: CPU: 0 PID: 0 at drivers/net/ethernet/realtek/r8169_main.c:4095 rtl8169_start_xmit+0x489/0x800 [r8169]
[27121.062225] Modules linked in: tun nft_nat nft_masq nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat ip_set_hash_net ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security ip_set nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter sunrpc vfat fat edac_mce_amd kvm_amd ccp kvm snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi ledtrig_audio snd_hda_intel snd_intel_dspcfg hp_wmi wmi_bmof snd_hda_codec irqbypass sparse_keymap fam15h_power k10temp snd_hda_core snd_hwdep snd_pcm snd_timer snd sp5100_tco soundcore i2c_piix4 rfkill_gpio rfkill acpi_cpufreq ip_tables xfs amdgpu iommu_v2 gpu_sched i2c_algo_bit ttm drm_kms_helper cec crct10dif_pclmul crc32_pclmul crc32c_intel drm ghash_clmulni_intel ax88179_178a usbnet serio_raw mii r8169
[27121.062268]  wmi video
[27121.062301] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.8.12-202.fc32.x86_64 #1
[27121.062306] Hardware name: HP HP t630 Thin Client/8158, BIOS M40 v01.12 02/04/2020
[27121.062317] RIP: 0010:rtl8169_start_xmit+0x489/0x800 [r8169]
[27121.062324] Code: 10 0f 85 43 01 00 00 80 3d bb 20 01 00 00 0f 85 16 fe ff ff 44 89 ee 48 c7 c7 b0 82 4e c0 c6 05 a4 20 01 00 01 e8 0d 23 c0 cd <0f> 0b 44 8b 44 24 28 8b 74 24 2c 48 8b 8d c8 00 00 00 e9 e9 fd ff
[27121.062333] RSP: 0018:ffffba1080003c80 EFLAGS: 00010282
[27121.062354] RAX: 0000000000000026 RBX: ffff9729973e4000 RCX: ffff97299b418d08
[27121.062359] RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff97299b418d00
[27121.062364] RBP: ffff9729383d9200 R08: 00000000000003d5 R09: 000000000000000b
[27121.062369] R10: 0000000000000730 R11: ffffba1080003b15 R12: 0000000000000000
[27121.062373] R13: 00000000000005a8 R14: 0000000000000022 R15: 0000000000000000
[27121.062379] FS:  0000000000000000(0000) GS:ffff97299b400000(0000) knlGS:0000000000000000
[27121.062385] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[27121.062389] CR2: 00007f71f400c598 CR3: 00000000b961e000 CR4: 00000000001406f0
[27121.062394] Call Trace:
[27121.062400]  <IRQ>
[27121.062413]  dev_hard_start_xmit+0x8d/0x1d0
[27121.062428]  sch_direct_xmit+0xeb/0x2f0
[27121.062435]  __dev_queue_xmit+0x710/0x8a0
[27121.062455]  ? nf_confirm+0xcb/0xf0 [nf_conntrack]
[27121.062462]  ? nf_hook_slow+0x3f/0xb0
[27121.062468]  ip_finish_output2+0x2ad/0x560
[27121.062475]  __netif_receive_skb_core+0x4f0/0xf40
[27121.062482]  ? packet_rcv+0x44/0x490
[27121.062488]  __netif_receive_skb_one_core+0x2d/0x70
[27121.062494]  process_backlog+0x96/0x160
[27121.062500]  net_rx_action+0x13c/0x3e0
[27121.062560]  ? usbnet_bh+0x24/0x2b0 [usbnet]
[27121.062569]  __do_softirq+0xd9/0x2c4
[27121.062576]  asm_call_on_stack+0x12/0x20
[27121.062580]  </IRQ>
[27121.062586]  do_softirq_own_stack+0x39/0x50
[27121.062593]  irq_exit_rcu+0xc2/0x100
[27121.062600]  common_interrupt+0x75/0x140
[27121.062605]  asm_common_interrupt+0x1e/0x40
[27121.062612] RIP: 0010:native_safe_halt+0xe/0x10
[27121.062617] Code: 02 20 48 8b 00 a8 08 75 c4 e9 7b ff ff ff cc cc cc cc cc cc cc cc cc cc cc cc cc cc e9 07 00 00 00 0f 00 2d f6 69 49 00 fb f4 <c3> 90 e9 07 00 00 00 0f 00 2d e6 69 49 00 f4 c3 cc cc 0f 1f 44 00
[27121.062626] RSP: 0018:ffffffff8fa03e08 EFLAGS: 00000246
[27121.062631] RAX: 0000000000004000 RBX: 0000000000000001 RCX: 000000000000001f
[27121.062635] RDX: 4ec4ec4ec4ec4ec5 RSI: ffffffff8fb78960 RDI: ffff972992e75000
[27121.062640] RBP: ffff972999c5ec00 R08: 000018aa9d42aec5 R09: 0000000000000005
[27121.062645] R10: 0000000000000024 R11: 0000000000000013 R12: ffff972999c5ec64
[27121.062649] R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000000
[27121.062660]  acpi_safe_halt+0x1b/0x30
[27121.062674]  acpi_idle_enter+0x27e/0x2e0
[27121.062683]  cpuidle_enter_state+0x81/0x3f0
[27121.062689]  cpuidle_enter+0x29/0x40
[27121.062695]  do_idle+0x1d5/0x2a0
[27121.062700]  cpu_startup_entry+0x19/0x20
[27121.062707]  start_kernel+0x7f4/0x804
[27121.062716]  secondary_startup_64+0xb6/0xc0
[27121.062722] ---[ end trace dc4fdf09c3fffc4d ]---

(In reply to Heiner Kallweit from comment #12)
> I found some reports about a similar issue in tun when using docker.
> 
> https://forums.unraid.net/bug-reports/prereleases/68-rc1-kernel-tun-
> unexpected-gso-r634/?page=3
> 
> According to comment 2 you have a tun network interface. Are you also
> running docker on the machine?
No, I don't use docker nor podman.

> Or what do you use the tun interface for?
I described that on comment#2:

(In reply to Damian Wrobel from comment #2)
> (In reply to Heiner Kallweit from comment #1)
> 
> 4: This is openvpn based interface to which I can connect to from mobile
> devices.
>
Currently I'm running openvpn server version 2.4.9-1.fc32.

I'm running the same router configuration for a few years the only thing which actually changed over the time was the hardware architecture. I started from RPi3B+ then switched to RPi4B and recently to HP-T630. The reason is that with increasing speed of my fiber internet connection previous hardware wasn't capable of routing it fast enough.

In practise the interface nr: 3 (see commen#2) is always the same (as it's external USB dongle) what changes recently, when switching for T630, is that the internal interface nr: 2 is no longer Broadcom but Realtek based.

So far (in last few years) I didn't observe similar warning as reported here - perhaps different ethernet drivers handles this differently.
Comment 14 Heiner Kallweit 2020-10-05 08:42:00 UTC
Here comes one more debug patch. skb_warn_bad_offload() provides more details about the offending skb, and maybe we get an idea where in the network stack something goes wrong.


From c52e6b215e6fd474842e92108fa51f73dbfb3fef Mon Sep 17 00:00:00 2001
From: Heiner Kallweit <hkallweit1@gmail.com>
Date: Sat, 3 Oct 2020 17:09:32 +0200
Subject: [PATCH] net: warn if gso_size is set but gso_type is not

Warn if gso_size is set but gso_type is not.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
 net/core/dev.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/core/dev.c b/net/core/dev.c
index 62b06523b..4c943b774 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3491,6 +3491,9 @@ static netdev_features_t gso_features_check(const struct sk_buff *skb,
 {
 	u16 gso_segs = skb_shinfo(skb)->gso_segs;
 
+	if (!skb_shinfo(skb)->gso_type)
+		skb_warn_bad_offload(skb);
+
 	if (gso_segs > dev->gso_max_segs)
 		return features & ~NETIF_F_GSO_MASK;
 
-- 
2.28.0
Comment 15 Damian Wrobel 2020-10-05 12:36:05 UTC
(In reply to Heiner Kallweit from comment #14)
> Here comes one more debug patch. skb_warn_bad_offload() provides more
> details about the offending skb,

Great, committed here[4], building here[5], will let you know, once deployed, when something pop-up...

[4] https://src.fedoraproject.org/fork/dwrobel/rpms/kernel/c/54c17223330ae5010f47411ee8f1860a08de0440?branch=f32-kbug-209423

[5] https://copr.fedorainfracloud.org/coprs/dwrobel/kernel-kbug-209423/build/1694926/
Comment 16 Damian Wrobel 2020-10-08 16:12:00 UTC
It took longer than usually, but here it comes build with patches from:
 - comment#1,
 - comment#10:
 - comment#14:

[236222.967236] skb len=134 headroom=778 headlen=134 tailroom=31536
                mac=(778,14) net=(792,20) trans=812
                shinfo(txflags=0 nr_frags=0 gso(size=568 type=0 segs=1))
                csum(0x0 ip_summed=1 complete_sw=0 valid=0 level=0)
                hash(0x0 sw=0 l4=0) proto=0x0800 pkttype=0 iif=4
[236222.967297] dev name=enp1s0 feat=0x0x00000100000041b2
[236222.967392] skb linear:   00000000: 00 13 3b a0 01 e8 7c d3 0a 2d 1b 3b 08 00 45 00
[236222.967404] skb linear:   00000010: 00 78 e2 e6 00 00 7b 06 52 e1 d8 3a d0 ce c0 a8
[236222.967415] skb linear:   00000020: a0 06 01 bb 8b c6 53 91 be 5e 6e 60 bd e2 80 18
[236222.967426] skb linear:   00000030: 01 13 5c f6 00 00 01 01 08 0a 3d d6 6a a3 63 ea
[236222.967437] skb linear:   00000040: 5c d9 17 03 03 00 3f af 00 01 84 45 e2 36 e4 6a
[236222.967454] skb linear:   00000050: 3d 76 a8 7f d7 12 fa 72 4b d1 d0 74 0d c1 49 77
[236222.967466] skb linear:   00000060: 8b a4 bb 04 e5 aa 03 61 d3 e6 1f c9 0d 3e 46 c8
[236222.967477] skb linear:   00000070: cd 1f 7d ce e8 a7 84 84 01 5d 1f b4 ee 4f 27 63
[236222.967488] skb linear:   00000080: d2 a1 ab 1f 26 1d
[236222.967498] ------------[ cut here ]------------
[236222.967508] r8169: caps=(0x00000100000041b2, 0x0000000000000000)
[236222.967668] WARNING: CPU: 0 PID: 0 at net/core/dev.c:3184 skb_warn_bad_offload+0x72/0xe0
[236222.967691] Modules linked in: tcp_diag udp_diag raw_diag inet_diag unix_diag tun nft_nat nft_masq nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat ip_set_hash_net ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security ip_set nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter sunrpc vfat fat snd_hda_codec_realtek snd_hda_codec_generic edac_mce_amd ledtrig_audio kvm_amd snd_hda_codec_hdmi ccp snd_hda_intel snd_intel_dspcfg kvm snd_hda_codec snd_hda_core snd_hwdep irqbypass snd_pcm snd_timer snd hp_wmi sp5100_tco sparse_keymap wmi_bmof fam15h_power k10temp i2c_piix4 soundcore rfkill_gpio rfkill acpi_cpufreq ip_tables xfs amdgpu iommu_v2 gpu_sched i2c_algo_bit ttm drm_kms_helper cec crct10dif_pclmul crc32_pclmul crc32c_intel drm
[236222.967776]  ghash_clmulni_intel ax88179_178a serio_raw usbnet mii r8169 wmi video
[236222.967858] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.8.12-203.fc32.x86_64 #1
[236222.967870] Hardware name: HP HP t630 Thin Client/8158, BIOS M40 v01.12 02/04/2020
[236222.967895] RIP: 0010:skb_warn_bad_offload+0x72/0xe0
[236222.967908] Code: 8d 95 c8 00 00 00 48 8d 88 e8 01 00 00 48 85 c0 48 c7 c0 d8 d7 15 a4 48 0f 44 c8 4c 89 e6 48 c7 c7 90 7b 47 a4 e8 04 85 72 ff <0f> 0b 5b 5d 41 5c c3 80 7d 00 00 49 c7 c4 3b 28 40 a4 74 ac be 25
[236222.967926] RSP: 0018:ffffa8f9c0003c80 EFLAGS: 00010282
[236222.967938] RAX: 0000000000000034 RBX: ffff8d7090f2cd00 RCX: 0000000000000000
[236222.967951] RDX: ffff8d709b427060 RSI: ffff8d709b418d00 RDI: 0000000000000300
[236222.967962] RBP: ffff8d709a9fc000 R08: 0000000000000406 R09: 0720072007200720
[236222.967974] R10: 0720072007200720 R11: 0729073007300730 R12: ffffffffc012e729
[236222.967986] R13: ffffa8f9c0003d3b R14: 0000000000000000 R15: ffff8d70367652ac
[236222.968000] FS:  0000000000000000(0000) GS:ffff8d709b400000(0000) knlGS:0000000000000000
[236222.968013] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[236222.968023] CR2: 00007f3cf5ebf010 CR3: 0000000113cc6000 CR4: 00000000001406f0
[236222.968035] Call Trace:
[236222.968047]  <IRQ>
[236222.968064]  netif_skb_features+0x25e/0x2c0
[236222.968084]  ? ipt_do_table+0x333/0x600 [ip_tables]
[236222.968098]  validate_xmit_skb+0x1d/0x300
[236222.968111]  validate_xmit_skb_list+0x48/0x70
[236222.968126]  sch_direct_xmit+0x129/0x2f0
[236222.968140]  __dev_queue_xmit+0x710/0x8a0
[236222.968184]  ? nf_confirm+0xcb/0xf0 [nf_conntrack]
[236222.968200]  ? nf_hook_slow+0x3f/0xb0
[236222.968214]  ip_finish_output2+0x2ad/0x560
[236222.968229]  __netif_receive_skb_core+0x4f0/0xf40
[236222.968244]  ? packet_rcv+0x44/0x490
[236222.968257]  __netif_receive_skb_one_core+0x2d/0x70
[236222.968277]  process_backlog+0x96/0x160
[236222.968290]  net_rx_action+0x13c/0x3e0
[236222.968312]  ? usbnet_bh+0x24/0x2b0 [usbnet]
[236222.968327]  __do_softirq+0xd9/0x2c4
[236222.968340]  asm_call_on_stack+0x12/0x20
[236222.968350]  </IRQ>
[236222.968362]  do_softirq_own_stack+0x39/0x50
[236222.968376]  irq_exit_rcu+0xc2/0x100
[236222.968389]  common_interrupt+0x75/0x140
[236222.968405]  asm_common_interrupt+0x1e/0x40
[236222.968427] RIP: 0010:native_safe_halt+0xe/0x10
[236222.968438] Code: 02 20 48 8b 00 a8 08 75 c4 e9 7b ff ff ff cc cc cc cc cc cc cc cc cc cc cc cc cc cc e9 07 00 00 00 0f 00 2d f6 69 49 00 fb f4 <c3> 90 e9 07 00 00 00 0f 00 2d e6 69 49 00 f4 c3 cc cc 0f 1f 44 00
[236222.968456] RSP: 0018:ffffffffa4a03e08 EFLAGS: 00000246
[236222.968467] RAX: 0000000000004000 RBX: 0000000000000001 RCX: 000000000000001f
[236222.968480] RDX: 4ec4ec4ec4ec4ec5 RSI: ffffffffa4b78960 RDI: ffff8d7092f45c00
[236222.968492] RBP: ffff8d709a288000 R08: 0000d6d7f20a4084 R09: 0000000000000006
[236222.968504] R10: 0000000000000022 R11: 000000000000000f R12: ffff8d709a288064
[236222.968515] R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000000
[236222.968535]  acpi_safe_halt+0x1b/0x30
[236222.968549]  acpi_idle_enter+0x27e/0x2e0
[236222.968566]  cpuidle_enter_state+0x81/0x3f0
[236222.968589]  cpuidle_enter+0x29/0x40
[236222.968602]  do_idle+0x1d5/0x2a0
[236222.968615]  cpu_startup_entry+0x19/0x20
[236222.968628]  start_kernel+0x7f4/0x804
[236222.968645]  secondary_startup_64+0xb6/0xc0
[236222.968659] ---[ end trace 8a4d7f639ad88505 ]---
[236222.968692] ------------[ cut here ]------------
[236222.968703] gso_size = 568, gso_type = 0x00000000
[236222.968869] WARNING: CPU: 0 PID: 0 at drivers/net/ethernet/realtek/r8169_main.c:4095 rtl8169_start_xmit+0x489/0x800 [r8169]
[236222.968883] Modules linked in: tcp_diag udp_diag raw_diag inet_diag unix_diag tun nft_nat nft_masq nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat ip_set_hash_net ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security ip_set nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter sunrpc vfat fat snd_hda_codec_realtek snd_hda_codec_generic edac_mce_amd ledtrig_audio kvm_amd snd_hda_codec_hdmi ccp snd_hda_intel snd_intel_dspcfg kvm snd_hda_codec snd_hda_core snd_hwdep irqbypass snd_pcm snd_timer snd hp_wmi sp5100_tco sparse_keymap wmi_bmof fam15h_power k10temp i2c_piix4 soundcore rfkill_gpio rfkill acpi_cpufreq ip_tables xfs amdgpu iommu_v2 gpu_sched i2c_algo_bit ttm drm_kms_helper cec crct10dif_pclmul crc32_pclmul crc32c_intel drm
[236222.968944]  ghash_clmulni_intel ax88179_178a serio_raw usbnet mii r8169 wmi video
[236222.969019] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G        W         5.8.12-203.fc32.x86_64 #1
[236222.969032] Hardware name: HP HP t630 Thin Client/8158, BIOS M40 v01.12 02/04/2020
[236222.969051] RIP: 0010:rtl8169_start_xmit+0x489/0x800 [r8169]
[236222.969063] Code: 10 0f 85 43 01 00 00 80 3d bb 20 01 00 00 0f 85 16 fe ff ff 44 89 ee 48 c7 c7 b0 e2 12 c0 c6 05 a4 20 01 00 01 e8 0d c3 fb e2 <0f> 0b 44 8b 44 24 28 8b 74 24 2c 48 8b 8d c8 00 00 00 e9 e9 fd ff
[236222.969080] RSP: 0018:ffffa8f9c0003c80 EFLAGS: 00010282
[236222.969092] RAX: 0000000000000025 RBX: ffff8d709a9fc000 RCX: 0000000000000000
[236222.969106] RDX: ffff8d709b427060 RSI: ffff8d709b418d00 RDI: 0000000000000300
[236222.969118] RBP: ffff8d7090f2cd00 R08: 0000000000000441 R09: 0720072007200720
[236222.969129] R10: 0720072007200720 R11: 0720072007200720 R12: 00000000000001d0
[236222.969141] R13: 0000000000000238 R14: 0000000000000022 R15: 000000000000001d
[236222.969154] FS:  0000000000000000(0000) GS:ffff8d709b400000(0000) knlGS:0000000000000000
[236222.969166] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[236222.969176] CR2: 00007f3cf5ebf010 CR3: 0000000113cc6000 CR4: 00000000001406f0
[236222.969195] Call Trace:
[236222.969205]  <IRQ>
[236222.969220]  dev_hard_start_xmit+0x8d/0x1d0
[236222.969233]  sch_direct_xmit+0xeb/0x2f0
[236222.969247]  __dev_queue_xmit+0x710/0x8a0
[236222.969280]  ? nf_confirm+0xcb/0xf0 [nf_conntrack]
[236222.969294]  ? nf_hook_slow+0x3f/0xb0
[236222.969306]  ip_finish_output2+0x2ad/0x560
[236222.969320]  __netif_receive_skb_core+0x4f0/0xf40
[236222.969337]  ? packet_rcv+0x44/0x490
[236222.969350]  __netif_receive_skb_one_core+0x2d/0x70
[236222.969363]  process_backlog+0x96/0x160
[236222.969376]  net_rx_action+0x13c/0x3e0
[236222.969395]  ? usbnet_bh+0x24/0x2b0 [usbnet]
[236222.969409]  __do_softirq+0xd9/0x2c4
[236222.969422]  asm_call_on_stack+0x12/0x20
[236222.969432]  </IRQ>
[236222.969443]  do_softirq_own_stack+0x39/0x50
[236222.969455]  irq_exit_rcu+0xc2/0x100
[236222.969468]  common_interrupt+0x75/0x140
[236222.969480]  asm_common_interrupt+0x1e/0x40
[236222.969494] RIP: 0010:native_safe_halt+0xe/0x10
[236222.969505] Code: 02 20 48 8b 00 a8 08 75 c4 e9 7b ff ff ff cc cc cc cc cc cc cc cc cc cc cc cc cc cc e9 07 00 00 00 0f 00 2d f6 69 49 00 fb f4 <c3> 90 e9 07 00 00 00 0f 00 2d e6 69 49 00 f4 c3 cc cc 0f 1f 44 00
[236222.969525] RSP: 0018:ffffffffa4a03e08 EFLAGS: 00000246
[236222.969536] RAX: 0000000000004000 RBX: 0000000000000001 RCX: 000000000000001f
[236222.969548] RDX: 4ec4ec4ec4ec4ec5 RSI: ffffffffa4b78960 RDI: ffff8d7092f45c00
[236222.969560] RBP: ffff8d709a288000 R08: 0000d6d7f20a4084 R09: 0000000000000006
[236222.969571] R10: 0000000000000022 R11: 000000000000000f R12: ffff8d709a288064
[236222.969583] R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000000
[236222.969600]  acpi_safe_halt+0x1b/0x30
[236222.969614]  acpi_idle_enter+0x27e/0x2e0
[236222.969628]  cpuidle_enter_state+0x81/0x3f0
[236222.969642]  cpuidle_enter+0x29/0x40
[236222.969654]  do_idle+0x1d5/0x2a0
[236222.969666]  cpu_startup_entry+0x19/0x20
[236222.969686]  start_kernel+0x7f4/0x804
[236222.969700]  secondary_startup_64+0xb6/0xc0
[236222.969713] ---[ end trace 8a4d7f639ad88506 ]---
Comment 17 Heiner Kallweit 2020-10-09 10:40:05 UTC
Analysis result so far is that somewhere in the network stack an invalid skb is generated, and r8169 driver complains about this. Having said that it's not a problem with the driver.
One guess is that it may be a use-after-free. If you'd like to contribute to the problem analysis, you can build your kernel with KASAN enabled.
Comment 18 Damian Wrobel 2020-10-12 09:45:42 UTC
Created attachment 292923 [details]
kasan-5.8.12-204.fc32.x86_64-1.txt
Comment 19 Damian Wrobel 2020-10-12 09:53:49 UTC
(In reply to Heiner Kallweit from comment #17)
> One guess is that it may be a use-after-free. If you'd like to contribute to
> the problem analysis, you can build your kernel with KASAN enabled.

Please find attached dmesg attachment#292923 [details] from boot with KASAN enabled[6] (rpms for addr2line translation available here[7]).

I see a lot of `skb_warn_bad_offload` warnings (this time they appeared quite soon after a boot):

$ grep skb_warn_bad_offload kasan-5.8.12-204.fc32.x86_64-1.txt | wc -l
56

but no single KASAN reports:

$ grep KASAN kasan-5.8.12-204.fc32.x86_64-1.txt | wc -l
0


[6] https://src.fedoraproject.org/fork/dwrobel/rpms/kernel/c/cea2682b706525c8927afd50404c2fb6ad2a601a?branch=f32-kbug-209423
[7] https://copr.fedorainfracloud.org/coprs/dwrobel/kernel-kbug-209423/build/1702965/
Comment 20 Heiner Kallweit 2021-01-20 06:38:27 UTC
Should be fixed with b160c28548bc ("tcp: do not mess with cloned skbs in tcp_add_backlog()").
Comment 21 Damian Wrobel 2022-08-15 05:25:16 UTC
I can confirm that I no longer observe this issue.