Bug 209423
Summary: | WARN_ON_ONCE() at rtl8169_tso_csum_v2() | ||
---|---|---|---|
Product: | Drivers | Reporter: | Damian Wrobel (dwrobel) |
Component: | Network | Assignee: | drivers_network (drivers_network) |
Status: | RESOLVED CODE_FIX | ||
Severity: | low | CC: | hkallweit1 |
Priority: | P1 | ||
Hardware: | x86-64 | ||
OS: | Linux | ||
Kernel Version: | 5.8.11 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: | kasan-5.8.12-204.fc32.x86_64-1.txt |
Description
Damian Wrobel
2020-09-29 09:05:49 UTC
That's interesting. GSO is used, but it's neither TSO nor TSO6. Do you have any special network setup / network traffic? r8169 is used with basically every consumer mainboard, therefore I'd expect much more such reports if standard network traffic would be affected. With the following patch we get more info about the offending SKB if the warning should pops up again. Note: It's a WARN_ONCE, therefore you'll see only one warning until next reboot. diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c index 6c7c004c2..551f5b8ed 100644 --- a/drivers/net/ethernet/realtek/r8169_main.c +++ b/drivers/net/ethernet/realtek/r8169_main.c @@ -4097,7 +4097,8 @@ static bool rtl8169_tso_csum_v2(struct rtl8169_private *tp, tcp_v6_gso_csum_prep(skb); opts[0] |= TD1_GTSENV6; } else { - WARN_ON_ONCE(1); + WARN_ONCE(1, "gso_size = %u, gso_type = 0x%08x\n", + mss, shinfo->gso_type); } opts[0] |= transport_offset << GTTCPHO_SHIFT; -- 2.28.0 (In reply to Heiner Kallweit from comment #1) > That's interesting. GSO is used, but it's neither TSO nor TSO6. > Do you have any special network setup / network traffic? r8169 is used with > basically every consumer mainboard, This is my first machine with the realtek network card. Basically this machine works as a internet router and has the following list of interfaces: # ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000 link/ether 7c:d3:0a:2d:1b:3b brd ff:ff:ff:ff:ff:ff inet 192.168.160.160/24 brd 192.168.160.255 scope global noprefixroute enp1s0 valid_lft forever preferred_lft forever inet6 fe80::7ed3:aff:fe2d:1b3b/64 scope link valid_lft forever preferred_lft forever 3: enp0s16u2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000 link/ether 80:1f:02:d6:2b:7f brd ff:ff:ff:ff:ff:ff inet 10.8.11.16/26 brd 10.8.11.63 scope global dynamic noprefixroute enp0s16u2 valid_lft 57242sec preferred_lft 57242sec inet6 fe80::821f:2ff:fed6:2b7f/64 scope link valid_lft forever preferred_lft forever 4: tun200: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UNKNOWN group default qlen 100 link/none inet 192.168.200.1/24 brd 192.168.200.255 scope global tun200 valid_lft forever preferred_lft forever inet6 fe80::311f:6a70:3ebe:a9fe/64 scope link stable-privacy valid_lft forever preferred_lft forever 2: This is the realtek built-in card connected to the LAN network to which dozen of devices is connected (mostly laptops and mobile devices via separate AP). 3: This is USB based "0b95:1790 ASIX Electronics Corp. AX88179 Gigabit Ethernet" card connected to WAN (my local Internet provider). 4: This is openvpn based interface to which I can connect to from mobile devices. I don't use explicitly any ipv6 connections, however I let all of the devices to setup a local-link ipv6 address. > With the following patch we get more info about the offending SKB if the > warning should pops up again. > Note: It's a WARN_ONCE, therefore you'll see only one warning until next > reboot. > Would you please consider to merge it upstream to 5.8 or 5.9 series? Otherwise I would have to patch/rebuild manually every single kernel from Fedora. I could do that once or twice but only if I would knew that it's easy reproducible. The patch is just meant as a debug patch. Depending on what type of network communication triggers the WARN, it may require an actual fix. Would be good if you can apply the patch, reboot, and then let's hope that it happens again. (In reply to Heiner Kallweit from comment #3) > The patch is just meant as a debug patch. Depending on what type of network > communication triggers the WARN, it may require an actual fix. The drawback with that approach is that you're relying on a single source to provide you helpful information instead of increasing a probability of getting this information by merging this change into mainline. > Would be good if you can apply the patch, reboot, and then let's hope that > it happens again. Applied here[1] on 5.8.12 - will deploy when becomes available[2]. [1] https://src.fedoraproject.org/fork/dwrobel/rpms/kernel/c/8d0e67bc254db1889441e88f48b66115abcbb996?branch=f32-kbug-209423 [2] https://copr.fedorainfracloud.org/coprs/dwrobel/kernel-kbug-209423/ (In reply to Damian Wrobel from comment #4) > (In reply to Heiner Kallweit from comment #3) > > The patch is just meant as a debug patch. Depending on what type of network > > communication triggers the WARN, it may require an actual fix. > > The drawback with that approach is that you're relying on a single source to > provide you helpful information instead of increasing a probability of > getting this information by merging this change into mainline. > It's a question of timing. As a new feature this patch could make it into 5.10 only, and it would take some time until I get the first feedback. However, based on the input we might get from your system, this additional debug output may be included in the fix for the issue you're facing. (In reply to Heiner Kallweit from comment #5) > (In reply to Damian Wrobel from comment #4) > > (In reply to Heiner Kallweit from comment #3) > > > The patch is just meant as a debug patch. Depending on what type of > network > > > communication triggers the WARN, it may require an actual fix. > > > > The drawback with that approach is that you're relying on a single source > to > > provide you helpful information instead of increasing a probability of > > getting this information by merging this change into mainline. > > > It's a question of timing. I understand you, however we also don't know how easily I can reproduce it, if at all. $ uname -a Linux localhost.localdomain 5.8.12-201.fc32.x86_64 #1 SMP Tue Sep 29 23:49:06 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux $ uptime 08:30:54 up 1 day, 3 min, 1 user, load average: 0.00, 0.00, 0.00 $ journalctl -b | grep r8169_main.c | wc -l 0 I'll keep it running... Here it comes: [86678.377120] ------------[ cut here ]------------ [86678.377155] gso_size = 1448, gso_type = 0x00000000 [86678.377381] WARNING: CPU: 0 PID: 0 at drivers/net/ethernet/realtek/r8169_main.c:4095 rtl8169_start_xmit+0x489/0x800 [r8169] [86678.377393] Modules linked in: tun nft_nat nft_masq nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat ip_set_hash_net ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security ip_set nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter sunrpc vfat fat snd_hda_codec_realtek edac_mce_amd snd_hda_codec_generic snd_hda_codec_hdmi ledtrig_audio kvm_amd snd_hda_intel snd_intel_dspcfg ccp snd_hda_codec kvm snd_hda_core snd_hwdep snd_pcm hp_wmi snd_timer wmi_bmof sparse_keymap irqbypass snd sp5100_tco i2c_piix4 soundcore k10temp fam15h_power rfkill_gpio rfkill acpi_cpufreq ip_tables xfs amdgpu iommu_v2 gpu_sched i2c_algo_bit ttm drm_kms_helper cec drm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel ax88179_178a usbnet serio_raw r8169 mii [86678.377442] wmi video [86678.377486] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.8.12-201.fc32.x86_64 #1 [86678.377495] Hardware name: HP HP t630 Thin Client/8158, BIOS M40 v01.12 02/04/2020 [86678.377511] RIP: 0010:rtl8169_start_xmit+0x489/0x800 [r8169] [86678.377521] Code: 10 0f 85 43 01 00 00 80 3d bb 20 01 00 00 0f 85 16 fe ff ff 44 89 ee 48 c7 c7 b0 72 36 c0 c6 05 a4 20 01 00 01 e8 0d 33 d8 e1 <0f> 0b 44 8b 44 24 28 8b 74 24 2c 48 8b 8d c8 00 00 00 e9 e9 fd ff [86678.377533] RSP: 0018:ffffa8f280003c80 EFLAGS: 00010282 [86678.377542] RAX: 0000000000000026 RBX: ffff8d331abc6000 RCX: 0000000000000000 [86678.377551] RDX: ffff8d331b427060 RSI: ffff8d331b418d00 RDI: 0000000000000300 [86678.377559] RBP: ffff8d32b5bb8200 R08: 00000000000003d0 R09: 000000000000000d [86678.377576] R10: 0000000000000730 R11: ffffa8f280003b15 R12: 00000000000001c0 [86678.377596] R13: 00000000000005a8 R14: 0000000000000022 R15: 000000000000001c [86678.377606] FS: 0000000000000000(0000) GS:ffff8d331b400000(0000) knlGS:0000000000000000 [86678.377617] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [86678.377624] CR2: 00007fa516f64520 CR3: 00000000b6de6000 CR4: 00000000001406f0 [86678.377632] Call Trace: [86678.377641] <IRQ> [86678.377657] dev_hard_start_xmit+0x8d/0x1d0 [86678.377676] sch_direct_xmit+0xeb/0x2f0 [86678.377687] __dev_queue_xmit+0x710/0x8a0 [86678.377713] ? nf_confirm+0xcb/0xf0 [nf_conntrack] [86678.377725] ? nf_hook_slow+0x3f/0xb0 [86678.377735] ip_finish_output2+0x2ad/0x560 [86678.377746] __netif_receive_skb_core+0x4f0/0xf40 [86678.377758] ? packet_rcv+0x44/0x490 [86678.377770] __netif_receive_skb_one_core+0x2d/0x70 [86678.377779] process_backlog+0x96/0x160 [86678.377789] net_rx_action+0x13c/0x3e0 [86678.377804] ? usbnet_bh+0x24/0x2b0 [usbnet] [86678.377815] __do_softirq+0xd9/0x2c4 [86678.377825] asm_call_on_stack+0x12/0x20 [86678.377835] </IRQ> [86678.377845] do_softirq_own_stack+0x39/0x50 [86678.377855] irq_exit_rcu+0xc2/0x100 [86678.377865] common_interrupt+0x75/0x140 [86678.377875] asm_common_interrupt+0x1e/0x40 [86678.377885] RIP: 0010:cpuidle_enter_state+0xb6/0x3f0 [86678.377894] Code: e0 ab 6b 5d e8 ab c4 7b ff 49 89 c7 0f 1f 44 00 00 31 ff e8 7c dd 7b ff 80 7c 24 0f 00 0f 85 d4 01 00 00 fb 66 0f 1f 44 00 00 <45> 85 e4 0f 88 e0 01 00 00 49 63 d4 4c 2b 7c 24 10 48 8d 04 52 48 [86678.377907] RSP: 0018:ffffffffa3a03e58 EFLAGS: 00000246 [86678.377915] RAX: ffff8d331b42a2c0 RBX: ffff8d3312f3e400 RCX: 000000000000001f [86678.377923] RDX: 0000000000000000 RSI: 00000000401ec2e2 RDI: 0000000000000000 [86678.377931] RBP: ffffffffa3b78960 R08: 00004ed561df8e36 R09: 0000000000000006 [86678.377939] R10: 000000000000001d R11: 000000000000000e R12: 0000000000000002 [86678.377956] R13: ffff8d3312f3e400 R14: 0000000000000002 R15: 00004ed561df8e36 [86678.377970] ? cpuidle_enter_state+0xa4/0x3f0 [86678.377980] cpuidle_enter+0x29/0x40 [86678.377990] do_idle+0x1d5/0x2a0 [86678.377999] cpu_startup_entry+0x19/0x20 [86678.378009] start_kernel+0x7f4/0x804 [86678.378022] secondary_startup_64+0xb6/0xc0 [86678.378032] ---[ end trace 263bcddb7119c953 ]--- I also executed: # echo 1 > /sys/kernel/debug/clear_warn_once to check if it will reappear. In case you would like to test any other modifications, please prepare it using git format-patch, as fedora kernel.spec doesn't like plain .diff files. I received feedback that the issue you're seeing may indicate a bug in the network stack, not in r8169 driver. Could you please check whether the issue still occurs with the following change. From 54ec9afb36b7e87aaeb45db5e9dcfa9fe78cc672 Mon Sep 17 00:00:00 2001 From: Heiner Kallweit <hkallweit1@gmail.com> Date: Fri, 2 Oct 2020 13:05:51 +0200 Subject: [PATCH] net: clear gso_size in napi_reuse_skb Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> --- net/core/dev.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/core/dev.c b/net/core/dev.c index 62b06523b..8e75399cc 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -6088,6 +6088,7 @@ static void napi_reuse_skb(struct napi_struct *napi, struct sk_buff *skb) skb->encapsulation = 0; skb_shinfo(skb)->gso_type = 0; + skb_shinfo(skb)->gso_size = 0; skb->truesize = SKB_TRUESIZE(skb_end_offset(skb)); skb_ext_reset(skb); -- 2.28.0 (In reply to Heiner Kallweit from comment #10) > I received feedback that the issue you're seeing may indicate a bug in the > network stack, not in r8169 driver. Could you please check whether the issue > still occurs with the following change. Thank you for the feedback. Sure, I'll check it. Applied on top of the previous patch here[3] on 5.8.12 - will deploy when becomes available. [3] https://src.fedoraproject.org/fork/dwrobel/rpms/kernel I found some reports about a similar issue in tun when using docker. https://forums.unraid.net/bug-reports/prereleases/68-rc1-kernel-tun-unexpected-gso-r634/?page=3 According to comment 2 you have a tun network interface. Are you also running docker on the machine? Or what do you use the tun interface for? It is still reproducible in a build with patches from: - comment#1, - comment#10: [27121.062075] ------------[ cut here ]------------ [27121.062085] gso_size = 1448, gso_type = 0x00000000 [27121.062220] WARNING: CPU: 0 PID: 0 at drivers/net/ethernet/realtek/r8169_main.c:4095 rtl8169_start_xmit+0x489/0x800 [r8169] [27121.062225] Modules linked in: tun nft_nat nft_masq nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat ip_set_hash_net ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security ip_set nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter sunrpc vfat fat edac_mce_amd kvm_amd ccp kvm snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi ledtrig_audio snd_hda_intel snd_intel_dspcfg hp_wmi wmi_bmof snd_hda_codec irqbypass sparse_keymap fam15h_power k10temp snd_hda_core snd_hwdep snd_pcm snd_timer snd sp5100_tco soundcore i2c_piix4 rfkill_gpio rfkill acpi_cpufreq ip_tables xfs amdgpu iommu_v2 gpu_sched i2c_algo_bit ttm drm_kms_helper cec crct10dif_pclmul crc32_pclmul crc32c_intel drm ghash_clmulni_intel ax88179_178a usbnet serio_raw mii r8169 [27121.062268] wmi video [27121.062301] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.8.12-202.fc32.x86_64 #1 [27121.062306] Hardware name: HP HP t630 Thin Client/8158, BIOS M40 v01.12 02/04/2020 [27121.062317] RIP: 0010:rtl8169_start_xmit+0x489/0x800 [r8169] [27121.062324] Code: 10 0f 85 43 01 00 00 80 3d bb 20 01 00 00 0f 85 16 fe ff ff 44 89 ee 48 c7 c7 b0 82 4e c0 c6 05 a4 20 01 00 01 e8 0d 23 c0 cd <0f> 0b 44 8b 44 24 28 8b 74 24 2c 48 8b 8d c8 00 00 00 e9 e9 fd ff [27121.062333] RSP: 0018:ffffba1080003c80 EFLAGS: 00010282 [27121.062354] RAX: 0000000000000026 RBX: ffff9729973e4000 RCX: ffff97299b418d08 [27121.062359] RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff97299b418d00 [27121.062364] RBP: ffff9729383d9200 R08: 00000000000003d5 R09: 000000000000000b [27121.062369] R10: 0000000000000730 R11: ffffba1080003b15 R12: 0000000000000000 [27121.062373] R13: 00000000000005a8 R14: 0000000000000022 R15: 0000000000000000 [27121.062379] FS: 0000000000000000(0000) GS:ffff97299b400000(0000) knlGS:0000000000000000 [27121.062385] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [27121.062389] CR2: 00007f71f400c598 CR3: 00000000b961e000 CR4: 00000000001406f0 [27121.062394] Call Trace: [27121.062400] <IRQ> [27121.062413] dev_hard_start_xmit+0x8d/0x1d0 [27121.062428] sch_direct_xmit+0xeb/0x2f0 [27121.062435] __dev_queue_xmit+0x710/0x8a0 [27121.062455] ? nf_confirm+0xcb/0xf0 [nf_conntrack] [27121.062462] ? nf_hook_slow+0x3f/0xb0 [27121.062468] ip_finish_output2+0x2ad/0x560 [27121.062475] __netif_receive_skb_core+0x4f0/0xf40 [27121.062482] ? packet_rcv+0x44/0x490 [27121.062488] __netif_receive_skb_one_core+0x2d/0x70 [27121.062494] process_backlog+0x96/0x160 [27121.062500] net_rx_action+0x13c/0x3e0 [27121.062560] ? usbnet_bh+0x24/0x2b0 [usbnet] [27121.062569] __do_softirq+0xd9/0x2c4 [27121.062576] asm_call_on_stack+0x12/0x20 [27121.062580] </IRQ> [27121.062586] do_softirq_own_stack+0x39/0x50 [27121.062593] irq_exit_rcu+0xc2/0x100 [27121.062600] common_interrupt+0x75/0x140 [27121.062605] asm_common_interrupt+0x1e/0x40 [27121.062612] RIP: 0010:native_safe_halt+0xe/0x10 [27121.062617] Code: 02 20 48 8b 00 a8 08 75 c4 e9 7b ff ff ff cc cc cc cc cc cc cc cc cc cc cc cc cc cc e9 07 00 00 00 0f 00 2d f6 69 49 00 fb f4 <c3> 90 e9 07 00 00 00 0f 00 2d e6 69 49 00 f4 c3 cc cc 0f 1f 44 00 [27121.062626] RSP: 0018:ffffffff8fa03e08 EFLAGS: 00000246 [27121.062631] RAX: 0000000000004000 RBX: 0000000000000001 RCX: 000000000000001f [27121.062635] RDX: 4ec4ec4ec4ec4ec5 RSI: ffffffff8fb78960 RDI: ffff972992e75000 [27121.062640] RBP: ffff972999c5ec00 R08: 000018aa9d42aec5 R09: 0000000000000005 [27121.062645] R10: 0000000000000024 R11: 0000000000000013 R12: ffff972999c5ec64 [27121.062649] R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000000 [27121.062660] acpi_safe_halt+0x1b/0x30 [27121.062674] acpi_idle_enter+0x27e/0x2e0 [27121.062683] cpuidle_enter_state+0x81/0x3f0 [27121.062689] cpuidle_enter+0x29/0x40 [27121.062695] do_idle+0x1d5/0x2a0 [27121.062700] cpu_startup_entry+0x19/0x20 [27121.062707] start_kernel+0x7f4/0x804 [27121.062716] secondary_startup_64+0xb6/0xc0 [27121.062722] ---[ end trace dc4fdf09c3fffc4d ]--- (In reply to Heiner Kallweit from comment #12) > I found some reports about a similar issue in tun when using docker. > > https://forums.unraid.net/bug-reports/prereleases/68-rc1-kernel-tun- > unexpected-gso-r634/?page=3 > > According to comment 2 you have a tun network interface. Are you also > running docker on the machine? No, I don't use docker nor podman. > Or what do you use the tun interface for? I described that on comment#2: (In reply to Damian Wrobel from comment #2) > (In reply to Heiner Kallweit from comment #1) > > 4: This is openvpn based interface to which I can connect to from mobile > devices. > Currently I'm running openvpn server version 2.4.9-1.fc32. I'm running the same router configuration for a few years the only thing which actually changed over the time was the hardware architecture. I started from RPi3B+ then switched to RPi4B and recently to HP-T630. The reason is that with increasing speed of my fiber internet connection previous hardware wasn't capable of routing it fast enough. In practise the interface nr: 3 (see commen#2) is always the same (as it's external USB dongle) what changes recently, when switching for T630, is that the internal interface nr: 2 is no longer Broadcom but Realtek based. So far (in last few years) I didn't observe similar warning as reported here - perhaps different ethernet drivers handles this differently. Here comes one more debug patch. skb_warn_bad_offload() provides more details about the offending skb, and maybe we get an idea where in the network stack something goes wrong. From c52e6b215e6fd474842e92108fa51f73dbfb3fef Mon Sep 17 00:00:00 2001 From: Heiner Kallweit <hkallweit1@gmail.com> Date: Sat, 3 Oct 2020 17:09:32 +0200 Subject: [PATCH] net: warn if gso_size is set but gso_type is not Warn if gso_size is set but gso_type is not. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> --- net/core/dev.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/net/core/dev.c b/net/core/dev.c index 62b06523b..4c943b774 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -3491,6 +3491,9 @@ static netdev_features_t gso_features_check(const struct sk_buff *skb, { u16 gso_segs = skb_shinfo(skb)->gso_segs; + if (!skb_shinfo(skb)->gso_type) + skb_warn_bad_offload(skb); + if (gso_segs > dev->gso_max_segs) return features & ~NETIF_F_GSO_MASK; -- 2.28.0 (In reply to Heiner Kallweit from comment #14) > Here comes one more debug patch. skb_warn_bad_offload() provides more > details about the offending skb, Great, committed here[4], building here[5], will let you know, once deployed, when something pop-up... [4] https://src.fedoraproject.org/fork/dwrobel/rpms/kernel/c/54c17223330ae5010f47411ee8f1860a08de0440?branch=f32-kbug-209423 [5] https://copr.fedorainfracloud.org/coprs/dwrobel/kernel-kbug-209423/build/1694926/ It took longer than usually, but here it comes build with patches from: - comment#1, - comment#10: - comment#14: [236222.967236] skb len=134 headroom=778 headlen=134 tailroom=31536 mac=(778,14) net=(792,20) trans=812 shinfo(txflags=0 nr_frags=0 gso(size=568 type=0 segs=1)) csum(0x0 ip_summed=1 complete_sw=0 valid=0 level=0) hash(0x0 sw=0 l4=0) proto=0x0800 pkttype=0 iif=4 [236222.967297] dev name=enp1s0 feat=0x0x00000100000041b2 [236222.967392] skb linear: 00000000: 00 13 3b a0 01 e8 7c d3 0a 2d 1b 3b 08 00 45 00 [236222.967404] skb linear: 00000010: 00 78 e2 e6 00 00 7b 06 52 e1 d8 3a d0 ce c0 a8 [236222.967415] skb linear: 00000020: a0 06 01 bb 8b c6 53 91 be 5e 6e 60 bd e2 80 18 [236222.967426] skb linear: 00000030: 01 13 5c f6 00 00 01 01 08 0a 3d d6 6a a3 63 ea [236222.967437] skb linear: 00000040: 5c d9 17 03 03 00 3f af 00 01 84 45 e2 36 e4 6a [236222.967454] skb linear: 00000050: 3d 76 a8 7f d7 12 fa 72 4b d1 d0 74 0d c1 49 77 [236222.967466] skb linear: 00000060: 8b a4 bb 04 e5 aa 03 61 d3 e6 1f c9 0d 3e 46 c8 [236222.967477] skb linear: 00000070: cd 1f 7d ce e8 a7 84 84 01 5d 1f b4 ee 4f 27 63 [236222.967488] skb linear: 00000080: d2 a1 ab 1f 26 1d [236222.967498] ------------[ cut here ]------------ [236222.967508] r8169: caps=(0x00000100000041b2, 0x0000000000000000) [236222.967668] WARNING: CPU: 0 PID: 0 at net/core/dev.c:3184 skb_warn_bad_offload+0x72/0xe0 [236222.967691] Modules linked in: tcp_diag udp_diag raw_diag inet_diag unix_diag tun nft_nat nft_masq nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat ip_set_hash_net ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security ip_set nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter sunrpc vfat fat snd_hda_codec_realtek snd_hda_codec_generic edac_mce_amd ledtrig_audio kvm_amd snd_hda_codec_hdmi ccp snd_hda_intel snd_intel_dspcfg kvm snd_hda_codec snd_hda_core snd_hwdep irqbypass snd_pcm snd_timer snd hp_wmi sp5100_tco sparse_keymap wmi_bmof fam15h_power k10temp i2c_piix4 soundcore rfkill_gpio rfkill acpi_cpufreq ip_tables xfs amdgpu iommu_v2 gpu_sched i2c_algo_bit ttm drm_kms_helper cec crct10dif_pclmul crc32_pclmul crc32c_intel drm [236222.967776] ghash_clmulni_intel ax88179_178a serio_raw usbnet mii r8169 wmi video [236222.967858] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.8.12-203.fc32.x86_64 #1 [236222.967870] Hardware name: HP HP t630 Thin Client/8158, BIOS M40 v01.12 02/04/2020 [236222.967895] RIP: 0010:skb_warn_bad_offload+0x72/0xe0 [236222.967908] Code: 8d 95 c8 00 00 00 48 8d 88 e8 01 00 00 48 85 c0 48 c7 c0 d8 d7 15 a4 48 0f 44 c8 4c 89 e6 48 c7 c7 90 7b 47 a4 e8 04 85 72 ff <0f> 0b 5b 5d 41 5c c3 80 7d 00 00 49 c7 c4 3b 28 40 a4 74 ac be 25 [236222.967926] RSP: 0018:ffffa8f9c0003c80 EFLAGS: 00010282 [236222.967938] RAX: 0000000000000034 RBX: ffff8d7090f2cd00 RCX: 0000000000000000 [236222.967951] RDX: ffff8d709b427060 RSI: ffff8d709b418d00 RDI: 0000000000000300 [236222.967962] RBP: ffff8d709a9fc000 R08: 0000000000000406 R09: 0720072007200720 [236222.967974] R10: 0720072007200720 R11: 0729073007300730 R12: ffffffffc012e729 [236222.967986] R13: ffffa8f9c0003d3b R14: 0000000000000000 R15: ffff8d70367652ac [236222.968000] FS: 0000000000000000(0000) GS:ffff8d709b400000(0000) knlGS:0000000000000000 [236222.968013] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [236222.968023] CR2: 00007f3cf5ebf010 CR3: 0000000113cc6000 CR4: 00000000001406f0 [236222.968035] Call Trace: [236222.968047] <IRQ> [236222.968064] netif_skb_features+0x25e/0x2c0 [236222.968084] ? ipt_do_table+0x333/0x600 [ip_tables] [236222.968098] validate_xmit_skb+0x1d/0x300 [236222.968111] validate_xmit_skb_list+0x48/0x70 [236222.968126] sch_direct_xmit+0x129/0x2f0 [236222.968140] __dev_queue_xmit+0x710/0x8a0 [236222.968184] ? nf_confirm+0xcb/0xf0 [nf_conntrack] [236222.968200] ? nf_hook_slow+0x3f/0xb0 [236222.968214] ip_finish_output2+0x2ad/0x560 [236222.968229] __netif_receive_skb_core+0x4f0/0xf40 [236222.968244] ? packet_rcv+0x44/0x490 [236222.968257] __netif_receive_skb_one_core+0x2d/0x70 [236222.968277] process_backlog+0x96/0x160 [236222.968290] net_rx_action+0x13c/0x3e0 [236222.968312] ? usbnet_bh+0x24/0x2b0 [usbnet] [236222.968327] __do_softirq+0xd9/0x2c4 [236222.968340] asm_call_on_stack+0x12/0x20 [236222.968350] </IRQ> [236222.968362] do_softirq_own_stack+0x39/0x50 [236222.968376] irq_exit_rcu+0xc2/0x100 [236222.968389] common_interrupt+0x75/0x140 [236222.968405] asm_common_interrupt+0x1e/0x40 [236222.968427] RIP: 0010:native_safe_halt+0xe/0x10 [236222.968438] Code: 02 20 48 8b 00 a8 08 75 c4 e9 7b ff ff ff cc cc cc cc cc cc cc cc cc cc cc cc cc cc e9 07 00 00 00 0f 00 2d f6 69 49 00 fb f4 <c3> 90 e9 07 00 00 00 0f 00 2d e6 69 49 00 f4 c3 cc cc 0f 1f 44 00 [236222.968456] RSP: 0018:ffffffffa4a03e08 EFLAGS: 00000246 [236222.968467] RAX: 0000000000004000 RBX: 0000000000000001 RCX: 000000000000001f [236222.968480] RDX: 4ec4ec4ec4ec4ec5 RSI: ffffffffa4b78960 RDI: ffff8d7092f45c00 [236222.968492] RBP: ffff8d709a288000 R08: 0000d6d7f20a4084 R09: 0000000000000006 [236222.968504] R10: 0000000000000022 R11: 000000000000000f R12: ffff8d709a288064 [236222.968515] R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000000 [236222.968535] acpi_safe_halt+0x1b/0x30 [236222.968549] acpi_idle_enter+0x27e/0x2e0 [236222.968566] cpuidle_enter_state+0x81/0x3f0 [236222.968589] cpuidle_enter+0x29/0x40 [236222.968602] do_idle+0x1d5/0x2a0 [236222.968615] cpu_startup_entry+0x19/0x20 [236222.968628] start_kernel+0x7f4/0x804 [236222.968645] secondary_startup_64+0xb6/0xc0 [236222.968659] ---[ end trace 8a4d7f639ad88505 ]--- [236222.968692] ------------[ cut here ]------------ [236222.968703] gso_size = 568, gso_type = 0x00000000 [236222.968869] WARNING: CPU: 0 PID: 0 at drivers/net/ethernet/realtek/r8169_main.c:4095 rtl8169_start_xmit+0x489/0x800 [r8169] [236222.968883] Modules linked in: tcp_diag udp_diag raw_diag inet_diag unix_diag tun nft_nat nft_masq nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat ip_set_hash_net ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security ip_set nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter sunrpc vfat fat snd_hda_codec_realtek snd_hda_codec_generic edac_mce_amd ledtrig_audio kvm_amd snd_hda_codec_hdmi ccp snd_hda_intel snd_intel_dspcfg kvm snd_hda_codec snd_hda_core snd_hwdep irqbypass snd_pcm snd_timer snd hp_wmi sp5100_tco sparse_keymap wmi_bmof fam15h_power k10temp i2c_piix4 soundcore rfkill_gpio rfkill acpi_cpufreq ip_tables xfs amdgpu iommu_v2 gpu_sched i2c_algo_bit ttm drm_kms_helper cec crct10dif_pclmul crc32_pclmul crc32c_intel drm [236222.968944] ghash_clmulni_intel ax88179_178a serio_raw usbnet mii r8169 wmi video [236222.969019] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 5.8.12-203.fc32.x86_64 #1 [236222.969032] Hardware name: HP HP t630 Thin Client/8158, BIOS M40 v01.12 02/04/2020 [236222.969051] RIP: 0010:rtl8169_start_xmit+0x489/0x800 [r8169] [236222.969063] Code: 10 0f 85 43 01 00 00 80 3d bb 20 01 00 00 0f 85 16 fe ff ff 44 89 ee 48 c7 c7 b0 e2 12 c0 c6 05 a4 20 01 00 01 e8 0d c3 fb e2 <0f> 0b 44 8b 44 24 28 8b 74 24 2c 48 8b 8d c8 00 00 00 e9 e9 fd ff [236222.969080] RSP: 0018:ffffa8f9c0003c80 EFLAGS: 00010282 [236222.969092] RAX: 0000000000000025 RBX: ffff8d709a9fc000 RCX: 0000000000000000 [236222.969106] RDX: ffff8d709b427060 RSI: ffff8d709b418d00 RDI: 0000000000000300 [236222.969118] RBP: ffff8d7090f2cd00 R08: 0000000000000441 R09: 0720072007200720 [236222.969129] R10: 0720072007200720 R11: 0720072007200720 R12: 00000000000001d0 [236222.969141] R13: 0000000000000238 R14: 0000000000000022 R15: 000000000000001d [236222.969154] FS: 0000000000000000(0000) GS:ffff8d709b400000(0000) knlGS:0000000000000000 [236222.969166] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [236222.969176] CR2: 00007f3cf5ebf010 CR3: 0000000113cc6000 CR4: 00000000001406f0 [236222.969195] Call Trace: [236222.969205] <IRQ> [236222.969220] dev_hard_start_xmit+0x8d/0x1d0 [236222.969233] sch_direct_xmit+0xeb/0x2f0 [236222.969247] __dev_queue_xmit+0x710/0x8a0 [236222.969280] ? nf_confirm+0xcb/0xf0 [nf_conntrack] [236222.969294] ? nf_hook_slow+0x3f/0xb0 [236222.969306] ip_finish_output2+0x2ad/0x560 [236222.969320] __netif_receive_skb_core+0x4f0/0xf40 [236222.969337] ? packet_rcv+0x44/0x490 [236222.969350] __netif_receive_skb_one_core+0x2d/0x70 [236222.969363] process_backlog+0x96/0x160 [236222.969376] net_rx_action+0x13c/0x3e0 [236222.969395] ? usbnet_bh+0x24/0x2b0 [usbnet] [236222.969409] __do_softirq+0xd9/0x2c4 [236222.969422] asm_call_on_stack+0x12/0x20 [236222.969432] </IRQ> [236222.969443] do_softirq_own_stack+0x39/0x50 [236222.969455] irq_exit_rcu+0xc2/0x100 [236222.969468] common_interrupt+0x75/0x140 [236222.969480] asm_common_interrupt+0x1e/0x40 [236222.969494] RIP: 0010:native_safe_halt+0xe/0x10 [236222.969505] Code: 02 20 48 8b 00 a8 08 75 c4 e9 7b ff ff ff cc cc cc cc cc cc cc cc cc cc cc cc cc cc e9 07 00 00 00 0f 00 2d f6 69 49 00 fb f4 <c3> 90 e9 07 00 00 00 0f 00 2d e6 69 49 00 f4 c3 cc cc 0f 1f 44 00 [236222.969525] RSP: 0018:ffffffffa4a03e08 EFLAGS: 00000246 [236222.969536] RAX: 0000000000004000 RBX: 0000000000000001 RCX: 000000000000001f [236222.969548] RDX: 4ec4ec4ec4ec4ec5 RSI: ffffffffa4b78960 RDI: ffff8d7092f45c00 [236222.969560] RBP: ffff8d709a288000 R08: 0000d6d7f20a4084 R09: 0000000000000006 [236222.969571] R10: 0000000000000022 R11: 000000000000000f R12: ffff8d709a288064 [236222.969583] R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000000 [236222.969600] acpi_safe_halt+0x1b/0x30 [236222.969614] acpi_idle_enter+0x27e/0x2e0 [236222.969628] cpuidle_enter_state+0x81/0x3f0 [236222.969642] cpuidle_enter+0x29/0x40 [236222.969654] do_idle+0x1d5/0x2a0 [236222.969666] cpu_startup_entry+0x19/0x20 [236222.969686] start_kernel+0x7f4/0x804 [236222.969700] secondary_startup_64+0xb6/0xc0 [236222.969713] ---[ end trace 8a4d7f639ad88506 ]--- Analysis result so far is that somewhere in the network stack an invalid skb is generated, and r8169 driver complains about this. Having said that it's not a problem with the driver. One guess is that it may be a use-after-free. If you'd like to contribute to the problem analysis, you can build your kernel with KASAN enabled. Created attachment 292923 [details]
kasan-5.8.12-204.fc32.x86_64-1.txt
(In reply to Heiner Kallweit from comment #17) > One guess is that it may be a use-after-free. If you'd like to contribute to > the problem analysis, you can build your kernel with KASAN enabled. Please find attached dmesg attachment#292923 [details] from boot with KASAN enabled[6] (rpms for addr2line translation available here[7]). I see a lot of `skb_warn_bad_offload` warnings (this time they appeared quite soon after a boot): $ grep skb_warn_bad_offload kasan-5.8.12-204.fc32.x86_64-1.txt | wc -l 56 but no single KASAN reports: $ grep KASAN kasan-5.8.12-204.fc32.x86_64-1.txt | wc -l 0 [6] https://src.fedoraproject.org/fork/dwrobel/rpms/kernel/c/cea2682b706525c8927afd50404c2fb6ad2a601a?branch=f32-kbug-209423 [7] https://copr.fedorainfracloud.org/coprs/dwrobel/kernel-kbug-209423/build/1702965/ Should be fixed with b160c28548bc ("tcp: do not mess with cloned skbs in tcp_add_backlog()"). I can confirm that I no longer observe this issue. |