Bug 217777 - kernel: WARNING: CPU: 0 PID: 16269 at net/netfilter/nf_conntrack_core.c:1210
Summary: kernel: WARNING: CPU: 0 PID: 16269 at net/netfilter/nf_conntrack_core.c:1210
Status: NEW
Alias: None
Product: Networking
Classification: Unclassified
Component: Netfilter/Iptables (show other bugs)
Hardware: All Linux
: P3 normal
Assignee: networking_netfilter-iptables@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-08-08 17:44 UTC by LimeTech
Modified: 2023-09-12 18:11 UTC (History)
2 users (show)

See Also:
Kernel Version:
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Non-functional test case (3.78 KB, application/x-shellscript)
2023-08-11 10:31 UTC, Florian Westphal
Details

Description LimeTech 2023-08-08 17:44:11 UTC
This call trace occurs consistently when using the docker macvlan driver where the bridge's parent is a Linux bridge.

Below is the typical configuration of br0 and docker macvlan

# commands to configure eth0 / br0 (br0 uses dhcp to obtain IP address)
ip link add name br0 type bridge stp_state 0 forward_delay 0 nf_call_iptables 1 nf_call_ip6tables 1 nf_call_arptables 1
ip link set br0 up
ip link set eth0 down
ip -4 addr flush dev eth0
ip -6 addr flush dev eth0
ip link set eth0 promisc on master br0 up

# command to configure docker macvlan network on br0
docker network create -d macvlan --subnet=10.0.101.0/24 --gateway=10.0.101.1 --aux-address=server=10.0.101.13 -o parent=br0 br0

We are currently running 6.1.43 kernel but this issue has been happening with all previous kernels for a least a couple of years now.  For example:
https://www.spinics.net/lists/netfilter/msg59040.html

Also syzbot detected this same issue in 6.5.0-rc2
https://www.spinics.net/lists/netfilter-devel/msg81831.html

Finally we also tried creating a vhost interface attached to br0 and then set that as the docker macvlan "parent" - same result.


Aug  8 10:06:51 ODROID kernel: ------------[ cut here ]------------
Aug  8 10:06:51 ODROID kernel: WARNING: CPU: 0 PID: 16269 at net/netfilter/nf_conntrack_core.c:1210 __nf_conntrack_confirm+0xa4/0x2b0 [nf_conntrack]
Aug  8 10:06:51 ODROID kernel: Modules linked in: xt_CHECKSUM ipt_REJECT nf_reject_ipv4 ip6table_mangle ip6table_nat iptable_mangle vhost_net tun vhost vhost_iotlb tap xt_nat xt_tcpudp veth xt_conntrack nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype br_netfilter xfs cmac cifs asn1_decoder cifs_arc4 cifs_md4 oid_registry dns_resolver md_mod zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) tcp_diag inet_diag emc2103 iptable_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libchacha ip6table_filter ip6_tables iptable_filter ip_tables x_tables efivarfs af_packet 8021q garp mrp macvlan bridge stp llc bonding tls r8125(O) intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel i915 kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel sha512_ssse3 iosf_mbi drm_buddy
Aug  8 10:06:51 ODROID kernel: aesni_intel i2c_algo_bit ttm drm_display_helper crypto_simd drm_kms_helper cryptd mei_hdcp mei_pxp nvme rapl i2c_i801 sr_mod input_leds intel_cstate drm i2c_smbus cdrom nvme_core joydev led_class video wmi intel_gtt ahci agpgart backlight libahci mei_me intel_pmc_core i2c_core syscopyarea sysfillrect mei sysimgblt fb_sys_fops thermal fan button unix [last unloaded: r8125(O)]
Aug  8 10:06:51 ODROID kernel: CPU: 0 PID: 16269 Comm: kworker/u8:0 Tainted: P           O       6.1.43-Unraid #2
Aug  8 10:06:51 ODROID kernel: Hardware name: HARDKERNEL ODROID-H2/ODROID-H2, BIOS 5.13 04/27/2020
Aug  8 10:06:51 ODROID kernel: Workqueue: events_unbound macvlan_process_broadcast [macvlan]
Aug  8 10:06:51 ODROID kernel: RIP: 0010:__nf_conntrack_confirm+0xa4/0x2b0 [nf_conntrack]
Aug  8 10:06:51 ODROID kernel: Code: 44 24 10 e8 e2 e1 ff ff 8b 7c 24 04 89 ea 89 c6 89 04 24 e8 7e e6 ff ff 84 c0 75 a2 48 89 df e8 9b e2 ff ff 85 c0 89 c5 74 18 <0f> 0b 8b 34 24 8b 7c 24 04 e8 18 dd ff ff e8 93 e3 ff ff e9 72 01
Aug  8 10:06:51 ODROID kernel: RSP: 0018:ffffc90000003d98 EFLAGS: 00010202
Aug  8 10:06:51 ODROID kernel: RAX: 0000000000000001 RBX: ffff88814a7ac400 RCX: 9c0c70d57470940d
Aug  8 10:06:51 ODROID kernel: RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff88814a7ac400
Aug  8 10:06:51 ODROID kernel: RBP: 0000000000000001 R08: 2fcce5ef4761f5d5 R09: 28e40e9ae48c7a5f
Aug  8 10:06:51 ODROID kernel: R10: f5516b05dfc149e9 R11: ffffc90000003d60 R12: ffffffff82a11d00
Aug  8 10:06:51 ODROID kernel: R13: 0000000000011231 R14: ffff88814c989d00 R15: 0000000000000000
Aug  8 10:06:51 ODROID kernel: FS:  0000000000000000(0000) GS:ffff88846fc00000(0000) knlGS:0000000000000000
Aug  8 10:06:51 ODROID kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug  8 10:06:51 ODROID kernel: CR2: 0000000000537d30 CR3: 000000018567a000 CR4: 0000000000350ef0
Aug  8 10:06:51 ODROID kernel: Call Trace:
Aug  8 10:06:51 ODROID kernel: <IRQ>
Aug  8 10:06:51 ODROID kernel: ? __warn+0xab/0x122
Aug  8 10:06:51 ODROID kernel: ? report_bug+0x109/0x17e
Aug  8 10:06:51 ODROID kernel: ? __nf_conntrack_confirm+0xa4/0x2b0 [nf_conntrack]
Aug  8 10:06:51 ODROID kernel: ? handle_bug+0x41/0x6f
Aug  8 10:06:51 ODROID kernel: ? exc_invalid_op+0x13/0x60
Aug  8 10:06:51 ODROID kernel: ? asm_exc_invalid_op+0x16/0x20
Aug  8 10:06:51 ODROID kernel: ? __nf_conntrack_confirm+0xa4/0x2b0 [nf_conntrack]
Aug  8 10:06:51 ODROID kernel: ? __nf_conntrack_confirm+0x9e/0x2b0 [nf_conntrack]
Aug  8 10:06:51 ODROID kernel: ? nf_nat_inet_fn+0x60/0x1a8 [nf_nat]
Aug  8 10:06:51 ODROID kernel: nf_conntrack_confirm+0x25/0x54 [nf_conntrack]
Aug  8 10:06:51 ODROID kernel: nf_hook_slow+0x3a/0x96
Aug  8 10:06:51 ODROID kernel: ? ip_protocol_deliver_rcu+0x164/0x164
Aug  8 10:06:51 ODROID kernel: NF_HOOK.constprop.0+0x79/0xd9
Aug  8 10:06:51 ODROID kernel: ? ip_protocol_deliver_rcu+0x164/0x164
Aug  8 10:06:51 ODROID kernel: __netif_receive_skb_one_core+0x77/0x9c
Aug  8 10:06:51 ODROID kernel: process_backlog+0x8c/0x116
Aug  8 10:06:51 ODROID kernel: __napi_poll.constprop.0+0x28/0x124
Aug  8 10:06:51 ODROID kernel: net_rx_action+0x159/0x24f
Aug  8 10:06:51 ODROID kernel: __do_softirq+0x126/0x288
Aug  8 10:06:51 ODROID kernel: do_softirq+0x7f/0xab
Aug  8 10:06:51 ODROID kernel: </IRQ>
Aug  8 10:06:51 ODROID kernel: <TASK>
Aug  8 10:06:51 ODROID kernel: __local_bh_enable_ip+0x4c/0x6b
Aug  8 10:06:51 ODROID kernel: netif_rx+0x52/0x5a
Aug  8 10:06:51 ODROID kernel: macvlan_broadcast+0x10a/0x150 [macvlan]
Aug  8 10:06:51 ODROID kernel: ? _raw_spin_unlock+0x14/0x29
Aug  8 10:06:51 ODROID kernel: macvlan_process_broadcast+0xbc/0x12f [macvlan]
Aug  8 10:06:51 ODROID kernel: process_one_work+0x1a8/0x295
Aug  8 10:06:51 ODROID kernel: worker_thread+0x18b/0x244
Aug  8 10:06:51 ODROID kernel: ? rescuer_thread+0x281/0x281
Aug  8 10:06:51 ODROID kernel: kthread+0xe4/0xef
Aug  8 10:06:51 ODROID kernel: ? kthread_complete_and_exit+0x1b/0x1b
Aug  8 10:06:51 ODROID kernel: ret_from_fork+0x1f/0x30
Aug  8 10:06:51 ODROID kernel: </TASK>
Aug  8 10:06:51 ODROID kernel: ---[ end trace 0000000000000000 ]---
Comment 1 Florian Westphal 2023-08-11 10:31:01 UTC
Created attachment 304816 [details]
Non-functional test case

Conntrack has design assumptions that get broken by bridge netfilter.
However, I fail to reproduce this problem.

Attaching a non-functional reproducer.  Can you please confirm if the setup is correct (macvlan on top of bridge)? I tried persistent flooding, but so far to no avail (the clones done by br_flood are processed in serial fashion so not surprising that this doesn't trigger any problem).

I can see problematic skb_clone() in bridge (br_flood and friends), but not in macvlan driver. By the time packet makes it to macvlan the conntrack entry is already confirmed, so skb_clone() is safe.

Is there anything else about the setup that makes a difference?
Comment 2 LimeTech 2023-08-18 17:16:20 UTC
(In reply to Florian Westphal from comment #1)
> Created attachment 304816 [details]
> Non-functional test case
> 
> Conntrack has design assumptions that get broken by bridge netfilter.
> However, I fail to reproduce this problem.
> 
> Attaching a non-functional reproducer.  Can you please confirm if the setup
> is correct (macvlan on top of bridge)? I tried persistent flooding, but so
> far to no avail (the clones done by br_flood are processed in serial fashion
> so not surprising that this doesn't trigger any problem).
> 
> I can see problematic skb_clone() in bridge (br_flood and friends), but not
> in macvlan driver. By the time packet makes it to macvlan the conntrack
> entry is already confirmed, so skb_clone() is safe.
> 
> Is there anything else about the setup that makes a difference?

The only reliable way we can reproduce this issue is to start docker, setting up a macvlan network where the parent interface is a bridge on the host.  You do not even need to start a container.  see: https://docs.docker.com/network/network-tutorial-macvlan/

If the parent interface is not a bridge, the problem doesn't seem to happen.

If you physically disconnect the ethernet cable, the problem doesn't seem to happen either.
Comment 3 mgutt 2023-09-12 18:11:40 UTC
As a side note: Maybe nf_nat_setup_info errors have the same source as they can be solved by the same workarounds:
https://forums.unraid.net/bug-reports/stable-releases/690691-kernel-panic-due-to-netfilter-nf_nat_setup_info-docker-static-ip-macvlan-r1356/?tab=comments

Note You need to log in before you can comment on or make changes to this bug.