This call trace occurs consistently when using the docker macvlan driver where the bridge's parent is a Linux bridge. Below is the typical configuration of br0 and docker macvlan # commands to configure eth0 / br0 (br0 uses dhcp to obtain IP address) ip link add name br0 type bridge stp_state 0 forward_delay 0 nf_call_iptables 1 nf_call_ip6tables 1 nf_call_arptables 1 ip link set br0 up ip link set eth0 down ip -4 addr flush dev eth0 ip -6 addr flush dev eth0 ip link set eth0 promisc on master br0 up # command to configure docker macvlan network on br0 docker network create -d macvlan --subnet=10.0.101.0/24 --gateway=10.0.101.1 --aux-address=server=10.0.101.13 -o parent=br0 br0 We are currently running 6.1.43 kernel but this issue has been happening with all previous kernels for a least a couple of years now. For example: https://www.spinics.net/lists/netfilter/msg59040.html Also syzbot detected this same issue in 6.5.0-rc2 https://www.spinics.net/lists/netfilter-devel/msg81831.html Finally we also tried creating a vhost interface attached to br0 and then set that as the docker macvlan "parent" - same result. Aug 8 10:06:51 ODROID kernel: ------------[ cut here ]------------ Aug 8 10:06:51 ODROID kernel: WARNING: CPU: 0 PID: 16269 at net/netfilter/nf_conntrack_core.c:1210 __nf_conntrack_confirm+0xa4/0x2b0 [nf_conntrack] Aug 8 10:06:51 ODROID kernel: Modules linked in: xt_CHECKSUM ipt_REJECT nf_reject_ipv4 ip6table_mangle ip6table_nat iptable_mangle vhost_net tun vhost vhost_iotlb tap xt_nat xt_tcpudp veth xt_conntrack nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype br_netfilter xfs cmac cifs asn1_decoder cifs_arc4 cifs_md4 oid_registry dns_resolver md_mod zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) tcp_diag inet_diag emc2103 iptable_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libchacha ip6table_filter ip6_tables iptable_filter ip_tables x_tables efivarfs af_packet 8021q garp mrp macvlan bridge stp llc bonding tls r8125(O) intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel i915 kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel sha512_ssse3 iosf_mbi drm_buddy Aug 8 10:06:51 ODROID kernel: aesni_intel i2c_algo_bit ttm drm_display_helper crypto_simd drm_kms_helper cryptd mei_hdcp mei_pxp nvme rapl i2c_i801 sr_mod input_leds intel_cstate drm i2c_smbus cdrom nvme_core joydev led_class video wmi intel_gtt ahci agpgart backlight libahci mei_me intel_pmc_core i2c_core syscopyarea sysfillrect mei sysimgblt fb_sys_fops thermal fan button unix [last unloaded: r8125(O)] Aug 8 10:06:51 ODROID kernel: CPU: 0 PID: 16269 Comm: kworker/u8:0 Tainted: P O 6.1.43-Unraid #2 Aug 8 10:06:51 ODROID kernel: Hardware name: HARDKERNEL ODROID-H2/ODROID-H2, BIOS 5.13 04/27/2020 Aug 8 10:06:51 ODROID kernel: Workqueue: events_unbound macvlan_process_broadcast [macvlan] Aug 8 10:06:51 ODROID kernel: RIP: 0010:__nf_conntrack_confirm+0xa4/0x2b0 [nf_conntrack] Aug 8 10:06:51 ODROID kernel: Code: 44 24 10 e8 e2 e1 ff ff 8b 7c 24 04 89 ea 89 c6 89 04 24 e8 7e e6 ff ff 84 c0 75 a2 48 89 df e8 9b e2 ff ff 85 c0 89 c5 74 18 <0f> 0b 8b 34 24 8b 7c 24 04 e8 18 dd ff ff e8 93 e3 ff ff e9 72 01 Aug 8 10:06:51 ODROID kernel: RSP: 0018:ffffc90000003d98 EFLAGS: 00010202 Aug 8 10:06:51 ODROID kernel: RAX: 0000000000000001 RBX: ffff88814a7ac400 RCX: 9c0c70d57470940d Aug 8 10:06:51 ODROID kernel: RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff88814a7ac400 Aug 8 10:06:51 ODROID kernel: RBP: 0000000000000001 R08: 2fcce5ef4761f5d5 R09: 28e40e9ae48c7a5f Aug 8 10:06:51 ODROID kernel: R10: f5516b05dfc149e9 R11: ffffc90000003d60 R12: ffffffff82a11d00 Aug 8 10:06:51 ODROID kernel: R13: 0000000000011231 R14: ffff88814c989d00 R15: 0000000000000000 Aug 8 10:06:51 ODROID kernel: FS: 0000000000000000(0000) GS:ffff88846fc00000(0000) knlGS:0000000000000000 Aug 8 10:06:51 ODROID kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Aug 8 10:06:51 ODROID kernel: CR2: 0000000000537d30 CR3: 000000018567a000 CR4: 0000000000350ef0 Aug 8 10:06:51 ODROID kernel: Call Trace: Aug 8 10:06:51 ODROID kernel: <IRQ> Aug 8 10:06:51 ODROID kernel: ? __warn+0xab/0x122 Aug 8 10:06:51 ODROID kernel: ? report_bug+0x109/0x17e Aug 8 10:06:51 ODROID kernel: ? __nf_conntrack_confirm+0xa4/0x2b0 [nf_conntrack] Aug 8 10:06:51 ODROID kernel: ? handle_bug+0x41/0x6f Aug 8 10:06:51 ODROID kernel: ? exc_invalid_op+0x13/0x60 Aug 8 10:06:51 ODROID kernel: ? asm_exc_invalid_op+0x16/0x20 Aug 8 10:06:51 ODROID kernel: ? __nf_conntrack_confirm+0xa4/0x2b0 [nf_conntrack] Aug 8 10:06:51 ODROID kernel: ? __nf_conntrack_confirm+0x9e/0x2b0 [nf_conntrack] Aug 8 10:06:51 ODROID kernel: ? nf_nat_inet_fn+0x60/0x1a8 [nf_nat] Aug 8 10:06:51 ODROID kernel: nf_conntrack_confirm+0x25/0x54 [nf_conntrack] Aug 8 10:06:51 ODROID kernel: nf_hook_slow+0x3a/0x96 Aug 8 10:06:51 ODROID kernel: ? ip_protocol_deliver_rcu+0x164/0x164 Aug 8 10:06:51 ODROID kernel: NF_HOOK.constprop.0+0x79/0xd9 Aug 8 10:06:51 ODROID kernel: ? ip_protocol_deliver_rcu+0x164/0x164 Aug 8 10:06:51 ODROID kernel: __netif_receive_skb_one_core+0x77/0x9c Aug 8 10:06:51 ODROID kernel: process_backlog+0x8c/0x116 Aug 8 10:06:51 ODROID kernel: __napi_poll.constprop.0+0x28/0x124 Aug 8 10:06:51 ODROID kernel: net_rx_action+0x159/0x24f Aug 8 10:06:51 ODROID kernel: __do_softirq+0x126/0x288 Aug 8 10:06:51 ODROID kernel: do_softirq+0x7f/0xab Aug 8 10:06:51 ODROID kernel: </IRQ> Aug 8 10:06:51 ODROID kernel: <TASK> Aug 8 10:06:51 ODROID kernel: __local_bh_enable_ip+0x4c/0x6b Aug 8 10:06:51 ODROID kernel: netif_rx+0x52/0x5a Aug 8 10:06:51 ODROID kernel: macvlan_broadcast+0x10a/0x150 [macvlan] Aug 8 10:06:51 ODROID kernel: ? _raw_spin_unlock+0x14/0x29 Aug 8 10:06:51 ODROID kernel: macvlan_process_broadcast+0xbc/0x12f [macvlan] Aug 8 10:06:51 ODROID kernel: process_one_work+0x1a8/0x295 Aug 8 10:06:51 ODROID kernel: worker_thread+0x18b/0x244 Aug 8 10:06:51 ODROID kernel: ? rescuer_thread+0x281/0x281 Aug 8 10:06:51 ODROID kernel: kthread+0xe4/0xef Aug 8 10:06:51 ODROID kernel: ? kthread_complete_and_exit+0x1b/0x1b Aug 8 10:06:51 ODROID kernel: ret_from_fork+0x1f/0x30 Aug 8 10:06:51 ODROID kernel: </TASK> Aug 8 10:06:51 ODROID kernel: ---[ end trace 0000000000000000 ]---
Created attachment 304816 [details] Non-functional test case Conntrack has design assumptions that get broken by bridge netfilter. However, I fail to reproduce this problem. Attaching a non-functional reproducer. Can you please confirm if the setup is correct (macvlan on top of bridge)? I tried persistent flooding, but so far to no avail (the clones done by br_flood are processed in serial fashion so not surprising that this doesn't trigger any problem). I can see problematic skb_clone() in bridge (br_flood and friends), but not in macvlan driver. By the time packet makes it to macvlan the conntrack entry is already confirmed, so skb_clone() is safe. Is there anything else about the setup that makes a difference?
(In reply to Florian Westphal from comment #1) > Created attachment 304816 [details] > Non-functional test case > > Conntrack has design assumptions that get broken by bridge netfilter. > However, I fail to reproduce this problem. > > Attaching a non-functional reproducer. Can you please confirm if the setup > is correct (macvlan on top of bridge)? I tried persistent flooding, but so > far to no avail (the clones done by br_flood are processed in serial fashion > so not surprising that this doesn't trigger any problem). > > I can see problematic skb_clone() in bridge (br_flood and friends), but not > in macvlan driver. By the time packet makes it to macvlan the conntrack > entry is already confirmed, so skb_clone() is safe. > > Is there anything else about the setup that makes a difference? The only reliable way we can reproduce this issue is to start docker, setting up a macvlan network where the parent interface is a bridge on the host. You do not even need to start a container. see: https://docs.docker.com/network/network-tutorial-macvlan/ If the parent interface is not a bridge, the problem doesn't seem to happen. If you physically disconnect the ethernet cable, the problem doesn't seem to happen either.
As a side note: Maybe nf_nat_setup_info errors have the same source as they can be solved by the same workarounds: https://forums.unraid.net/bug-reports/stable-releases/690691-kernel-panic-due-to-netfilter-nf_nat_setup_info-docker-static-ip-macvlan-r1356/?tab=comments