Network traffic eventually causes a fatal exception in interrupt. Disabling TSO prevents the bug. Likely related to recent changes to enable TSO? Crash: [ 487.231819] Unable to handle kernel paging request at virtual address fffffd9807000008 [ 487.239780] Mem abort info: [ 487.242570] ESR = 0x96000006 [ 487.245620] EC = 0x25: DABT (current EL), IL = 32 bits [ 487.250974] SET = 0, FnV = 0 [ 487.254025] EA = 0, S1PTW = 0 [ 487.257170] FSC = 0x06: level 2 translation fault [ 487.262050] Data abort info: [ 487.264921] ISV = 0, ISS = 0x00000006 [ 487.268748] CM = 0, WnR = 0 [ 487.271747] swapper pgtable: 4k pages, 48-bit VAs, pgdp=00000000830fd000 [ 487.278449] [fffffd9807000008] pgd=100000277c353003, p4d=100000277c353003, pud=100000277c352003, pmd=0000000000000000 [ 487.289110] Internal error: Oops: 96000006 [#1] SMP [ 487.293985] Modules linked in: rfkill fsl_dpaa2_ptp ltc2978 lm90 pmbus_core at24 ptp_qoriq fsl_dpaa2_eth pcs_lynx at803x phylink xgmac_mdio i2c_mux_pca954x i2c_mux sfp mdio_i2c qoriq_thermal qoriq_cpufreq layerscape_edac_mod vfat fat auth_rpcgss fuse sunrpc dpaa2_caam fsl_mc_dpio caam_jr nvme rtc_pcf2127 caamhash_desc mmc_block xhci_plat_hcd caamalg_desc regmap_spi dpaa2_console crct10dif_ce libdes ghash_ce nvme_core dwc3 caam sdhci_of_esdhc ulpi error sdhci_pltfm rtc_fsl_ftm_alarm udc_core sbsa_gwdt ahci_qoriq i2c_imx sdhci gpio_keys [ 487.341467] CPU: 7 PID: 1772 Comm: sshd Tainted: G W -------- --- 5.18.0-0.rc3.20220422gitd569e86915b7f2f.31.fc37.aarch64 #1 [ 487.354061] Hardware name: SolidRun LX2160A Honeycomb (DT) [ 487.359535] pstate: a0400005 (NzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 487.366485] pc : kfree+0xac/0x304 [ 487.369799] lr : kfree+0x204/0x304 [ 487.373191] sp : ffff80000c4eb120 [ 487.376493] x29: ffff80000c4eb120 x28: ffff662240c46400 x27: 0000000000000001 [ 487.383621] x26: 0000000000000001 x25: ffff662246da0cc0 x24: ffff66224af78000 [ 487.390748] x23: ffffad184f4ce008 x22: ffffad1850185000 x21: ffffad1838d13cec [ 487.397874] x20: ffff6601c0000000 x19: fffffd9807000000 x18: 0000000000000000 [ 487.405000] x17: ffffb910cdc49000 x16: ffffad184d7d9080 x15: 0000000000004000 [ 487.412126] x14: 0000000000000008 x13: 000000000000ffff x12: 0000000000000000 [ 487.419252] x11: 0000000000000004 x10: 0000000000000001 x9 : ffffad184d7d927c [ 487.426379] x8 : 0000000000000000 x7 : 0000000ffffffd1d x6 : ffff662240a94900 [ 487.433505] x5 : 0000000000000003 x4 : 0000000000000009 x3 : ffffad184f4ce008 [ 487.440632] x2 : ffff662243eec000 x1 : 0000000100000100 x0 : fffffc0000000000 [ 487.447758] Call trace: [ 487.450194] kfree+0xac/0x304 [ 487.453151] dpaa2_eth_free_tx_fd.isra.0+0x33c/0x3e0 [fsl_dpaa2_eth] [ 487.459507] dpaa2_eth_tx_conf+0x100/0x2e0 [fsl_dpaa2_eth] [ 487.464989] dpaa2_eth_poll+0xdc/0x380 [fsl_dpaa2_eth] [ 487.470122] __napi_poll.constprop.0+0x40/0x1a0 [ 487.474645] net_rx_action+0x310/0x3d4 [ 487.478384] __do_softirq+0x23c/0x6b4 [ 487.482036] __irq_exit_rcu+0x104/0x214 [ 487.485862] irq_exit_rcu+0x1c/0x50 [ 487.489339] el1_interrupt+0x38/0x70 [ 487.492907] el1h_64_irq_handler+0x18/0x24 [ 487.496993] el1h_64_irq+0x68/0x6c [ 487.500384] __ip_finish_output+0x138/0x220 [ 487.504558] ip_finish_output+0x40/0xf4 [ 487.508384] ip_output+0xfc/0x2fc [ 487.511689] __ip_queue_xmit+0x1c0/0x5e0 [ 487.515601] ip_queue_xmit+0x20/0x30 [ 487.519166] __tcp_transmit_skb+0x3c0/0x7cc [ 487.523339] tcp_write_xmit+0x310/0x8ac [ 487.527164] __tcp_push_pending_frames+0x48/0x110 [ 487.531857] tcp_push+0xbc/0x19c [ 487.535075] tcp_sendmsg_locked+0x2ac/0xad4 [ 487.539247] tcp_sendmsg+0x44/0x6c [ 487.542639] inet_sendmsg+0x50/0x7c [ 487.546117] sock_sendmsg+0x60/0x70 [ 487.549595] sock_write_iter+0x98/0xe0 [ 487.553333] new_sync_write+0x124/0x130 [ 487.557159] vfs_write+0x1c8/0x210 [ 487.560550] ksys_write+0xd8/0xec [ 487.563854] __arm64_sys_write+0x28/0x34 [ 487.567766] invoke_syscall+0x78/0x100 [ 487.571506] el0_svc_common.constprop.0+0x68/0x124 [ 487.576287] do_el0_svc+0x30/0x90 [ 487.579592] el0_svc+0x60/0x1a4 [ 487.582723] el0t_64_sync_handler+0x10c/0x140 [ 487.587070] el0t_64_sync+0x190/0x194 [ 487.590723] Code: 8b130293 b25657e0 d34cfe73 8b131813 (f9400660) [ 487.596807] ---[ end trace 0000000000000000 ]--- [ 487.601413] Kernel panic - not syncing: Oops: Fatal exception in interrupt [ 487.608276] SMP: stopping secondary CPUs [ 487.612206] Kernel Offset: 0x2d1845400000 from 0xffff800008000000 [ 487.618287] PHYS_OFFSET: 0xffff99fe40000000 [ 487.622457] CPU features: 0x100,00004b09,00001086 [ 487.627150] Memory Limit: none [ 487.630196] Rebooting in 1 seconds.. Mitigation: ethtool -K ethX tso off
I believe this is related to commit 3dc709e0cd47c602a8d1a6747f1a91e9737eeed3
Created attachment 300920 [details] patch
Could you please try out the attached patch and let me know if it makes a difference? Thanks!
(In reply to Ioana Ciornei from comment #2) > Created attachment 300920 [details] > patch @reporter: did you give this a try? Would be good to get this tested soon, as 5.18 might be one week away (maybe two)
Compiling now with patch. Will update with test results after it finished compiling.
For context, I am new to Fedora's build system specifically. I'm still verifying the patch was actually applied on my local build. I did verify the built kernel was booted. I am still seeing a crash after `ethtool -K eth3 tso on` on network traffic (mangled by serial console somewhat): [ 147.442853] DMA-API: fsl_dpaa2_eth dpni.1: device driver frees DMA] [ 147.458170] WARNING: CPU: 6 PID: 1077 at kernel/dma/debug.c:973 ch0 [ 147.466196] Modules linked in: rfkill fsl_dpaa2_ptp lm90 ltc2978 px [ 147.513931] CPU: 6 PID: 1077 Comm: ns-slapd Not tainted 5.18.0-0.r1 [ 147.524284] Hardware name: SolidRun LX2160A Honeycomb (DT) [ 147.529773] pstate: 004000c5 (nzcv daIF +PAN -UAO -TCO -DIT -SSBS ) [ 147.536741] pc : check_unmap+0x590/0x930 [ 147.540672] lr : check_unmap+0x590/0x930 [ 147.544602] sp : ffff80000b97b130 [ 147.547919] x29: ffff80000b97b130 x28: ffff2653cc6f9700 x27: 000001 [ 147.555080] x26: 0000000000000003 x25: ffffb641a5032000 x24: ffffb0 [ 147.562240] x23: ffffb641a4e98c38 x22: 0000000000000000 x21: ffffb8 [ 147.569400] x20: ffff2653c27acf00 x19: ffff80000b97b200 x18: ffffff [ 147.576559] x17: 6464612065636976 x16: 65645b20657a6973 x15: 000006 [ 147.583718] x14: 0000000000000006 x13: 6972642065636976 x12: 65642e [ 147.590877] x11: 00000000ffffdfff x10: ffffb641a4f8dfb0 x9 : ffffb4 [ 147.598036] x8 : 000000000002ffe8 x7 : c0000000ffffdfff x6 : 000008 [ 147.605194] x5 : 0000000000000001 x4 : 0000000000000000 x3 : 000007 [ 147.612353] x2 : 0000000000000103 x1 : ffff2654049fc000 x0 : 00000f [ 147.619512] Call trace: [ 147.621961] check_unmap+0x590/0x930 [ 147.625544] debug_dma_unmap_page+0x8c/0xa0 [ 147.629736] dma_unmap_page_attrs+0x130/0x1ec [ 147.634099] dpaa2_eth_free_tx_fd.isra.0+0x324/0x3e4 [fsl_dpaa2_et] [ 147.640482] dpaa2_eth_tx_conf+0x100/0x2e0 [fsl_dpaa2_eth] [ 147.645987] dpaa2_eth_poll+0xdc/0x380 [fsl_dpaa2_eth] [ 147.651145] __napi_poll.constprop.0+0x40/0x1a0 [ 147.655685] net_rx_action+0x310/0x3d4 [ 147.659441] __do_softirq+0x23c/0x6b4 [ 147.663108] do_softirq+0xc4/0xdc [ 147.666429] __local_bh_enable_ip+0x1dc/0x1f0 [ 147.670790] ip_finish_output2+0x230/0x8e0 [ 147.674895] __ip_finish_output+0x12c/0x220 [ 147.679085] ip_finish_output+0x40/0xf4 [ 147.682927] ip_output+0xfc/0x2fc [ 147.686248] __ip_queue_xmit+0x1c0/0x5e0 [ 147.690177] ip_queue_xmit+0x20/0x30 [ 147.693758] __tcp_transmit_skb+0x3c0/0x7cc [ 147.697948] tcp_write_xmit+0x310/0x8ac [ 147.701788] __tcp_push_pending_frames+0x48/0x110 [ 147.706496] tcp_push+0xbc/0x19c [ 147.709730] tcp_sendmsg_locked+0x2ac/0xad4 [ 147.713919] tcp_sendmsg+0x44/0x6c [ 147.717326] inet6_sendmsg+0x50/0x80 [ 147.720908] sock_sendmsg+0x60/0x70 [ 147.724404] __sys_sendto+0xc4/0x130 [ 147.727984] __arm64_sys_sendto+0x34/0x44 [ 147.731997] invoke_syscall+0x78/0x100 [ 147.735754] el0_svc_common.constprop.0+0x104/0x124 [ 147.740639] do_el0_svc+0x30/0x90 [ 147.743962] el0_svc+0x60/0x1a4 [ 147.747110] el0t_64_sync_handler+0x10c/0x140 [ 147.751473] el0t_64_sync+0x190/0x194 [ 147.755140] irq event stamp: 28785 [ 147.758544] hardirqs last enabled at (28784): [<ffffb641a3770990>0 [ 147.768120] hardirqs last disabled at (28785): [<ffffb641a3770b90>0 [ 147.777434] softirqs last enabled at (28778): [<ffffb641a358c980>0 [ 147.786313] softirqs last disabled at (28779): [<ffffb641a26b24e8>c [ 147.794408] ---[ end trace 0000000000000000 ]--- [ 147.799029] DMA-API: Mapped at: [ 147.802172] debug_dma_map_page+0x70/0x110 [ 147.806277] dma_map_page_attrs+0x80/0xc0 [ 147.810293] dpaa2_eth_build_gso_fd.constprop.0+0x300/0x630 [fsl_d] [ 147.817276] __dpaa2_eth_tx+0x478/0x850 [fsl_dpaa2_eth] [ 147.822520] dpaa2_eth_tx+0x74/0x110 [fsl_dpaa2_eth] [ 147.827741] Unable to handle kernel paging request at virtual addr8 [ 147.835673] Mem abort info: [ 147.838528] ESR = 0x96000006 [ 147.841592] EC = 0x25: DABT (current EL), IL = 32 bits [ 147.846955] SET = 0, FnV = 0 [ 147.850018] EA = 0, S1PTW = 0 [ 147.853165] FSC = 0x06: level 2 translation fault [ 147.858091] Data abort info: [ 147.860978] ISV = 0, ISS = 0x00000006 [ 147.864820] CM = 0, WnR = 0 [ 147.867835] swapper pgtable: 4k pages, 48-bit VAs, pgdp=00000000830 [ 147.874546] [fffffc98cd000008] pgd=100000277c353003, p4d=1000002770 [ 147.885411] Internal error: Oops: 96000006 [#1] SMP [ 147.890292] Modules linked in: rfkill fsl_dpaa2_ptp lm90 ltc2978 px [ 147.937766] CPU: 6 PID: 1077 Comm: ns-slapd Tainted: G W 1 [ 147.950712] Hardware name: SolidRun LX2160A Honeycomb (DT) [ 147.956189] pstate: a0400005 (NzCv daif +PAN -UAO -TCO -DIT -SSBS ) [ 147.963145] pc : kfree+0xac/0x304 [ 147.966463] lr : kfree+0x204/0x304 [ 147.969859] sp : ffff80000b97b260 [ 147.973165] x29: ffff80000b97b260 x28: ffff2653cc6f9700 x27: 000001 [ 147.980300] x26: ffff2653c8d1f800 x25: ffff2653c7360cc0 x24: ffff20 [ 147.987434] x23: ffffb641a46de008 x22: ffffb641a5396000 x21: ffffb8 [ 147.994568] x20: ffff263340000000 x19: fffffc98cd000000 x18: ffffff [ 148.001702] x17: 6464612065636976 x16: ffffb641a29d75c0 x15: 000006 [ 148.008836] x14: 0000000000000008 x13: 000000000000ffff x12: 000000 [ 148.015970] x11: 0000000000000004 x10: 0000000000000001 x9 : ffffbc [ 148.023104] x8 : 0000000000000000 x7 : 0000000ffffff540 x6 : ffff20 [ 148.030238] x5 : 0000000000000003 x4 : 0000000000000102 x3 : ffffb8 [ 148.037372] x2 : ffff2654049fc000 x1 : 0000000000000101 x0 : fffff0 [ 148.044506] Call trace: [ 148.046943] kfree+0xac/0x304 [ 148.049906] dpaa2_eth_free_tx_fd.isra.0+0x358/0x3e4 [fsl_dpaa2_et] [ 148.056273] dpaa2_eth_tx_conf+0x100/0x2e0 [fsl_dpaa2_eth] [ 148.061766] dpaa2_eth_poll+0xdc/0x380 [fsl_dpaa2_eth] [ 148.066911] __napi_poll.constprop.0+0x40/0x1a0 [ 148.071440] net_rx_action+0x310/0x3d4 [ 148.075185] __do_softirq+0x23c/0x6b4 [ 148.078841] do_softirq+0xc4/0xdc [ 148.082149] __local_bh_enable_ip+0x1dc/0x1f0 [ 148.086499] ip_finish_output2+0x230/0x8e0 [ 148.090592] __ip_finish_output+0x12c/0x220 [ 148.094770] ip_finish_output+0x40/0xf4 [ 148.098600] ip_output+0xfc/0x2fc [ 148.101910] __ip_queue_xmit+0x1c0/0x5e0 [ 148.105827] ip_queue_xmit+0x20/0x30 [ 148.109398] __tcp_transmit_skb+0x3c0/0x7cc [ 148.113575] tcp_write_xmit+0x310/0x8ac [ 148.117404] __tcp_push_pending_frames+0x48/0x110 [ 148.122101] tcp_push+0xbc/0x19c [ 148.125323] tcp_sendmsg_locked+0x2ac/0xad4 [ 148.129500] tcp_sendmsg+0x44/0x6c [ 148.132896] inet6_sendmsg+0x50/0x80 [ 148.136466] sock_sendmsg+0x60/0x70 [ 148.139951] __sys_sendto+0xc4/0x130 [ 148.143519] __arm64_sys_sendto+0x34/0x44 [ 148.147521] invoke_syscall+0x78/0x100 [ 148.151266] el0_svc_common.constprop.0+0x104/0x124 [ 148.156140] do_el0_svc+0x30/0x90 [ 148.159450] el0_svc+0x60/0x1a4 [ 148.162587] el0t_64_sync_handler+0x10c/0x140 [ 148.166939] el0t_64_sync+0x190/0x194 [ 148.170597] Code: 8b130293 b25657e0 d34cfe73 8b131813 (f9400660) [ 148.176686] ---[ end trace 0000000000000000 ]--- [ 148.181295] Kernel panic - not syncing: Oops: Fatal exception in it [ 148.188161] SMP: stopping secondary CPUs [ 148.192092] Kernel Offset: 0x36419a600000 from 0xffff800008000000 [ 148.198177] PHYS_OFFSET: 0xffffd9ccc0000000 [ 148.202350] CPU features: 0x100,00004b09,00001086 [ 148.207046] Memory Limit: none [ 148.210095] Rebooting in 1 seconds..
I'll try to build with a different uname next to confirm, but I do note the offsets have appeared to have changed, source was same git commit as previous.
Any progress here? Time to get this fixed in 5.18 is running out, as it likely will be released on Sunday.
Created attachment 300998 [details] fixup annotation field used + dma_unmap before access Hi, I will give it one more try with the attached patched. It seems that not only the position of the dma_unmap is wrong but also the software annotation field used for the DMA unmap size. I was getting it from the wrong field. This patch tries to fix both issues. Unfortunately, I am still unable to reproduce this even though I am running a TCP Tx flow for some time now. @reported, please give this new patch a try.
@reporter, did you happen to give this a try? Also, can I contact you directly by email? Is the one in your username the correct one?
I will try now. If that does not work, I'll check out a source tree and bypass this clearly nice but not understood well by me Fedora kernel build system. I can receive email at this address.
Unfortunate update, ended up with a bad flash and need to do some complicated recovery. ETA to test run is 24h.
Tested now with latest patch, I still am seeing panics after `ethtool -K eth3 tso on`. If it helps, I'm using MTU 9000 and a dpl that breaks out to 4x 10g interfaces and 1x 1g interface, eth3 is one of the 10g PHYs. Now building with 5.18.0-0.rc7.20220519gitf993aed406ea.56.fc37.aarch64. I'll try again to make sure I am using the latest DTB as well, but it appears the issue remains for me. Trace: [ 122.735365] Unable to handle kernel paging request at virtual address fffffd2477000008 [ 122.735919] Unable to handle kernel paging request at virtual address fffffd2477000008 [ 122.743314] Mem abort info: [ 122.751243] Mem abort info: [ 122.751250] ESR = 0x96000006 [ 122.751256] EC = 0x25: DABT (current EL), IL = 32 bits [ 122.751262] SET = 0, FnV = 0 [ 122.751267] EA = 0, S1PTW = 0 [ 122.751272] FSC = 0x06: level 2 translation fault [ 122.751278] Data abort info: [ 122.751282] ISV = 0, ISS = 0x00000006 [ 122.754066] ESR = 0x96000006 [ 122.756852] CM = 0, WnR = 0 [ 122.759922] EC = 0x25: DABT (current EL), IL = 32 bits [ 122.765229] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000083106000 [ 122.768296] SET = 0, FnV = 0 [ 122.771432] [fffffd2477000008] pgd=100000277c353003 [ 122.776300] EA = 0, S1PTW = 0 [ 122.779195] , p4d=100000277c353003 [ 122.783022] FSC = 0x06: level 2 translation fault [ 122.786068] , pud=100000277c352003 [ 122.789048] Data abort info: [ 122.789054] ISV = 0, ISS = 0x00000006 [ 122.789059] CM = 0, WnR = 0 [ 122.789064] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000083106000 [ 122.789071] [fffffd2477000008] pgd=100000277c353003, p4d=100000277c353003, pud=100000277c352003 [ 122.794384] , pmd=0000000000000000 [ 122.801096] , pmd=0000000000000000 [ 122.804145] [ 122.809036] [ 122.812299] Internal error: Oops: 96000006 [#1] SMP [ 122.863418] Modules linked in: rfkill fsl_dpaa2_ptp ltc2978 pmbus_core lm90 at24 at803x ptp_qoriq fsl_dpaa2_eth pcs_lynx phylink xgmac_mdio vfat i2c_mux_pca954x fat i2c_mux sfp mdio_i2c qoriq_thermal layerscape_edac_mod qoriq_cpufreq auth_rpcgss fuse sunrpc dpaa2_caam caam_jr fsl_mc_dpio ca amhash_desc xhci_plat_hcd rtc_pcf2127 caamalg_desc mmc_block regmap_spi crct10dif_ce ghash_ce libdes dpaa2_console dwc3 caam nvme nvme_core ulpi sd hci_of_esdhc error ahci_qoriq udc_core sdhci_pltfm sbsa_gwdt rtc_fsl_ftm_alarm sdhci i2c_imx phy_fsl_lynx_28g gpio_keys [ 122.912294] CPU: 3 PID: 1789 Comm: sshd Not tainted 5.18.0-0.rc7.20220519gitf993aed406ea.56.fc37.aarch64 #1 [ 122.922023] Hardware name: SolidRun LX2160A Honeycomb (DT) [ 122.927496] pstate: a0400005 (NzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 122.934447] pc : kfree+0xac/0x304 [ 122.937760] lr : kfree+0x204/0x304 [ 122.941151] sp : ffff80000b9531a0 [ 122.944454] x29: ffff80000b9531a0 x28: ffff493e4ba0c600 x27: 0000000000000001 [ 122.951581] x26: ffff493e46d9f800 x25: ffff493e48ca0cc0 x24: ffff493e4b03d000 [ 122.958708] x23: ffffdc18370de008 x22: ffffdc1837d98000 x21: ffffdc17f6834d04 [ 122.965835] x20: ffff491dc0000000 x19: fffffd2477000000 x18: 0000000000000000 [ 122.972962] x17: 0000000000000000 x16: ffffdc18353d7690 x15: 0000000000000000 [ 122.980089] x14: 0000000000000008 x13: 000000000000ffff x12: 0000000000000000 [ 122.987216] x11: 0000000000000004 x10: 0000000000000001 x9 : ffffdc183517108c [ 122.994342] x8 : 0000000000000000 x7 : 0000000ffffe60c2 x6 : ffff493e40a8e600 [ 123.001468] x5 : 0000000000000003 x4 : 0000000000000102 x3 : ffffdc18370de008 [ 123.008594] x2 : ffff493e54028000 x1 : 0000000100000101 x0 : fffffc0000000000 [ 123.015721] Call trace: [ 123.018156] kfree+0xac/0x304 [ 123.021114] dpaa2_eth_free_tx_fd.isra.0+0x354/0x3e0 [fsl_dpaa2_eth] [ 123.027471] dpaa2_eth_tx_conf+0x100/0x2e0 [fsl_dpaa2_eth] [ 123.032952] dpaa2_eth_poll+0xdc/0x380 [fsl_dpaa2_eth] [ 123.038086] __napi_poll.constprop.0+0x40/0x1a0 [ 123.042608] net_rx_action+0x310/0x3d4 [ 123.046346] __do_softirq+0x23c/0x6b4 [ 123.049999] do_softirq+0xc4/0xdc [ 123.053304] __local_bh_enable_ip+0x1dc/0x1f0 [ 123.057649] ip_finish_output2+0x230/0x8e0 [ 123.061737] __ip_finish_output+0x12c/0x220 [ 123.065910] ip_finish_output+0x40/0xf4 [ 123.069736] ip_output+0xfc/0x2fc [ 123.073041] __ip_queue_xmit+0x1c0/0x5e0 [ 123.076954] ip_queue_xmit+0x20/0x30 [ 123.080519] __tcp_transmit_skb+0x3c0/0x7cc [ 123.084693] tcp_write_xmit+0x310/0x8ac [ 123.088518] __tcp_push_pending_frames+0x48/0x110 [ 123.093211] tcp_rcv_established+0x420/0x950 [ 123.097469] tcp_v4_do_rcv+0x238/0x32c [ 123.101211] __release_sock+0x64/0x11c [ 123.104951] release_sock+0x44/0xe0 [ 123.108430] tcp_sendmsg+0x54/0x6c [ 123.111822] inet_sendmsg+0x50/0x7c [ 123.115301] sock_sendmsg+0x60/0x70 [ 123.118777] sock_write_iter+0x98/0xe0 [ 123.122515] new_sync_write+0x124/0x130 [ 123.126341] vfs_write+0x1c8/0x210 [ 123.129732] ksys_write+0xd8/0xec [ 123.133036] __arm64_sys_write+0x28/0x34 [ 123.136947] invoke_syscall+0x78/0x100 [ 123.140688] el0_svc_common.constprop.0+0x68/0x124 [ 123.145469] do_el0_svc+0x30/0x90 [ 123.148774] el0_svc+0x60/0x1a4 [ 123.151906] el0t_64_sync_handler+0x10c/0x140 [ 123.156253] el0t_64_sync+0x190/0x194 [ 123.159906] Code: 8b130293 b25657e0 d34cfe73 8b131813 (f9400660) [ 123.165991] ---[ end trace 0000000000000000 ]--- [ 123.170597] Kernel panic - not syncing: Oops: Fatal exception in interrupt [ 123.177460] SMP: stopping secondary CPUs [ 124.227686] SMP: failed to stop secondary CPUs 3,8 [ 124.232468] Kernel Offset: 0x5c182d000000 from 0xffff800008000000 [ 124.238550] PHYS_OFFSET: 0xffffb6e240000000 [ 124.242720] CPU features: 0x100,00004b09,00001086 [ 124.247412] Memory Limit: none [ 124.250457] Rebooting in 1 seconds.. [ 125.254065] SMP: stopping secondary CPUs [ 126.304276] SMP: failed to stop secondary CPUs 3,8
Ensuring latest DTB/DTS did not change the result. My steps to reproduce: - MTU 9000 and 10G DAC SFP+ - ethtool -K eth3 tso on - ssh from another host - type `dmesg` in ssh session - observe panic on serial console
Yeah, I am really sorry that it tool me so long to reproduce this. I was running with IOMMU passthrough which hid the bugs. I just sent a patch set to upstream with the fixes: https://patchwork.kernel.org/project/netdevbpf/list/?series=644071&state=%2A&archive=both Anyhow, thanks a lot for reporting this and again, sorry for the wait.