Bug 215886

Summary: dpaa2: TSO offload on lx2160a causes fatal exception in interrupt
Product: Drivers Reporter: kernelbugs
Component: NetworkAssignee: drivers_network (drivers_network)
Status: NEW ---    
Severity: normal CC: ciorneiioana, regressions
Priority: P1    
Hardware: ARM   
OS: Linux   
Kernel Version: 5.18.0-0.rc3.20220422gitd569e86915b7f2f.31.fc37.aarch64 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: patch
fixup annotation field used + dma_unmap before access

Description kernelbugs 2022-04-25 18:15:38 UTC
Network traffic eventually causes a fatal exception in interrupt. Disabling TSO prevents the bug. Likely related to recent changes to enable TSO?

Crash:
[  487.231819] Unable to handle kernel paging request at virtual address fffffd9807000008
[  487.239780] Mem abort info:
[  487.242570]   ESR = 0x96000006
[  487.245620]   EC = 0x25: DABT (current EL), IL = 32 bits
[  487.250974]   SET = 0, FnV = 0
[  487.254025]   EA = 0, S1PTW = 0
[  487.257170]   FSC = 0x06: level 2 translation fault
[  487.262050] Data abort info:
[  487.264921]   ISV = 0, ISS = 0x00000006
[  487.268748]   CM = 0, WnR = 0
[  487.271747] swapper pgtable: 4k pages, 48-bit VAs, pgdp=00000000830fd000
[  487.278449] [fffffd9807000008] pgd=100000277c353003, p4d=100000277c353003, pud=100000277c352003, pmd=0000000000000000
[  487.289110] Internal error: Oops: 96000006 [#1] SMP
[  487.293985] Modules linked in: rfkill fsl_dpaa2_ptp ltc2978 lm90 pmbus_core at24 ptp_qoriq fsl_dpaa2_eth pcs_lynx at803x phylink xgmac_mdio i2c_mux_pca954x i2c_mux sfp mdio_i2c qoriq_thermal qoriq_cpufreq layerscape_edac_mod vfat fat auth_rpcgss fuse sunrpc dpaa2_caam fsl_mc_dpio caam_jr nvme rtc_pcf2127 caamhash_desc mmc_block xhci_plat_hcd caamalg_desc regmap_spi dpaa2_console crct10dif_ce libdes ghash_ce nvme_core dwc3 caam sdhci_of_esdhc ulpi error sdhci_pltfm rtc_fsl_ftm_alarm udc_core sbsa_gwdt ahci_qoriq i2c_imx sdhci gpio_keys
[  487.341467] CPU: 7 PID: 1772 Comm: sshd Tainted: G        W        --------  ---  5.18.0-0.rc3.20220422gitd569e86915b7f2f.31.fc37.aarch64 #1
[  487.354061] Hardware name: SolidRun LX2160A Honeycomb (DT)
[  487.359535] pstate: a0400005 (NzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  487.366485] pc : kfree+0xac/0x304
[  487.369799] lr : kfree+0x204/0x304
[  487.373191] sp : ffff80000c4eb120
[  487.376493] x29: ffff80000c4eb120 x28: ffff662240c46400 x27: 0000000000000001
[  487.383621] x26: 0000000000000001 x25: ffff662246da0cc0 x24: ffff66224af78000
[  487.390748] x23: ffffad184f4ce008 x22: ffffad1850185000 x21: ffffad1838d13cec
[  487.397874] x20: ffff6601c0000000 x19: fffffd9807000000 x18: 0000000000000000
[  487.405000] x17: ffffb910cdc49000 x16: ffffad184d7d9080 x15: 0000000000004000
[  487.412126] x14: 0000000000000008 x13: 000000000000ffff x12: 0000000000000000
[  487.419252] x11: 0000000000000004 x10: 0000000000000001 x9 : ffffad184d7d927c
[  487.426379] x8 : 0000000000000000 x7 : 0000000ffffffd1d x6 : ffff662240a94900
[  487.433505] x5 : 0000000000000003 x4 : 0000000000000009 x3 : ffffad184f4ce008
[  487.440632] x2 : ffff662243eec000 x1 : 0000000100000100 x0 : fffffc0000000000
[  487.447758] Call trace:
[  487.450194]  kfree+0xac/0x304
[  487.453151]  dpaa2_eth_free_tx_fd.isra.0+0x33c/0x3e0 [fsl_dpaa2_eth]
[  487.459507]  dpaa2_eth_tx_conf+0x100/0x2e0 [fsl_dpaa2_eth]
[  487.464989]  dpaa2_eth_poll+0xdc/0x380 [fsl_dpaa2_eth]
[  487.470122]  __napi_poll.constprop.0+0x40/0x1a0
[  487.474645]  net_rx_action+0x310/0x3d4
[  487.478384]  __do_softirq+0x23c/0x6b4
[  487.482036]  __irq_exit_rcu+0x104/0x214
[  487.485862]  irq_exit_rcu+0x1c/0x50
[  487.489339]  el1_interrupt+0x38/0x70
[  487.492907]  el1h_64_irq_handler+0x18/0x24
[  487.496993]  el1h_64_irq+0x68/0x6c
[  487.500384]  __ip_finish_output+0x138/0x220
[  487.504558]  ip_finish_output+0x40/0xf4
[  487.508384]  ip_output+0xfc/0x2fc
[  487.511689]  __ip_queue_xmit+0x1c0/0x5e0
[  487.515601]  ip_queue_xmit+0x20/0x30
[  487.519166]  __tcp_transmit_skb+0x3c0/0x7cc
[  487.523339]  tcp_write_xmit+0x310/0x8ac
[  487.527164]  __tcp_push_pending_frames+0x48/0x110
[  487.531857]  tcp_push+0xbc/0x19c
[  487.535075]  tcp_sendmsg_locked+0x2ac/0xad4
[  487.539247]  tcp_sendmsg+0x44/0x6c
[  487.542639]  inet_sendmsg+0x50/0x7c
[  487.546117]  sock_sendmsg+0x60/0x70
[  487.549595]  sock_write_iter+0x98/0xe0
[  487.553333]  new_sync_write+0x124/0x130
[  487.557159]  vfs_write+0x1c8/0x210
[  487.560550]  ksys_write+0xd8/0xec
[  487.563854]  __arm64_sys_write+0x28/0x34
[  487.567766]  invoke_syscall+0x78/0x100
[  487.571506]  el0_svc_common.constprop.0+0x68/0x124
[  487.576287]  do_el0_svc+0x30/0x90
[  487.579592]  el0_svc+0x60/0x1a4
[  487.582723]  el0t_64_sync_handler+0x10c/0x140
[  487.587070]  el0t_64_sync+0x190/0x194
[  487.590723] Code: 8b130293 b25657e0 d34cfe73 8b131813 (f9400660) 
[  487.596807] ---[ end trace 0000000000000000 ]---
[  487.601413] Kernel panic - not syncing: Oops: Fatal exception in interrupt
[  487.608276] SMP: stopping secondary CPUs
[  487.612206] Kernel Offset: 0x2d1845400000 from 0xffff800008000000
[  487.618287] PHYS_OFFSET: 0xffff99fe40000000
[  487.622457] CPU features: 0x100,00004b09,00001086
[  487.627150] Memory Limit: none
[  487.630196] Rebooting in 1 seconds..

Mitigation:
ethtool -K ethX tso off
Comment 1 kernelbugs 2022-05-02 01:37:06 UTC
I believe this is related to commit 3dc709e0cd47c602a8d1a6747f1a91e9737eeed3
Comment 2 Ioana Ciornei 2022-05-10 16:33:46 UTC
Created attachment 300920 [details]
patch
Comment 3 Ioana Ciornei 2022-05-10 16:35:02 UTC
Could you please try out the attached patch and let me know if it makes a difference?

Thanks!
Comment 4 The Linux kernel's regression tracker (Thorsten Leemhuis) 2022-05-15 10:20:00 UTC
(In reply to Ioana Ciornei from comment #2)
> Created attachment 300920 [details]
> patch

@reporter: did you give this a try? Would be good to get this tested soon, as 5.18 might be one week away (maybe two)
Comment 5 kernelbugs 2022-05-16 02:34:20 UTC
Compiling now with patch. Will update with test results after it finished compiling.
Comment 6 kernelbugs 2022-05-16 04:54:04 UTC
For context, I am new to Fedora's build system specifically. I'm still verifying the patch was actually applied on my local build. I did verify the built kernel was booted. I am still seeing a crash after `ethtool -K eth3 tso on` on network traffic (mangled by serial console somewhat):  

[  147.442853] DMA-API: fsl_dpaa2_eth dpni.1: device driver frees DMA]
[  147.458170] WARNING: CPU: 6 PID: 1077 at kernel/dma/debug.c:973 ch0
[  147.466196] Modules linked in: rfkill fsl_dpaa2_ptp lm90 ltc2978 px
[  147.513931] CPU: 6 PID: 1077 Comm: ns-slapd Not tainted 5.18.0-0.r1
[  147.524284] Hardware name: SolidRun LX2160A Honeycomb (DT)
[  147.529773] pstate: 004000c5 (nzcv daIF +PAN -UAO -TCO -DIT -SSBS )
[  147.536741] pc : check_unmap+0x590/0x930
[  147.540672] lr : check_unmap+0x590/0x930
[  147.544602] sp : ffff80000b97b130
[  147.547919] x29: ffff80000b97b130 x28: ffff2653cc6f9700 x27: 000001
[  147.555080] x26: 0000000000000003 x25: ffffb641a5032000 x24: ffffb0
[  147.562240] x23: ffffb641a4e98c38 x22: 0000000000000000 x21: ffffb8
[  147.569400] x20: ffff2653c27acf00 x19: ffff80000b97b200 x18: ffffff
[  147.576559] x17: 6464612065636976 x16: 65645b20657a6973 x15: 000006
[  147.583718] x14: 0000000000000006 x13: 6972642065636976 x12: 65642e
[  147.590877] x11: 00000000ffffdfff x10: ffffb641a4f8dfb0 x9 : ffffb4
[  147.598036] x8 : 000000000002ffe8 x7 : c0000000ffffdfff x6 : 000008
[  147.605194] x5 : 0000000000000001 x4 : 0000000000000000 x3 : 000007
[  147.612353] x2 : 0000000000000103 x1 : ffff2654049fc000 x0 : 00000f
[  147.619512] Call trace:
[  147.621961]  check_unmap+0x590/0x930
[  147.625544]  debug_dma_unmap_page+0x8c/0xa0
[  147.629736]  dma_unmap_page_attrs+0x130/0x1ec
[  147.634099]  dpaa2_eth_free_tx_fd.isra.0+0x324/0x3e4 [fsl_dpaa2_et]
[  147.640482]  dpaa2_eth_tx_conf+0x100/0x2e0 [fsl_dpaa2_eth]
[  147.645987]  dpaa2_eth_poll+0xdc/0x380 [fsl_dpaa2_eth]
[  147.651145]  __napi_poll.constprop.0+0x40/0x1a0
[  147.655685]  net_rx_action+0x310/0x3d4
[  147.659441]  __do_softirq+0x23c/0x6b4
[  147.663108]  do_softirq+0xc4/0xdc
[  147.666429]  __local_bh_enable_ip+0x1dc/0x1f0
[  147.670790]  ip_finish_output2+0x230/0x8e0
[  147.674895]  __ip_finish_output+0x12c/0x220
[  147.679085]  ip_finish_output+0x40/0xf4
[  147.682927]  ip_output+0xfc/0x2fc
[  147.686248]  __ip_queue_xmit+0x1c0/0x5e0
[  147.690177]  ip_queue_xmit+0x20/0x30
[  147.693758]  __tcp_transmit_skb+0x3c0/0x7cc
[  147.697948]  tcp_write_xmit+0x310/0x8ac
[  147.701788]  __tcp_push_pending_frames+0x48/0x110
[  147.706496]  tcp_push+0xbc/0x19c
[  147.709730]  tcp_sendmsg_locked+0x2ac/0xad4
[  147.713919]  tcp_sendmsg+0x44/0x6c
[  147.717326]  inet6_sendmsg+0x50/0x80
[  147.720908]  sock_sendmsg+0x60/0x70
[  147.724404]  __sys_sendto+0xc4/0x130
[  147.727984]  __arm64_sys_sendto+0x34/0x44
[  147.731997]  invoke_syscall+0x78/0x100
[  147.735754]  el0_svc_common.constprop.0+0x104/0x124
[  147.740639]  do_el0_svc+0x30/0x90
[  147.743962]  el0_svc+0x60/0x1a4
[  147.747110]  el0t_64_sync_handler+0x10c/0x140
[  147.751473]  el0t_64_sync+0x190/0x194
[  147.755140] irq event stamp: 28785
[  147.758544] hardirqs last  enabled at (28784): [<ffffb641a3770990>0
[  147.768120] hardirqs last disabled at (28785): [<ffffb641a3770b90>0
[  147.777434] softirqs last  enabled at (28778): [<ffffb641a358c980>0
[  147.786313] softirqs last disabled at (28779): [<ffffb641a26b24e8>c
[  147.794408] ---[ end trace 0000000000000000 ]---
[  147.799029] DMA-API: Mapped at:
[  147.802172]  debug_dma_map_page+0x70/0x110
[  147.806277]  dma_map_page_attrs+0x80/0xc0
[  147.810293]  dpaa2_eth_build_gso_fd.constprop.0+0x300/0x630 [fsl_d]
[  147.817276]  __dpaa2_eth_tx+0x478/0x850 [fsl_dpaa2_eth]
[  147.822520]  dpaa2_eth_tx+0x74/0x110 [fsl_dpaa2_eth]
[  147.827741] Unable to handle kernel paging request at virtual addr8
[  147.835673] Mem abort info:
[  147.838528]   ESR = 0x96000006
[  147.841592]   EC = 0x25: DABT (current EL), IL = 32 bits
[  147.846955]   SET = 0, FnV = 0
[  147.850018]   EA = 0, S1PTW = 0
[  147.853165]   FSC = 0x06: level 2 translation fault
[  147.858091] Data abort info:
[  147.860978]   ISV = 0, ISS = 0x00000006
[  147.864820]   CM = 0, WnR = 0
[  147.867835] swapper pgtable: 4k pages, 48-bit VAs, pgdp=00000000830
[  147.874546] [fffffc98cd000008] pgd=100000277c353003, p4d=1000002770
[  147.885411] Internal error: Oops: 96000006 [#1] SMP
[  147.890292] Modules linked in: rfkill fsl_dpaa2_ptp lm90 ltc2978 px
[  147.937766] CPU: 6 PID: 1077 Comm: ns-slapd Tainted: G        W   1
[  147.950712] Hardware name: SolidRun LX2160A Honeycomb (DT)
[  147.956189] pstate: a0400005 (NzCv daif +PAN -UAO -TCO -DIT -SSBS )
[  147.963145] pc : kfree+0xac/0x304
[  147.966463] lr : kfree+0x204/0x304
[  147.969859] sp : ffff80000b97b260
[  147.973165] x29: ffff80000b97b260 x28: ffff2653cc6f9700 x27: 000001
[  147.980300] x26: ffff2653c8d1f800 x25: ffff2653c7360cc0 x24: ffff20
[  147.987434] x23: ffffb641a46de008 x22: ffffb641a5396000 x21: ffffb8
[  147.994568] x20: ffff263340000000 x19: fffffc98cd000000 x18: ffffff
[  148.001702] x17: 6464612065636976 x16: ffffb641a29d75c0 x15: 000006
[  148.008836] x14: 0000000000000008 x13: 000000000000ffff x12: 000000
[  148.015970] x11: 0000000000000004 x10: 0000000000000001 x9 : ffffbc
[  148.023104] x8 : 0000000000000000 x7 : 0000000ffffff540 x6 : ffff20
[  148.030238] x5 : 0000000000000003 x4 : 0000000000000102 x3 : ffffb8
[  148.037372] x2 : ffff2654049fc000 x1 : 0000000000000101 x0 : fffff0
[  148.044506] Call trace:
[  148.046943]  kfree+0xac/0x304
[  148.049906]  dpaa2_eth_free_tx_fd.isra.0+0x358/0x3e4 [fsl_dpaa2_et]
[  148.056273]  dpaa2_eth_tx_conf+0x100/0x2e0 [fsl_dpaa2_eth]
[  148.061766]  dpaa2_eth_poll+0xdc/0x380 [fsl_dpaa2_eth]
[  148.066911]  __napi_poll.constprop.0+0x40/0x1a0
[  148.071440]  net_rx_action+0x310/0x3d4
[  148.075185]  __do_softirq+0x23c/0x6b4
[  148.078841]  do_softirq+0xc4/0xdc
[  148.082149]  __local_bh_enable_ip+0x1dc/0x1f0
[  148.086499]  ip_finish_output2+0x230/0x8e0
[  148.090592]  __ip_finish_output+0x12c/0x220
[  148.094770]  ip_finish_output+0x40/0xf4
[  148.098600]  ip_output+0xfc/0x2fc
[  148.101910]  __ip_queue_xmit+0x1c0/0x5e0
[  148.105827]  ip_queue_xmit+0x20/0x30
[  148.109398]  __tcp_transmit_skb+0x3c0/0x7cc
[  148.113575]  tcp_write_xmit+0x310/0x8ac
[  148.117404]  __tcp_push_pending_frames+0x48/0x110
[  148.122101]  tcp_push+0xbc/0x19c
[  148.125323]  tcp_sendmsg_locked+0x2ac/0xad4
[  148.129500]  tcp_sendmsg+0x44/0x6c
[  148.132896]  inet6_sendmsg+0x50/0x80
[  148.136466]  sock_sendmsg+0x60/0x70
[  148.139951]  __sys_sendto+0xc4/0x130
[  148.143519]  __arm64_sys_sendto+0x34/0x44
[  148.147521]  invoke_syscall+0x78/0x100
[  148.151266]  el0_svc_common.constprop.0+0x104/0x124
[  148.156140]  do_el0_svc+0x30/0x90
[  148.159450]  el0_svc+0x60/0x1a4
[  148.162587]  el0t_64_sync_handler+0x10c/0x140
[  148.166939]  el0t_64_sync+0x190/0x194
[  148.170597] Code: 8b130293 b25657e0 d34cfe73 8b131813 (f9400660) 
[  148.176686] ---[ end trace 0000000000000000 ]---
[  148.181295] Kernel panic - not syncing: Oops: Fatal exception in it
[  148.188161] SMP: stopping secondary CPUs
[  148.192092] Kernel Offset: 0x36419a600000 from 0xffff800008000000
[  148.198177] PHYS_OFFSET: 0xffffd9ccc0000000
[  148.202350] CPU features: 0x100,00004b09,00001086
[  148.207046] Memory Limit: none
[  148.210095] Rebooting in 1 seconds..
Comment 7 kernelbugs 2022-05-16 04:57:37 UTC
I'll try to build with a different uname next to confirm, but I do note the offsets have appeared to have changed, source was same git commit as previous.
Comment 8 The Linux kernel's regression tracker (Thorsten Leemhuis) 2022-05-18 07:49:13 UTC
Any progress here? Time to get this fixed in 5.18 is running out, as it likely will be released on Sunday.
Comment 9 Ioana Ciornei 2022-05-19 16:01:19 UTC
Created attachment 300998 [details]
fixup annotation field used + dma_unmap before access

Hi, I will give it one more try with the attached patched.

It seems that not only the position of the dma_unmap is wrong but also the software annotation field used for the DMA unmap size. I was getting it from the wrong field. This patch tries to fix both issues.

Unfortunately, I am still unable to reproduce this even though I am running a TCP Tx flow for some time now.

@reported, please give this new patch a try.
Comment 10 Ioana Ciornei 2022-05-20 15:28:23 UTC
@reporter, did you happen to give this a try? Also, can I contact you directly by email? Is the one in your username the correct one?
Comment 11 kernelbugs 2022-05-21 00:18:18 UTC
I will try now. If that does not work, I'll check out a source tree and bypass this clearly nice but not understood well by me Fedora kernel build system. I can receive email at this address.
Comment 12 kernelbugs 2022-05-21 05:26:49 UTC
Unfortunate update, ended up with a bad flash and need to do some complicated recovery. ETA to test run is 24h.
Comment 13 kernelbugs 2022-05-21 19:52:45 UTC
Tested now with latest patch, I still am seeing panics after `ethtool -K eth3 tso on`. If it helps, I'm using MTU 9000 and a dpl that breaks out to 4x 10g interfaces and 1x 1g interface, eth3 is one of the 10g PHYs. Now building with 5.18.0-0.rc7.20220519gitf993aed406ea.56.fc37.aarch64.

I'll try again to make sure I am using the latest DTB as well, but it appears the issue remains for me.
 
Trace:

[  122.735365] Unable to handle kernel paging request at virtual address fffffd2477000008
[  122.735919] Unable to handle kernel paging request at virtual address fffffd2477000008
[  122.743314] Mem abort info:
[  122.751243] Mem abort info:
[  122.751250]   ESR = 0x96000006
[  122.751256]   EC = 0x25: DABT (current EL), IL = 32 bits
[  122.751262]   SET = 0, FnV = 0
[  122.751267]   EA = 0, S1PTW = 0
[  122.751272]   FSC = 0x06: level 2 translation fault
[  122.751278] Data abort info:
[  122.751282]   ISV = 0, ISS = 0x00000006
[  122.754066]   ESR = 0x96000006
[  122.756852]   CM = 0, WnR = 0
[  122.759922]   EC = 0x25: DABT (current EL), IL = 32 bits
[  122.765229] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000083106000
[  122.768296]   SET = 0, FnV = 0
[  122.771432] [fffffd2477000008] pgd=100000277c353003
[  122.776300]   EA = 0, S1PTW = 0
[  122.779195] , p4d=100000277c353003
[  122.783022]   FSC = 0x06: level 2 translation fault
[  122.786068] , pud=100000277c352003
[  122.789048] Data abort info:
[  122.789054]   ISV = 0, ISS = 0x00000006
[  122.789059]   CM = 0, WnR = 0
[  122.789064] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000083106000
[  122.789071] [fffffd2477000008] pgd=100000277c353003, p4d=100000277c353003, pud=100000277c352003
[  122.794384] , pmd=0000000000000000
[  122.801096] , pmd=0000000000000000
[  122.804145] 
[  122.809036] 
[  122.812299] Internal error: Oops: 96000006 [#1] SMP
[  122.863418] Modules linked in: rfkill fsl_dpaa2_ptp ltc2978 pmbus_core lm90 at24 at803x ptp_qoriq fsl_dpaa2_eth pcs_lynx phylink xgmac_mdio vfat
 i2c_mux_pca954x fat i2c_mux sfp mdio_i2c qoriq_thermal layerscape_edac_mod qoriq_cpufreq auth_rpcgss fuse sunrpc dpaa2_caam caam_jr fsl_mc_dpio ca
amhash_desc xhci_plat_hcd rtc_pcf2127 caamalg_desc mmc_block regmap_spi crct10dif_ce ghash_ce libdes dpaa2_console dwc3 caam nvme nvme_core ulpi sd
hci_of_esdhc error ahci_qoriq udc_core sdhci_pltfm sbsa_gwdt rtc_fsl_ftm_alarm sdhci i2c_imx phy_fsl_lynx_28g gpio_keys
[  122.912294] CPU: 3 PID: 1789 Comm: sshd Not tainted 5.18.0-0.rc7.20220519gitf993aed406ea.56.fc37.aarch64 #1
[  122.922023] Hardware name: SolidRun LX2160A Honeycomb (DT)
[  122.927496] pstate: a0400005 (NzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  122.934447] pc : kfree+0xac/0x304
[  122.937760] lr : kfree+0x204/0x304
[  122.941151] sp : ffff80000b9531a0
[  122.944454] x29: ffff80000b9531a0 x28: ffff493e4ba0c600 x27: 0000000000000001
[  122.951581] x26: ffff493e46d9f800 x25: ffff493e48ca0cc0 x24: ffff493e4b03d000
[  122.958708] x23: ffffdc18370de008 x22: ffffdc1837d98000 x21: ffffdc17f6834d04
[  122.965835] x20: ffff491dc0000000 x19: fffffd2477000000 x18: 0000000000000000
[  122.972962] x17: 0000000000000000 x16: ffffdc18353d7690 x15: 0000000000000000
[  122.980089] x14: 0000000000000008 x13: 000000000000ffff x12: 0000000000000000
[  122.987216] x11: 0000000000000004 x10: 0000000000000001 x9 : ffffdc183517108c
[  122.994342] x8 : 0000000000000000 x7 : 0000000ffffe60c2 x6 : ffff493e40a8e600
[  123.001468] x5 : 0000000000000003 x4 : 0000000000000102 x3 : ffffdc18370de008
[  123.008594] x2 : ffff493e54028000 x1 : 0000000100000101 x0 : fffffc0000000000
[  123.015721] Call trace:
[  123.018156]  kfree+0xac/0x304
[  123.021114]  dpaa2_eth_free_tx_fd.isra.0+0x354/0x3e0 [fsl_dpaa2_eth]
[  123.027471]  dpaa2_eth_tx_conf+0x100/0x2e0 [fsl_dpaa2_eth]
[  123.032952]  dpaa2_eth_poll+0xdc/0x380 [fsl_dpaa2_eth]
[  123.038086]  __napi_poll.constprop.0+0x40/0x1a0
[  123.042608]  net_rx_action+0x310/0x3d4
[  123.046346]  __do_softirq+0x23c/0x6b4
[  123.049999]  do_softirq+0xc4/0xdc
[  123.053304]  __local_bh_enable_ip+0x1dc/0x1f0
[  123.057649]  ip_finish_output2+0x230/0x8e0
[  123.061737]  __ip_finish_output+0x12c/0x220
[  123.065910]  ip_finish_output+0x40/0xf4
[  123.069736]  ip_output+0xfc/0x2fc
[  123.073041]  __ip_queue_xmit+0x1c0/0x5e0
[  123.076954]  ip_queue_xmit+0x20/0x30
[  123.080519]  __tcp_transmit_skb+0x3c0/0x7cc
[  123.084693]  tcp_write_xmit+0x310/0x8ac
[  123.088518]  __tcp_push_pending_frames+0x48/0x110
[  123.093211]  tcp_rcv_established+0x420/0x950
[  123.097469]  tcp_v4_do_rcv+0x238/0x32c
[  123.101211]  __release_sock+0x64/0x11c
[  123.104951]  release_sock+0x44/0xe0
[  123.108430]  tcp_sendmsg+0x54/0x6c
[  123.111822]  inet_sendmsg+0x50/0x7c
[  123.115301]  sock_sendmsg+0x60/0x70
[  123.118777]  sock_write_iter+0x98/0xe0
[  123.122515]  new_sync_write+0x124/0x130
[  123.126341]  vfs_write+0x1c8/0x210
[  123.129732]  ksys_write+0xd8/0xec
[  123.133036]  __arm64_sys_write+0x28/0x34
[  123.136947]  invoke_syscall+0x78/0x100
[  123.140688]  el0_svc_common.constprop.0+0x68/0x124
[  123.145469]  do_el0_svc+0x30/0x90
[  123.148774]  el0_svc+0x60/0x1a4
[  123.151906]  el0t_64_sync_handler+0x10c/0x140
[  123.156253]  el0t_64_sync+0x190/0x194
[  123.159906] Code: 8b130293 b25657e0 d34cfe73 8b131813 (f9400660) 
[  123.165991] ---[ end trace 0000000000000000 ]---
[  123.170597] Kernel panic - not syncing: Oops: Fatal exception in interrupt
[  123.177460] SMP: stopping secondary CPUs
[  124.227686] SMP: failed to stop secondary CPUs 3,8
[  124.232468] Kernel Offset: 0x5c182d000000 from 0xffff800008000000
[  124.238550] PHYS_OFFSET: 0xffffb6e240000000
[  124.242720] CPU features: 0x100,00004b09,00001086
[  124.247412] Memory Limit: none
[  124.250457] Rebooting in 1 seconds..
[  125.254065] SMP: stopping secondary CPUs
[  126.304276] SMP: failed to stop secondary CPUs 3,8
Comment 14 kernelbugs 2022-05-21 20:01:43 UTC
Ensuring latest DTB/DTS did not change the result. My steps to reproduce:
- MTU 9000 and 10G DAC SFP+
- ethtool -K eth3 tso on
- ssh from another host
- type `dmesg` in ssh session
- observe panic on serial console
Comment 15 Ioana Ciornei 2022-05-22 13:00:00 UTC
Yeah, I am really sorry that it tool me so long to reproduce this. I was running with IOMMU passthrough which hid the bugs.


I just sent a patch set to upstream with the fixes: https://patchwork.kernel.org/project/netdevbpf/list/?series=644071&state=%2A&archive=both

Anyhow, thanks a lot for reporting this and again, sorry for the wait.