Created attachment 304519 [details] Partial dmesg output with the 3 seemingly identical stack traces Since commit 19898ce9cf8a the iwlwifi has generated three possibly identical kernel stack traces for me. Because I only use the Bluetooth but not the Wi-Fi functionality, this is not a big deal for me but I thought such an issue is worth reporting nontheless. All three traces point at **drivers/iommu/dma-iommu.c:693 __iommu_dma_unmap+0x150/0x160**. I'm attaching to this bug report the three stack traces along with other possibly relevant dmesg parts. Sorry in advance for not cutting at the cut here markers which resulted in considerably longer text but I suspected that the PCI, ACPI, memory and possibly iwlwifi related messages may be of importance, too. If I should cut the stack traces out and attach them as three distinct files (and diff to see if there's any change between them) let me know. I can provide a full (but redacted) dmesg output of a git master build, if required as well. I did try booting a much more recent git master build with *iommu.passthrough=0 iommu.strict=0* on the kernel command line but that did not seem to make any difference. ``` 19898ce9cf8a33e0ac35cb4c7f68de297cc93cb2 is the first bad commit commit 19898ce9cf8a33e0ac35cb4c7f68de297cc93cb2 Author: Johannes Berg <johannes.berg@intel.com> Date: Wed Jun 21 13:12:07 2023 +0300 wifi: iwlwifi: split 22000.c into multiple files Split the configuration list in 22000.c into four new files, per new device family, so we don't have this huge unusable file. Yes, this duplicates a few small things, but that's still much better than what we have now. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230621130443.7543603b2ee7.Ia8dd54216d341ef1ddc0531f2c9aa30d30536a5d@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com> drivers/net/wireless/intel/iwlwifi/Makefile | 1 + drivers/net/wireless/intel/iwlwifi/cfg/22000.c | 939 +----------------------- drivers/net/wireless/intel/iwlwifi/cfg/ax210.c | 452 ++++++++++++ drivers/net/wireless/intel/iwlwifi/cfg/bz.c | 523 +++++++++++++ drivers/net/wireless/intel/iwlwifi/cfg/sc.c | 214 ++++++ drivers/net/wireless/intel/iwlwifi/iwl-config.h | 2 + drivers/net/wireless/intel/iwlwifi/pcie/drv.c | 3 + 7 files changed, 1206 insertions(+), 928 deletions(-) create mode 100644 drivers/net/wireless/intel/iwlwifi/cfg/ax210.c create mode 100644 drivers/net/wireless/intel/iwlwifi/cfg/bz.c create mode 100644 drivers/net/wireless/intel/iwlwifi/cfg/sc.c ```
Created attachment 304520 [details] kernel configuration Sorry for initially forgetting to attach the kernel config, too.
As suggested by Johannes Berg via e-mail, I have re-assigned the issue to Intel Wireless and am providing the device information: ``` 00:14.3 Network controller: Intel Corporation Alder Lake-S PCH CNVi WiFi (rev 11) DeviceName: Onboard - Ethernet Subsystem: Intel Corporation Wi-Fi 6 AX201 160MHz Flags: bus master, fast devsel, latency 0, IRQ 18, IOMMU group 5 Memory at 6308124000 (64-bit, non-prefetchable) [size=16K] Capabilities: [c8] Power Management version 3 Capabilities: [d0] MSI: Enable- Count=1/1 Maskable- 64bit+ Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00 Capabilities: [80] MSI-X: Enable+ Count=16 Masked- Capabilities: [100] Latency Tolerance Reporting Capabilities: [164] Vendor Specific Information: ID=0010 Rev=0 Len=014 <?> Kernel driver in use: iwlwifi Kernel modules: iwlwifi ``` However, as can be seen in the dmesg I uploaded, there it's reported as AX211 and not AX201: `[ 8.745865] iwlwifi 0000:00:14.3: Detected Intel(R) Wi-Fi 6E AX211 160MHz, REV=0x430`
I can confirm similar 6.5 regression: no boot unless I blacklist iwlwifi. The BT works, as reported. 03:00.0 Network controller: Intel Corporation Device 2725 (rev 1a) Subsystem: Intel Corporation Device 0024 Flags: bus master, fast devsel, latency 0, IRQ 17 Memory at e1000000 (64-bit, non-prefetchable) [size=16K] Capabilities: [c8] Power Management version 3 Capabilities: [d0] MSI: Enable- Count=1/1 Maskable- 64bit+ Capabilities: [40] Express Endpoint, MSI 00 Capabilities: [80] MSI-X: Enable+ Count=16 Masked- Capabilities: [100] Advanced Error Reporting Capabilities: [14c] Latency Tolerance Reporting Capabilities: [154] L1 PM Substates Kernel driver in use: iwlwifi Kernel modules: iwlwifi
Here's the first stacktrace with **CONFIG_DMA_API_DEBUG=y** and **CONFIG_DMA_API_DEBUG_SG=y** set: ``` [ 11.119574] DMA-API: iwlwifi 0000:00:14.3: device driver frees DMA memory with different size [device address=0x00000000fe000000] [map size=8388608 bytes] [unmap size=16384 bytes] [ 11.119577] WARNING: CPU: 5 PID: 817 at kernel/dma/debug.c:978 check_unmap+0x413/0x950 [ 11.119580] Modules linked in: vfat fat snd_sof_pci_intel_tgl snd_sof_intel_hda_common snd_soc_acpi_intel_match snd_soc_acpi snd_soc_hdac_hda iwlmvm(+) snd_sof_pci snd_sof_xtensa_dsp soundwire_intel soundwire_generic_allocation intel_rapl_msr soundwire_cadence soundwire_bus intel_rapl_common snd_sof_intel_hda_mlink intel_tcc_cooling snd_sof_intel_hda mac80211 x86_pkg_temp_thermal snd_sof intel_powerclamp snd_sof_utils kvm_intel snd_hda_ext_core snd_soc_core snd_hda_codec_hdmi snd_compress snd_usb_audio(+) libarc4 kvm snd_hda_intel snd_intel_dspcfg btusb snd_intel_sdw_acpi snd_usbmidi_lib snd_hda_codec btrtl snd_ump iwlwifi btbcm snd_hwdep iTCO_wdt btmtk snd_hda_core snd_rawmidi pmt_telemetry btintel intel_pmc_bxt irqbypass mei_pxp mei_hdcp iTCO_vendor_support pmt_class snd_pcm snd_seq_device bluetooth rapl cfg80211 snd_timer mc intel_cstate mei_me igc joydev snd intel_uncore i2c_i801 mei idma64 thermal ecdh_generic intel_vsec soundcore i2c_smbus pcspkr rfkill virt_dma acpi_tad bfq kyber_iosched acpi_pad zram [ 11.119608] sch_fq_codel zsmalloc dm_mod fuse hid_logitech_hidpp hid_logitech_dj amdgpu i915 crc32_pclmul i2c_algo_bit drm_ttm_helper drm_suballoc_helper ttm amdxcp ghash_clmulni_intel gpu_sched drm_buddy sha512_ssse3 drm_display_helper cec video wmi pinctrl_alderlake nct6775 nct6775_core hwmon_vid coretemp [ 11.119618] CPU: 5 PID: 817 Comm: modprobe Tainted: G W T 6.4.0-dma-api-debug-10173-ga901a3568fd2 #140 [ 11.119619] Hardware name: ASUS System Product Name/ROG STRIX Z690-G GAMING WIFI, BIOS 2204 11/30/2022 [ 11.119620] RIP: 0010:check_unmap+0x413/0x950 [ 11.119622] Code: 4c 8b 37 4c 89 4c 24 08 e8 da 7e 83 00 4c 8b 4c 24 08 48 89 c6 4d 89 e8 4c 89 f9 4c 89 f2 48 c7 c7 c8 f0 71 b5 e8 3d 8b f3 ff <0f> 0b 48 c7 c7 4d c0 6a b5 e8 af a1 fc ff 8b 75 4c 48 8d 7d 50 31 [ 11.119623] RSP: 0018:ffffab4d01a738f0 EFLAGS: 00010046 [ 11.119624] RAX: 0000000000000000 RBX: ffffab4d01a73940 RCX: 0000000000000000 [ 11.119625] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [ 11.119625] RBP: ffff9333c176ba80 R08: 0000000000000000 R09: 0000000000000000 [ 11.119626] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffb6591e80 [ 11.119626] R13: 0000000000800000 R14: ffff9333c2791370 R15: 00000000fe000000 [ 11.119627] FS: 00007f2b2f5fb740(0000) GS:ffff933b1f540000(0000) knlGS:0000000000000000 [ 11.119628] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 11.119629] CR2: 000055ede56b3f28 CR3: 0000000109124000 CR4: 0000000000f50ee0 [ 11.119629] PKRU: 55555554 [ 11.119630] Call Trace: [ 11.119630] <TASK> [ 11.119631] ? check_unmap+0x413/0x950 [ 11.119632] ? __warn+0x81/0x130 [ 11.119634] ? check_unmap+0x413/0x950 [ 11.119635] ? report_bug+0x1a2/0x1d0 [ 11.119637] ? console_unlock+0x64/0x120 [ 11.119638] ? handle_bug+0x40/0x80 [ 11.119640] ? exc_invalid_op+0x17/0x80 [ 11.119641] ? asm_exc_invalid_op+0x1a/0x20 [ 11.119643] ? check_unmap+0x413/0x950 [ 11.119644] debug_dma_free_coherent+0xfd/0x120 [ 11.119646] ? preempt_count_add+0x7e/0xb0 [ 11.119647] ? _raw_spin_lock_irqsave+0x1b/0x60 [ 11.119649] ? __list_del_entry+0x9/0x30 [ 11.119651] ? _raw_spin_unlock_irqrestore+0x1f/0x50 [ 11.119652] ? __slab_free+0x503/0x550 [ 11.119654] ? _raw_spin_lock_irqsave+0x1b/0x60 [ 11.119655] ? lock_timer_base+0x61/0x90 [ 11.119657] dma_free_attrs+0x51/0xc0 [ 11.119659] iwl_txq_gen2_free_memory+0x3e/0x90 [iwlwifi] [ 11.119674] iwl_txq_gen2_free+0x58/0xf0 [iwlwifi] [ 11.119686] iwl_txq_gen2_tx_free+0x39/0x60 [iwlwifi] [ 11.119696] _iwl_trans_pcie_gen2_stop_device+0x2e7/0x300 [iwlwifi] [ 11.119707] iwl_trans_pcie_gen2_stop_device+0x58/0x80 [iwlwifi] [ 11.119717] iwl_mvm_stop_device+0x51/0x80 [iwlmvm] [ 11.119739] iwl_mvm_start_get_nvm+0x16d/0x1f0 [iwlmvm] [ 11.119753] iwl_op_mode_mvm_start+0x7c7/0x9c0 [iwlmvm] [ 11.119767] _iwl_op_mode_start+0x98/0xd0 [iwlwifi] [ 11.119776] iwl_opmode_register+0x6c/0xe0 [iwlwifi] [ 11.119785] ? __pfx_iwl_mvm_init+0x10/0x10 [iwlmvm] [ 11.119801] iwl_mvm_init+0x26/0xff0 [iwlmvm] [ 11.119815] ? __pfx_iwl_mvm_init+0x10/0x10 [iwlmvm] [ 11.119827] do_one_initcall+0x5a/0x300 [ 11.119829] do_init_module+0x60/0x250 [ 11.119831] init_module_from_file+0x17f/0x330 [ 11.119833] __x64_sys_finit_module+0x5e/0xc0 [ 11.119834] do_syscall_64+0x5d/0x90 [ 11.119836] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [ 11.119838] RIP: 0033:0x7f2b2fc6415d [ 11.119839] Code: 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 8b dc 0c 00 f7 d8 64 89 01 48 [ 11.119840] RSP: 002b:00007ffd6c361dd8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 [ 11.119841] RAX: ffffffffffffffda RBX: 00005556af01f080 RCX: 00007f2b2fc6415d [ 11.119841] RDX: 0000000000000000 RSI: 00005556aedce079 RDI: 0000000000000002 [ 11.119842] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 [ 11.119842] R10: 0000000000000002 R11: 0000000000000246 R12: 0000000000040000 [ 11.119843] R13: 0000000000000000 R14: 00005556aedce079 R15: 00005556af01f1b0 [ 11.119844] </TASK> [ 11.119844] ---[ end trace 0000000000000000 ]--- [ 11.119845] DMA-API: Mapped at: [ 11.119845] debug_dma_alloc_coherent+0x56/0x100 [ 11.119847] dma_alloc_attrs+0xb0/0x110 [ 11.119848] iwl_txq_alloc+0x190/0x270 [iwlwifi] [ 11.119859] iwl_txq_gen2_init+0xc6/0x150 [iwlwifi] [ 11.119868] iwl_trans_pcie_gen2_start_fw+0x262/0x5b0 [iwlwifi] ```
Created attachment 304534 [details] iwlwifi related dmesg output with DMA-API debugging enabled Here's the hopefully important bits of dmesg output from which the stack trace was cut out along with another stack trace after the first one.
Diff of lspci -vv from before commit (driver loaded) to today's master git (driver blacklisted): @@ -1,9 +1,8 @@ 03:00.0 Network controller: Intel Corporation Wi-Fi 6 AX210/AX211/AX411 160MHz (rev 1a) Subsystem: Intel Corporation Wi-Fi 6 AX210 160MHz - Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ + Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- - Latency: 0, Cache Line Size: 64 bytes - Interrupt: pin A routed to IRQ 17 + Interrupt: pin A routed to IRQ 9 Region 0: Memory at e1000000 (64-bit, non-prefetchable) [size=16K] Capabilities: [c8] Power Management version 3 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) @@ -32,7 +31,7 @@ Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1- EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- - Capabilities: [80] MSI-X: Enable+ Count=16 Masked- + Capabilities: [80] MSI-X: Enable- Count=16 Masked- Vector table: BAR=0 offset=00002000 PBA: BAR=0 offset=00003000 Capabilities: [100 v1] Advanced Error Reporting @@ -53,6 +52,4 @@ L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ T_CommonMode=0us LTR1.2_Threshold=64512ns L1SubCtl2: T_PwrOn=18us - Kernel driver in use: iwlwifi Kernel modules: iwlwifi
For the record: another report about this problem: https://lore.kernel.org/all/CAAJw_Zug6VCS5ZqTWaFSr9sd85k%3DtyPm9DEE%2BmV%3DAKoECZM%2BsQ@mail.gmail.com/
Larry looked at the culprit and spotted something that looked suspicious; he posted a patch and looks for testers: https://lore.kernel.org/all/0068af47-e475-7e8d-e476-c374e90dff5f@lwfinger.net/
Created attachment 304569 [details] tentative fix Can you try this patch?
Indeed, the crashing is resolved. Unfortunately I do not actually use the Wi-Fi capability, so I can't test how well it actually functions.
Created attachment 304578 [details] patch fixing the AX210 family of drivers for 6.5-rc1 Patch fixing the AX210 family of drivers for 6.5-rc1 . Tested on AX210, but the whole family should now work.
Oops, I posted my patch before I saw Johannes's. I can confirm my patch fixes my AX210. I restored the TFH lines in the ax210.c so it would be the same as the previously unsplit 22000.c , and restored the following code for these cards that I don't have, so I can't test if this is also needed: + const struct iwl_cfg iwl_cfg_ma_a0_hr_b0 = { + .fw_name_pre = IWL_MA_A_HR_B_FW_PRE, + .uhb_supported = true, + IWL_DEVICE_AX210, + .num_rbds = IWL_NUM_RBDS_AX210_HE, +}; + +const struct iwl_cfg iwl_cfg_ma_a0_gf_a0 = { + .fw_name_pre = IWL_MA_A_GF_A_FW_PRE, + .uhb_supported = true, + IWL_DEVICE_AX210, + .num_rbds = IWL_NUM_RBDS_AX210_HE, +}; + +const struct iwl_cfg iwl_cfg_ma_a0_gf4_a0 = { + .fw_name_pre = IWL_MA_A_GF4_A_FW_PRE, + .uhb_supported = true, + IWL_DEVICE_AX210, + .num_rbds = IWL_NUM_RBDS_AX210_HE, +}; + +const struct iwl_cfg iwl_cfg_ma_a0_mr_a0 = { + .fw_name_pre = IWL_MA_A_MR_A_FW_PRE, + .uhb_supported = true, + IWL_DEVICE_AX210, + .num_rbds = IWL_NUM_RBDS_AX210_HE, +}; + I tested with the 83 firmware, but previously I was using the 81 firmware: iwlwifi 0000:03:00.0: loaded firmware version 83.e8f84e98.0 ty-a0-gf-a0-83.ucode op_mode iwlmvm I also did this unrelated change (same one Larry spotted): --- a/drivers/net/wireless/intel/iwlwifi/cfg/22000.c +++ b/drivers/net/wireless/intel/iwlwifi/cfg/22000.c @@ -10,7 +10,7 @@ #include "fw/api/txq.h" /* Highest firmware API version supported */ -#define IWL_22000_UCODE_API_MAX 77 +#define IWL_22000_UCODE_API_MAX 81 and removed this duplicate line: diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/debugfs.c b/drivers/net/wireless/intel/iwlwifi/mvm/debugfs.c index cf27f106d4d5..7f48457a9d09 100644 --- a/drivers/net/wireless/intel/iwlwifi/mvm/debugfs.c +++ b/drivers/net/wireless/intel/iwlwifi/mvm/debugfs.c @@ -455,7 +455,6 @@ static ssize_t iwl_dbgfs_amsdu_len_write(struct ieee80211_link_sta *link_sta, if (amsdu_len) { mvm_link_sta->orig_amsdu_len = link_sta->agg.max_amsdu_len; link_sta->agg.max_amsdu_len = amsdu_len; - link_sta->agg.max_amsdu_len = amsdu_len; for (i = 0; i < ARRAY_SIZE(link_sta->agg.max_tid_amsdu_len); i++) link_sta->agg.max_tid_amsdu_len[i] = amsdu_len; } else { Johannes - I'll try your patch.
Jonathan, good find on the use_tfh! FWIW that's basically equivalent to my patch which just removes it and uses 'gen2' in place of it in the rest of the code :)
Yes Johannes, your patch fixes it for me.
Guys, is this a similar issue of mine? ( https://bugzilla.kernel.org/show_bug.cgi?id=217643 ) because seem the same problem (I noticed that issue while downloading only) If yes I'll link this thread
I'm no expert but I'd expect your issue to be unrelated, because this has nothing to do with utilization. Besides I doubt any of the kernels you're using are new enough that they could even theoretically contain the commit which introduced this bug.
(In reply to Niklāvs Koļesņikovs from comment #16) > I'm no expert but I'd expect your issue to be unrelated, because this has > nothing to do with utilization. Besides I doubt any of the kernels you're > using are new enough that they could even theoretically contain the commit > which introduced this bug. Mh, as I know AX210 got initial support in 5.10 but I'm not expert as well so I don't know why it doesen't work (even on 6.5-rc) nvm
(In reply to sepali_ardimento0e from comment #15) > Guys, is this a similar issue of mine? No, this issue is caused by a commit merged for 6.5 only while you see it on older kernels.
Linux 6.5-rc2 has been released with the fix, so I'm closing this as resolved. Thanks to everyone involved in dealing with this during the vacation season.
FWIW, on my XPS 13 9315, 6.5 kernels up till rc2 failed to boot (possibly due to this); rc2 does indeed boot, but the wifi still doesn't work. I've filed https://bugzilla.kernel.org/show_bug.cgi?id=217682 .