Bug 215523

Summary: Firmware crash with 66.f1c864e0.0 (cc-a0-66.ucode) and 68.01d30b0c (cc-a0-68.ucode)
Product: Drivers Reporter: Udo Steinberg (udo)
Component: network-wireless-intelAssignee: Default virtual assignee for network-wireless-intel (drivers_network-wireless-intel)
Status: CLOSED CODE_FIX    
Severity: normal CC: golan.ben.ami, luca, regressions, t.wede
Priority: P1    
Hardware: Intel   
OS: Linux   
Kernel Version: 5.15.x, 5.16.x, 5.17.x Subsystem:
Regression: Yes Bisected commit-id:
Attachments: complete dmesg output
complete dmesg output
Disable TWT from client

Description Udo Steinberg 2022-01-23 11:03:09 UTC
Seeing the following firmware crash frequently with
firmware-version: 66.f1c864e0.0 cc-a0-66.ucode

[14699.974055] iwlwifi 0000:03:00.0: Microcode SW error detected. Restarting 0x0.
[14699.974484] iwlwifi 0000:03:00.0: Start IWL Error Log Dump:
[14699.974496] iwlwifi 0000:03:00.0: Transport status: 0x0000004A, valid: 6
[14699.974509] iwlwifi 0000:03:00.0: Loaded firmware version: 66.f1c864e0.0 cc-a0-66.ucode
[14699.974521] iwlwifi 0000:03:00.0: 0x00000071 | NMI_INTERRUPT_UMAC_FATAL    
[14699.974532] iwlwifi 0000:03:00.0: 0x000022F0 | trm_hw_status0
[14699.974541] iwlwifi 0000:03:00.0: 0x00000000 | trm_hw_status1
[14699.974549] iwlwifi 0000:03:00.0: 0x004FAA46 | branchlink2
[14699.974557] iwlwifi 0000:03:00.0: 0x004F13DE | interruptlink1
[14699.974565] iwlwifi 0000:03:00.0: 0x004F13DE | interruptlink2
[14699.974573] iwlwifi 0000:03:00.0: 0x00005D04 | data1
[14699.974581] iwlwifi 0000:03:00.0: 0x00001000 | data2
[14699.974588] iwlwifi 0000:03:00.0: 0x00000000 | data3
[14699.974596] iwlwifi 0000:03:00.0: 0x5F80B98E | beacon time
[14699.974604] iwlwifi 0000:03:00.0: 0xDA80E7BE | tsf low
[14699.974612] iwlwifi 0000:03:00.0: 0x000001D9 | tsf hi
[14699.974621] iwlwifi 0000:03:00.0: 0x00000000 | time gp1
[14699.974629] iwlwifi 0000:03:00.0: 0x1622E64D | time gp2
[14699.974636] iwlwifi 0000:03:00.0: 0x00000001 | uCode revision type
[14699.974645] iwlwifi 0000:03:00.0: 0x00000042 | uCode version major
[14699.974652] iwlwifi 0000:03:00.0: 0xF1C864E0 | uCode version minor
[14699.974662] iwlwifi 0000:03:00.0: 0x00000340 | hw version
[14699.974668] iwlwifi 0000:03:00.0: 0x00C89000 | board version
[14699.974675] iwlwifi 0000:03:00.0: 0x800EFC03 | hcmd
[14699.974682] iwlwifi 0000:03:00.0: 0x00020000 | isr0
[14699.974689] iwlwifi 0000:03:00.0: 0x00400000 | isr1
[14699.974695] iwlwifi 0000:03:00.0: 0x08F04802 | isr2
[14699.974702] iwlwifi 0000:03:00.0: 0x00C3780C | isr3
[14699.974710] iwlwifi 0000:03:00.0: 0x00000000 | isr4
[14699.974718] iwlwifi 0000:03:00.0: 0x0510001C | last cmd Id
[14699.974725] iwlwifi 0000:03:00.0: 0x00005D04 | wait_event
[14699.974733] iwlwifi 0000:03:00.0: 0x00000054 | l2p_control
[14699.974740] iwlwifi 0000:03:00.0: 0x00000000 | l2p_duration
[14699.974746] iwlwifi 0000:03:00.0: 0x0000000F | l2p_mhvalid
[14699.974754] iwlwifi 0000:03:00.0: 0x000000C7 | l2p_addr_match
[14699.974761] iwlwifi 0000:03:00.0: 0x00000008 | lmpm_pmg_sel
[14699.974768] iwlwifi 0000:03:00.0: 0x00000000 | timestamp
[14699.974776] iwlwifi 0000:03:00.0: 0x00004890 | flow_handler
[14699.974923] iwlwifi 0000:03:00.0: Start IWL Error Log Dump:
[14699.974933] iwlwifi 0000:03:00.0: Transport status: 0x0000004A, valid: 7
[14699.974945] iwlwifi 0000:03:00.0: 0x20003463 | ADVANCED_SYSASSERT
[14699.974953] iwlwifi 0000:03:00.0: 0x00000000 | umac branchlink1
[14699.974961] iwlwifi 0000:03:00.0: 0x80455A96 | umac branchlink2
[14699.974969] iwlwifi 0000:03:00.0: 0xC00811A4 | umac interruptlink1
[14699.974976] iwlwifi 0000:03:00.0: 0x00000000 | umac interruptlink2
[14699.974983] iwlwifi 0000:03:00.0: 0xDA80E7B1 | umac data1
[14699.974992] iwlwifi 0000:03:00.0: 0x1622E63F | umac data2
[14699.974999] iwlwifi 0000:03:00.0: 0xC5490101 | umac data3
[14699.975006] iwlwifi 0000:03:00.0: 0x00000042 | umac major
[14699.975013] iwlwifi 0000:03:00.0: 0xF1C864E0 | umac minor
[14699.975020] iwlwifi 0000:03:00.0: 0x1622E647 | frame pointer
[14699.975027] iwlwifi 0000:03:00.0: 0xC0885E08 | stack pointer
[14699.975035] iwlwifi 0000:03:00.0: 0x00AB010C | last host cmd
[14699.975042] iwlwifi 0000:03:00.0: 0x00000000 | isr status reg
[14699.975089] iwlwifi 0000:03:00.0: IML/ROM dump:
[14699.975096] iwlwifi 0000:03:00.0: 0x00000003 | IML/ROM error/state
[14699.975135] iwlwifi 0000:03:00.0: 0x00005A9A | IML/ROM data1
[14699.975170] iwlwifi 0000:03:00.0: 0x00000080 | IML/ROM WFPM_AUTH_KEY_0
[14699.975249] iwlwifi 0000:03:00.0: Fseq Registers:
[14699.975269] iwlwifi 0000:03:00.0: 0x60000000 | FSEQ_ERROR_CODE
[14699.975288] iwlwifi 0000:03:00.0: 0x80290021 | FSEQ_TOP_INIT_VERSION
[14699.975308] iwlwifi 0000:03:00.0: 0x00050008 | FSEQ_CNVIO_INIT_VERSION
[14699.975328] iwlwifi 0000:03:00.0: 0x0000A503 | FSEQ_OTP_VERSION
[14699.975348] iwlwifi 0000:03:00.0: 0x80000003 | FSEQ_TOP_CONTENT_VERSION
[14699.975368] iwlwifi 0000:03:00.0: 0x4552414E | FSEQ_ALIVE_TOKEN
[14699.975388] iwlwifi 0000:03:00.0: 0x00100530 | FSEQ_CNVI_ID
[14699.975407] iwlwifi 0000:03:00.0: 0x00000532 | FSEQ_CNVR_ID
[14699.975428] iwlwifi 0000:03:00.0: 0x00100530 | CNVI_AUX_MISC_CHIP
[14699.975450] iwlwifi 0000:03:00.0: 0x00000532 | CNVR_AUX_MISC_CHIP
[14699.975472] iwlwifi 0000:03:00.0: 0x05B0905B | CNVR_SCU_SD_REGS_SD_REG_DIG_DCDC_VTRIM
[14699.975494] iwlwifi 0000:03:00.0: 0x0000025B | CNVR_SCU_SD_REGS_SD_REG_ACTIVE_VDIG_MIRROR
[14699.975970] iwlwifi 0000:03:00.0: WRT: Collecting data: ini trigger 4 fired (delay=0ms).
[14699.975991] ieee80211 phy0: Hardware restart was requested
Comment 1 Udo Steinberg 2022-01-23 11:03:43 UTC
Created attachment 300306 [details]
complete dmesg output
Comment 2 t.wede 2022-01-29 21:43:47 UTC
I can confirm the problem. With Intel AX210 neither wifi nor bluetooth work.
This also affects the vanilla kernels 5.5.18 and 5.6.x!
with kernel 5.14.12 it works.
Comment 3 t.wede 2022-01-29 21:47:06 UTC
Sorry, I meant the problem also affects kernels 5.15.18 and 5.16.x!
Comment 4 t.wede 2022-01-30 11:27:05 UTC
In kernel 5.16.4 the problem persists.

Bluetooth seems to work, but generates the following errors:

[   49.898869] Bluetooth: RFCOMM TTY layer initialized
[   49.898879] Bluetooth: RFCOMM socket layer initialized
[   49.898884] Bluetooth: RFCOMM ver 1.11
[   51.610867] Bluetooth: hci0: Failed to read codec capabilities (-56)
[   51.611863] Bluetooth: hci0: Failed to read codec capabilities (-56)
[   51.612954] Bluetooth: hci0: Failed to read codec capabilities (-56)
[   51.613865] Bluetooth: hci0: Failed to read codec capabilities (-56)
[   51.614945] Bluetooth: hci0: Failed to read codec capabilities (-56)
[   51.615897] Bluetooth: hci0: Failed to read codec capabilities (-56)


For me Intel AX210 wifi works with the following temporary solution:

Replace the intel wifi driver with the version from kernel 5.14.21 - this will work.

Example:
1. rename linux-5.16.4/drivers/net/wireless/intel/iwlwifi, e.g. to iwlwifi.bak or delete it.
2. download the latest kernel 5.14.21. 
3. copy linux-5.14.21/drivers/net/wireless/intel/iwlwifi to linux-5.16.4/drivers/net/wireless/intel/iwlwifi
4. recompile the kernel.
Comment 5 Udo Steinberg 2022-02-08 19:09:36 UTC
Created attachment 300416 [details]
complete dmesg output
Comment 6 Udo Steinberg 2022-02-08 19:10:08 UTC
Problem also happens with 5.17.0-rc
Comment 7 Udo Steinberg 2022-02-08 19:12:22 UTC
Also happens with  firmware version 68.01d30b0c
Comment 8 The Linux kernel's regression tracker (Thorsten Leemhuis) 2022-02-09 09:08:04 UTC
Just noticed this bug, I wonder if this is a dupe of Bug 215488, which is to be fixed by https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?h=master&id=92883a524ae918736a7b8acef98698075507b8c1
Comment 9 Udo Steinberg 2022-02-09 10:02:18 UTC
It is probably a separate issue, because my kernels are compiled with

CONFIG_WLAN_VENDOR_INTEL=y
# CONFIG_IPW2100 is not set
# CONFIG_IPW2200 is not set
# CONFIG_IWL4965 is not set
# CONFIG_IWL3945 is not set
CONFIG_IWLWIFI=y
CONFIG_IWLWIFI_LEDS=y
# CONFIG_IWLDVM is not set
CONFIG_IWLMVM=y
# CONFIG_IWLWIFI_BCAST_FILTERING is not set

so broadcast filtering isn't compiled in.
Comment 10 The Linux kernel's regression tracker (Thorsten Leemhuis) 2022-02-09 10:08:08 UTC
(In reply to Udo Steinberg from comment #9)
> It is probably a separate issue, because my kernels are compiled with
> […]
> so broadcast filtering isn't compiled in.

But your dump mentioned the "ADVANCED_SYSASSERT" stuff mentioned in the compile, so I'd say it's worth testing the patch. If it doesn't help, I'll add this regression to the tracking
Comment 11 The Linux kernel's regression tracker (Thorsten Leemhuis) 2022-02-09 10:09:01 UTC
Asking the developers in the other bug for advice might be worth a shot as well.
Comment 12 Udo Steinberg 2022-02-09 12:07:54 UTC
I applied the patch from https://bugzilla.kernel.org/show_bug.cgi?id=215523#c8 to Linux-5.15.22, but the problem persists.
Comment 13 Golan Ben Ami 2022-02-09 13:19:15 UTC
Created attachment 300422 [details]
Disable TWT from client

Hi,

I wonder why i missed this bug from my query. will check.
anyway, this is a TWT related issue, mainly indicates a interoperability issue 
with the AP.
as this feature is not yet well enough supported in the echo system, i'd like to ask you to check the attached patch, and if it works for you i'll upstream it till the feature is better tested in the echo-system.
Comment 14 The Linux kernel's regression tracker (Thorsten Leemhuis) 2022-02-09 13:47:03 UTC
(In reply to Golan Ben Ami from comment #13)

many thx for looking into this!

> I wonder why i missed this bug from my query. will check.

Side note: to me it looks like bugs submitted against "Drivers->network-wireless-intel" are not mailed to any developers(¹). Is this known or did I misdiagnose things? 

(¹) it at least looks like it to me, when I first commented here only the reporter and the person that wrote comment 2 got a mail according to the info bugzilla showed.
Comment 16 Udo Steinberg 2022-02-09 22:14:35 UTC
> I assume we missed this bug because the summary doesn't consist
> iwlwifi/intel.
> we should probably relax our query.

I think it would be good to filter on the component field being "network-wireless-intel".
Comment 17 Udo Steinberg 2022-02-09 22:16:12 UTC
(In reply to Golan Ben Ami from comment #13)
> Created attachment 300422 [details]
> Disable TWT from client
> anyway, this is a TWT related issue, mainly indicates a interoperability
> issue with the AP. as this feature is not yet well enough supported in the
> echo system, i'd like to ask you to check the attached patch, and if it works
> for you i'll upstream it till the feature is better tested in the
> echo-system.

With this patch I haven't seen the issue reappear for 8 hours, so I think your analysis is spot on. Please go ahead and upstream that patch. Thanks!
Comment 18 Golan Ben Ami 2022-02-11 09:32:04 UTC
will do, marking this as resolved in the meanwhile.
Comment 19 The Linux kernel's regression tracker (Thorsten Leemhuis) 2022-02-11 12:13:14 UTC
(In reply to Golan Ben Ami from comment #18)
> marking this as resolved in the meanwhile.

I've seen you doing this in another bug as well. Is this really wise? That makes it even harder for users that run into the issue now to find it in this tracker, as only "open" bugs are searched by default. And there is always a risk that the patch gets forgotten for some reason. I'd say it should only be closed if the fix is mainlined.
Comment 20 The Linux kernel's regression tracker (Thorsten Leemhuis) 2022-02-17 06:53:26 UTC
(In reply to Golan Ben Ami from comment #18)
> marking this as resolved in the meanwhile.

What's the status of this? As the culprit apparently was backported it would be good to get this fixed rather sooner than later. On a quick search I couldn't find a patch on lore that looked similar to the one attached here about a week ago.
Comment 21 Golan Ben Ami 2022-02-27 08:42:34 UTC
Tend to agree, i'm re-opening till Luca will upstream the patch.
Comment 22 Luca Coelho 2022-03-01 07:37:24 UTC
The patch was sent out now:

https://lore.kernel.org/linux-wireless/20220301072926.153969-1-luca@coelho.fi

I'll close this bug.