Bug 219447 - iwlwifi: ax201: can't connect after resume - loop of Not associated and the session protection is over already - CSA mode=1 related
Summary: iwlwifi: ax201: can't connect after resume - loop of Not associated and the s...
Status: CLOSED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: network-wireless-intel (show other bugs)
Hardware: Intel Linux
: P3 high
Assignee: Default virtual assignee for network-wireless-intel
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-10-31 04:45 UTC by Rahul
Modified: 2024-11-19 11:16 UTC (History)
4 users (show)

See Also:
Kernel Version: 6.11..5
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
trace-cmd report > trace-cmd.txt (1.27 MB, text/plain)
2024-10-31 13:34 UTC, Rahul
Details
trace.dat (3.22 MB, application/octet-stream)
2024-10-31 13:48 UTC, Rahul
Details
Trace w/ driver patch (3.61 MB, application/octet-stream)
2024-11-02 23:19 UTC, Stars
Details
patch - don't dump FW state upon RFKILL in suspend (4.32 KB, patch)
2024-11-04 06:50 UTC, Emmanuel Grumbach
Details | Diff
don't dump FW log on rfkill in wowlan (4.13 KB, patch)
2024-11-04 09:04 UTC, Emmanuel Grumbach
Details | Diff
sudo trace-cmd record -T -e iwlwifi -e mac80211 -e cfg80211 -e mac80211_msg -e iwlwifi_dbg -e console (3.68 MB, application/octet-stream)
2024-11-04 22:56 UTC, Rahul
Details
fix candidate (1.53 KB, application/mbox)
2024-11-05 04:14 UTC, Emmanuel Grumbach
Details
sudo trace-cmd record -T -e iwlwifi -e mac80211 -e cfg80211 -e mac80211_msg -e iwlwifi_dbg -e console (3.53 MB, application/octet-stream)
2024-11-06 00:27 UTC, Rahul
Details
more prints in queue tracing (2.89 KB, patch)
2024-11-10 09:53 UTC, Emmanuel Grumbach
Details | Diff
more prints in queue tracing (3.32 KB, patch)
2024-11-10 10:04 UTC, Emmanuel Grumbach
Details | Diff
trace.dat without wake from sleep (3.82 MB, application/octet-stream)
2024-11-10 16:05 UTC, Rahul
Details
how about this (4.34 MB, application/octet-stream)
2024-11-10 16:27 UTC, Rahul
Details
check1 (3.79 MB, application/octet-stream)
2024-11-10 17:30 UTC, Rahul
Details
fix candidate (559 bytes, patch)
2024-11-10 18:52 UTC, Emmanuel Grumbach
Details | Diff
fix candidate (7.38 KB, patch)
2024-11-11 20:26 UTC, Emmanuel Grumbach
Details | Diff

Description Rahul 2024-10-31 04:45:05 UTC
Wi-Fi does not work after waking from sleep with the latest 6.11 kernel. This regression started with 6.11, as it works correctly on 6.10. To restore functionality, I need to reload the iwlmvm module driver. Disabling power management did not resolve the issue either. Wi-Fi networks are visible, but I am unable to connect to any network, even open ones.
Comment 1 Rahul 2024-10-31 04:53:03 UTC
(In reply to Rahul from comment #0)
> Wi-Fi does not work after waking from sleep with the latest 6.11 kernel.
> This regression started with 6.11, as it works correctly on 6.10. To restore
> functionality, I need to reload the iwlmvm module driver. Disabling power
> management did not resolve the issue either. Wi-Fi networks are visible, but
> I am unable to connect to any network, even open ones.



Operating System: Fedora Workstation 41
Kernel Version: 6.11.5-300.fc41.x86_64
Hardware: Mi Notebook Ultra with Intel Tiger Lake CPU (no dedicated GPU)
Wi-Fi Card: Intel Wi-Fi 6 AX201 (rev 20)
Comment 2 Emmanuel Grumbach 2024-10-31 07:48:59 UTC
Can you please share kernel log?
Would be useful to get tracing..
Comment 3 Rahul 2024-10-31 08:28:55 UTC
dmesg output

https://pastebin.com/8fsQtKGa


note:

It occurs randomly but most often happens when waking from sleep. Additionally, changing the frequency of the access point seems to increase the chances of reproducing the issue. In my case, I switch from dynamic to a fixed 5 GHz frequency. After failing to connect to the 5 GHz network, it also becomes unable to connect to any network, even if it is open.
Comment 4 Emmanuel Grumbach 2024-10-31 08:37:36 UTC
Ok... interesting...

can you record tracing of this?

sudo trace-cmd record -e iwlwifi -e mac80211 -e cfg80211 -e mac80211_dbg -e iwlwifi_dbg

?
Comment 5 Rahul 2024-10-31 08:39:28 UTC
(In reply to Emmanuel Grumbach from comment #4)
> Ok... interesting...
> 
> can you record tracing of this?
> 
> sudo trace-cmd record -e iwlwifi -e mac80211 -e cfg80211 -e mac80211_dbg -e
> iwlwifi_dbg
> 
> ?

it shows

sudo trace-cmd record -e iwlwifi -e mac80211 -e cfg80211 -e mac80211_dbg -e iwlwifi_dbg
trace-cmd: No such file or directory
  No events enabled with iwlwifi
Comment 6 Rahul 2024-10-31 08:41:45 UTC
This is the current output when I am unable to connect to Wi-Fi.
Comment 7 Emmanuel Grumbach 2024-10-31 09:14:44 UTC
can I ask you to recompile the kernel with IWLWIFI_TRACING?
Comment 8 Rahul 2024-10-31 09:18:12 UTC
let me try
Comment 9 Omer Sabic 2024-10-31 09:30:49 UTC
I am having the exact same issue with my Lenovo T15 G2 and F40.
Adapter:
Intel Wi-Fi 6E AX210/AX1675 2x2 [Typhoon Peak] driver: iwlwifi

Also for me, 6.11.4 does not fix it, the issue remains. Booting into 6.10 makes it work normally again.
Comment 10 Emmanuel Grumbach 2024-10-31 09:54:28 UTC
@Omer, please share your kernel log, I want to make sure it is really the same issue.
Comment 11 Rahul 2024-10-31 10:06:02 UTC
(In reply to Emmanuel Grumbach from comment #7)
> can I ask you to recompile the kernel with IWLWIFI_TRACING?

CONFIG_IWLWIFI_DEBUG=y
CONFIG_IWLWIFI_DEBUGFS=y
CONFIG_IWLWIFI_DEVICE_TRACING=y this work right?
Comment 12 Rahul 2024-10-31 10:06:15 UTC
(In reply to Emmanuel Grumbach from comment #7)
> can I ask you to recompile the kernel with IWLWIFI_TRACING?

CONFIG_IWLWIFI_DEBUG=y
CONFIG_IWLWIFI_DEBUGFS=y
CONFIG_IWLWIFI_DEVICE_TRACING=y this work right?
Comment 13 Emmanuel Grumbach 2024-10-31 10:08:55 UTC
yes.

Thanks!
Comment 14 Rahul 2024-10-31 12:49:01 UTC
(In reply to Emmanuel Grumbach from comment #13)
> yes.
> 
> Thanks!

done now what i do next
Comment 15 Rahul 2024-10-31 13:32:54 UTC
](In reply to Emmanuel Grumbach from comment #4)
> Ok... interesting...
> 
> can you record tracing of this?
> 
> sudo trace-cmd record -e iwlwifi -e mac80211 -e cfg80211 -e mac80211_dbg -e
> iwlwifi_dbg
> 
> ?

after doing this. i run 
trace-cmd report > trace-cmd.txt i uploaded it as attachment
Comment 16 Rahul 2024-10-31 13:34:20 UTC
Created attachment 307100 [details]
trace-cmd report > trace-cmd.txt
Comment 17 Emmanuel Grumbach 2024-10-31 13:35:05 UTC
Did you reproduce the bug while recording tracing?

Note that tracing needs to be running while you reproduce the bug.
Comment 18 Rahul 2024-10-31 13:36:26 UTC
(In reply to Emmanuel Grumbach from comment #17)
> Did you reproduce the bug while recording tracing?
> 
> Note that tracing needs to be running while you reproduce the bug.

yep i did when prodcing the issue
Comment 19 Emmanuel Grumbach 2024-10-31 13:45:18 UTC
Can you please compress and attach the raw trace.dat instead of parsing it yourself?

Thanks!
Comment 20 Rahul 2024-10-31 13:48:50 UTC
Created attachment 307101 [details]
trace.dat
Comment 21 Rahul 2024-10-31 13:49:17 UTC
(In reply to Emmanuel Grumbach from comment #19)
> Can you please compress and attach the raw trace.dat instead of parsing it
> yourself?
> 
> Thanks!

done
Comment 22 Omer Sabic 2024-11-01 10:00:20 UTC
(In reply to Emmanuel Grumbach from comment #10)
> @Omer, please share your kernel log, I want to make sure it is really the
> same issue.

Sorry for the late reply.
Meanwhile Fedora released kernel 6.11.5 to the update channel and I upgraded, but the problem persists.
However, if I set "Sleep state" in UEFI to "Windows and Linux" instead of "Linux S3", the problem is gone. But as I understand this consumes more power in sleep when not connected to AC, so it would be great if I could go back to S3.
Here's my dmesg log that you asked for:
https://pastebin.com/B3XhvSZ2
Comment 23 Emmanuel Grumbach 2024-11-01 10:27:27 UTC
@Omar, this is not related to the issue reported by Rahul.

Please open a new bug, CC me, and add the firmware debug dump.
The title of the bug should be:

iwlwifi AX210 ASSERT 87 upon resume from suspend.

Information on how to create such a dump can be found here:

https://wireless.docs.kernel.org/en/latest/en/users/drivers/iwlwifi/debugging.html#firmware-debugging

No need to trigger the dump through debugfs, it'll be created automatically because of the ASSERT.
All you need is to create /sbin/iwlfwdump.sh and to add the udev rule.

Thanks

@Rahul, you provided excellent data, we are analyzing it.
I assume we'll need more data from you soon when we'll have a a better idea of where to look for the problem. Thank you for your cooperation.
Comment 24 Emmanuel Grumbach 2024-11-01 10:27:49 UTC
(In reply to Emmanuel Grumbach from comment #23)
> @Omar

Sorry, I mean Omer.
Comment 25 Rahul 2024-11-01 14:28:19 UTC
(In reply to Emmanuel Grumbach from comment #23)
 
> @Rahul, you provided excellent data, we are analyzing it.
> I assume we'll need more data from you soon when we'll have a a better idea
> of where to look for the problem. Thank you for your cooperation.

Thank you for your assistance. Just let me know if you need anything.
Comment 26 Stars 2024-11-01 19:03:54 UTC
(In reply to Emmanuel Grumbach from comment #23)
> @Omar, this is not related to the issue reported by Rahul.
> 
> Please open a new bug, CC me, and add the firmware debug dump.
> The title of the bug should be:
> 
> iwlwifi AX210 ASSERT 87 upon resume from suspend.
> 
> Information on how to create such a dump can be found here:
> 
> https://wireless.docs.kernel.org/en/latest/en/users/drivers/iwlwifi/
> debugging.html#firmware-debugging
> 
> No need to trigger the dump through debugfs, it'll be created automatically
> because of the ASSERT.
> All you need is to create /sbin/iwlfwdump.sh and to add the udev rule.
> 
> Thanks
> 
> @Rahul, you provided excellent data, we are analyzing it.
> I assume we'll need more data from you soon when we'll have a a better idea
> of where to look for the problem. Thank you for your cooperation.

I believe I am having the same bug as @Omer would you be able to confirm this? If so should I go ahead and follow the same direction?
Comment 27 Stars 2024-11-01 19:05:06 UTC
> I believe I am having the same bug as @Omer would you be able to confirm
> this? If so should I go ahead and follow the same direction?

Sorry here is the paste... https://pastebin.com/cHVtBcMh
Comment 28 Emmanuel Grumbach 2024-11-02 16:32:12 UTC
@Stars, I can confirm you seem to face the same issue as Rahul.
Comment 29 Stars 2024-11-02 16:54:24 UTC
(In reply to Emmanuel Grumbach from comment #28)
> @Stars, I can confirm you seem to face the same issue as Rahul.

Okay, thank you for confirming. If you require a second set of debug data please let me know if it helps.
Comment 30 Emmanuel Grumbach 2024-11-02 19:29:02 UTC
@Stars, I guess you can provide the same data as Rahul, it'll allow to get more evidence of the problem.
I'll probably need to add more debug data in the code (will require to patch the driver) to see what's really going wrong.
It looks like a race in the Tx scheduling.

Thanks
Comment 31 Emmanuel Grumbach 2024-11-02 19:41:05 UTC
Actually...

sudo trace-cmd record -T -e iwlwifi -e mac80211 -e cfg80211 -e mac80211_msg -e iwlwifi_dbg

That will be better.

Thanks
Comment 32 Emmanuel Grumbach 2024-11-02 20:21:36 UTC
All right, I need to know why mac80211 stopped the queues.
Can you please add this patch:
diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index 2150496130ff..2a09ef8510fc 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -3853,6 +3853,7 @@ begin:
        spin_unlock_irqrestore(&local->queue_stop_reason_lock, flags);
 
        if (unlikely(q_stopped)) {
+               pr_err("%s - q_stopped 0x%08x\n", __func__, q_stopped);
                /* mark for waking later */
                set_bit(IEEE80211_TXQ_DIRTY, &txqi->flags);
                return NULL;


this print will appear in the kernel logs, but the best would be to capture then along with the other data through tracing:

sudo trace-cmd record -T -e iwlwifi -e mac80211 -e cfg80211 -e mac80211_msg -e iwlwifi_dbg -e console

Please also make sure you have CONFIG_MAC80211_MESSAGE_TRACING selected in the kernel compilation.

Thanks!
Comment 33 Stars 2024-11-02 23:19:46 UTC
Created attachment 307121 [details]
Trace w/ driver patch
Comment 34 Stars 2024-11-02 23:32:06 UTC
(In reply to Emmanuel Grumbach from comment #32)
> All right, I need to know why mac80211 stopped the queues.
> Can you please add this patch:
> diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
> index 2150496130ff..2a09ef8510fc 100644
> --- a/net/mac80211/tx.c
> +++ b/net/mac80211/tx.c
> @@ -3853,6 +3853,7 @@ begin:
>         spin_unlock_irqrestore(&local->queue_stop_reason_lock, flags);
>  
>         if (unlikely(q_stopped)) {
> +               pr_err("%s - q_stopped 0x%08x\n", __func__, q_stopped);
>                 /* mark for waking later */
>                 set_bit(IEEE80211_TXQ_DIRTY, &txqi->flags);
>                 return NULL;
> 
> 
> this print will appear in the kernel logs, but the best would be to capture
> then along with the other data through tracing:
> 
> sudo trace-cmd record -T -e iwlwifi -e mac80211 -e cfg80211 -e mac80211_msg
> -e iwlwifi_dbg -e console
> 
> Please also make sure you have CONFIG_MAC80211_MESSAGE_TRACING selected in
> the kernel compilation.
> 
> Thanks!

I am hoping I have done everything right, this was the first time I've ever compiled a kernel before. CONFIG_MAC80211_MESSAGE_TRACING was selected, and patch was added. File should be signed to you.
Comment 35 Emmanuel Grumbach 2024-11-03 15:07:53 UTC
First you did everything right.

Second, I can see a differently failure, or to be more precise, an additional failure:

0x00000087 | ADVANCED_SYSASSERT 

Because of that, we go into a mess of FW recovery, and then, we have the queues stuck:

 wpa_supplicant-650   [010]   145.756114: console:              wlp9s0: Inserted STA XX                                                                                                                                       
  wpa_supplicant-650   [010]   145.756119: console:              wlp9s0: authenticate with XX (local address=)                                                                                                
  wpa_supplicant-650   [010]   145.759106: console:              wlp9s0: send auth to XX (try 1/3)                                                                                                                             
    kworker/10:1-166   [010]   145.760108: console:              ieee80211_tx_dequeue - q_stopped 0x00000001   

This means that we have a bug in iwlwifi.
I'll dig.
In the meantime, if you can get a reproduction without the firmware crash (which needs to be debugged regardless..)
Comment 36 Emmanuel Grumbach 2024-11-03 15:08:17 UTC
BTW, the firmware crash you saw is exactly what Omer is seeing..
Comment 37 Emmanuel Grumbach 2024-11-03 15:32:10 UTC
Ahh.. Ok.
I think I understand.
Sorry for the noise.
No need for more data at this stage.

Thanks
Comment 38 Omer Sabic 2024-11-03 16:55:50 UTC
(In reply to Emmanuel Grumbach from comment #23)
> @Omar, this is not related to the issue reported by Rahul.
> 
> Please open a new bug, CC me, and add the firmware debug dump.
> The title of the bug should be:
> 
> iwlwifi AX210 ASSERT 87 upon resume from suspend.
> 
> Information on how to create such a dump can be found here:
> 
> https://wireless.docs.kernel.org/en/latest/en/users/drivers/iwlwifi/
> debugging.html#firmware-debugging
> 
> No need to trigger the dump through debugfs, it'll be created automatically
> because of the ASSERT.
> All you need is to create /sbin/iwlfwdump.sh and to add the udev rule.
> 
> Thanks
> 
> @Rahul, you provided excellent data, we are analyzing it.
> I assume we'll need more data from you soon when we'll have a a better idea
> of where to look for the problem. Thank you for your cooperation.

I hope I did it ok, here's the report:
https://bugzilla.kernel.org/show_bug.cgi?id=219460

I initially forgot to add you to CC and did it after it was already open :)
Comment 39 Watford 2024-11-04 01:15:20 UTC
I don't want to interfere with the above, but I have the same issue. The Wi-Fi controler doesn't appear in the GUI Network Manager. Can't connect to wifi.

System:
Kernel: 6.11.0-9-generic arch: x86_64 bits: 64
Desktop: MATE v: 1.26.2 Distro: Ubuntu 24.10 (Oracular Oriole)
Machine:
Type: Convertible System: HP product: HP ENVY x360 Convertible 15-ed1xxx
v: Type1ProductConfigId serial: CND0506KLY
Mobo: HP model: 8826 v: 48.37 serial: PKGKVD31WEKG0V UEFI: Insyde v: F.15
date: 04/28/2021

If I do 'dmesg | grep iwlwifi' I get:

[   14.038750] iwlwifi_compat: loading out-of-tree module taints kernel.
[   14.046028] Loading modules backported from iwlwifi
[   14.046034] iwlwifi-stack-public:release/core89:12325:dcfcbdc0
[   14.188838] iwlwifi 0000:00:14.3: Detected crf-id 0x3617, cnv-id 0x20000302 wfpm id 0x80000000
[   14.188868] iwlwifi 0000:00:14.3: PCI dev a0f0/0074, rev=0x351, rfid=0x10a100
[   14.188873] iwlwifi 0000:00:14.3: Detected Intel(R) Wi-Fi 6 AX201 160MHz
[   14.199951] iwlwifi 0000:00:14.3: TLV_FW_FSEQ_VERSION: FSEQ Version: 89.3.35.37
[   14.200345] iwlwifi 0000:00:14.3: loaded firmware version 77.85be44d3.0 QuZ-a0-hr-b0-77.ucode op_mode iwlmvm
[   16.681831] iwlwifi 0000:00:14.3: SecBoot CPU1 Status: 0x5893, CPU2 Status: 0x3
[   16.681866] iwlwifi 0000:00:14.3: WFPM_LMAC1_PD_NOTIFICATION: 0x0
[   16.681896] iwlwifi 0000:00:14.3: HPM_SECONDARY_DEVICE_STATE: 0x42
[   16.681927] iwlwifi 0000:00:14.3: WFPM_MAC_OTP_CFG7_ADDR: 0x0
[   16.681956] iwlwifi 0000:00:14.3: WFPM_MAC_OTP_CFG7_DATA: 0x0
[   16.681960] iwlwifi 0000:00:14.3: UMAC CURRENT PC: 0xa05c18
[   16.681964] iwlwifi 0000:00:14.3: LMAC1 CURRENT PC: 0xa05c1c
[   16.681969] iwlwifi 0000:00:14.3: WRT: Collecting data: ini trigger 13 fired (delay=0ms).
[   16.682073] iwlwifi 0000:00:14.3: Start IWL Error Log Dump:
[   16.682077] iwlwifi 0000:00:14.3: Transport status: 0x00000042, valid: 6
[   16.682082] iwlwifi 0000:00:14.3: Loaded firmware version: 77.85be44d3.0 QuZ-a0-hr-b0-77.ucode
[   16.682087] iwlwifi 0000:00:14.3: 0x00000034 | NMI_INTERRUPT_WDG           
[   16.682092] iwlwifi 0000:00:14.3: 0x000022F0 | trm_hw_status0
[   16.682096] iwlwifi 0000:00:14.3: 0x00000000 | trm_hw_status1
[   16.682100] iwlwifi 0000:00:14.3: 0x004C94FA | branchlink2
[   16.682104] iwlwifi 0000:00:14.3: 0x0001531C | interruptlink1
[   16.682108] iwlwifi 0000:00:14.3: 0x0001531C | interruptlink2
[   16.682111] iwlwifi 0000:00:14.3: 0x00014ECA | data1
[   16.682115] iwlwifi 0000:00:14.3: 0x0BADCAFE | data2
[   16.682119] iwlwifi 0000:00:14.3: 0x00000000 | data3
[   16.682122] iwlwifi 0000:00:14.3: 0x00000000 | beacon time
[   16.682126] iwlwifi 0000:00:14.3: 0x00000000 | tsf low
[   16.682130] iwlwifi 0000:00:14.3: 0x00000000 | tsf hi
[   16.682134] iwlwifi 0000:00:14.3: 0x00000000 | time gp1
[   16.682137] iwlwifi 0000:00:14.3: 0x0003D5AB | time gp2
[   16.682141] iwlwifi 0000:00:14.3: 0x00000001 | uCode revision type
[   16.682145] iwlwifi 0000:00:14.3: 0x0000004D | uCode version major
[   16.682149] iwlwifi 0000:00:14.3: 0x85BE44D3 | uCode version minor
[   16.682153] iwlwifi 0000:00:14.3: 0x00000351 | hw version
[   16.682157] iwlwifi 0000:00:14.3: 0x00C89001 | board version
[   16.682161] iwlwifi 0000:00:14.3: 0x00000000 | hcmd
[   16.682164] iwlwifi 0000:00:14.3: 0x00020000 | isr0
[   16.682168] iwlwifi 0000:00:14.3: 0x00000000 | isr1
[   16.682172] iwlwifi 0000:00:14.3: 0x08F00002 | isr2
[   16.682175] iwlwifi 0000:00:14.3: 0x00C0001C | isr3
[   16.682179] iwlwifi 0000:00:14.3: 0x00000000 | isr4
[   16.682183] iwlwifi 0000:00:14.3: 0x00000000 | last cmd Id
[   16.682186] iwlwifi 0000:00:14.3: 0x00014ECA | wait_event
[   16.682190] iwlwifi 0000:00:14.3: 0x00000000 | l2p_control
[   16.682194] iwlwifi 0000:00:14.3: 0x00000000 | l2p_duration
[   16.682197] iwlwifi 0000:00:14.3: 0x00000000 | l2p_mhvalid
[   16.682201] iwlwifi 0000:00:14.3: 0x00000000 | l2p_addr_match
[   16.682205] iwlwifi 0000:00:14.3: 0x0000004B | lmpm_pmg_sel
[   16.682208] iwlwifi 0000:00:14.3: 0x00000000 | timestamp
[   16.682212] iwlwifi 0000:00:14.3: 0x0000F81C | flow_handler
[   16.682258] iwlwifi 0000:00:14.3: Start IWL Error Log Dump:
[   16.682261] iwlwifi 0000:00:14.3: Transport status: 0x00000042, valid: 7
[   16.682265] iwlwifi 0000:00:14.3: 0x20000070 | NMI_INTERRUPT_LMAC_FATAL
[   16.682270] iwlwifi 0000:00:14.3: 0x00000000 | umac branchlink1
[   16.682274] iwlwifi 0000:00:14.3: 0x804561E2 | umac branchlink2
[   16.682278] iwlwifi 0000:00:14.3: 0x804737A2 | umac interruptlink1
[   16.682281] iwlwifi 0000:00:14.3: 0x804661EC | umac interruptlink2
[   16.682285] iwlwifi 0000:00:14.3: 0x00000400 | umac data1
[   16.682289] iwlwifi 0000:00:14.3: 0x804661EC | umac data2
[   16.682292] iwlwifi 0000:00:14.3: 0x00000000 | umac data3
[   16.682296] iwlwifi 0000:00:14.3: 0x0000004D | umac major
[   16.682299] iwlwifi 0000:00:14.3: 0x85BE44D3 | umac minor
[   16.682303] iwlwifi 0000:00:14.3: 0x0003D622 | frame pointer
[   16.682307] iwlwifi 0000:00:14.3: 0xC0887EE4 | stack pointer
[   16.682310] iwlwifi 0000:00:14.3: 0x00000000 | last host cmd
[   16.682314] iwlwifi 0000:00:14.3: 0x00200040 | isr status reg
[   16.682339] iwlwifi 0000:00:14.3: IML/ROM dump:
[   16.682343] iwlwifi 0000:00:14.3: 0x00000003 | IML/ROM error/state
[   16.682370] iwlwifi 0000:00:14.3: 0x00005893 | IML/ROM data1
[   16.682399] iwlwifi 0000:00:14.3: 0x00000080 | IML/ROM WFPM_AUTH_KEY_0
[   16.682427] iwlwifi 0000:00:14.3: Fseq Registers:
[   16.682448] iwlwifi 0000:00:14.3: 0x60000000 | FSEQ_ERROR_CODE
[   16.682474] iwlwifi 0000:00:14.3: 0x00290033 | FSEQ_TOP_INIT_VERSION
[   16.682499] iwlwifi 0000:00:14.3: 0x00090006 | FSEQ_CNVIO_INIT_VERSION
[   16.682523] iwlwifi 0000:00:14.3: 0x0000A482 | FSEQ_OTP_VERSION
[   16.682548] iwlwifi 0000:00:14.3: 0x00000003 | FSEQ_TOP_CONTENT_VERSION
[   16.682574] iwlwifi 0000:00:14.3: 0x4552414E | FSEQ_ALIVE_TOKEN
[   16.682599] iwlwifi 0000:00:14.3: 0x20000302 | FSEQ_CNVI_ID
[   16.682624] iwlwifi 0000:00:14.3: 0x01300504 | FSEQ_CNVR_ID
[   16.682664] iwlwifi 0000:00:14.3: 0x20000302 | CNVI_AUX_MISC_CHIP
[   16.682686] iwlwifi 0000:00:14.3: 0x01300504 | CNVR_AUX_MISC_CHIP
[   16.682712] iwlwifi 0000:00:14.3: 0x05B0905B | CNVR_SCU_SD_REGS_SD_REG_DIG_DCDC_VTRIM
[   16.682737] iwlwifi 0000:00:14.3: 0x0000025B | CNVR_SCU_SD_REGS_SD_REG_ACTIVE_VDIG_MIRROR
[   16.682759] iwlwifi 0000:00:14.3: 0x00000000 | FSEQ_PREV_CNVIO_INIT_VERSION
[   16.682784] iwlwifi 0000:00:14.3: 0x00290033 | FSEQ_WIFI_FSEQ_VERSION
[   16.682810] iwlwifi 0000:00:14.3: 0x00290033 | FSEQ_BT_FSEQ_VERSION
[   16.682835] iwlwifi 0000:00:14.3: 0x000000D4 | FSEQ_CLASS_TP_VERSION
[   16.682867] iwlwifi 0000:00:14.3: UMAC CURRENT PC: 0x804732b0
[   16.682889] iwlwifi 0000:00:14.3: LMAC1 CURRENT PC: 0xd0
[   16.682914] iwlwifi 0000:00:14.3: Failed to start RT ucode: -110
[   16.682919] iwlwifi 0000:00:14.3: WRT: Collecting data: ini trigger 13 fired (delay=0ms).
[   17.902943] iwlwifi 0000:00:14.3: Failed to run INIT ucode: -110


There is 13 'iwlwifi-ty-a0-gf-a0-xx.ucode' in /lib/firmware. And the above shows '13 fired'

I have no idea how to solve it.  Same phenomena, but don't know if same problem/issue then with your above discussion.

Thanks for any advice.

W
Comment 40 Emmanuel Grumbach 2024-11-04 04:54:30 UTC
@Wadford, this is a different issue. Please open a new bug if you want, but we must not mix issues in the same ticket.
Comment 41 Emmanuel Grumbach 2024-11-04 06:50:20 UTC
Created attachment 307129 [details]
patch - don't dump FW state upon RFKILL in suspend

@Rahul,

can you please add the patch attached?
It'll avoid printing the FW state when it's not really needed and can possibly fix the firmware load failure that you see later on.

Thanks
Comment 42 Rahul 2024-11-04 06:59:53 UTC
(In reply to Emmanuel Grumbach from comment #41)
> Created attachment 307129 [details]
> patch - don't dump FW state upon RFKILL in suspend
> 
> @Rahul,
> 
> can you please add the patch attached?
> It'll avoid printing the FW state when it's not really needed and can
> possibly fix the firmware load failure that you see later on.
> 
> Thanks

any other things to add in compile kernel other than CONFIG_IWLWIFI_DEBUG=y
CONFIG_IWLWIFI_DEBUGFS=y
CONFIG_IWLWIFI_DEVICE_TRACING=y  and this patch.
Comment 43 Emmanuel Grumbach 2024-11-04 07:01:29 UTC
Yes :)

The debug print patch from Comment#32

and CONFIG_MAC80211_MESSAGE_TRACING

Thanks!!
Comment 44 Stars 2024-11-04 07:40:39 UTC
(In reply to Emmanuel Grumbach from comment #36)
> BTW, the firmware crash you saw is exactly what Omer is seeing..

Shall I migrate over to Omer's bug report? Or was the crash just coincidence?
Comment 45 Emmanuel Grumbach 2024-11-04 07:47:03 UTC
(In reply to Stars from comment #44)
> (In reply to Emmanuel Grumbach from comment #36)
> > BTW, the firmware crash you saw is exactly what Omer is seeing..
> 
> Shall I migrate over to Omer's bug report? Or was the crash just coincidence?

Please do
Comment 46 Rahul 2024-11-04 07:51:16 UTC
(In reply to Emmanuel Grumbach from comment #43)
> Yes :)
> 
> The debug print patch from Comment#32
> 
> and CONFIG_MAC80211_MESSAGE_TRACING
> 
> Thanks!!

thanks compiling
Comment 47 Rahul 2024-11-04 08:45:15 UTC
(In reply to Rahul from comment #46)
> (In reply to Emmanuel Grumbach from comment #43)
> > Yes :)
> > 
> > The debug print patch from Comment#32
> > 
> > and CONFIG_MAC80211_MESSAGE_TRACING
> > 
> > Thanks!!
> 
> thanks compiling

is both patch up to date with latest kernel 6.11.6 because it not applying
Comment 48 Emmanuel Grumbach 2024-11-04 09:04:36 UTC
Created attachment 307130 [details]
don't dump FW log on rfkill in wowlan

Hi,

sorry, this is the patch based on 6.11.6
The previous one was based on our internal tree.
Comment 49 Watford 2024-11-04 12:27:13 UTC
(In reply to Emmanuel Grumbach from comment #40)
> @Wadford, this is a different issue. Please open a new bug if you want, but
> we must not mix issues in the same ticket.

OK but it seems I won't have to
After trying a few other things, last night I did this:

sudo mv /usr/lib/firmware/iwlwifi-ty-a0-gf-a0.pnvm.zst /usr/lib/firmware/iwlwifi-ty-a0-gf-a0.bak

I found here on bugzilla.kernel.org/, where it was said: ''After upgrading linux-firmware lib,.... always have to use this command `sudo mv /usr/lib/firmware/iwlwifi-ty-a0-gf-a0.pnvm /usr/lib/firmware/iwlwifi-ty-a0-gf-a0.bak` (https://askubuntu.com/questions/1360175/intel-wifi-6-ax210-wifi-not-working-after-update), and after this command, Wi-Fi module works.''

When firing up the laptop minutes ago, I am back to normal, like if the week-end being just a bad dream.

I'll come back here and issue a new ticket if the solution doesn't persist (i.e. if the issue comes back)
All my thanks for your pointing me in the right direction... for the amateur that I am.

W
Comment 50 Emmanuel Grumbach 2024-11-04 12:30:12 UTC
Hi,

This is really strange... you shouldn't have to disable the PNVM file (Which is what yo udo here) to have things working..
In any case, that's a different issue.
Comment 51 Rahul 2024-11-04 22:55:59 UTC
(In reply to Emmanuel Grumbach from comment #48)
> Created attachment 307130 [details]
> don't dump FW log on rfkill in wowlan
> 
> Hi,
> 
> sorry, this is the patch based on 6.11.6
> The previous one was based on our internal tree.

After applying both patches, the output shows of "sudo trace-cmd record -T -e iwlwifi -e mac80211 -e cfg80211 -e mac80211_msg -e iwlwifi_dbg -e console" attached
Comment 52 Rahul 2024-11-04 22:56:37 UTC
Created attachment 307137 [details]
sudo trace-cmd record -T -e iwlwifi -e mac80211 -e cfg80211 -e mac80211_msg -e iwlwifi_dbg -e console
Comment 53 Emmanuel Grumbach 2024-11-05 04:14:09 UTC
Created attachment 307140 [details]
fix candidate

Can you please apply the patch attached?

I believe it should fix the issue, or at least certain occurrences of the issues.
At this point, you can remove all the other patches I asked you to apply and use only this one.

Thnks for your cooperation!
Comment 54 Rahul 2024-11-05 20:06:39 UTC
(In reply to Emmanuel Grumbach from comment #53)
> Created attachment 307140 [details]
> fix candidate
> 
> Can you please apply the patch attached?
> 
> I believe it should fix the issue, or at least certain occurrences of the
> issues.
> At this point, you can remove all the other patches I asked you to apply and
> use only this one.
> 
> Thnks for your cooperation!

It looks like it's fixed the issue, but I will wait for a few days before jumping to a conclusion.
Comment 55 Rahul 2024-11-05 20:11:15 UTC
(In reply to Rahul from comment #54)
> (In reply to Emmanuel Grumbach from comment #53)
> > Created attachment 307140 [details]
> > fix candidate
> > 
> > Can you please apply the patch attached?
> > 
> > I believe it should fix the issue, or at least certain occurrences of the
> > issues.
> > At this point, you can remove all the other patches I asked you to apply
> and
> > use only this one.
> > 
> > Thnks for your cooperation!
> 
> It looks like it's fixed the issue, but I will wait for a few days before
> jumping to a conclusion.

nop its not fix
Comment 56 Emmanuel Grumbach 2024-11-05 20:16:21 UTC
Ok can you please tell us what this patch prints to the logs:

Note, it is very much like the previous one, but not exactly the same.
Please keep the patch attached in comment#53 and add this one on top.
Thanks!


diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index 2150496130ff..e92932b20227 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -3854,6 +3854,7 @@ begin:
 
        if (unlikely(q_stopped)) {
                /* mark for waking later */
+               pr_err("%s - q_stopped 0x%08x\n", __func__, local->queue_stop_reasons[q]);
                set_bit(IEEE80211_TXQ_DIRTY, &txqi->flags);
                return NULL;
        }
Comment 57 Rahul 2024-11-05 20:23:27 UTC
(In reply to Emmanuel Grumbach from comment #56)
> Ok can you please tell us what this patch prints to the logs:
> 
> Note, it is very much like the previous one, but not exactly the same.
> Please keep the patch attached in comment#53 and add this one on top.
> Thanks!
> 
> 
> diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
> index 2150496130ff..e92932b20227 100644
> --- a/net/mac80211/tx.c
> +++ b/net/mac80211/tx.c
> @@ -3854,6 +3854,7 @@ begin:
>  
>         if (unlikely(q_stopped)) {
>                 /* mark for waking later */
> +               pr_err("%s - q_stopped 0x%08x\n", __func__,
> local->queue_stop_reasons[q]);
>                 set_bit(IEEE80211_TXQ_DIRTY, &txqi->flags);
>                 return NULL;
>         }

is it 6.11.6 compatible because its not applying
Comment 58 Rahul 2024-11-05 20:27:48 UTC
diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index 2150496130ff..2a09ef8510fc 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -3830,6 +3830,7 @@
 	spin_unlock_irqrestore(&local->queue_stop_reason_lock, flags);
 
 	if (unlikely(q_stopped)) {
+       pr_err("%s - q_stopped 0x%08x\n", __func__, local->queue_stop_reasons[q]);
		/* mark for waking later */
 		set_bit(IEEE80211_TXQ_DIRTY, &txqi->flags);
 		return NULL;


this one okay
Comment 59 Emmanuel Grumbach 2024-11-05 20:34:59 UTC
yes, sorry..
Comment 60 Rahul 2024-11-06 00:23:24 UTC
(In reply to Emmanuel Grumbach from comment #59)
> yes, sorry..

same loop of "Nov 06 05:51:07 kernel: iwlwifi 0000:00:14.3: Not associated and the session protection is over already...
"
Comment 61 Rahul 2024-11-06 00:27:07 UTC
Created attachment 307150 [details]
sudo trace-cmd record -T -e iwlwifi -e mac80211 -e cfg80211 -e mac80211_msg -e iwlwifi_dbg -e console
Comment 62 Emmanuel Grumbach 2024-11-06 04:40:22 UTC
Can you please start tracing before going to suspend?
I can see that you started tracing upon resume.
Thanks!
Comment 63 Rahul 2024-11-06 15:13:45 UTC
(In reply to Emmanuel Grumbach from comment #62)
> Can you please start tracing before going to suspend?
> I can see that you started tracing upon resume.
> Thanks!

file is above limit so share drive link

https://drive.google.com/file/d/1Z_wAd_xwdoWVd3EjS5RgfDI80ot8JfzY/view?usp=drive_link
Comment 64 Rahul 2024-11-06 15:27:05 UTC
It works sometimes after waking up, but after that, it stops working, which is a known behavior from the initial bug report
Comment 65 Emmanuel Grumbach 2024-11-06 16:05:35 UTC
Thanks, I'll take a look. 

We need to focus on specific behaviors.

For now, I focus on the authentication frame not being sent. 

If there are cases where we associate successfully and fail later, it's a different issue.
Comment 66 Rahul 2024-11-06 16:07:48 UTC
(In reply to Emmanuel Grumbach from comment #65)
> Thanks, I'll take a look. 
> 
> We need to focus on specific behaviors.
> 
> For now, I focus on the authentication frame not being sent. 
> 
> If there are cases where we associate successfully and fail later, it's a
> different issue.

i think issue happens 5ghz channel
Comment 67 Rahul 2024-11-06 16:08:04 UTC
(In reply to Emmanuel Grumbach from comment #65)
> Thanks, I'll take a look. 
> 
> We need to focus on specific behaviors.
> 
> For now, I focus on the authentication frame not being sent. 
> 
> If there are cases where we associate successfully and fail later, it's a
> different issue.

i think issue happens 5ghz channel
Comment 68 Emmanuel Grumbach 2024-11-06 20:02:20 UTC
Thanks for the tracing.
I can see strange things there...
First, your AP seems to be switching channel all the time, and every time it changes channel, it prevents us from transmitting data.

This can cause lots of problems, but I do wonder how come it worked on 6.10...
I need to dig more in the data you provided.
Comment 69 Emmanuel Grumbach 2024-11-10 09:53:00 UTC
Created attachment 307193 [details]
more prints in queue tracing

Hi,

I am afraid I have to ask you more logs :(
I can see something that I cannot explain, hence the need to get more data.

Can you please apply the patch attached and reproduce the problem while recording tracing?

You can compress the trace.dat file if you want.
Thanks
Comment 70 Rahul 2024-11-10 09:55:52 UTC
Does tracing need to be enabled during sleep to wake?
and only this patch only right?
Comment 71 Rahul 2024-11-10 09:56:26 UTC
and only this patch need to apply right?
Comment 72 Emmanuel Grumbach 2024-11-10 10:01:47 UTC
(In reply to Rahul from comment #70)
> Does tracing need to be enabled during sleep to wake?

Yes please

> and only this patch only right?

Please keep the diff from comment#58.

I'll prepare a patch that includes both changes.
Comment 73 Emmanuel Grumbach 2024-11-10 10:04:19 UTC
Created attachment 307194 [details]
more prints in queue tracing
Comment 74 Rahul 2024-11-10 15:59:20 UTC
(In reply to Emmanuel Grumbach from comment #72)
> (In reply to Rahul from comment #70)
> > Does tracing need to be enabled during sleep to wake?
> 
> Yes please
> 
> > and only this patch only right?
> 
> Please keep the diff from comment#58.
> 
> I'll prepare a patch that includes both changes.

It's without sleep-to-wake because I tried it a few times, and it worked. However, this time, after I stopped debugging, the issue occurred, but I had already deleted the old trace by then.
Comment 75 Rahul 2024-11-10 16:05:07 UTC
Created attachment 307195 [details]
trace.dat without wake from sleep
Comment 76 Emmanuel Grumbach 2024-11-10 16:14:39 UTC
I can't see the start of the channel switch :(
Comment 77 Emmanuel Grumbach 2024-11-10 16:17:46 UTC
From what I see, the problem is not related to resume from sleep, but rather to the AP switching channel, and this happens only on the 5.2 GHz band
Comment 78 Rahul 2024-11-10 16:27:41 UTC
Created attachment 307196 [details]
how about this
Comment 79 Rahul 2024-11-10 16:30:42 UTC
(In reply to Emmanuel Grumbach from comment #77)
> From what I see, the problem is not related to resume from sleep, but rather
> to the AP switching channel, and this happens only on the 5.2 GHz band

But after encountering this issue, I tried with 2.4 GHz, but it was also unable to connect after the issue occurred.
Comment 80 Emmanuel Grumbach 2024-11-10 17:19:41 UTC
(In reply to Rahul from comment #78)
> Created attachment 307196 [details]
> how about this

nope.

Channel switch happened before.

I suggest that you disable wifi. Start tracing, enable wifi and wait until the problem happens.
Comment 81 Rahul 2024-11-10 17:30:37 UTC
Created attachment 307198 [details]
check1

check+1
Comment 82 Emmanuel Grumbach 2024-11-10 17:39:49 UTC
You haven't disabled wifi.

Please do the following.

1) Disable wifi
2) unload iwlmvm iwlwifi mac80211 cfg80211
3) load iwlwifi # that will load all the rest.
4) while Wifi is still disabled, start tracing
5) enable wifi

All the last logs didn't contain valuable information unfortunately.
Comment 83 Rahul 2024-11-10 18:10:19 UTC
(In reply to Emmanuel Grumbach from comment #82)
> You haven't disabled wifi.
> 
> Please do the following.
> 
> 1) Disable wifi
> 2) unload iwlmvm iwlwifi mac80211 cfg80211
> 3) load iwlwifi # that will load all the rest.
> 4) while Wifi is still disabled, start tracing
> 5) enable wifi
> 
> All the last logs didn't contain valuable information unfortunately.

https://drive.google.com/file/d/1vmPkS3LD9uUFGtr_W5jHcp2RpqiZ-Kja/view?usp=drive_link
Comment 84 Rahul 2024-11-10 18:17:08 UTC
if you don’t find it here, it might be because, in the previous case, it didn’t connect with the 5 GHz, so I changed the AP frequency to dynamic. That could be the reason.
Comment 85 Emmanuel Grumbach 2024-11-10 18:28:34 UTC
Yes - that one is good.
Will analyze a bit later.
Comment 86 Emmanuel Grumbach 2024-11-10 18:52:05 UTC
Created attachment 307199 [details]
fix candidate

Can you please try the patch attached?

I believe it should fix the problem.
I still need to discuss this with someone though.
Comment 87 Rahul 2024-11-11 05:59:44 UTC
(In reply to Emmanuel Grumbach from comment #86)
> Created attachment 307199 [details]
> fix candidate
> 
> Can you please try the patch attached?
> 
> I believe it should fix the problem.
> I still need to discuss this with someone though.

yep it fix the issue
Comment 88 Emmanuel Grumbach 2024-11-11 06:03:01 UTC
great.
I'll keep this bug open until I provide a fix that passes code review.
Comment 89 Emmanuel Grumbach 2024-11-11 20:26:45 UTC
Created attachment 307205 [details]
fix candidate

This is the (probably) finaly version of the fix.
Can you please give it a try?
Comment 90 Rahul 2024-11-12 06:34:30 UTC
Now the Wi-Fi isn’t completely dead, but the connection is unstable. After some time, it disconnects and won’t reconnect until the AP is restarted. After restarting the AP, Wi-Fi works for a while, but then the issue repeats.(In reply to Emmanuel Grumbach from comment #89)
> Created attachment 307205 [details]
> fix candidate
> 
> This is the (probably) finaly version of the fix.
> Can you please give it a try?

Now the Wi-Fi isn’t completely dead, but the connection is unstable. After some time, it disconnects and won’t reconnect until the AP is restarted. After restarting the AP, Wi-Fi works for a while, but then the issue repeats.
Comment 91 Rahul 2024-11-12 06:36:19 UTC
Sometimes, reconnecting also solves the issue.
Comment 92 Emmanuel Grumbach 2024-11-12 06:37:20 UTC
Can you try again the previous version of the fix to see if that one worked better?
I am very surprised because the fixes are equivalent, at least, they seem so.
Comment 93 Rahul 2024-11-12 06:39:49 UTC
(In reply to Emmanuel Grumbach from comment #92)
> Can you try again the previous version of the fix to see if that one worked
> better?
> I am very surprised because the fixes are equivalent, at least, they seem so.

one note is that previous was 6.11.6 and now i tested with 6.11.7 does it make change?
Comment 94 Emmanuel Grumbach 2024-11-12 07:17:28 UTC
Wait...

Did you recompile and reinstalled the back port driver? 

Or you change and recompile the whole kernel?
Comment 95 Rahul 2024-11-12 07:19:58 UTC
(In reply to Emmanuel Grumbach from comment #94)
> Wait...
> 
> Did you recompile and reinstalled the back port driver? 
> 
 i dont know how to do this

> Or you change and recompile the whole kernel?

yep this way
Comment 96 Emmanuel Grumbach 2024-11-12 08:57:41 UTC
So kernel version doesn't matter.

Can you please record tracing of that?
Comment 97 Rahul 2024-11-12 09:08:14 UTC
wait let me test more(In reply to Emmanuel Grumbach from comment #96)
> So kernel version doesn't matter.
> 
> Can you please record tracing of that?

wait let me test more
Comment 98 Rahul 2024-11-12 14:16:45 UTC
It happened only once, immediately after the first boot of this kernel, and now it’s not happening anymore. It looks good.(In reply to Emmanuel Grumbach from comment #96)
> So kernel version doesn't matter.
> 
> Can you please record tracing of that?
It happened only once, immediately after the first boot of this kernel, and now it’s not happening anymore. It looks good.
Comment 99 Emmanuel Grumbach 2024-11-12 14:33:49 UTC
Thanks

I'll close this bug.
The fix is in final stages of code review and it'll make its way upstream.

I'd like to thank you for your report and cooperation.
I had to make you work hard to provide the data I needed, but thanks to your efforts, the wifi stack in Linux is getting better.

Thank you!
Comment 100 Emmanuel Grumbach 2024-11-12 14:34:11 UTC
Closing. I'll still see any new comment added to this ticket.
Comment 101 Rahul 2024-11-12 14:43:27 UTC
(In reply to Emmanuel Grumbach from comment #99)
> Thanks
> 
> I'll close this bug.
> The fix is in final stages of code review and it'll make its way upstream.
> 
> I'd like to thank you for your report and cooperation.
> I had to make you work hard to provide the data I needed, but thanks to your
> efforts, the wifi stack in Linux is getting better.
> 
> Thank you!

Linux FTW
Comment 102 Rahul 2024-11-19 10:55:53 UTC
(In reply to Emmanuel Grumbach from comment #100)
> Closing. I'll still see any new comment added to this ticket.

Has it been merged upstream, and has it been backported to the 6.11 series?
Comment 103 Emmanuel Grumbach 2024-11-19 11:16:27 UTC
Not yet.
Our maintainer is going to start sending patches soon :)

Note You need to log in before you can comment on or make changes to this bug.