Bug 219159 - iwlwifi firmware crash with Intel AX210 on AP changing to 6 GHz channel
Summary: iwlwifi firmware crash with Intel AX210 on AP changing to 6 GHz channel
Status: CLOSED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: network-wireless-intel (show other bugs)
Hardware: Intel Linux
: P3 normal
Assignee: Default virtual assignee for network-wireless-intel
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-08-14 18:14 UTC by Miren Radia
Modified: 2025-04-10 16:29 UTC (History)
2 users (show)

See Also:
Kernel Version:
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Output of sudo dmesg -T | grep -i -E 'a6:00.0|wlp|iwl|80211' (10.30 KB, application/pgp-encrypted)
2024-08-14 18:14 UTC, Miren Radia
Details
iwlwifi fw crash dump (993.50 KB, application/pgp-encrypted)
2024-08-14 18:17 UTC, Miren Radia
Details
iw reg get output (6.10 KB, text/plain)
2024-08-15 08:33 UTC, Miren Radia
Details
Requested tcpdump during firmware crash (7.83 KB, application/pgp-encrypted)
2024-08-15 09:57 UTC, Miren Radia
Details
Output of iw phy0 channels (7.88 KB, text/plain)
2024-08-20 11:15 UTC, Miren Radia
Details
attachment-22265-0.html (2.01 KB, text/html)
2025-04-10 16:29 UTC, Emmanuel Grumbach
Details

Description Miren Radia 2024-08-14 18:14:58 UTC
Created attachment 306724 [details]
Output of sudo dmesg -T | grep -i -E 'a6:00.0|wlp|iwl|80211'

I am regularly experiencing crashes with the iwlwifi driver when connected to a Wi-Fi network at my workplace. Looking at the kernel and system logs, it always seems to happen after the following line

AP xx:xx:xx:xx:xx:xx changed bandwidth, new used config is 6375.000 MHz, width 3 (6345.000/0 MHz)

SYSTEM INFORMATION

Device: Framework Laptop 13 (Intel 12th Gen)
Wi-Fi card: Intel AX210
Distribution: Fedora 40 KDE
Kernel: 6.10.3-200.fc40.x86_64 (I experienced the problem with 6.9.13 too)
Firmware Version: 89.202a2f7b.0 ty-a0-gf-a0-89.ucode

WI-FI NETWORK INFORMATION

AP Make: HPE Aruba Networking
AP Model: AP-635 (Wi-Fi 6E)
Authentication: WPA2-PEAP with MS-CHAPv2

STEPS TO REPRODUCE

1. Connect to Wi-Fi network described above
2. Wait a couple of minutes

RESULT

A crash will occur at some point, sometimes on the initial connection and sometimes after around 30 minutes. The Wi-Fi network will disconnect.

EXPECTED RESULT

The firmware doesn't crash and my laptop remains connected to the Wi-Fi network.

THINGS I HAVE TRIED

Set options iwlmvm power_scheme=1 (doesn't help) 
Set options iwlwifi amsdu_size=3 (doesn't help)
Set options iwlwifi disable_11be=1 (doesn't help)
Set options iwlwifi disable_11ax=1 (resolves problem)

Of course, I don't want to disable Wi-Fi 6/6E with the last option if I can avoid it hence I am reporting this bug.

ATTACHMENTS

I have attached the output of dmesg encrypted using the GPG keys listed in https://wireless.wiki.kernel.org/en/users/drivers/iwlwifi/debugging. This output contains numerous crashes.
Comment 1 Miren Radia 2024-08-14 18:17:18 UTC
Created attachment 306725 [details]
iwlwifi fw crash dump

Here is a firmware dump from one of the crashes which I got by following the guide in https://wireless.wiki.kernel.org/en/users/drivers/iwlwifi/debugging. Again it is encrypted using the GPG keys therein.
Comment 2 Emmanuel Grumbach 2024-08-15 07:06:58 UTC
[Wed Aug 14 18:32:53 2024] wlp166s0: bad HE/EHT 6 GHz operation
[Wed Aug 14 18:32:53 2024] wlp166s0: AP appears to change mode (expected HE, found legacy), disconnect


Can you try to change the channel on which your AP operates in the 6 GHz band?
Apparently it is picking an illegal channel.
Comment 3 Miren Radia 2024-08-15 08:30:19 UTC
Unfortunately I don't manage this wireless network so can't change the channel. However, I can speak to the wireless infrastructure team and ask them about it.

What makes a channel illegal and how do we know this one is?
Comment 4 Miren Radia 2024-08-15 08:33:25 UTC
Created attachment 306730 [details]
iw reg get output

In case this is helpful, here is the output of

iw reg get
Comment 5 Emmanuel Grumbach 2024-08-15 08:59:31 UTC
To determine what goes wrong we need to see the beacon and the assoc response of the AP.
You can record this by using a monitor interface while attempting to connect.

sudo iw wlan0 interface add mon0 type monitor
sudo ifconfig mon0 up
sudo tcpdump -i mon0 -w mon0.pcap
Comment 6 Miren Radia 2024-08-15 09:43:27 UTC
Does this monitoring need to happen during a firmware crash (i.e. do I need to start it and wait for a crash)? Sometimes the connection will be stable for 30 mins or so before it crashes.
Comment 7 Miren Radia 2024-08-15 09:57:18 UTC
Created attachment 306737 [details]
Requested tcpdump during firmware crash

OK, I think I was able to get the tcpdump during a firmware crash. Please find it attached  (GPG encrypted as before). Looking at the the output of journalctl (not sure how to get dmesg to print the actual time), I can see a crash at 10:41:52 which is in the timerange included in the tcpdump:

Aug 15 10:41:52 framira kernel: wlp166s0: AP xx:xx:xx:xx:xx:xx changed bandwidth, new used config is 6055.000 MHz, width 3 (6025.000/0 MHz)
Aug 15 10:41:52 framira kernel: iwlwifi 0000:a6:00.0: Microcode SW error detected. Restarting 0x0.
Comment 8 Emmanuel Grumbach 2024-08-15 10:54:56 UTC
I can see the association shows that your AP advertises this:

6 GHz Operation Information
    Primary Channel: 53
    Control: 0x03
        .... ..11 = Channel Width: 160MHz or 80MHz+80MHz (3)
        .... .0.. = Duplicate Beacon: False
        ..00 0... = Regulatory Info: 0
        00.. .... = Reserved: 0x0
    Channel Center Frequency Segment 0: 47
    Channel Center Frequency Segment 1: 0
    Minimum Rate: 1


This seems an invalid channel 80MHz and since the Segment 1 is 0, we consider this a 80 MHz but I still need to ask someone who knows better than me.
Comment 9 Miren Radia 2024-08-15 12:54:13 UTC
Even if the channel is invalid, surely in that case the wireless driver should ignore it and definitely not crash due to it?
Comment 10 Miren Radia 2024-08-16 13:46:06 UTC
I switched from wpa_supplicant to iwd and am now getting lots of lines like

Aug 16 14:41:51 framira iwd[100110]: invalid HE capabilities for xx:xx:xx:xx:xx:xx

I guess this supports your suggestion that the AP is broadcasting on an invalid channel.

I think my previous point still stands that I don't think it should crash the whole driver.
Comment 11 Emmanuel Grumbach 2024-08-18 06:10:07 UTC
We agree on that point that we should be more robust and not crash against those buggy APs. This is why this bug is still open. We'll try to see how we can cope better with those APs.
Comment 12 Miren Radia 2024-08-20 11:15:50 UTC
Created attachment 306761 [details]
Output of iw phy0 channels

I've just noticed that from the output of

iw phy0 channels

it seems as if only 20 MHz widths are supported for 6 GHz. Is this behaviour correct?
Comment 13 Emmanuel Grumbach 2024-09-01 17:04:08 UTC
Can you give this a try?

diff --git a/net/mac80211/util.c b/net/mac80211/util.c
index aed72794d9fe..ef1748771157 100644
--- a/net/mac80211/util.c
+++ b/net/mac80211/util.c
@@ -3129,9 +3129,9 @@ bool ieee80211_chandef_he_6ghz_oper(struct ieee80211_local *local,
                        he_chandef.width = NL80211_CHAN_WIDTH_80;
                        break;
                case IEEE80211_HE_6GHZ_OPER_CTRL_CHANWIDTH_160MHZ:
-                       he_chandef.width = NL80211_CHAN_WIDTH_80;
                        if (!he_6ghz_oper->ccfs1)
-                               break;
+                               return false;
+                       he_chandef.width = NL80211_CHAN_WIDTH_80;
                        if (abs(he_6ghz_oper->ccfs1 - he_6ghz_oper->ccfs0) == 8)
                                he_chandef.width = NL80211_CHAN_WIDTH_160;
                        else
Comment 14 Emmanuel Grumbach 2024-09-02 07:56:03 UTC
There is a patch in 6.11 that does more validation on the channel, but I don't think it'll be enough to fix the problem you're facing.
Comment 15 Emmanuel Grumbach 2024-09-02 13:15:14 UTC
(In reply to Emmanuel Grumbach from comment #14)
> There is a patch in 6.11 that does more validation on the channel, but I
> don't think it'll be enough to fix the problem you're facing.

Strike that.

It should help.
Please try 6.11 or if you can, you can apply [1].

[1] - https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=91b193d546683558a8799ffb2e2f935d3800633e
Comment 16 Emmanuel Grumbach 2024-09-08 14:03:39 UTC
Did you have a chance to test?
Comment 17 Miren Radia 2024-09-08 14:10:00 UTC
Unfortunately not yet. I have been away from the office with these APs. I will try to find time to test this week.
Comment 18 Miren Radia 2024-09-12 21:51:15 UTC
(In reply to Emmanuel Grumbach from comment #15)
> (In reply to Emmanuel Grumbach from comment #14)
> > There is a patch in 6.11 that does more validation on the channel, but I
> > don't think it'll be enough to fix the problem you're facing.
> 
> Strike that.
> 
> It should help.
> Please try 6.11 or if you can, you can apply [1].
> 
> [1] -
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/
> ?id=91b193d546683558a8799ffb2e2f935d3800633e

I applied this patch and rebuilt 6.10.8 and I no longer observe firmware crashes. However, I still occasionally get disconnected from the AP when it tries to shift me onto the invalid 6 GHz channel. I think that's a separate issue though and I think this issue has been resolved.
Comment 19 Emmanuel Grumbach 2024-09-13 05:15:57 UTC
Thanks.
Comment 20 Avamander 2025-04-09 16:39:39 UTC
I think I've stumbled upon the exact same thing, but now with the guards forbidding valid operation.

> wlp4s0: bad HE/EHT 6 GHz operation
> wlp4s0: AP appears to change mode (expected HE, found legacy), disconnect

The AP is configured to operate on channel 69, CCFP0 being 71, CCFP1 being 79.

Which seems completely valid? It's also accepted by other client devices.

It would be very useful if logging were improved so that the aspect that doesn't satisfy Linux would be logged.

This would also help determine which component here is buggy.
Comment 21 Avamander 2025-04-09 16:44:03 UTC
Oh, and I'm currently running 6.11.0-21-generic (#21-Ubuntu) with linux-firmware 20240913 (gita34e7a5f-0ubuntu2.6).
Comment 22 Emmanuel Grumbach 2025-04-09 17:43:40 UTC
that is absolutely not the same problem.
Your AP seems to be sending broken information.
Comment 23 Avamander 2025-04-09 22:36:27 UTC
Okay, that's good to know. I just couldn't tell if these fixes were just to avoid a firmware bug and broke actually valid setups, or if they were added to achieve actually (more) standards-compliant operation. (I mean, 6GHz is still broken on Windows, so is LAR...)

Though again, the error messages printed as a result of these patches are not sufficient for determining which part is "broken information". So far I've got at least two vendors that seem to find it acceptable.

In any case, if you could refer to any section of any relevant standard that specifies how it's wrong it would be very appreciated. If I had that information I could try and go pass it on to HP.

As it stands right now, two major vendors seem incompatible with each other.
Comment 24 Emmanuel Grumbach 2025-04-10 04:21:25 UTC
You can refer to https://en.wikipedia.org/wiki/List_of_WLAN_channels to see what's valid and what is not.
Comment 25 Avamander 2025-04-10 09:11:24 UTC
I would need more than a Wikipedia article to say that though. I would guess the first response would anyways be that it's "central frequency" not "valid channel for primary operation". But I'll try.
Comment 26 Avamander 2025-04-10 14:50:58 UTC
Okay, so 802.11ax-2021 page 453 (table 26-15) says that for a 6GHz AP with 160MHz BSS channel width, CCFS1 must be greater than zero and the absolute difference between CCFS1 and CCFS0 must be exactly eight.

So if CCFS0 is 71 (also a valid channel) then 79 is the required value for CCFS1.

What's actually incorrect here?
Comment 27 Emmanuel Grumbach 2025-04-10 16:29:56 UTC
Created attachment 307947 [details]
attachment-22265-0.html

Sorry then,  I was confused. I suggest you report the problem on the wireless mailing list.

But clearly cfg80211 is complaining about a problem in the AP
________________________________
From: bugzilla-daemon@kernel.org <bugzilla-daemon@kernel.org>
Sent: Thursday, April 10, 2025 5:50:58 PM
To: Grumbach, Emmanuel <emmanuel.grumbach@intel.com>
Subject: [Bug 219159] iwlwifi firmware crash with Intel AX210 on AP changing to 6 GHz channel

https://bugzilla.kernel.org/show_bug.cgi?id=219159

--- Comment #26 from Avamander (avamander@gmail.com) ---
Okay, so 802.11ax-2021 page 453 (table 26-15) says that for a 6GHz AP with
160MHz BSS channel width, CCFS1 must be greater than zero and the absolute
difference between CCFS1 and CCFS0 must be exactly eight.

So if CCFS0 is 71 (also a valid channel) then 79 is the required value for
CCFS1.

What's actually incorrect here?

--
You may reply to this email to add a comment.

You are receiving this mail because:
You are on the CC list for the bug.

Note You need to log in before you can comment on or make changes to this bug.