Bug 217782 - Random loss of wireless access with RTL8822CE
Summary: Random loss of wireless access with RTL8822CE
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: network-wireless (show other bugs)
Hardware: All Linux
: P3 normal
Assignee: drivers_network-wireless@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-08-10 14:17 UTC by Marco
Modified: 2023-12-19 08:39 UTC (History)
2 users (show)

See Also:
Kernel Version:
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Kernel log from the device itself (149.56 KB, text/plain)
2023-08-10 14:17 UTC, Marco
Details
rtw88 coex_info dump when the adapter work (3.22 KB, text/plain)
2023-08-10 14:21 UTC, Marco
Details
rtw88 coex_info dump when the adapter is dead (3.22 KB, text/plain)
2023-08-10 14:22 UTC, Marco
Details
rtw88 phy_info dump when the adapter work (1006 bytes, text/plain)
2023-08-10 14:22 UTC, Marco
Details
rtw88 phy_info dump when the adapter is dead (1009 bytes, text/plain)
2023-08-10 14:22 UTC, Marco
Details
add debug log for issue (4.39 KB, application/mbox)
2023-08-25 05:56 UTC, timlee
Details
Kernel dmesg (30.78 KB, text/plain)
2023-12-14 13:51 UTC, Marco
Details

Description Marco 2023-08-10 14:17:45 UTC
Created attachment 304808 [details]
Kernel log from the device itself

I encountered an extremely rare but annoying loss of wireless connection on a Steam Deck, which mount a RTL8822CE. While using Wi-Fi and Bluetooth peripherals it will randomly start outputting stuff like:
[ 8234.212938] rtw_8822ce 0000:03:00.0: coex request time out
[ 8234.216128] rtw_8822ce 0000:03:00.0: failed to send h2c command
[ 8234.226284] rtw_8822ce 0000:03:00.0: coex request time out
[ 8234.229458] rtw_8822ce 0000:03:00.0: failed to send h2c command
[ 8234.239607] rtw_8822ce 0000:03:00.0: coex request time out
[ 8234.242779] rtw_8822ce 0000:03:00.0: failed to send h2c command
[ 8234.256280] rtw_8822ce 0000:03:00.0: coex request time out
[ 8234.259470] rtw_8822ce 0000:03:00.0: failed to send h2c command
[ 8234.272954] rtw_8822ce 0000:03:00.0: coex request time out
[ 8235.725876] rtw_8822ce 0000:03:00.0: firmware failed to leave lps state
[ 8235.729078] rtw_8822ce 0000:03:00.0: failed to send h2c command
[ 8235.732206] rtw_8822ce 0000:03:00.0: failed to send h2c command
[ 8235.735346] rtw_8822ce 0000:03:00.0: failed to send h2c command
[ 8237.759933] rtw_8822ce 0000:03:00.0: failed to send h2c command
[ 8237.861966] rtw_8822ce 0000:03:00.0: firmware failed to leave lps state
[ 8237.865074] rtw_8822ce 0000:03:00.0: failed to send h2c command

And keep spamming over and over lps + h2c message until a full reconnection to the network, dmesg and debug registry from rtw88 attached.

Bluetooth peripherals keep working like nothing happened, wireless is completely dead. As soon as the device is powered off and on, the adapter start to work fine again.

Any further debugging steps?
Comment 1 Marco 2023-08-10 14:19:47 UTC
*Wireless adapter, not the whole computer, I should add. After a wireless power cycle the adapter start working again fine.
Comment 2 Marco 2023-08-10 14:21:44 UTC
Created attachment 304809 [details]
rtw88 coex_info dump when the adapter work
Comment 3 Marco 2023-08-10 14:22:06 UTC
Created attachment 304810 [details]
rtw88 coex_info dump when the adapter is dead
Comment 4 Marco 2023-08-10 14:22:29 UTC
Created attachment 304811 [details]
rtw88 phy_info dump when the adapter work
Comment 5 Marco 2023-08-10 14:22:48 UTC
Created attachment 304812 [details]
rtw88 phy_info dump when the adapter is dead
Comment 6 Ping-Ke Shih 2023-08-11 02:18:18 UTC
Just let you know I got your report, and my colleague is working on this issue. He will get back to you if any update.
Comment 7 timlee 2023-08-25 05:55:26 UTC
Hi Macro,

We have tried to reproduce the issue but failed.

Could you kindly help to merge the patch and provide the log if it happens?

Thanks.
Comment 8 timlee 2023-08-25 05:56:10 UTC
Created attachment 304935 [details]
add debug log for issue
Comment 9 Marco 2023-08-25 07:58:32 UTC
(In reply to timlee from comment #7)
> Hi Macro,
> 
> We have tried to reproduce the issue but failed.
> 
> Could you kindly help to merge the patch and provide the log if it happens?
> 
> Thanks.

Hi,

I have switched recently to another distro (https://github.com/ublue-os/bazzite) for other random issues, this uses the latest main stable kernel. 

Here the wireless issues are even worse than SteamOS, where just yesterday I had literally 5/10 random wireless lockups (meaning everything is still connected but no wireless traffic coming in or out of the adapter is detected) but on dmesg I really can't see anything meaningful as compared to 6.1 LTS. The best I got out today as logs is (no drops yet, unfortunately):

ago 25 08:40:52 *** kernel: wlo1: authenticate with ***
ago 25 08:40:52 *** kernel: wlo1: 80 MHz not supported, disabling VHT
ago 25 08:40:52 *** kernel: rtw_8822ce 0000:03:00.0: failed to do dpk calibration
ago 25 08:40:52 *** kernel: wlo1: send auth to *** (try 1/3)
ago 25 08:40:52 *** kernel: wlo1: authenticated
ago 25 08:40:52 *** kernel: wlo1: associate with *** (try 1/3)
ago 25 08:40:52 *** kernel: wlo1: RX AssocResp from *** (capab=0x1411 status=0 aid=2)
ago 25 08:40:52 *** kernel: wlo1: associated
ago 25 08:40:53 *** kernel: IPv6: ADDRCONF(NETDEV_CHANGE): wlo1: link becomes ready
ago 25 08:40:55 *** kernel: warning: `ThreadPoolForeg' uses wireless extensions which will stop working for Wi-Fi 7 hardware; use nl80211
ago 25 08:40:56 *** kernel: input: Steam Deck as /devices/pci0000:00/0000:00:08.1/0000:04:00.4/usb3/3-3/3-3:1.2/0003:28DE:1205.0003/input/input22
ago 25 08:40:58 *** kernel: input: Microsoft X-Box 360 pad 0 as /devices/virtual/input/input23
ago 25 08:41:00 *** kernel: cs35l41 spi-VLV1776:00: DSP1: Legacy support not available
ago 25 08:41:00 *** kernel: cs35l41 spi-VLV1776:01: DSP1: Legacy support not available
ago 25 08:41:08 *** kernel: rtw_8822ce 0000:03:00.0: timed out to flush queue 1
ago 25 08:41:08 *** kernel: wlo1: disconnect from AP *** for new auth to ***
ago 25 08:41:08 *** kernel: rtw_8822ce 0000:03:00.0: timed out to flush queue 1
ago 25 08:41:08 *** kernel: rtw_8822ce 0000:03:00.0: timed out to flush queue 1
ago 25 08:41:08 *** kernel: wlo1: authenticate with ***
ago 25 08:41:09 *** kernel: wlo1: send auth to *** (try 1/3)
ago 25 08:41:09 *** kernel: wlo1: authenticated
ago 25 08:41:09 *** kernel: wlo1: associate with *** (try 1/3)
ago 25 08:41:09 *** kernel: wlo1: RX ReassocResp from *** (capab=0x1011 status=0 aid=1)
ago 25 08:41:09 *** kernel: wlo1: associated

Fedora immutable (or the derivative I'm using now) builds seems to have issues at conserving logs older than the last update, so I don't have older details (or the issues of yesterday, sadly), but as soon as I'm able to reproduce them I'll check dmesg before rebooting to see if I can capture additional data.

Regarding the patch, I'll try applying it, as soon as I can understand how to do this on Fedora immutable. 

It would be ideal to have this debug code in kernel hidden behind a parameter driver by default, so people don't need to recompile the kernel just to get additional information regarding these issues, in my opinion.

Would love to debug this issue further, but it seems that there is no public documentation anywhere on this adapter (or the whole rtl88/89 family), so my debugging capabilities are quite limited.

Regards,

Marco.
Comment 10 timlee 2023-08-31 02:39:56 UTC
Hi Marco,

Thanks for your reply.
We are also saw the similar disconnect issue without meaningful log like you, but is even rare and hard to debug than on SteamOS because it could become normal later.

Could you kindly help to repo the disconnect issue on SteamOS, it could provide more log if wireless issue happens?

Thank for the support!
Comment 11 timlee 2023-08-31 03:16:13 UTC
Supplementary information on comment 10, we also saw similar issue on other platform/distro , not bazzite.
Comment 12 Marco 2023-09-07 16:47:12 UTC
The last two weeks has been spotless, no issue so far.

The annoying thing is the randomness of it. One day it might be unusable while for a month it will work fine, no issues whatsoever.

I'll keep it tracked and let you know when it breaks again.

It's definitively better on 6.4 series kernel compared to 6.1 LTS, so at least the driver seems to be on the right track (assuming it's the driver and not the firmware of the chip itself).

Marco.
Comment 13 Marco 2023-09-16 09:08:42 UTC
(In reply to timlee from comment #10)
> Hi Marco,
> 
> Thanks for your reply.
> We are also saw the similar disconnect issue without meaningful log like
> you, but is even rare and hard to debug than on SteamOS because it could
> become normal later.
> 
> Could you kindly help to repo the disconnect issue on SteamOS, it could
> provide more log if wireless issue happens?
> 
> Thank for the support!

Sadly I can not reinstall SteamOS, since it force itself in the integrated flash storage, where I have installed Bazzite.

I've tried ChimeraOS but it's not using the stock Valve kernel, so it's a no go for debugging.

HoloISO can't boot on my hardware, so that option is also a failure for testing.

I don't know how I can help repro this, but on Steam it's full of people with random lockup issues while using 5GHz wireless networks, one example here https://steamcommunity.com/app/1675200/discussions/1/3186864498794889745/. Maybe contacting Valve again might be the right thing to track down this, but at this point I doubt it will ever be fixed (not sure to which one to blame, and frankly I'm just interested in a working wireless adapter at this point)

On my end, since the driver is basically a black box with no documentation on what the hardware is doing digging deeper is practically impossible to do. The firmware itself AFAICS is basically barren of any useful debugging strings (the newer rtw89 has quite a bit more of debugging messages in after a casual run of strings on one of the binaries provided), but even with them since the driver do not print anything about where the firmware itself get stuck it would still be pointless to have them.

Compared to 5 years ago Realtek Wireless has improved quite a little bit (god, I still remember the pains to make your WiFi N USB adapter work even on Windows, never worked properly), but even today it's still not quite there. I seriously hope that your Windows drivers are better than the Linux one, though.

On a side note, besides bluetooth issues (I'm not sure if that your fault tho), on 6.4 it seems to be a little bit better. Having more documentation on how your hardware work in order to have other people outside your company (not sure how many devs are on Linux drivers, but I doubt it's a lot of people) being able to fix issues would not benefit only the affected people, but sales of your hardware devices itself.

Sorry for the rant, but I'm a little bit tired of all this random stuff.

Marco.
Comment 14 Marco 2023-11-19 10:14:32 UTC
And today it decided to break on me again on latest kernel 6.5.11. It seems that for some reason it decided to bounce back and forth beetween my two different APs for no good apparent reason (since I haven't moved at all, so no change in signal power should be detected anyway), without connecting enough to either of them to restore a proper signal, as shown below in the kernel dmesg:

[10887.280537] wlo1: authenticate with b0:be:76:xx:xx:xx
[10887.563598] wlo1: send auth to b0:be:76:xx:xx:xx (try 1/3)
[10887.625235] wlo1: authenticate with b0:be:76:xx:xx:xx
[10887.625289] wlo1: send auth to b0:be:76:xx:xx:xx (try 1/3)
[10887.668587] wlo1: authenticated
[10887.669510] wlo1: associate with b0:be:76:xx:xx:xx (try 1/3)
[10887.679997] wlo1: RX AssocResp from b0:be:76:xx:xx:xx (capab=0x111 status=0 aid=1)
[10887.680426] wlo1: associated
[10887.716743] wlo1: Limiting TX power to 0 (0 - 3) dBm as advertised by b0:be:76:xx:xx:xx
[12103.034593] wlo1: disconnect from AP b0:be:76:xx:xx:xx for new auth to a6:91:b1:xx:xx:xx
[12103.057337] wlo1: authenticate with a6:91:b1:xx:xx:xx
[12103.460956] wlo1: send auth to a6:91:b1:xx:xx:xx (try 1/3)
[12103.469418] wlo1: authenticated
[12103.470331] wlo1: associate with a6:91:b1:xx:xx:xx (try 1/3)
[12103.472644] wlo1: RX ReassocResp from a6:91:b1:xx:xx:xx (capab=0x1011 status=0 aid=3)
[12103.472989] wlo1: associated
[12103.483758] wlo1: Limiting TX power to 30 (30 - 0) dBm as advertised by a6:91:b1:xx:xx:xx
[12171.003588] wlo1: disconnect from AP a6:91:b1:xx:xx:xx for new auth to b0:be:76:xx:xx:xx
[12171.036051] wlo1: authenticate with b0:be:76:xx:xx:xx
[12171.350309] rtw_8822ce 0000:03:00.0: failed to do dpk calibration
[12171.437730] wlo1: send auth to b0:be:76:xx:xx:xx (try 1/3)
[12171.441143] wlo1: authenticated
[12171.443135] wlo1: associate with b0:be:76:xx:xx:xx (try 1/3)
[12171.450928] wlo1: RX ReassocResp from b0:be:76:xx:xx:xx (capab=0x111 status=30 aid=1)
[12171.450963] wlo1: b0:be:76:xx:xx:xx rejected association temporarily; comeback duration 1000 TU (1024 ms)
[12172.510177] wlo1: associate with b0:be:76:xx:xx:xx (try 2/3)
[12172.534628] wlo1: RX ReassocResp from b0:be:76:xx:xx:xx (capab=0x111 status=0 aid=1)
[12172.534969] wlo1: associated
[12172.551571] wlo1: Limiting TX power to 0 (0 - 3) dBm as advertised by b0:be:76:xx:xx:xx
[12321.061502] wlo1: Connection to AP b0:be:76:xx:xx:xx lost
[12321.647787] wlo1: authenticate with a6:91:b1:xx:xx:xx
[12322.062376] wlo1: send auth to a6:91:b1:xx:xx:xx (try 1/3)
[12322.065796] wlo1: authenticated
[12322.066827] wlo1: associate with a6:91:b1:xx:xx:xx (try 1/3)
[12322.068998] wlo1: RX AssocResp from a6:91:b1:xx:xx:xx (capab=0x1011 status=0 aid=3)
[12322.069338] wlo1: associated
[12322.113560] wlo1: Limiting TX power to 30 (30 - 0) dBm as advertised by a6:91:b1:xx:xx:xx
[12490.756070] wlo1: disconnect from AP a6:91:b1:xx:xx:xx for new auth to b0:be:76:xx:xx:xx
[12490.788639] wlo1: authenticate with b0:be:76:xx:xx:xx
[12491.188333] wlo1: send auth to b0:be:76:xx:xx:xx (try 1/3)
[12491.205784] wlo1: authenticated
[12491.206728] wlo1: associate with b0:be:76:xx:xx:xx (try 1/3)
[12491.214779] wlo1: RX ReassocResp from b0:be:76:xx:xx:xx (capab=0x131 status=53 aid=0)
[12491.214794] wlo1: b0:be:76:xx:xx:xx denied association (code=53)
[12491.222526] wlo1: authenticate with b0:be:76:xx:xx:xx
[12491.511825] wlo1: send auth to b0:be:76:xx:xx:xx (try 1/3)
[12491.573742] wlo1: authenticate with b0:be:76:xx:xx:xx
[12491.573800] wlo1: send auth to b0:be:76:xx:xx:xx (try 1/3)
[12491.617381] wlo1: authenticated
[12491.618754] wlo1: associate with b0:be:76:xx:xx:xx (try 1/3)
[12491.627558] wlo1: RX AssocResp from b0:be:76:xx:xx:xx (capab=0x111 status=0 aid=1)
[12491.627945] wlo1: associated
[12491.714259] wlo1: Limiting TX power to 23 (26 - 3) dBm as advertised by b0:be:76:xx:xx:xx

Couple of odd things I noticed:

[12171.350309] rtw_8822ce 0000:03:00.0: failed to do dpk calibration
These pop up over and over again on this device at random. If it's a critical failure coming from the driver (I have no clue of what dpk is, honestly, and from a quick kernel search I can't find anything conclusive in there) please fix it. If it's just a warning, just avoid to print them over and over again. Just print them once while making it clear to the user what they are and what problem can they cause.

[12172.551571] wlo1: Limiting TX power to 0 (0 - 3) dBm as advertised by b0:be:76:xx:xx:xx

Limiting tx to 0? I mean, I can see why the heck it's not transmitting to anything if the power is limited to nothing. Where that value is coming from? Bugged firmware or what?

Marco
Comment 15 Marco 2023-11-19 10:21:47 UTC
Oh, now the 0 tx values makes more sense if dpk is relevant to automatic gain control:
https://github.com/torvalds/linux/blob/037266a5f7239ead1530266f7d7af153d2a867fa/drivers/net/wireless/realtek/rtw88/rtw8822c.h#L105

Reasons for this stuff failing, Realtek?
Comment 16 Marco 2023-11-26 13:22:18 UTC
Today again, while starting a download for a game, it kept stop transmitting again, and after reconnection each time it will set tx power to 0 four times. No immediate failed dpk calibration before this, but I'll check the kernel log earlier to see it it appeared earlier.

Any help from Realtek here?
Comment 17 Marco 2023-11-26 15:46:36 UTC
Ok, it seems that the tx to 0 is just a red herring from my ap, see https://wireless.wiki.kernel.org/en/users/drivers/ath10k/faq#why_the_tx_rate_reported_to_user_space_is_wrong

So this seems to be again the usual firmware lockup with no useful messages in dmesg. Fun.
Comment 18 timlee 2023-11-28 06:13:28 UTC
Hi Marco,

Thanks for your analysis.

We want to check, when "while starting a download for a game," happen,
do the network still work before you reconnect?

Are there any log you can provide?

Could you help disable PS to check if the issue still happen,
and provide the information of AP, including name and setting?

Thanks!
Comment 19 Marco 2023-11-28 10:23:40 UTC
(In reply to timlee from comment #18)
> Hi Marco,
> 
> Thanks for your analysis.
> 
> We want to check, when "while starting a download for a game," happen,
> do the network still work before you reconnect?
> 
> Are there any log you can provide?
> 
> Could you help disable PS to check if the issue still happen,
> and provide the information of AP, including name and setting?
> 
> Thanks!

Nope, the network do not work on the Deck. It works fine on all my other devices, if you were referring to the actual radio in the AP. The AP is a Archer C7 V5 (it uses a combo of QCA9560 2.4 GHz + QCA9880 5 GHz radios) and it's running the latest version of OpenWRT on top. The settings for the two radios are show below as reported by hostapd:

WiFi 2.4 GHz
hostapd: Configuration file: data: driver=nl80211 logger_syslog=127 logger_syslog_level=2 logger_stdout=127 logger_stdout_level=2 country_code=IT ieee80211d=1 hw_mode=g supported_rates=60 90 120 180 240 360 480 540 basic_rates=60 120 240 beacon_int=100 chanlist=6 #num_global_macaddr=1 ieee80211n=1 ht_coex=0 ht_capab=[SHORT-GI-20][SHORT-GI-40][TX-STBC][RX-STBC1][DSSS_CCK-40] channel=6  interface=phy1-ap0 bssid=b0:be:76:xx:xx:xx ctrl_interface=/var/run/hostapd ap_isolate=1 bss_load_update_period=60 chan_util_avg_period=600 disassoc_low_ack=1 skip_inactivity_poll=0 preamble=1 wmm_enabled=1 ignore_broadcast_ssid=0 uapsd_advertisement_enabled=1 utf8_ssid=1 multi_ap=0 sae_require_mfp=1 sae_pwe=2 wpa_passphrase=xxx wpa_psk_file=/var/run/hostapd-phy1-ap0.psk auth_algs=1 wpa=2 wpa_pairwise=CCMP ssid=xxx bridge=br-lan wds_bridge= snoop_iface=br-lan ft_iface=br-lan mobility_domain=2d9f ft_psk_generate_local=1 ft_over_ds=0 reassociation_deadline=1000 wpa_disable_eapol_key_retries=1

WiFi 5 Ghz
hostapd: Configuration file: data: driver=nl80211 logger_syslog=127 logger_syslog_level=2 logger_stdout=127 logger_stdout_level=2 country_code=IT ieee80211d=1 ieee80211h=1 hw_mode=a beacon_int=100 chanlist=36 40 44 48 52 56 60 64 100 104 108 112 116 120 124 128 132 136 140 144 tx_queue_data2_burst=2.0 #num_global_macaddr=1 ieee80211n=1 ht_coex=0 ht_capab=[HT40+][LDPC][SHORT-GI-20][SHORT-GI-40][TX-STBC][RX-STBC1][MAX-AMSDU-7935][DSSS_CCK-40] ieee80211ac=1 vht_oper_chwidth=1 vht_oper_centr_freq_seg0_idx=-6 vht_capab=[RXLDPC][SHORT-GI-80][TX-STBC-2BY1][RX-ANTENNA-PATTERN][TX-ANTENNA-PATTERN][RX-STBC-1][MAX-MPDU-11454][MAX-A-MPDU-LEN-EXP7] channel=acs_survey  interface=phy0-ap0 bssid=b0:be:76:xx:xx:xx ctrl_interface=/var/run/hostapd ap_isolate=1 bss_load_update_period=60 chan_util_avg_period=600 disassoc_low_ack=1 skip_inactivity_poll=0 preamble=1 wmm_enabled=1 ignore_broadcast_ssid=0 uapsd_advertisement_enabled=1 utf8_ssid=1 multi_ap=0 sae_require_mfp=1 sae_pwe=2 wpa_passphrase=xxx

I have another AP provided by the operator that internally runs a Broadcom SoC IIRC, but I do not have control on their specific radio settings. They both use the same UUID for the Wireless name with the same password for all 4 radios. I've tried to split and disable power management before to no avail, I'll try again now and report back.

As an aside, what is the problem with low power management on modern wireless adapter and why it is always the first thing mentioned to cause issues? Is this driver/firmware issues caused by your specific drivers or something deeper?
Comment 20 Marco 2023-11-28 16:01:14 UTC
Just had the same issue after disabling wireless power management, so that's not the issue. BTW, I'm not sure if the reported 0 dBm is something expected when atk10k is used as an AP or if it's only Something that happens when it's used as a client AP. Might be worth looking into, since today the card is correctly reporting a sensible dBm coming out from the AP (26-3=23 dBm).
Comment 21 Marco 2023-12-03 18:47:58 UTC
It seems that this issue can still happens even if the data rate is idle, it just rarer. Any progress on this from Realtek?
Comment 22 Marco 2023-12-04 11:35:51 UTC
Since I've updated to 6.6.2 it seems to became as crap as on SteamOS 5.13 and 6.1LTS again. It immediately stop receiving data as soon as I start downloading anything from Steam. I'll take a look at changes in the 6.6 merge stuff on rtw88 but I doubt I'll see anything relevant.

Realtek, any progress here?
Comment 23 timlee 2023-12-04 13:29:20 UTC
Hi Marco,

Sorry for late response.

[ 8234.216128] rtw_8822ce 0000:03:00.0: failed to send h2c command

First you met the case "stop transmitting" (the log is above), we thought it is unknown behavior of Firmware after enter/leaving PS mode, and in deed we find PS mode triggered by firmware led to some issue in the past and have fixed some.

But we can't reproduce the issue like yours on in-hand Steam-deck with SteamOS, even we update image or the rtw88 driver of steamOS with differenat AP, or try it on general Notebook or chromebook.

And we can't know what happen for you meet now because we don't see the same log  after you update to Bazzite.

About "It seems that this issue can still happens even if the data rate is idle"
Do you mean you don't download file , just ping and failed?

We have merged debug patch to upstream, to help us dump the firmware status
https://lore.kernel.org/linux-wireless/20231016053554.744180-1-pkshih@realtek.com/T/#t 
Could you help enable the debug mask to check if some log is shown when the stop transmitting happens?

And here is some issue we met and beg your kindly help to check if which could improve the case you meet?

* set RTW_DBG_PHY for debugmask and  Disable edcca by /sys/kernel/debug/ieee80211/phyX/rtw88/edcca_enable 
(set 0 and you could see "EDCCA disabled, cannot be set")

* set module parameter rtw_pci_disable_aspm to false to disable PCIE ASPM.

Thanks again for your support and tolerate, we will keep debug issues of realtek driver.
Comment 24 Marco 2023-12-04 15:47:52 UTC
(In reply to timlee from comment #23)
> Hi Marco,
> 
> Sorry for late response.
> 
> [ 8234.216128] rtw_8822ce 0000:03:00.0: failed to send h2c command
> 
> First you met the case "stop transmitting" (the log is above), we thought it
> is unknown behavior of Firmware after enter/leaving PS mode, and in deed we
> find PS mode triggered by firmware led to some issue in the past and have
> fixed some.
> 
> But we can't reproduce the issue like yours on in-hand Steam-deck with
> SteamOS, even we update image or the rtw88 driver of steamOS with differenat
> AP, or try it on general Notebook or chromebook.
> 
> And we can't know what happen for you meet now because we don't see the same
> log  after you update to Bazzite.
> 
> About "It seems that this issue can still happens even if the data rate is
> idle"
> Do you mean you don't download file , just ping and failed?
> 
> We have merged debug patch to upstream, to help us dump the firmware status
> https://lore.kernel.org/linux-wireless/20231016053554.744180-1-
> pkshih@realtek.com/T/#t 
> Could you help enable the debug mask to check if some log is shown when the
> stop transmitting happens?
> 
> And here is some issue we met and beg your kindly help to check if which
> could improve the case you meet?
> 
> * set RTW_DBG_PHY for debugmask and  Disable edcca by
> /sys/kernel/debug/ieee80211/phyX/rtw88/edcca_enable 
> (set 0 and you could see "EDCCA disabled, cannot be set")
> 
> * set module parameter rtw_pci_disable_aspm to false to disable PCIE ASPM.
> 
> Thanks again for your support and tolerate, we will keep debug issues of
> realtek driver.

Yeah, I've tracked down one issue but that wasn't probably your cause. For whatever reason the system wasn't applying the correct regdom by passing module parameters back to userspace when set using cfg80211 module parameters. I have no clue why it failed, but it did (even if the parameter to cfg80211 was set correctly). Searching for an alternative method for forcing this seems to have remedied these random stalls in idle, and iw reg get report the correct country now. I still need to do additional testing tho.

Sorry for the noise.

BTW, thanks for the patch above, at least now I can enable it to see if stuff get printed in dmesg; however after switching to more recent kernel I haven't managed to get any h2c errors back in the logs, so I doubt I'll get much out of it.

I find odd that the adapter reports 0dBm as requested by the AP and sometimes correctly (26-3 in my country), not sure if that is just related by the mesh network I've have in my home. Might be worth to take a look, but it seems to be a bug for ath10k driver, but I'm not sure if it is something visible only when used as a client or also as an AP (like in my case). If it keeps giving me headaches like that I might search for an alternative OpenWRT node (not that any other clients has similar issues in my network, AFAIK, so I would like to avoid it if I can).

Never seen any PCIEx errors in the logs, so I doubt it's that, but I can try to disable it for now.

Thanks for the help, timlee.
Comment 25 Marco 2023-12-05 18:27:42 UTC
Well, after setting rtw_pci.disable_aspm it seems to already have improved quite a bit. Lately I could trigger this easily by running a speedtest and immediately get a hang, however after settings disable_aspm it does seem to keep trucking along fine. I've enabled the PHY debug flag, so if it happens again I'll surely post relevant logs here.

It's too early to tell, but since it seems to be triggered quite easily by starting to download stuff on a fast connection (2.5 gbit here) it makes sense that power transitioning being at least one of the factors here.

Let me know if I can debug this stuff further, and thanks for the help for now,

Marco.
Comment 26 Marco 2023-12-06 15:40:25 UTC
After disabling aspm I managed to reproduce it again, but toggling on debug on PHY didn't print anything at all. I've set the mask to 0xffffffff to force all debug messaging on and I'll see if this time I can get something out of it.
Comment 27 timlee 2023-12-11 11:38:27 UTC
Hi Marco,

About the "wlo1: Limiting TX power to 0 (0 - 3) dBm as advertised",
we think it is not the reason to trigger disconnect, because rtw88 will skip it and follow the power limit table which is approved by each country and stored in driver.

Thanks for your help again~
Comment 28 Marco 2023-12-11 13:32:01 UTC
Since I've updated the BIOS back to 119 even without ASPM off I haven't managed to repeat that just yet. It seems that ASPM on older BIOS versions (116 at least) of the Deck might have a factor in it.

I've had one last annoying issue last night while playing games online (so not really much traffic on it), where the adapter kept stuttering and switching APs over and over again for no apparent reason every couple of minutes (the device was stationary, so it shouldn't see any signal variation to speak of). The log is attached below. The only oddities that jump me was dpk failing quite often, not sure if it's related. Kernel 6.6.5.

[ 1270.383475] wlo1: disconnect from AP b0:be:76:xx:xx:xd for new auth to a6:91:b1:xx:xx:x9
[ 1270.402852] wlo1: authenticate with a6:91:b1:xx:xx:x9
[ 1270.818581] wlo1: send auth to a6:91:b1:xx:xx:x9 (try 1/3)
[ 1270.932081] wlo1: send auth to a6:91:b1:xx:xx:x9 (try 2/3)
[ 1270.940778] wlo1: authenticated
[ 1270.943024] wlo1: associate with a6:91:b1:xx:xx:x9 (try 1/3)
[ 1270.947352] wlo1: RX ReassocResp from a6:91:b1:xx:xx:x9 (capab=0x1011 status=0 aid=3)
[ 1270.947734] wlo1: associated
[ 1270.986684] wlo1: Limiting TX power to 30 (30 - 0) dBm as advertised by a6:91:b1:xx:xx:x9
[ 1318.407478] wlo1: disconnect from AP a6:91:b1:xx:xx:x9 for new auth to b0:be:76:xx:xx:xd
[ 1318.429378] wlo1: authenticate with b0:be:76:xx:xx:xd
[ 1318.835130] wlo1: send auth to b0:be:76:xx:xx:xd (try 1/3)
[ 1318.842463] wlo1: b0:be:76:xx:xx:xd denied authentication (status 1)
[ 1319.379824] wlo1: authenticate with b0:be:76:xx:xx:xe
[ 1319.379847] wlo1: 80 MHz not supported, disabling VHT
[ 1319.765205] wlo1: send auth to b0:be:76:xx:xx:xe (try 1/3)
[ 1319.790447] wlo1: b0:be:76:xx:xx:xe denied authentication (status 1)
[ 1320.680261] wlo1: authenticate with a6:91:b1:xx:xx:x9
[ 1321.095388] wlo1: send auth to a6:91:b1:xx:xx:x9 (try 1/3)
[ 1321.098774] wlo1: authenticated
[ 1321.099610] wlo1: associate with a6:91:b1:xx:xx:x9 (try 1/3)
[ 1321.105404] wlo1: RX ReassocResp from a6:91:b1:xx:xx:x9 (capab=0x1011 status=0 aid=2)
[ 1321.105911] wlo1: associated
[ 1321.205891] wlo1: Limiting TX power to 30 (30 - 0) dBm as advertised by a6:91:b1:xx:xx:x9
[ 1335.408852] wlo1: disconnect from AP a6:91:b1:xx:xx:x9 for new auth to b0:be:76:xx:xx:xd
[ 1335.412284] wlo1: authenticate with b0:be:76:xx:xx:xd
[ 1335.724552] rtw_8822ce 0000:03:00.0: failed to do dpk calibration
[ 1335.811918] wlo1: send auth to b0:be:76:xx:xx:xd (try 1/3)
[ 1335.916444] wlo1: send auth to b0:be:76:xx:xx:xd (try 2/3)
[ 1335.923094] wlo1: b0:be:76:xx:xx:xd denied authentication (status 1)
[ 1338.953545] wlo1: authenticate with b0:be:76:xx:xx:xe
[ 1338.953600] wlo1: 80 MHz not supported, disabling VHT
[ 1339.333902] wlo1: send auth to b0:be:76:xx:xx:xe (try 1/3)
[ 1339.337344] wlo1: b0:be:76:xx:xx:xe denied authentication (status 1)
[ 1342.768292] wlo1: authenticate with a6:91:b1:xx:xx:x9
[ 1343.085410] rtw_8822ce 0000:03:00.0: failed to do dpk calibration
[ 1343.174807] wlo1: send auth to a6:91:b1:xx:xx:x9 (try 1/3)
[ 1343.178516] wlo1: authenticated
[ 1343.179258] wlo1: associate with a6:91:b1:xx:xx:x9 (try 1/3)
[ 1343.184705] wlo1: RX ReassocResp from a6:91:b1:xx:xx:x9 (capab=0x1011 status=0 aid=2)
[ 1343.185498] wlo1: associated
[ 1343.218766] wlo1: Limiting TX power to 30 (30 - 0) dBm as advertised by a6:91:b1:xx:xx:x9
[ 1401.359251] wlo1: disconnect from AP a6:91:b1:xx:xx:x9 for new auth to b0:be:76:xx:xx:xd
[ 1401.383639] wlo1: authenticate with b0:be:76:xx:xx:xd
[ 1401.783175] wlo1: send auth to b0:be:76:xx:xx:xd (try 1/3)
[ 1401.786645] wlo1: b0:be:76:xx:xx:xd denied authentication (status 1)
[ 1404.817248] wlo1: authenticate with a6:91:b1:xx:xx:x9
[ 1405.233055] wlo1: send auth to a6:91:b1:xx:xx:x9 (try 1/3)
[ 1405.239779] wlo1: authenticated
[ 1405.240462] wlo1: associate with a6:91:b1:xx:xx:x9 (try 1/3)
[ 1405.243053] wlo1: RX ReassocResp from a6:91:b1:xx:xx:x9 (capab=0x1011 status=0 aid=2)
[ 1405.243429] wlo1: associated
[ 1405.296230] wlo1: Limiting TX power to 30 (30 - 0) dBm as advertised by a6:91:b1:xx:xx:x9
[ 1529.370815] wlo1: disconnect from AP a6:91:b1:xx:xx:x9 for new auth to b0:be:76:xx:xx:xd
[ 1529.391188] wlo1: authenticate with b0:be:76:xx:xx:xd
[ 1529.792847] wlo1: send auth to b0:be:76:xx:xx:xd (try 1/3)
[ 1529.898336] wlo1: send auth to b0:be:76:xx:xx:xd (try 2/3)
[ 1529.905439] wlo1: b0:be:76:xx:xx:xd denied authentication (status 1)
[ 1532.935979] wlo1: authenticate with a6:91:b1:xx:xx:x9
[ 1533.345881] wlo1: send auth to a6:91:b1:xx:xx:x9 (try 1/3)
[ 1533.349289] wlo1: authenticated
[ 1533.350231] wlo1: associate with a6:91:b1:xx:xx:x9 (try 1/3)
[ 1533.355788] wlo1: RX ReassocResp from a6:91:b1:xx:xx:x9 (capab=0x1011 status=0 aid=2)
[ 1533.356138] wlo1: associated
[ 1533.395386] wlo1: Limiting TX power to 30 (30 - 0) dBm as advertised by a6:91:b1:xx:xx:x9
[ 2152.456287] wlo1: disconnect from AP a6:91:b1:xx:xx:x9 for new auth to b0:be:76:xx:xx:xd
[ 2152.468053] wlo1: authenticate with b0:be:76:xx:xx:xd
[ 2152.872575] wlo1: send auth to b0:be:76:xx:xx:xd (try 1/3)
[ 2152.881019] wlo1: b0:be:76:xx:xx:xd denied authentication (status 1)
[ 2155.911887] wlo1: authenticate with b0:be:76:xx:xx:xe
[ 2155.911930] wlo1: 80 MHz not supported, disabling VHT
[ 2156.309570] wlo1: send auth to b0:be:76:xx:xx:xe (try 1/3)
[ 2156.315323] wlo1: b0:be:76:xx:xx:xe denied authentication (status 1)
[ 2159.751757] wlo1: authenticate with a6:91:b1:xx:xx:x9
[ 2160.091201] rtw_8822ce 0000:03:00.0: failed to do dpk calibration
[ 2160.178594] wlo1: send auth to a6:91:b1:xx:xx:x9 (try 1/3)
[ 2160.182513] wlo1: authenticated
[ 2160.183042] wlo1: associate with a6:91:b1:xx:xx:x9 (try 1/3)
[ 2160.185236] wlo1: RX ReassocResp from a6:91:b1:xx:xx:x9 (capab=0x1011 status=0 aid=2)
[ 2160.185622] wlo1: associated
[ 2160.241065] wlo1: Limiting TX power to 30 (30 - 0) dBm as advertised by a6:91:b1:xx:xx:x9
[ 2534.150716] input: Microsoft X-Box 360 pad 0 as /devices/virtual/input/input38
[ 3591.360181] wlo1: deauthenticating from a6:91:b1:xx:xx:x9 by local choice (Reason: 3=DEAUTH_LEAVING)
[ 3596.525417] wlo1: authenticate with b0:be:76:xx:xx:xe
[ 3596.525442] wlo1: 80 MHz not supported, disabling VHT
[ 3596.910193] wlo1: send auth to b0:be:76:xx:xx:xe (try 1/3)
[ 3596.964992] wlo1: authenticate with b0:be:76:xx:xx:xe
[ 3596.965065] wlo1: send auth to b0:be:76:xx:xx:xe (try 1/3)
[ 3597.007819] wlo1: authenticated
[ 3597.008644] wlo1: associate with b0:be:76:xx:xx:xe (try 1/3)
[ 3597.011088] wlo1: RX AssocResp from b0:be:76:xx:xx:xe (capab=0x431 status=0 aid=5)
[ 3597.011339] wlo1: associated
[ 3900.854467] wlo1: disconnect from AP b0:be:76:xx:xx:xe for new auth to b0:be:76:xx:xx:xd
[ 3900.877908] wlo1: authenticate with b0:be:76:xx:xx:xd
[ 3901.280654] wlo1: send auth to b0:be:76:xx:xx:xd (try 1/3)
[ 3901.288085] wlo1: b0:be:76:xx:xx:xd denied authentication (status 53)
[ 3901.656737] wlo1: authenticate with b0:be:76:xx:xx:xe
[ 3901.656766] wlo1: 80 MHz not supported, disabling VHT
[ 3901.985230] rtw_8822ce 0000:03:00.0: failed to do dpk calibration
[ 3902.022647] wlo1: send auth to b0:be:76:xx:xx:xe (try 1/3)
[ 3902.032366] wlo1: authenticated
[ 3902.033065] wlo1: associate with b0:be:76:xx:xx:xe (try 1/3)
[ 3902.042123] wlo1: RX ReassocResp from b0:be:76:xx:xx:xe (capab=0x431 status=0 aid=5)
[ 3902.042376] wlo1: associated
[ 4206.030649] wlo1: disconnect from AP b0:be:76:xx:xx:xe for new auth to b0:be:76:xx:xx:xd
[ 4206.055178] wlo1: authenticate with b0:be:76:xx:xx:xd
[ 4206.370577] rtw_8822ce 0000:03:00.0: failed to do dpk calibration
[ 4206.458968] wlo1: send auth to b0:be:76:xx:xx:xd (try 1/3)
[ 4206.466508] wlo1: b0:be:76:xx:xx:xd denied authentication (status 53)
[ 4209.500543] wlo1: authenticate with b0:be:76:xx:xx:xe
[ 4209.500568] wlo1: 80 MHz not supported, disabling VHT
[ 4209.882055] wlo1: send auth to b0:be:76:xx:xx:xe (try 1/3)
[ 4209.914164] wlo1: authenticated
[ 4209.915443] wlo1: associate with b0:be:76:xx:xx:xe (try 1/3)
[ 4209.927227] wlo1: RX ReassocResp from b0:be:76:xx:xx:xe (capab=0x431 status=0 aid=5)
[ 4209.927476] wlo1: associated
[ 4513.946059] wlo1: disconnect from AP b0:be:76:xx:xx:xe for new auth to b0:be:76:xx:xx:xd
[ 4513.965453] wlo1: authenticate with b0:be:76:xx:xx:xd
[ 4514.373306] wlo1: send auth to b0:be:76:xx:xx:xd (try 1/3)
[ 4514.379897] wlo1: b0:be:76:xx:xx:xd denied authentication (status 53)
[ 4517.404733] wlo1: authenticate with a6:91:b1:xx:xx:x9
[ 4517.819373] wlo1: send auth to a6:91:b1:xx:xx:x9 (try 1/3)
[ 4517.822738] wlo1: authenticated
[ 4517.823790] wlo1: associate with a6:91:b1:xx:xx:x9 (try 1/3)
[ 4517.826008] wlo1: RX ReassocResp from a6:91:b1:xx:xx:x9 (capab=0x1011 status=0 aid=2)
[ 4517.826370] wlo1: associated
[ 4517.829296] wlo1: Limiting TX power to 30 (30 - 0) dBm as advertised by a6:91:b1:xx:xx:x9
[ 5104.409275] wlo1: disconnect from AP a6:91:b1:xx:xx:x9 for new auth to b0:be:76:xx:xx:xd
[ 5104.432665] wlo1: authenticate with b0:be:76:xx:xx:xd
[ 5104.833578] wlo1: send auth to b0:be:76:xx:xx:xd (try 1/3)
[ 5104.840206] wlo1: b0:be:76:xx:xx:xd denied authentication (status 1)
[ 5107.868860] wlo1: authenticate with b0:be:76:xx:xx:xe
[ 5107.868880] wlo1: 80 MHz not supported, disabling VHT
[ 5108.246591] wlo1: send auth to b0:be:76:xx:xx:xe (try 1/3)
[ 5108.249970] wlo1: b0:be:76:xx:xx:xe denied authentication (status 1)
[ 5109.134583] wlo1: authenticate with a6:91:b1:xx:xx:x9
[ 5109.548603] wlo1: send auth to a6:91:b1:xx:xx:x9 (try 1/3)
[ 5109.551964] wlo1: authenticated
[ 5109.553059] wlo1: associate with a6:91:b1:xx:xx:x9 (try 1/3)
[ 5109.558489] wlo1: RX ReassocResp from a6:91:b1:xx:xx:x9 (capab=0x1011 status=0 aid=2)
[ 5109.558850] wlo1: associated
[ 5109.628471] wlo1: Limiting TX power to 30 (30 - 0) dBm as advertised by a6:91:b1:xx:xx:x9
[ 6146.438688] wlo1: disconnect from AP a6:91:b1:xx:xx:x9 for new auth to b0:be:76:xx:xx:xd
[ 6146.467213] wlo1: authenticate with b0:be:76:xx:xx:xd
[ 6146.872044] wlo1: send auth to b0:be:76:xx:xx:xd (try 1/3)
[ 6146.878687] wlo1: b0:be:76:xx:xx:xd denied authentication (status 1)
[ 6149.910555] wlo1: authenticate with a6:91:b1:xx:xx:x9
[ 6150.322088] wlo1: send auth to a6:91:b1:xx:xx:x9 (try 1/3)
[ 6150.325458] wlo1: authenticated
[ 6150.326473] wlo1: associate with a6:91:b1:xx:xx:x9 (try 1/3)
[ 6150.332014] wlo1: RX ReassocResp from a6:91:b1:xx:xx:x9 (capab=0x1011 status=0 aid=2)
[ 6150.332384] wlo1: associated
[ 6150.428886] wlo1: Limiting TX power to 30 (30 - 0) dBm as advertised by a6:91:b1:xx:xx:x9
Comment 29 Marco 2023-12-12 09:27:18 UTC
Yeah, as of today it happened again, but even with debug maxed out nothing is printed on console. At all. Adapter in wireless menu seems to still update bandwidth information and such, but no data is transmitted in or out. The only relevant message is when I forcefully disable and reenable the adapter, which makes sense. I'll try disabling ASPM again and see if something change. I can't disable ECCDA since I do not have any rtw88 debug folder in the corresponding wireless adapter inside /system/kernel/debug/, so I can't test disabling that unfortunately.

Let me know what else I can do to debug this.
Comment 30 timlee 2023-12-13 11:20:48 UTC
Hi Marco,

From the log in comment#28, your Steam Deck try to connect three SSID a6:91:b1:xx:xx:x9 , b0:be:76:xx:xx:xd, b0:be:76:xx:xx:xe  (the last two should be same AP), but have some reason to roamding.

1. Disconnect reason for  b0:be:76:xx:xx:xd b0:be:76:xx:xx:xe could be
security related, could you try to set the security to open to see if the denied authentication  disappear.
(b0:be:76:xx:xx:xd denied authentication status 53)

2. provide the all phy_info when connecting different SSID

cat /sys/kernel/debug/ieee80211/phyX"/rtw89/phy_info (you need use "root")
we want to check if system try to roam to other AP with much higher RSSI than 
 connected AP.

3. Could you provide log of the application, like wpa_supplicant or iwd for system to connect AP in bazzite  system? And if you have the sniffer log could greatly help us to analyze the issue

Thanks!
Comment 31 timlee 2023-12-13 11:27:24 UTC
(In reply to timlee from comment #30)
> Hi Marco,
> 
> From the log in comment#28, your Steam Deck try to connect three SSID
> a6:91:b1:xx:xx:x9 , b0:be:76:xx:xx:xd, b0:be:76:xx:xx:xe  (the last two
> should be same AP), but have some reason to roamding.
> 
> 1. Disconnect reason for  b0:be:76:xx:xx:xd b0:be:76:xx:xx:xe could be
> security related, could you try to set the security to open to see if the
> denied authentication  disappear.
> (b0:be:76:xx:xx:xd denied authentication status 53)
> 
> 2. provide the all phy_info when connecting different SSID
> 
> cat /sys/kernel/debug/ieee80211/phyX"/rtw89/phy_info (you need use "root")
> we want to check if system try to roam to other AP with much higher RSSI
> than 
>  connected AP.
> 
> 3. Could you provide log of the application, like wpa_supplicant or iwd for
> system to connect AP in bazzite  system? And if you have the sniffer log
> could greatly help us to analyze the issue
> 
> Thanks!

sorry, /sys/kernel/debug/ieee80211/phyX"/rtw88/phy_info
Comment 32 timlee 2023-12-13 11:27:38 UTC
(In reply to timlee from comment #30)
> Hi Marco,
> 
> From the log in comment#28, your Steam Deck try to connect three SSID
> a6:91:b1:xx:xx:x9 , b0:be:76:xx:xx:xd, b0:be:76:xx:xx:xe  (the last two
> should be same AP), but have some reason to roamding.
> 
> 1. Disconnect reason for  b0:be:76:xx:xx:xd b0:be:76:xx:xx:xe could be
> security related, could you try to set the security to open to see if the
> denied authentication  disappear.
> (b0:be:76:xx:xx:xd denied authentication status 53)
> 
> 2. provide the all phy_info when connecting different SSID
> 
> cat /sys/kernel/debug/ieee80211/phyX"/rtw89/phy_info (you need use "root")
> we want to check if system try to roam to other AP with much higher RSSI
> than 
>  connected AP.
> 
> 3. Could you provide log of the application, like wpa_supplicant or iwd for
> system to connect AP in bazzite  system? And if you have the sniffer log
> could greatly help us to analyze the issue
> 
> Thanks!

sorry, /sys/kernel/debug/ieee80211/phyX"/rtw88/phy_info
Comment 33 Marco 2023-12-14 06:36:59 UTC
I have a possible hyphotesis regarding the authentication problems. I have one AP that is set as WPA2/WPA3 mixed mode, and the other only as WPA2 (since it's a stupid provider router with no support for WPA3). Does the adapter handle changing WPA3 to WPA2 properly under the same SSID?
Comment 34 Marco 2023-12-14 13:50:31 UTC
Heh, other logs for you, Realtek. I'll mention that I didn't have any connection issues and I had the USB stack broken from another unrelated driver, but I had quite a lot of warnings in it that mention:

dic 14 13:15:30 DeckDiMarco kernel: rtw_8822ce 0000:03:00.0: failed to get tx report from firmware
dic 14 13:35:25 DeckDiMarco kernel: rtw_8822ce 0000:03:00.0: failed to get tx report from firmware
dic 14 13:36:23 DeckDiMarco kernel: rtw_8822ce 0000:03:00.0: failed to get tx report from firmware
dic 14 13:39:35 DeckDiMarco kernel: rtw_8822ce 0000:03:00.0: failed to get tx report from firmware
dic 14 13:48:01 DeckDiMarco kernel: rtw_8822ce 0000:03:00.0: failed to get tx report from firmware

And when the sleep failed:
dic 14 14:23:41 DeckDiMarco kernel: rtw_8822ce 0000:03:00.0: firmware failed to leave lps state
dic 14 14:23:43 DeckDiMarco kernel: rtw_8822ce 0000:03:00.0: firmware failed to leave lps state
dic 14 14:23:45 DeckDiMarco kernel: rtw_8822ce 0000:03:00.0: firmware failed to leave lps state
dic 14 14:23:47 DeckDiMarco kernel: rtw_8822ce 0000:03:00.0: firmware failed to leave lps state
dic 14 14:23:49 DeckDiMarco kernel: rtw_8822ce 0000:03:00.0: firmware failed to leave lps state

This was while the deck was transitioning to sleep and I might have tried to press the power button to wake up the deck, so it might be just a problem coming from the stuck kernel while entering sleep and never completing rather than an issue with the adapter itself. Still, I wanted to report it anyway, maybe there is something useful in there. Full log attached after.
Comment 35 Marco 2023-12-14 13:51:04 UTC
Created attachment 305598 [details]
Kernel dmesg
Comment 36 timlee 2023-12-15 03:38:42 UTC
Hi Marco,

About the  authentication problems, I think it is possible there's some problem for the different setting of APs with same SSID.
May you connect to AP with WPA3, but try to roam to AP with WPA2... or AP with WPA3 but send beacon without WPA2 ie, and these cases may lead to fail to roam like your log.

Another, the connect issue sometimes come from the application, like wpa_supplicant or IWD, maybe due to the version or setting. We will try to meet the setting of different security in driver, but we can't help about the setting  ^^".
So now you don't have connect issue, right?

About the log with "firmware failed to leave lps" and "failed to get tx report from firmware", do you mean only happen when "the USB stack broken from another unrelated driver"? What frequency of AP do you connect at that time? 2G or 5G?
If 2G, could you help to connect to 5G to check it again?
Another, the log will happen if you work without "the USB stack broken from another unrelated driver"?

Thanks!
Comment 37 Marco 2023-12-15 07:42:41 UTC
Yeah, I've matched my AP by forcing them to only WPA2 for now. Since I'm mainly in the area with support on WPA3 I hoped to be able to keep additional security on, but for now it's fine to just use WPA2. Sad that GPON routers are so much expensive, otherwise I would already swapped with a OpenWRT supported router. I'll see if that's the issue.

Second part, yeah, if you see the logs USB was broken from a DRD issue so I'm not that sure how much it's useful to debug.

Regarding the stalls I haven't been able to get it again, but even with full debugs log enabled it just doesn't print anything useful. It just stop working until a disable enable cycle. I'm not sure what I can additionally do to debug this.
Comment 38 timlee 2023-12-19 03:22:25 UTC
Hi Macro,

Thanks for your kindly help. If any issue or log, please provide to us.
Comment 39 timlee 2023-12-19 07:23:34 UTC
Hi Macro,

Do you disable aspm now? Could you open it again and check if the stall happen?

Thanks!
Comment 40 timlee 2023-12-19 07:23:44 UTC
Hi Macro,

Do you disable aspm now? Could you open it again and check if the stall happen?

Thanks!
Comment 41 Marco 2023-12-19 08:39:18 UTC
(In reply to timlee from comment #40)
> Hi Macro,
> 
> Do you disable aspm now? Could you open it again and check if the stall
> happen?
> 
> Thanks!

As of #26, yes, I also can reproduce them with ASPM disabled. Problem is that nothing is printed in dmesg, even with the debug mask set as 0xffffffff. Anything other to try?

Note You need to log in before you can comment on or make changes to this bug.