Bug 217016
Summary: | rtw88 regression bisected: resume after suspend breaks, reboots instead | ||
---|---|---|---|
Product: | Drivers | Reporter: | Paul Gover (pmw.gover) |
Component: | network-wireless | Assignee: | drivers_network-wireless (drivers_network-wireless) |
Status: | RESOLVED CODE_FIX | ||
Severity: | normal | CC: | gary.chang, mario.limonciello, pkshih |
Priority: | P1 | ||
Hardware: | AMD | ||
OS: | Linux | ||
Kernel Version: | 6.1.1 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Attachments: |
git bisect log
Draft fix v1 |
Please send this report as an email to: gary.chang AT realtek.com s.hauer AT pengutronix.de None of them are subscribed to kernel's bugzilla. I've tried kernel 6.2-rc7; the bug remains. Kernel config for the commit identified by bisections can be found here: https://www.dropbox.com/s/nq95rwze856m39j/bisected.config.gz?dl=0 Boot log with said config and kernel: https://www.dropbox.com/s/gpudjk0wnm9ufiw/bisected.boot.log.gz?dl=0 Both gzipped. Doh! The above boot log was for the bisect good kernel! The following link is for a boot log with the bisect bad kernel: https://www.dropbox.com/s/wbgltknrfyzb2ql/bisected.bad.log.gz?dl=0 My apologies. Please email the developers. (In reply to Artem S. Tashkinov from comment #4) > Please email the developers. Did so yesterday. Just putting the links here so people can find them. I think I can reproduce by below steps: 1. ifconfig wlan0 up 2. iw wlan0 connect AP 3. use GUI to suspend 4. press space to resume 5. ifconfig wlan0 up 6. iw wlan0 connect AP <--- system stall If I revert "wifi: rtw88: add flag check before enter or leave IPS", it will not stall in step 6. I also try 8822CE, but it works well with/without the patch. I think that is why we don't find the issue before submitting. Could you help to check if this symptom is identical to yours? My system uses wpa_supplicant, dhcpcd and acpi. I'm not sure where "ifconfig" comes in that stack. I tried stopping dhcpcd and then suspending, but got the same hardware error and reboot as described before. However ... I stopped wpa_supplicant and then suspended, and the system then resumed OK with wpa_supplicant still stopped. When I restarted wpa_supplicant, the system crashed with the same hardware error and rebooted. I think that says the symptoms are identical. This was on kernel 6.1.11, which is what I currently use; I have others available if that's of use. Thanks for the aditional information. I think symptoms are identical as well. I will prepare a patch soon. Please give it a try. Created attachment 303723 [details]
Draft fix v1
I have posted a draft patch (attachment 303723 [details]). Please give it a try.
That works for me. I've tested on 6.2.0-rc7 compiled with both gcc and clang/LTO and 6.1.11 (which is what I'd normally use). I also tested suspending while on mains power and resuming on batteries. All good! Thanks for the speedy resolution. I see there's a new 6.1.12 stable kernel. I've just tested that with the patch. The patch works OK there too. Thanks for the testing. I will think more about this fix, and then submit the patch soon. I have sent a patch to upstream: https://lore.kernel.org/linux-wireless/20230216053633.20366-1-pkshih@realtek.com/T/#u This patch content is identical to draft version I posted here. Could you please post a Tested-by tag there? Thank you. 4a267bc5ea8f1 ("wifi: rtw88: use RTW_FLAG_POWERON flag to prevent to power on/off twice") |
Created attachment 303708 [details] git bisect log Suspend/Resume was working OK on kernel 6.0.13, broken ever since 6.1.1 (I've not tried kernels between those, except in the bisect) All subsequent 6,1 kernels exhibit the same behaviour, viz: Suspend works OK, but on Resume, there's a flicker, and then it reboots. Sometimes the screen gets restored to its contents at the time of suspend. but less than a second later, it starts rebooting. To reproduce, simply boot, suspend, and resume. Subsequent dmesg include the following "Emergency" output: x86/pm: family 0x15 cpu detected, MSR saving is needed during suspending. mce: [Hardware Error]: Machine check events logged [Hardware Error]: System Fatal error. [Hardware Error]: CPU:0 (15:70:0) MC4_STATUS[Over|UE|-|AddrV|PCC|-]: 0xf600000000070f0f [Hardware Error]: Error Addr: 0x00000000f10011d4 [Hardware Error]: MC4 Error (node 2): Watchdog timeout due to lack of progress. [Hardware Error]: cache level: L3/GEN, mem/io: GEN, mem-tx: GEN, part-proc: GEN (timed out) I'll send emails as per "Reporting issues" in the kernel documentation I'll attach dmesg and configs later