Bug 206411 - rtwpci: driver crashes after upgrading to 5.5
Summary: rtwpci: driver crashes after upgrading to 5.5
Status: RESOLVED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: network-wireless (show other bugs)
Hardware: x86-64 Linux
: P1 normal
Assignee: drivers_network-wireless@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-02-04 17:35 UTC by Iulian Costan
Modified: 2020-09-14 05:19 UTC (History)
9 users (show)

See Also:
Kernel Version: 5.5
Tree: Mainline
Regression: No


Attachments
rtwpci driver crash (155.79 KB, text/plain)
2020-02-04 17:35 UTC, Iulian Costan
Details
A shell helps building and installing the patched version of rtw88 on Arch Linux (1.37 KB, application/x-shellscript)
2020-06-05 03:55 UTC, i
Details

Description Iulian Costan 2020-02-04 17:35:22 UTC
Created attachment 287111 [details]
rtwpci driver crash

After upgrading kernel from 5.4 to 5.5 Realtek rtwpci wireless driver started to crash.


Please see the log snippet below and attached logs:


Jan 28 08:51:24 drakarys kernel: ------------[ cut here ]------------
Jan 28 08:51:24 drakarys kernel: failed to read DBI register, addr=0x0719
Jan 28 08:51:24 drakarys kernel: WARNING: CPU: 7 PID: 1692 at drivers/net/wireless/realtek/rtw88/pci.c:1104 rtw_dbi_read8.constprop.0+0xa0/0xb0 [rtwpci]
Jan 28 08:51:24 drakarys kernel: CPU: 7 PID: 1692 Comm: kworker/u24:8 Not tainted 5.5.0-arch1-1 #1
Jan 28 08:51:24 drakarys kernel: Hardware name: LENOVO 81LF/LNVNB161216, BIOS 9VCN12WW 08/06/2018
Jan 28 08:51:24 drakarys kernel: Workqueue: phy0 ieee80211_beacon_connection_loss_work [mac80211]
Jan 28 08:51:24 drakarys kernel: RIP: 0010:rtw_dbi_read8.constprop.0+0xa0/0xb0 [rtwpci]
Jan 28 08:51:24 drakarys kernel: Code: be ed 03 00 00 48 8b 40 40 e8 0c 35 93 d5 5b 5d 41 88 04 24 31 c0 41 5c c3 be 19 07 00 00 48 c7 c7 b8 02 ad c0 e8 02 2e dc d4 <0f> 0b b8 fb ff ff ff 5b 5d 41 5c c3 0f 1f 40 00 0f 1f 44 00 00 55
Jan 28 08:51:24 drakarys kernel: RSP: 0018:ffff93c9c28f7d60 EFLAGS: 00010286
Jan 28 08:51:24 drakarys kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
Jan 28 08:51:24 drakarys kernel: RDX: 0000000000000001 RSI: 0000000000000096 RDI: 00000000ffffffff
Jan 28 08:51:24 drakarys kernel: RBP: ffff8a64bd309e80 R08: 0000000000000491 R09: 0000000000000001
Jan 28 08:51:24 drakarys kernel: R10: 0000000000000000 R11: 0000000000000001 R12: ffff93c9c28f7d87
Jan 28 08:51:24 drakarys kernel: R13: 0000000000000010 R14: ffff8a64bd30db98 R15: 0ffff8a64c716480
Jan 28 08:51:24 drakarys kernel: FS:  0000000000000000(0000) GS:ffff8a64ce3c0000(0000) knlGS:0000000000000000
Jan 28 08:51:24 drakarys kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 28 08:51:24 drakarys kernel: CR2: 00007ffdeb857e28 CR3: 000000041e20a001 CR4: 00000000003606e0
Jan 28 08:51:24 drakarys kernel: Call Trace:
Jan 28 08:51:24 drakarys kernel:  rtw_pci_link_ps+0x4f/0x90 [rtwpci]
Jan 28 08:51:24 drakarys kernel:  rtw_leave_lps+0x7f/0x140 [rtw88]
Jan 28 08:51:24 drakarys kernel:  rtw_ops_config+0x9d/0xa0 [rtw88]
Jan 28 08:51:24 drakarys kernel:  ieee80211_hw_config+0x7f/0x3c0 [mac80211]
Jan 28 08:51:24 drakarys kernel:  ieee80211_recalc_ps.part.0+0xf1/0x160 [mac80211]
Jan 28 08:51:24 drakarys kernel:  ieee80211_mgd_probe_ap.part.0+0xaf/0x140 [mac80211]
Jan 28 08:51:24 drakarys kernel:  process_one_work+0x1e2/0x3b0
Jan 28 08:51:24 drakarys kernel:  worker_thread+0x4a/0x3d0
Jan 28 08:51:24 drakarys kernel:  kthread+0xfb/0x130
Jan 28 08:51:24 drakarys kernel:  ? process_one_work+0x3b0/0x3b0
Jan 28 08:51:24 drakarys kernel:  ? kthread_park+0x90/0x90
Jan 28 08:51:24 drakarys kernel:  ret_from_fork+0x35/0x40
Jan 28 08:51:24 drakarys kernel: ---[ end trace d1c988c9c1185d9e ]---
Jan 28 08:51:24 drakarys kernel: rtw_pci 0000:07:00.0: failed to read ASPM, ret=-5
Jan 28 08:51:24 drakarys kernel: rtw_pci 0000:07:00.0: failed to send h2c command
Jan 28 08:51:24 drakarys kernel: rtw_pci 0000:07:00.0: firmware failed to restore hardware setting
Jan 28 08:51:24 drakarys kernel: rtw_pci 0000:07:00.0: failed to send h2c command
Jan 28 08:51:24 drakarys kernel: rtw_pci 0000:07:00.0: failed to send h2c command
Jan 28 08:51:24 drakarys kernel: audit: type=1106 audit(1580194284.911:109): pid=2681 uid=0 auid=1000 ses=2 msg='op=PAM:session_close grantors=pam_limits,pam_unix,pam_permit acct="root" exe="/usr/bin/sudo" hostname=? addr=? terminal=/dev/pts/0 res=success'
Jan 28 08:51:24 drakarys kernel: audit: type=1104 audit(1580194284.911:110): pid=2681 uid=0 auid=1000 ses=2 msg='op=PAM:setcred grantors=pam_unix,pam_permit,pam_env acct="root" exe="/usr/bin/sudo" hostname=? addr=? terminal=/dev/pts/0 res=success'
Jan 28 08:51:25 drakarys kernel: rtw_pci 0000:07:00.0: failed to send h2c command
Jan 28 08:51:25 drakarys kernel: rtw_pci 0000:07:00.0: sta 2c:fd:a1:c3:50:28 with macid 0 left
Jan 28 08:51:25 drakarys kernel: rtw_pci 0000:07:00.0: failed to send h2c command
Jan 28 08:51:25 drakarys kernel: rtw_pci 0000:07:00.0: failed to send h2c command
Jan 28 08:51:27 drakarys kernel: ------------[ cut here ]------------

With latest 5.5.1 kernel I am only getting the following errors, no driver crash anymore but wireless driver still does not work.

Feb 04 17:57:51 drakarys kernel: rtw_pci 0000:07:00.0: mac power on failed
Feb 04 17:57:51 drakarys kernel: rtw_pci 0000:07:00.0: failed to power on mac
Feb 04 17:57:54 drakarys kernel: rtw_pci 0000:07:00.0: mac power on failed
Feb 04 17:57:54 drakarys kernel: rtw_pci 0000:07:00.0: failed to power on mac
Comment 1 i 2020-02-07 02:57:14 UTC
Also have this problem with the same error logs of "mac power on failed" and "failed to power on mac". The problem sometimes disappears after a restart to Windows and restart back to Linux.
Comment 2 Dimitris 2020-03-22 11:10:25 UTC
I'm facing a similar issue; the rtwpci driver is constantly using 10-20% of CPU and as such my HP Envy x360 13-ag0011nv is thermal throttling. I'm running 5.4.26-1-lts kernel on Arch Linux. I'm having this issue with all kernels, to the latest version in Arch repos.
Comment 3 Trevor Campbell 2020-04-06 06:09:57 UTC
I'm also getting this problem on Fedora 31 with kernel 5.5, more specifically kernel-core-5.5.13-200.fc31.

This problem seems to have been introduced with this commit:
https://github.com/torvalds/linux/commit/d2e2c47e65af7310ad7d40ebf4cbb1d898719ec2
in the 5.5 kernel, but my 'c' coding skills are pretty limited, so may be something else closely related.

I have no trouble with this driver in the 5.4 kernel.
Comment 4 Trevor Campbell 2020-04-29 22:42:45 UTC
Why was working driver replaced by this if those who did it are not going to try and get it working.

How can I go back to the old driver which actually works? Very frustrating.
Comment 5 Yen-Hsuan Chuang 2020-05-07 02:26:25 UTC
Hey, the patch enables PCI CLKREQ to reduce power consumption under power save state. But it seems like that mech has some inter-op issues with some platform.

Maybe we should add a module parameter or something to disable it after the driver is built. But as far as I know, the maintainers aren't going to like the module parameters, it makes the maintainence more difficult (they need to take care of them in configurations files or something). So, if there is a better way to do it please give me a hint. Or if there're too many people suffering from it, I can try to use the evil module parameter again, but it's when there's no other way we can go.

Unfortunately, if you want to get the code working, you can just modify the code manually. Either revert the patch or return upon the CLKREQ functions will do. For example:


diff --git a/pci.c b/pci.c
index 9f5edb8e..d9469ebf 100644
--- a/pci.c
+++ b/pci.c
@@ -1199,6 +1199,8 @@ static void rtw_pci_clkreq_set(struct rtw_dev *rtwdev, bool enable)
        u8 value;
        int ret;

+       return;
+
        ret = rtw_dbi_read8(rtwdev, RTK_PCIE_LINK_CFG, &value);
        if (ret) {
                rtw_err(rtwdev, "failed to read CLKREQ_L1, ret=%d", ret);
@@ -1218,6 +1220,8 @@ static void rtw_pci_aspm_set(struct rtw_dev *rtwdev, bool enable)
        u8 value;
        int ret;

+       return;
+
        ret = rtw_dbi_read8(rtwdev, RTK_PCIE_LINK_CFG, &value);
        if (ret) {
                rtw_err(rtwdev, "failed to read ASPM, ret=%d", ret);
Comment 6 i 2020-05-07 14:50:43 UTC
Tested the patch on Comment 5 on 5.6.10 and it works. Thanks, Yen-Hsuan!


Notice that you may need to restore the wireless hardware to "a normal state" (say, it can be used by linux <5.5. A simple approach is to connect to a wireless network via Windows.) before using the patched rtw88 module.
Comment 7 Trevor Campbell 2020-06-05 03:45:00 UTC
Hi Yen-Hsuan,

Thanks for looking into this. I would be happy to test a patched kernel, but I'm afraid patching and building kernels is a bit beyond my skill set and abilities.
Comment 8 i 2020-06-05 03:55:17 UTC
Created attachment 289515 [details]
A shell helps building and installing the patched version of rtw88 on Arch Linux

Hi Trevor,

I've uploaded a shell script, modified from the PKGBUILD of Arch Linux's linux package, tested on Arch Linux, which might be helpful for you as a reference.


Hi Yen-Hsuan,

I've tested this patch and find it seems to be not always working. What I did was:

1. Boot from Windows and connect to the Internet
2. Boot again from patched 5.6.10, it works
3. Boot again from 5.4.43 (LTS, which doesn't have this problem), it works
4. Boot again from patched 5.6.10, it doesn't work
Comment 9 Yen-Hsuan Chuang 2020-06-16 02:16:27 UTC
Some platforms don't disconnect to the power when "shutdown", and if so, the power sequence may be different. If you manually remove the battery or the wifi card, and boot again, that should work.
Comment 10 Hongjia Cao 2020-08-24 07:48:37 UTC
Please give us a runtime config method to revert to the former behavior. I encounter the same problem running debian testing on my ASUS laptop. The kernel upgrades several times a year with debian testing. Compiling the kernel each time it upgrades is inconvenient. Thanks. 

(In reply to Yen-Hsuan Chuang from comment #5)
> Hey, the patch enables PCI CLKREQ to reduce power consumption under power
> save state. But it seems like that mech has some inter-op issues with some
> platform.
> 
> Maybe we should add a module parameter or something to disable it after the
> driver is built. But as far as I know, the maintainers aren't going to like
> the module parameters, it makes the maintainence more difficult (they need
> to take care of them in configurations files or something). So, if there is
> a better way to do it please give me a hint. Or if there're too many people
> suffering from it, I can try to use the evil module parameter again, but
> it's when there's no other way we can go.
> 
> Unfortunately, if you want to get the code working, you can just modify the
> code manually. Either revert the patch or return upon the CLKREQ functions
> will do. For example:
> 
> 
> diff --git a/pci.c b/pci.c
> index 9f5edb8e..d9469ebf 100644
> --- a/pci.c
> +++ b/pci.c
> @@ -1199,6 +1199,8 @@ static void rtw_pci_clkreq_set(struct rtw_dev *rtwdev,
> bool enable)
>         u8 value;
>         int ret;
> 
> +       return;
> +
>         ret = rtw_dbi_read8(rtwdev, RTK_PCIE_LINK_CFG, &value);
>         if (ret) {
>                 rtw_err(rtwdev, "failed to read CLKREQ_L1, ret=%d", ret);
> @@ -1218,6 +1220,8 @@ static void rtw_pci_aspm_set(struct rtw_dev *rtwdev,
> bool enable)
>         u8 value;
>         int ret;
> 
> +       return;
> +
>         ret = rtw_dbi_read8(rtwdev, RTK_PCIE_LINK_CFG, &value);
>         if (ret) {
>                 rtw_err(rtwdev, "failed to read ASPM, ret=%d", ret);
Comment 11 i 2020-08-24 09:56:00 UTC
> Please give us a runtime config method to revert to the former behavior. I
> encounter the same problem running debian testing on my ASUS laptop. The
> kernel upgrades several times a year with debian testing. Compiling the
> kernel each time it upgrades is inconvenient. Thanks. 

On linux kernel v5.8.3+ (or say, it should contain this commit: [1]), add this setting to a file in /etc/modprobe.d:

options rtw88_pci disable_aspm=1

Then restart. Note that it might only work if your wireless has not been affected with the change. If it's already affected, booting to Windows and connect to a WiFi network usually restores the state of your wireless device.

If you can't have kernel 5.8.3+ due to any reason, use a linux LTS version.

1. https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/drivers/net/wireless/realtek/rtw88/pci.c?h=v5.8.3&id=b8e8613492b4b3489dc382a4d1d848af7b2d6a5f
Comment 12 Trevor Campbell 2020-09-03 07:25:46 UTC
Installed Kernel 5.8.4, 5.8.4-200.fc32.x86_64 to be exact, from standard Fedora repo and added "options rtw88_pci disable_aspm=1" to /etc/modprobe.d/rtw88_pci.conf.

Everything seems to be working well. Hve been ruynning now for 2 days.

Thanks to all involved.
Comment 13 Iulian Costan 2020-09-12 07:06:48 UTC
I confirm that 5.8.* kernel on Arch Linux works like a charm.

Thank you!

P.S. I am going to mark the defect as resolved.
Comment 14 Yen-Hsuan Chuang 2020-09-14 05:19:34 UTC
I think we should find a way to add a quirk list for the lenovo (or whatever machine that cannot use ASPM) series.

Can anyone help on this ?

Note You need to log in before you can comment on or make changes to this bug.