Bug 208535
Summary: | S3 Mode Bug MSR - unchecked MSR access error | ||
---|---|---|---|
Product: | Drivers | Reporter: | sander44 (ionut_n2001) |
Component: | Platform | Assignee: | drivers_platform (drivers_platform) |
Status: | RESOLVED CODE_FIX | ||
Severity: | normal | CC: | ashok.raj, bp, promarbler14, steffen, tony.luck |
Priority: | P1 | ||
Hardware: | x86-64 | ||
OS: | Linux | ||
Kernel Version: | 5.7.8 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: | final fix |
Description
sander44
2020-07-13 14:50:08 UTC
Can you add to your command line: "debug ignore_loglevel log_buf_len=16M no_console_suspend systemd.log_target=null" boot with it and send full dmesg from that boot? Thx. Hello. I am using 5.9.2 the second day and after resuming the first time i saw this: Nov 3 16:03:26 kent kernel: smpboot: Scheduler frequency invariance went wobbly, disabling! Nov 3 16:03:26 kent kernel: Enabling non-boot CPUs ... Nov 3 16:03:26 kent kernel: x86: Booting SMP configuration: Nov 3 16:03:26 kent kernel: smpboot: Booting Node 0 Processor 1 APIC 0x2 Nov 3 16:03:26 kent kernel: unchecked MSR access error: RDMSR from 0x123 at rIP: 0xffffffffae09c56c (update_srbds_msr+0x2c/0x60) Nov 3 16:03:26 kent kernel: Call Trace: Nov 3 16:03:26 kent kernel: smp_store_cpu_info+0x40/0x60 Nov 3 16:03:26 kent kernel: start_secondary+0x36/0x100 Nov 3 16:03:26 kent kernel: secondary_startup_64+0xb6/0xc0 Nov 3 16:03:26 kent kernel: unchecked MSR access error: WRMSR to 0x123 (tried to write 0x0000000000000000) at rIP: 0xffffffffae09c58f (update_srbds_msr+0x4f/0x60) Nov 3 16:03:26 kent kernel: Call Trace: Nov 3 16:03:26 kent kernel: smp_store_cpu_info+0x40/0x60 Nov 3 16:03:26 kent kernel: start_secondary+0x36/0x100 Nov 3 16:03:26 kent kernel: secondary_startup_64+0xb6/0xc0 Nov 3 16:03:26 kent kernel: microcode: sig=0x806ea, pf=0x80, revision=0x96 Nov 3 16:03:26 kent kernel: CPU1 is up Nov 3 16:03:26 kent kernel: smpboot: Booting Node 0 Processor 2 APIC 0x4 Intel(R) Core(TM) i5-8250U CPU @ 1.60GHz, Stepping 10. 00:00.0 Host bridge: Intel Corporation Xeon E3-1200 v6/7th Gen Core Processor Host Bridge/DRAM Registers (rev 08) No systemd here. Not booting before next Monday (i hope; just in case). I have debug on the command line. Thank you. Yeah, known issue. We're working on it. I'll ping you to test a patch once we have a one. Thx. (In reply to Steffen Nurpmeso from comment #2) > Intel(R) Core(TM) i5-8250U CPU @ 1.60GHz, Stepping 10. Do you have an SGX option in the BIOS? If so, try turning it off and see if the warning disappears. Thx. Lenovo Ideapad 530S-14IKB. Seems to have support for SGX, but i have not looked in BIOS since i bought it in April 2019 :) I need the password .. i will look and try it out tomorrow, if there is a setting. I grep(1)ed the kernel and found tools/power/x86/turbostat/turbostat and that says $ ./turbostat 2>&1 |grep -i sgx CPUID(7): SGX So it seems to be enabled. (This is kernel 4.19 again because RTW88 is too broken, i cannot go 5.9. But will try BIOS and 5.9.4 tomorrow.) Yes, with SGX enabled in the BIOS there is no access error with 5.9.5. So - i keep it on. Thanks. On kernel 5.8.17, with the exact same CPU and host bridge as the person above, but on a Dell board. My BIOS also has the option to configure SGX, and I discovered 3 options: enabled, disabled, and "software controlled". Mine was set to "software controlled", and I get the same fault as above upon a suspend/resume. Setting the BIOS option to "disabled" resulted in the error persisting. With the BIOS option for SGX set to "enabled" (at 128MB for the enclave size), the fault is no longer present. Since the warning doesn't seem to affect normal operation, I set it back to "software controlled" so that I can discover when a fix is released. It was a kernel bug, patch posted here. Can someone check and see if this works? https://lore.kernel.org/lkml/20201110135247.422-1-yu.c.chen@intel.com/T/#u Disable SGX in BIOS again and retry? I will build the new 5.9 and report tomorrow (BIOS pass etc.), ok? I currently have SGX not enabled in the BIOS. With Linux 5.9.8 and the patch, I don't see the MSR access warning. I also don't see the "microcode: sig=0x806ea, pf=0x80, revision=0xb4/0xd6" or any other microcode messages on resume, either. grep microcode /proc/cpuinfo shows 0xd6, so I assume the reason for no message is because the microcode is already updated to the latest revision. Me too, no more such MSR message. Thanks! Created attachment 293691 [details]
final fix
Here's a more complete fix if anyone wants to give it a run. This should work regardless of SGX setting in the BIOS.
Thx.
Hello! Me rather not unless absolutely necessary (on Saturday then please ;). I am not using kernel 5.9 because it does not work for me (RTW88: lots of crashes, issue 209263), still staying on 4.19 but for experiments. (And have lots of work on hold.) Thanks for fixing! Well, I'm no wireless drivers guy by any stretch of the imagination but a couple of things that spring up to me which you could try, from looking at this: * Remove that CONFIG_EXTRA_FIRMWARE option in your .config and let the driver request its own firmware. It has a bunch of fw images it might request and you could be missing some. So make sure you have them all installed and let the driver load them. That's from looking at that warning "purge skb(s) not reported by firmware". * drop that proprietary zfs module. It might be innocent but it might be corrupting stuff so remove it completely and build a stock, upstream kernel without any out-of-tree crap. If you then can reproduce it with the latest upstream kernel - that's 5.10-rc4 atm, send a proper bug report to the driver maintainers: $ ./scripts/get_maintainer.pl -f drivers/net/wireless/realtek/rtw88/ Yan-Hsuan Chuang <yhchuang@realtek.com> (maintainer:REALTEK WIRELESS DRIVER (rtw88)) Kalle Valo <kvalo@codeaurora.org> (maintainer:NETWORKING DRIVERS (WIRELESS)) "David S. Miller" <davem@davemloft.net> (maintainer:NETWORKING DRIVERS) Jakub Kicinski <kuba@kernel.org> (maintainer:NETWORKING DRIVERS) linux-wireless@vger.kernel.org (open list:REALTEK WIRELESS DRIVER (rtw88)) netdev@vger.kernel.org (open list:NETWORKING DRIVERS) linux-kernel@vger.kernel.org (open list) Anyway, just a couple of ideas. HTH. Hello, very kind, thanks for the hints. :) Ok i will try out your patch with the RC kernel .. on saturday, ok? I lag behind my daily work it is not true, and i will likely have to look around new configuration items, too :( Regarding wireless issue: I have no ZFS module, i (still - you are on #btrfs IRC?) use one big BTRFS partition here now. (I am interested though, since FreeBSD now also uses OpenZFS and i am interested in replacing encfs for specific directories -> ZFS encryption, and using zvol:/umes for VMs.) Other than that BTRFS works just great but one "corrupt" VM file i had, i now use a different cache strategy :-) I first ran ArchLinux 5.8.3 and included the firmware that gets loaded, according to dmesg. It seems this driver needs two firmwares, one for operation and one for suspend/resume, but for 8822BE only one exists. That RTW88 driver was broken for 5.8, now 5.9, i will see on Saturday, maybe 5.10 does it. (I do not use an initramfs.) Ciao and good night! (In reply to Steffen Nurpmeso from comment #16) > I have no ZFS module Bah, sorry about that. That's another guy. I thought you were doing a monologue on that bug. :-) Forgot to update, whoops. With kernel 5.9.8 and the latest git patch, there are no issues on resume under any BIOS SGX configuration. Thanks for testing and reporting back! Hello! I have no MSR line here with 5.10.0-rc4 (27bba9c532a8d21050b94224ffd310ad0058c353 indeed) and your final patch. (SGX disabled in BIOS.) Also not after resuming? Thank you! :) Ciao, and a nice weekend i wish! Cool, thanks for testing. I think we're done here. Thanks to all involved folks for the help! |