Bug 208535 - S3 Mode Bug MSR - unchecked MSR access error
Summary: S3 Mode Bug MSR - unchecked MSR access error
Status: RESOLVED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: Platform (show other bugs)
Hardware: x86-64 Linux
: P1 normal
Assignee: drivers_platform@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-07-13 14:50 UTC by sander44
Modified: 2020-11-22 11:17 UTC (History)
5 users (show)

See Also:
Kernel Version: 5.7.8
Subsystem:
Regression: No
Bisected commit-id:


Attachments
final fix (6.12 KB, application/mbox)
2020-11-16 14:09 UTC, Borislav Petkov
Details

Description sander44 2020-07-13 14:50:08 UTC
Hi Kernel Team,

With S3, observe this issue:
...
[ 1138.726807] unchecked MSR access error: RDMSR from 0x123 at rIP: 0xffffffffa62726aa (native_read_msr+0xa/0x30)
[ 1138.726809] Call Trace:
[ 1138.726813]  update_srbds_msr+0x38/0x80
[ 1138.726815]  identify_secondary_cpu+0x76/0xa0
[ 1138.726816]  smp_store_cpu_info+0x49/0x60
[ 1138.726817]  start_secondary+0x62/0x1c0
[ 1138.726820]  secondary_startup_64+0xa4/0xb0
[ 1138.726823] unchecked MSR access error: WRMSR to 0x123 (tried to write 0x0000000000000000) at rIP: 0xffffffffa62728a8 (native_write_msr+0x8/0x30)
[ 1138.726824] Call Trace:
[ 1138.726825]  update_srbds_msr+0x61/0x80
[ 1138.726826]  identify_secondary_cpu+0x76/0xa0
[ 1138.726827]  smp_store_cpu_info+0x49/0x60
[ 1138.726829]  start_secondary+0x62/0x1c0
[ 1138.726830]  secondary_startup_64+0xa4/0xb0

uname -a
Linux os1 5.7.8-vanilla #1 SMP Mon Jul 13 17:02:58 EEST 2020 x86_64 x86_64 x86_64 GNU/Linux

cat /proc/cmdline 
BOOT_IMAGE=/boot/vmlinuz-5.7.8-vanilla root=UUID=12a96ddf-02b2-4621-ae62-6236b9471e9d ro quiet splash intel_iommu=on drm.debug=0 i915.enable_gvt=1 vt.handoff=7


dmesg
[ 1138.455632] PM: suspend entry (deep)
[ 1138.461947] Filesystems sync: 0.006 seconds
[ 1138.468016] Freezing user space processes ... (elapsed 0.001 seconds) done.
[ 1138.469856] OOM killer disabled.
[ 1138.469856] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
[ 1138.471120] printk: Suspending console(s) (use no_console_suspend to debug)
[ 1138.471641] e1000e: EEE TX LPI TIMER: 00000011
[ 1138.487907] sd 0:0:0:0: [sda] Synchronizing SCSI cache
[ 1138.488372] sd 0:0:0:0: [sda] Stopping disk
[ 1138.640578] ACPI: EC: interrupt blocked
[ 1138.700667] ACPI: Preparing to enter system sleep state S3
[ 1138.701921] ACPI: EC: event blocked
[ 1138.701922] ACPI: EC: EC stopped
[ 1138.701923] PM: Saving platform NVS memory
[ 1138.701977] Disabling non-boot CPUs ...
[ 1138.702441] IRQ 128: no longer affine to CPU1
[ 1138.703480] smpboot: CPU 1 is now offline
[ 1138.708280] IRQ 130: no longer affine to CPU2
[ 1138.709291] smpboot: CPU 2 is now offline
[ 1138.714703] IRQ 16: no longer affine to CPU3
[ 1138.715731] smpboot: CPU 3 is now offline
[ 1138.722561] ACPI: Low-level resume complete
[ 1138.722641] ACPI: EC: EC started
[ 1138.722642] PM: Restoring platform NVS memory
[ 1138.724890] Enabling non-boot CPUs ...
[ 1138.724934] x86: Booting SMP configuration:
[ 1138.724935] smpboot: Booting Node 0 Processor 1 APIC 0x2
[ 1138.726807] unchecked MSR access error: RDMSR from 0x123 at rIP: 0xffffffffa62726aa (native_read_msr+0xa/0x30)
[ 1138.726809] Call Trace:
[ 1138.726813]  update_srbds_msr+0x38/0x80
[ 1138.726815]  identify_secondary_cpu+0x76/0xa0
[ 1138.726816]  smp_store_cpu_info+0x49/0x60
[ 1138.726817]  start_secondary+0x62/0x1c0
[ 1138.726820]  secondary_startup_64+0xa4/0xb0
[ 1138.726823] unchecked MSR access error: WRMSR to 0x123 (tried to write 0x0000000000000000) at rIP: 0xffffffffa62728a8 (native_write_msr+0x8/0x30)
[ 1138.726824] Call Trace:
[ 1138.726825]  update_srbds_msr+0x61/0x80
[ 1138.726826]  identify_secondary_cpu+0x76/0xa0
[ 1138.726827]  smp_store_cpu_info+0x49/0x60
[ 1138.726829]  start_secondary+0x62/0x1c0
[ 1138.726830]  secondary_startup_64+0xa4/0xb0
[ 1138.726834] microcode: sig=0x806e9, pf=0x40, revision=0x9a
[ 1138.728701] CPU1 is up
[ 1138.728728] smpboot: Booting Node 0 Processor 2 APIC 0x1
[ 1138.729164] microcode: sig=0x806e9, pf=0x40, revision=0xd6
[ 1138.729498] CPU2 is up
[ 1138.729525] smpboot: Booting Node 0 Processor 3 APIC 0x3
[ 1138.730214] CPU3 is up
[ 1138.731732] ACPI: Waking up from system sleep state S3
[ 1138.736955] ACPI: EC: interrupt unblocked
[ 1138.736966] pcieport 0000:00:1c.0: Intel SPT PCH root port ACS workaround enabled
[ 1138.755106] pcieport 0000:00:1c.7: Intel SPT PCH root port ACS workaround enabled
[ 1138.755117] pcieport 0000:00:1c.5: Intel SPT PCH root port ACS workaround enabled
[ 1138.776505] ACPI: EC: event unblocked
[ 1138.777454] sd 0:0:0:0: [sda] Starting disk
[ 1138.784986] iwlwifi 0000:3a:00.0: Applying debug destination EXTERNAL_DRAM
[ 1138.918258] iwlwifi 0000:3a:00.0: Applying debug destination EXTERNAL_DRAM
[ 1138.985074] iwlwifi 0000:3a:00.0: FW already configured (0) - re-configuring
[ 1139.090333] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 1139.091736] ata1.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded
[ 1139.091738] ata1.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
[ 1139.091740] ata1.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
[ 1139.091882] ata1.00: supports DRM functions and may not be fully accessible
[ 1139.093362] ata1.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded
[ 1139.093364] ata1.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
[ 1139.093365] ata1.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
[ 1139.093428] ata1.00: supports DRM functions and may not be fully accessible
[ 1139.093894] ata1.00: configured for UDMA/133
[ 1139.465522] acpi LNXPOWER:00: Turning OFF
[ 1139.465607] OOM killer enabled.
[ 1139.465608] Restarting tasks ... done.
[ 1139.474294] video LNXVIDEO:00: Restoring backlight state
[ 1139.475671] PM: suspend exit
[ 1139.654453] e1000e 0000:00:1f.6 eno1: NIC Link is Down
[ 1139.967686] iwlwifi 0000:3a:00.0: Applying debug destination EXTERNAL_DRAM
[ 1140.102764] iwlwifi 0000:3a:00.0: Applying debug destination EXTERNAL_DRAM
[ 1140.170443] iwlwifi 0000:3a:00.0: FW already configured (0) - re-configuring
[ 1146.451378] e1000e 0000:00:1f.6 eno1: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
[ 1146.451464] IPv6: ADDRCONF(NETDEV_CHANGE): eno1: link becomes ready
[ 1153.029145] PM: suspend entry (deep)
[ 1153.042303] Filesystems sync: 0.013 seconds
[ 1153.048736] Freezing user space processes ... (elapsed 0.003 seconds) done.
[ 1153.052583] OOM killer disabled.
[ 1153.052584] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
[ 1153.054191] printk: Suspending console(s) (use no_console_suspend to debug)
[ 1153.055468] e1000e: EEE TX LPI TIMER: 00000011
[ 1153.070965] sd 0:0:0:0: [sda] Synchronizing SCSI cache
[ 1153.071354] sd 0:0:0:0: [sda] Stopping disk
[ 1153.144503] ACPI: EC: interrupt blocked
[ 1153.203667] ACPI: Preparing to enter system sleep state S3
[ 1153.204833] ACPI: EC: event blocked
[ 1153.204834] ACPI: EC: EC stopped
[ 1153.204835] PM: Saving platform NVS memory
[ 1153.204885] Disabling non-boot CPUs ...
[ 1153.206378] smpboot: CPU 1 is now offline
[ 1153.210291] smpboot: CPU 2 is now offline
[ 1153.213530] smpboot: CPU 3 is now offline
[ 1153.217738] ACPI: Low-level resume complete
[ 1153.217818] ACPI: EC: EC started
[ 1153.217818] PM: Restoring platform NVS memory
[ 1153.220076] Enabling non-boot CPUs ...
[ 1153.220122] x86: Booting SMP configuration:
[ 1153.220122] smpboot: Booting Node 0 Processor 1 APIC 0x2
[ 1153.220492] microcode: sig=0x806e9, pf=0x40, revision=0x9a
[ 1153.222372] CPU1 is up
[ 1153.222399] smpboot: Booting Node 0 Processor 2 APIC 0x1
[ 1153.222835] microcode: sig=0x806e9, pf=0x40, revision=0xd6
[ 1153.223190] CPU2 is up
[ 1153.223219] smpboot: Booting Node 0 Processor 3 APIC 0x3
[ 1153.223945] CPU3 is up
[ 1153.225415] ACPI: Waking up from system sleep state S3
[ 1153.230269] ACPI: EC: interrupt unblocked
[ 1153.230410] pcieport 0000:00:1c.0: Intel SPT PCH root port ACS workaround enabled
[ 1153.250156] pcieport 0000:00:1c.5: Intel SPT PCH root port ACS workaround enabled
[ 1153.250558] pcieport 0000:00:1c.7: Intel SPT PCH root port ACS workaround enabled
[ 1153.271560] ACPI: EC: event unblocked
[ 1153.273705] sd 0:0:0:0: [sda] Starting disk
[ 1153.282453] iwlwifi 0000:3a:00.0: Applying debug destination EXTERNAL_DRAM
[ 1153.415295] iwlwifi 0000:3a:00.0: Applying debug destination EXTERNAL_DRAM
[ 1153.482563] iwlwifi 0000:3a:00.0: FW already configured (0) - re-configuring
[ 1153.585378] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 1153.586779] ata1.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded
[ 1153.586781] ata1.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
[ 1153.586782] ata1.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
[ 1153.586925] ata1.00: supports DRM functions and may not be fully accessible
[ 1153.588447] ata1.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded
[ 1153.588449] ata1.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
[ 1153.588450] ata1.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
[ 1153.588514] ata1.00: supports DRM functions and may not be fully accessible
[ 1153.588981] ata1.00: configured for UDMA/133
[ 1153.790472] acpi LNXPOWER:00: Turning OFF
[ 1153.790563] OOM killer enabled.
[ 1153.790564] Restarting tasks ... done.
[ 1153.802695] video LNXVIDEO:00: Restoring backlight state
[ 1153.804786] PM: suspend exit
[ 1153.966304] e1000e 0000:00:1f.6 eno1: NIC Link is Down
[ 1154.277784] iwlwifi 0000:3a:00.0: Applying debug destination EXTERNAL_DRAM
[ 1154.418024] iwlwifi 0000:3a:00.0: Applying debug destination EXTERNAL_DRAM
[ 1154.511418] iwlwifi 0000:3a:00.0: FW already configured (0) - re-configuring
[ 1161.082327] e1000e 0000:00:1f.6 eno1: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
[ 1161.082383] IPv6: ADDRCONF(NETDEV_CHANGE): eno1: link becomes ready
[ 1200.288275] PM: suspend entry (deep)
[ 1200.295681] Filesystems sync: 0.007 seconds
[ 1200.300621] Freezing user space processes ... (elapsed 0.003 seconds) done.
[ 1200.304365] OOM killer disabled.
[ 1200.304367] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
[ 1200.305970] printk: Suspending console(s) (use no_console_suspend to debug)
[ 1200.306996] e1000e: EEE TX LPI TIMER: 00000011
[ 1200.326258] sd 0:0:0:0: [sda] Synchronizing SCSI cache
[ 1200.326656] sd 0:0:0:0: [sda] Stopping disk
[ 1200.439687] ACPI: EC: interrupt blocked
[ 1200.498705] ACPI: Preparing to enter system sleep state S3
[ 1200.499867] ACPI: EC: event blocked
[ 1200.499868] ACPI: EC: EC stopped
[ 1200.499869] PM: Saving platform NVS memory
[ 1200.499920] Disabling non-boot CPUs ...
[ 1200.500411] IRQ 128: no longer affine to CPU1
[ 1200.501438] smpboot: CPU 1 is now offline
[ 1200.505577] smpboot: CPU 2 is now offline
[ 1200.508201] IRQ 125: no longer affine to CPU3
[ 1200.508212] IRQ 130: no longer affine to CPU3
[ 1200.509233] smpboot: CPU 3 is now offline
[ 1200.514029] ACPI: Low-level resume complete
[ 1200.514109] ACPI: EC: EC started
[ 1200.514109] PM: Restoring platform NVS memory
[ 1200.516356] Enabling non-boot CPUs ...
[ 1200.516401] x86: Booting SMP configuration:
[ 1200.516402] smpboot: Booting Node 0 Processor 1 APIC 0x2
[ 1200.516771] microcode: sig=0x806e9, pf=0x40, revision=0x9a
[ 1200.518649] CPU1 is up
[ 1200.518679] smpboot: Booting Node 0 Processor 2 APIC 0x1
[ 1200.519128] microcode: sig=0x806e9, pf=0x40, revision=0xd6
[ 1200.519482] CPU2 is up
[ 1200.519510] smpboot: Booting Node 0 Processor 3 APIC 0x3
[ 1200.520220] CPU3 is up
[ 1200.521756] ACPI: Waking up from system sleep state S3
[ 1200.526595] ACPI: EC: interrupt unblocked
[ 1200.526762] pcieport 0000:00:1c.0: Intel SPT PCH root port ACS workaround enabled
[ 1200.545163] pcieport 0000:00:1c.7: Intel SPT PCH root port ACS workaround enabled
[ 1200.545486] pcieport 0000:00:1c.5: Intel SPT PCH root port ACS workaround enabled
[ 1200.566531] ACPI: EC: event unblocked
[ 1200.568649] sd 0:0:0:0: [sda] Starting disk
[ 1200.577480] iwlwifi 0000:3a:00.0: Applying debug destination EXTERNAL_DRAM
[ 1200.710113] iwlwifi 0000:3a:00.0: Applying debug destination EXTERNAL_DRAM
[ 1200.777045] iwlwifi 0000:3a:00.0: FW already configured (0) - re-configuring
[ 1200.884472] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 1200.885949] ata1.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded
[ 1200.885952] ata1.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
[ 1200.885953] ata1.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
[ 1200.886096] ata1.00: supports DRM functions and may not be fully accessible
[ 1200.887649] ata1.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded
[ 1200.887652] ata1.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
[ 1200.887653] ata1.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
[ 1200.887718] ata1.00: supports DRM functions and may not be fully accessible
[ 1200.888187] ata1.00: configured for UDMA/133
[ 1201.199965] acpi LNXPOWER:00: Turning OFF
[ 1201.200056] OOM killer enabled.
[ 1201.200057] Restarting tasks ... done.
[ 1201.215410] video LNXVIDEO:00: Restoring backlight state
[ 1201.216719] PM: suspend exit
[ 1201.376483] e1000e 0000:00:1f.6 eno1: NIC Link is Down
[ 1201.677618] iwlwifi 0000:3a:00.0: Applying debug destination EXTERNAL_DRAM
[ 1201.810761] iwlwifi 0000:3a:00.0: Applying debug destination EXTERNAL_DRAM
[ 1201.878683] iwlwifi 0000:3a:00.0: FW already configured (0) - re-configuring
[ 1207.133600] e1000e 0000:00:1f.6 eno1: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
[ 1207.133677] IPv6: ADDRCONF(NETDEV_CHANGE): eno1: link becomes ready

lscpi
00:00.0 Host bridge: Intel Corporation Xeon E3-1200 v6/7th Gen Core Processor Host Bridge/DRAM Registers (rev 03)
00:02.0 VGA compatible controller: Intel Corporation Iris Plus Graphics 650 (rev 06)
00:08.0 System peripheral: Intel Corporation Xeon E3-1200 v5/v6 / E3-1500 v5 / 6th/7th/8th Gen Core Processor Gaussian Mixture Model
00:14.0 USB controller: Intel Corporation Sunrise Point-LP USB 3.0 xHCI Controller (rev 21)
00:14.2 Signal processing controller: Intel Corporation Sunrise Point-LP Thermal subsystem (rev 21)
00:16.0 Communication controller: Intel Corporation Sunrise Point-LP CSME HECI #1 (rev 21)
00:17.0 SATA controller: Intel Corporation Sunrise Point-LP SATA Controller [AHCI mode] (rev 21)
00:1c.0 PCI bridge: Intel Corporation Sunrise Point-LP PCI Express Root Port #1 (rev f1)
00:1c.5 PCI bridge: Intel Corporation Sunrise Point-LP PCI Express Root Port #6 (rev f1)
00:1c.7 PCI bridge: Intel Corporation Sunrise Point-LP PCI Express Root Port #8 (rev f1)
00:1f.0 ISA bridge: Intel Corporation Sunrise Point LPC Controller/eSPI Controller (rev 21)
00:1f.2 Memory controller: Intel Corporation Sunrise Point-LP PMC (rev 21)
00:1f.3 Audio device: Intel Corporation Sunrise Point-LP HD Audio (rev 21)
00:1f.4 SMBus: Intel Corporation Sunrise Point-LP SMBus (rev 21)
00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (4) I219-V (rev 21)
3a:00.0 Network controller: Intel Corporation Wireless 8265 / 8275 (rev 78)
3b:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS5229 PCI Express Card Reader (rev 01)

lscpu
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   39 bits physical, 48 bits virtual
CPU(s):                          4
On-line CPU(s) list:             0-3
Thread(s) per core:              2
Core(s) per socket:              2
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           142
Model name:                      Intel(R) Core(TM) i7-7567U CPU @ 3.50GHz
Stepping:                        9
CPU MHz:                         702.442
CPU max MHz:                     4000,0000
CPU min MHz:                     400,0000
BogoMIPS:                        6999.82
Virtualization:                  VT-x
L1d cache:                       64 KiB
L1i cache:                       64 KiB
L2 cache:                        512 KiB
L3 cache:                        4 MiB
NUMA node0 CPU(s):               0-3

Host OS:
Ubuntu 20.04

DMI:
Intel(R) Client Systems NUC7i7BNH/NUC7i7BNB, BIOS BNKBL357.86A.0078.2019.0425.1314 04/25/2019
Comment 1 Borislav Petkov 2020-08-16 19:50:38 UTC
Can you add to your command line:

"debug ignore_loglevel log_buf_len=16M no_console_suspend systemd.log_target=null"

boot with it and send full dmesg from that boot?

Thx.
Comment 2 Steffen Nurpmeso 2020-11-03 17:21:31 UTC
Hello.

I am using 5.9.2 the second day and after resuming the first time i saw this:
Nov  3 16:03:26 kent kernel: smpboot: Scheduler frequency invariance went wobbly, disabling!

Nov  3 16:03:26 kent kernel: Enabling non-boot CPUs ...
Nov  3 16:03:26 kent kernel: x86: Booting SMP configuration:
Nov  3 16:03:26 kent kernel: smpboot: Booting Node 0 Processor 1 APIC 0x2
Nov  3 16:03:26 kent kernel: unchecked MSR access error: RDMSR from 0x123 at rIP: 0xffffffffae09c56c (update_srbds_msr+0x2c/0x60)
Nov  3 16:03:26 kent kernel: Call Trace:
Nov  3 16:03:26 kent kernel:  smp_store_cpu_info+0x40/0x60
Nov  3 16:03:26 kent kernel:  start_secondary+0x36/0x100
Nov  3 16:03:26 kent kernel:  secondary_startup_64+0xb6/0xc0
Nov  3 16:03:26 kent kernel: unchecked MSR access error: WRMSR to 0x123 (tried to write 0x0000000000000000) at rIP: 0xffffffffae09c58f (update_srbds_msr+0x4f/0x60)
Nov  3 16:03:26 kent kernel: Call Trace:
Nov  3 16:03:26 kent kernel:  smp_store_cpu_info+0x40/0x60
Nov  3 16:03:26 kent kernel:  start_secondary+0x36/0x100
Nov  3 16:03:26 kent kernel:  secondary_startup_64+0xb6/0xc0
Nov  3 16:03:26 kent kernel: microcode: sig=0x806ea, pf=0x80, revision=0x96
Nov  3 16:03:26 kent kernel: CPU1 is up
Nov  3 16:03:26 kent kernel: smpboot: Booting Node 0 Processor 2 APIC 0x4

Intel(R) Core(TM) i5-8250U CPU @ 1.60GHz, Stepping 10.
00:00.0 Host bridge: Intel Corporation Xeon E3-1200 v6/7th Gen Core Processor Host Bridge/DRAM Registers (rev 08)
No systemd here.  Not booting before next Monday (i hope; just in case).  I have debug on the command line.
Thank you.
Comment 3 Borislav Petkov 2020-11-03 17:49:19 UTC
Yeah, known issue. We're working on it. I'll ping you to test a patch once we have a one.

Thx.
Comment 4 Borislav Petkov 2020-11-04 17:25:11 UTC
(In reply to Steffen Nurpmeso from comment #2)
> Intel(R) Core(TM) i5-8250U CPU @ 1.60GHz, Stepping 10.

Do you have an SGX option in the BIOS? If so, try turning it off and see if the warning disappears.

Thx.
Comment 5 Steffen Nurpmeso 2020-11-04 23:02:40 UTC
Lenovo Ideapad 530S-14IKB.
Seems to have support for SGX, but i have not looked in BIOS since i bought it in April 2019 :)
I need the password .. i will look and try it out tomorrow, if there is a setting.
Comment 6 Steffen Nurpmeso 2020-11-04 23:13:09 UTC
I grep(1)ed the kernel and found

  tools/power/x86/turbostat/turbostat

and that says

  $ ./turbostat 2>&1 |grep -i sgx
  CPUID(7): SGX

So it seems to be enabled.  (This is kernel 4.19 again because RTW88 is too broken, i cannot go 5.9.  But will try BIOS and 5.9.4 tomorrow.)
Comment 7 Steffen Nurpmeso 2020-11-05 16:39:48 UTC
Yes, with SGX enabled in the BIOS there is no access error with 5.9.5.
So - i keep it on.
Thanks.
Comment 8 Adric Blake 2020-11-06 04:50:33 UTC
On kernel 5.8.17, with the exact same CPU and host bridge as the person above, but on a Dell board.

My BIOS also has the option to configure SGX, and I discovered 3 options: enabled, disabled, and "software controlled". Mine was set to "software controlled", and I get the same fault as above upon a suspend/resume. Setting the BIOS option to "disabled" resulted in the error persisting. With the BIOS option for SGX set to "enabled" (at 128MB for the enclave size), the fault is no longer present.

Since the warning doesn't seem to affect normal operation, I set it back to "software controlled" so that I can discover when a fix is released.
Comment 9 Ashok Raj 2020-11-10 14:36:27 UTC
It was a kernel bug, patch posted here. Can someone check and see if this works?

https://lore.kernel.org/lkml/20201110135247.422-1-yu.c.chen@intel.com/T/#u
Comment 10 Steffen Nurpmeso 2020-11-10 17:49:52 UTC
Disable SGX in BIOS again and retry?
I will build the new 5.9 and report tomorrow (BIOS pass etc.), ok?
Comment 11 Adric Blake 2020-11-11 04:53:32 UTC
I currently have SGX not enabled in the BIOS.

With Linux 5.9.8 and the patch, I don't see the MSR access warning.

I also don't see the "microcode: sig=0x806ea, pf=0x80, revision=0xb4/0xd6" or any other microcode messages on resume, either. grep microcode /proc/cpuinfo shows 0xd6, so I assume the reason for no message is because the microcode is already updated to the latest revision.
Comment 12 Steffen Nurpmeso 2020-11-11 15:15:08 UTC
Me too, no more such MSR message.
Thanks!
Comment 13 Borislav Petkov 2020-11-16 14:09:21 UTC
Created attachment 293691 [details]
final fix

Here's a more complete fix if anyone wants to give it a run. This should work regardless of SGX setting in the BIOS.

Thx.
Comment 14 Steffen Nurpmeso 2020-11-16 21:10:50 UTC
Hello!
Me rather not unless absolutely necessary (on Saturday then please ;).
I am not using kernel 5.9 because it does not work for me (RTW88: lots of crashes, issue 209263), still staying on 4.19 but for experiments. (And have lots of work on hold.)
Thanks for fixing!
Comment 15 Borislav Petkov 2020-11-16 21:40:24 UTC
Well, I'm no wireless drivers guy by any stretch of the imagination but a couple of things that spring up to me which you could try, from looking at this:

* Remove that CONFIG_EXTRA_FIRMWARE option in your .config and let the driver request its own firmware. It has a bunch of fw images it might request and you could be missing some. So make sure you have them all installed and let the driver load them. That's from looking at that warning "purge skb(s) not reported by firmware".

* drop that proprietary zfs module. It might be innocent but it might be corrupting stuff so remove it completely and build a stock, upstream kernel without any out-of-tree crap.

If you then can reproduce it with the latest upstream kernel - that's 5.10-rc4 atm, send a proper bug report to the driver maintainers:

$ ./scripts/get_maintainer.pl -f drivers/net/wireless/realtek/rtw88/
Yan-Hsuan Chuang <yhchuang@realtek.com> (maintainer:REALTEK WIRELESS DRIVER (rtw88))
Kalle Valo <kvalo@codeaurora.org> (maintainer:NETWORKING DRIVERS (WIRELESS))
"David S. Miller" <davem@davemloft.net> (maintainer:NETWORKING DRIVERS)
Jakub Kicinski <kuba@kernel.org> (maintainer:NETWORKING DRIVERS)
linux-wireless@vger.kernel.org (open list:REALTEK WIRELESS DRIVER (rtw88))
netdev@vger.kernel.org (open list:NETWORKING DRIVERS)
linux-kernel@vger.kernel.org (open list)

Anyway, just a couple of ideas.

HTH.
Comment 16 Steffen Nurpmeso 2020-11-16 22:24:34 UTC
Hello, very kind, thanks for the hints. :)
Ok i will try out your patch with the RC kernel .. on saturday, ok? I lag behind my daily work it is not true, and i will likely have to look around new configuration items, too :(

Regarding wireless issue:

I have no ZFS module, i (still - you are on #btrfs IRC?) use one big BTRFS partition here now. (I am interested though, since FreeBSD now also uses OpenZFS and i am interested in replacing encfs for specific directories -> ZFS encryption, and using zvol:/umes for VMs.)  Other than that BTRFS works just great but one "corrupt" VM file i had, i now use a different cache strategy :-)

I first ran ArchLinux 5.8.3 and included the firmware that gets loaded, according to dmesg. It seems this driver needs two firmwares, one for operation and one for suspend/resume, but for 8822BE only one exists. That RTW88 driver was broken for 5.8, now 5.9, i will see on Saturday, maybe 5.10 does it.  (I do not use an initramfs.)

Ciao and good night!
Comment 17 Borislav Petkov 2020-11-17 09:04:34 UTC
(In reply to Steffen Nurpmeso from comment #16)
> I have no ZFS module

Bah, sorry about that. That's another guy. I thought you were doing a monologue on that bug. :-)
Comment 18 Adric Blake 2020-11-20 19:58:39 UTC
Forgot to update, whoops.

With kernel 5.9.8 and the latest git patch, there are no issues on resume under any BIOS SGX configuration.
Comment 19 Borislav Petkov 2020-11-20 22:06:01 UTC
Thanks for testing and reporting back!
Comment 20 Steffen Nurpmeso 2020-11-21 23:45:28 UTC
Hello!
I have no MSR line here with 5.10.0-rc4 (27bba9c532a8d21050b94224ffd310ad0058c353 indeed)  and your final patch.  (SGX disabled in BIOS.)  Also not after resuming?
Thank you! :)
Ciao, and a nice weekend i wish!
Comment 21 Borislav Petkov 2020-11-22 11:17:28 UTC
Cool, thanks for testing. I think we're done here.

Thanks to all involved folks for the help!

Note You need to log in before you can comment on or make changes to this bug.