I use Dell OptiPlex 7050, and kernel hangs when shutting down the computer. Similar symptom has been reported on some forums, and all of them are using Dell computers: https://bbs.archlinux.org/viewtopic.php?pid=2124429 https://www.reddit.com/r/openSUSE/comments/16qq99b/tumbleweed_shutdown_did_not_finish_completely/ https://forum.artixlinux.org/index.php/topic,5997.0.html Tested with various kernel and this bug seems to be caused by commit: 88afbb21d4b36fee6acaa167641f9f0fc122f01b.
CC'ing Thomas Gleixner, the author of the bad commit.
(In reply to Yanjun Yang from comment #0) > I use Dell OptiPlex 7050, and kernel hangs when shutting down the computer. > Similar symptom has been reported on some forums, and all of them are using > Dell computers: > https://bbs.archlinux.org/viewtopic.php?pid=2124429 > https://www.reddit.com/r/openSUSE/comments/16qq99b/ > tumbleweed_shutdown_did_not_finish_completely/ > https://forum.artixlinux.org/index.php/topic,5997.0.html > > Tested with various kernel and this bug seems to be caused by commit: > 88afbb21d4b36fee6acaa167641f9f0fc122f01b. Can you attach kernel log (dmesg/journalctl?)
A bisect log might be good as well, as this bisected to a merge commit, which sometimes means that the bisection went sideways. FWIW, here is what looks like another downstream report (only noticed it by chance): https://bugzilla.redhat.com/show_bug.cgi?id=2241279
Created attachment 305208 [details] kernel dmesg
(In reply to The Linux kernel's regression tracker (Thorsten Leemhuis) from comment #3) > A bisect log might be good as well, as this bisected to a merge commit, > which sometimes means that the bisection went sideways. > > FWIW, here is what looks like another downstream report (only noticed it by > chance): https://bugzilla.redhat.com/show_bug.cgi?id=2241279 I lost my bisect log. It was my first bisect session, and I did take some shortcut during the bisect. From the commit message, I thought 88afbb21d4b3 was promising and checked it out first. The result was tagged in following shot log, IIRC. * 88afbb21d4b3 Merge tag 'x86-core-2023-06-26' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tip/tip [BAD] |\ | * 45e34c8af58f x86/smp: Put CPUs into INIT on shutdown if possible [GOOD] | * 6087dd5e86ff x86/smp: Split sending INIT IPI out into a helper function [GOOD] | * d7893093a741 x86/smp: Cure kexec() vs. mwait_play_dead() breakage | * f9c9987bf52f x86/smp: Use dedicated cache-line for mwait_play_dead()[GOOD] | * 2affa6d6db28 x86/smp: Remove pointless wmb()s from native_stop_other_cpus() | * 9b040453d444 x86/smp: Dont access non-existing CPUID leaf | * 1f5e7eb7868e x86/smp: Make stop_other_cpus() more robust * | cd336f6562d3 Merge tag 'timers-core-2023-06-26' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tip/tip [GOOD] Following is the reflog of my session: 2dde18cd1d8f (tag: v6.5) HEAD@{9}: checkout: moving from 45e34c8af58f23db4474e2bfe79183efec09a18b to 2dde18cd1d8fac735875f2e4987f11817cc0bc2c 45e34c8af58f HEAD@{10}: checkout: moving from 6087dd5e86ff03a8cd4cffdf463a7f457e65cbff to 45e34c8af58f23db4474e2bfe79183efec09a18b 6087dd5e86ff HEAD@{11}: checkout: moving from f9c9987bf52f4e42e940ae217333ebb5a4c3b506 to 6087dd5e86ff03a8cd4cffdf463a7f457e65cbff f9c9987bf52f HEAD@{12}: checkout: moving from 88afbb21d4b36fee6acaa167641f9f0fc122f01b to f9c9987bf52f4e42e940ae217333ebb5a4c3b506 88afbb21d4b3 (HEAD) HEAD@{13}: checkout: moving from f9c9987bf52f4e42e940ae217333ebb5a4c3b506 to 88afbb21d4b36fee6acaa167641f9f0fc122f01b f9c9987bf52f HEAD@{14}: checkout: moving from 88afbb21d4b36fee6acaa167641f9f0fc122f01b to f9c9987bf52f4e42e940ae217333ebb5a4c3b506 88afbb21d4b3 (HEAD) HEAD@{15}: checkout: moving from f9c9987bf52f4e42e940ae217333ebb5a4c3b506 to 88afbb21d4b36fee6acaa167641f9f0fc122f01b f9c9987bf52f HEAD@{16}: checkout: moving from 88afbb21d4b36fee6acaa167641f9f0fc122f01b to f9c9987bf52f4e42e940ae217333ebb5a4c3b506 88afbb21d4b3 (HEAD) HEAD@{17}: checkout: moving from e5ce2f196fb9ab35fe18dcfd2bc17883db7bbe33 to 88afbb21d4b36fee6acaa167641f9f0fc122f01b e5ce2f196fb9 HEAD@{18}: checkout: moving from cd336f6562d3d7646a9cf071b902db200a1dd77b to e5ce2f196fb9ab35fe18dcfd2bc17883db7bbe33 cd336f6562d3 HEAD@{19}: checkout: moving from e5ce2f196fb9ab35fe18dcfd2bc17883db7bbe33 to cd336f6562d3d7646a9cf071b902db200a1dd77b e5ce2f196fb9 HEAD@{20}: checkout: moving from 59035135b32280fd394ba5765c6f4de24f48353e to e5ce2f196fb9ab35fe18dcfd2bc17883db7bbe33 59035135b322 HEAD@{21}: checkout: moving from cd336f6562d3d7646a9cf071b902db200a1dd77b to 59035135b32280fd394ba5765c6f4de24f48353e cd336f6562d3 HEAD@{22}: checkout: moving from 5dfe7a7e52ccdf60dfd11ccbe509e4365ea721ca to cd336f6562d3d7646a9cf071b902db200a1dd77b 5dfe7a7e52cc HEAD@{23}: checkout: moving from 7cffdbe3607a6cc2dc02d135e13732ec36bc4e28 to 5dfe7a7e52ccdf60dfd11ccbe509e4365ea721ca 7cffdbe3607a HEAD@{24}: checkout: moving from 0aa69d53ac7c30f6184f88f2e310d808b32b35a5 to 7cffdbe3607a6cc2dc02d135e13732ec36bc4e28 0aa69d53ac7c HEAD@{25}: checkout: moving from 2605e80d3438c77190f55b821c6575048c68268e to 0aa69d53ac7c30f6184f88f2e310d808b32b35a5 2605e80d3438 HEAD@{26}: checkout: moving from 6e17c6de3ddf3073741d9c91a796ee696914d8a0 to 2605e80d3438c77190f55b821c6575048c68268e 6e17c6de3ddf HEAD@{27}: checkout: moving from 3a8a670eeeaa40d87bd38a587438952741980c18 to 6e17c6de3ddf3073741d9c91a796ee696914d8a0 3a8a670eeeaa HEAD@{28}: checkout: moving from b775d6c5859affe00527cbe74263de05cfe6b9f9 to 3a8a670eeeaa40d87bd38a587438952741980c18 b775d6c5859a HEAD@{29}: checkout: moving from 2dde18cd1d8fac735875f2e4987f11817cc0bc2c to b775d6c5859affe00527cbe74263de05cfe6b9f9 2dde18cd1d8f (tag: v6.5) HEAD@{30}: checkout: moving from 6995e2de6891c724bfeb2db33d7b87775f913ad1 to v6.5 6995e2de6891 (tag: v6.4) HEAD@{31}: checkout: moving from master to v6.4
I'm not a git bisect expert, hence the following advice might be bad, so feel free to ignore it. But I think it might be wise to confirm again if 88afbb21d4b3's first parent (e.g. cd336f6562d3d7) really was good; and if it is, recheck if 88afbb21d4b3's second parent (e.g. 45e34c8af58f) really is bad.
[FWIW, maybe the merge really is the problem, as it afaics resolves a merge conflict]
(In reply to The Linux kernel's regression tracker (Thorsten Leemhuis) from comment #6) > I'm not a git bisect expert, hence the following advice might be bad, so > feel free to ignore it. But I think it might be wise to confirm again if > 88afbb21d4b3's first parent (e.g. cd336f6562d3d7) really was good; and if it > is, recheck if 88afbb21d4b3's second parent (e.g. 45e34c8af58f) really is > bad. Thanks for the advice. I did another test today, and it turns out 45e34c8af58f is the bad commit. Maybe it's not 100% reproducible, because last time I did double check the parents commits (can see from the reflog).
(In reply to Yanjun Yang from comment #8) > (In reply to The Linux kernel's regression tracker (Thorsten Leemhuis) from > comment #6) > > I'm not a git bisect expert, hence the following advice might be bad, so > > feel free to ignore it. But I think it might be wise to confirm again if > > 88afbb21d4b3's first parent (e.g. cd336f6562d3d7) really was good; and if > it > > is, recheck if 88afbb21d4b3's second parent (e.g. 45e34c8af58f) really is > > bad. > > Thanks for the advice. I did another test today, and it turns out > 45e34c8af58f > is the bad commit. Maybe it's not 100% reproducible, because last time I did > double check the parents commits (can see from the reflog). Sorry for the miss leading info. I read my reflog again, 45e34c8af58f was not checked a second time.
(In reply to Yanjun Yang from comment #9) > Sorry for the miss leading info. Happens, don't worry about it.
We ran into a similar problem with DELL systems. It turned out the BIOS SMM handler was picking the lowest APICID as the master, rather than picking the first one to arrive in SMM. Can you confirm if this is MeteorLake system? There is a BIOS fix for it that should resolve the issue which we have validated internally. Can add rdmsr -a 0x1b and turbostat --show CPU,APIC,X2APIC
(In reply to Ashok Raj from comment #11) > We ran into a similar problem with DELL systems. It turned out the BIOS SMM > handler was picking the lowest APICID as the master, rather than picking > the first one to arrive in SMM. > > Can you confirm if this is MeteorLake system? There is a BIOS fix for it > that should resolve the issue which we have validated internally. > > Can add rdmsr -a 0x1b > > and turbostat --show CPU,APIC,X2APIC From dmesg: [ 1.609507] smpboot: CPU0: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz (family: 0x6, model: 0x9e, stepping: 0x9) How could a MTL system find its way to customers when it's not even been released yet?
Thanks, i got carried away since this was fresh in my mind. But maybe the BIOS bug exists in some earlier ones if it happens to be the same problem. Is it possible to get the turbostat o/p? Just to confirm if the BSP isn't the one with the lowest APICID?
(In reply to Ashok Raj from comment #13) > Thanks, i got carried away since this was fresh in my mind. But maybe the > BIOS bug exists in some earlier ones if it happens to be the same problem. > > Is it possible to get the turbostat o/p? Just to confirm if the BSP isn't > the one with the lowest APICID? I ran following command using a good kernel, should I try it with the "bad" kernel? ❯ sudo rdmsr -a 0x1b fee00d00 fee00c00 fee00c00 fee00c00 ❯ sudo turbostat --show CPU,APIC,X2APIC turbostat version 2023.03.17 - Len Brown <lenb@kernel.org> Kernel command line: BOOT_IMAGE=/boot/vmlinuz-6.3.13_1 root=UUID=8e25188f-b653-464f-a786-64ca8d597633 ro loglevel=4 CPUID(0): GenuineIntel 0x16 CPUID levels CPUID(1): family:model:stepping 0x6:9e:9 (6:158:9) microcode 0xf4 CPUID(0x80000000): max_extended_levels: 0x80000008 CPUID(1): SSE3 MONITOR SMX EIST TM2 TSC MSR ACPI-TM HT TM CPUID(6): APERF, TURBO, DTS, PTM, HWP, HWPnotify, HWPwindow, HWPepp, No-HWPpkg, EPB cpu1: MSR_IA32_MISC_ENABLE: 0x00850089 (TCC EIST MWAIT PREFETCH TURBO) CPUID(7): SGX No-Hybrid cpu1: MSR_IA32_FEATURE_CONTROL: 0x00000005 (Locked ) CPUID(0x15): eax_crystal: 2 ebx_tsc: 284 ecx_crystal_hz: 0 TSC: 3408 MHz (24000000 Hz * 284 / 2 / 1000000) CPUID(0x16): base_mhz: 3400 max_mhz: 3800 bus_mhz: 100 cpu1: MSR_MISC_PWR_MGMT: 0x00401cc0 (ENable-EIST_Coordination DISable-EPB DISable-OOB) RAPL: 4033 sec. Joule Counter Range, at 65 Watts cpu1: MSR_PLATFORM_INFO: 0x88080838f1012200 8 * 100.0 = 800.0 MHz max efficiency frequency 34 * 100.0 = 3400.0 MHz base frequency cpu1: MSR_IA32_POWER_CTL: 0x002c005d (C1E auto-promotion: DISabled) cpu1: MSR_TURBO_RATIO_LIMIT: 0x24252526 36 * 100.0 = 3600.0 MHz max turbo 4 active cores 37 * 100.0 = 3700.0 MHz max turbo 3 active cores 37 * 100.0 = 3700.0 MHz max turbo 2 active cores 38 * 100.0 = 3800.0 MHz max turbo 1 active cores cpu1: MSR_CONFIG_TDP_NOMINAL: 0x00000022 (base_ratio=34) cpu1: MSR_CONFIG_TDP_LEVEL_1: 0x00000000 () cpu1: MSR_CONFIG_TDP_LEVEL_2: 0x00000000 () cpu1: MSR_CONFIG_TDP_CONTROL: 0x80000000 ( lock=1) cpu1: MSR_TURBO_ACTIVATION_RATIO: 0x00000000 (MAX_NON_TURBO_RATIO=0 lock=0) cpu1: MSR_PKG_CST_CONFIG_CONTROL: 0x1e008006 (UNdemote-C3, UNdemote-C1, demote-C3, demote-C1, locked, pkg-cstate-limit=6 (pc8)) /dev/cpu_dma_latency: 2000000000 usec (default) current_driver: intel_idle current_governor: menu current_governor_ro: menu cpu1: POLL: CPUIDLE CORE POLL IDLE cpu1: C1: MWAIT 0x00 cpu1: C1E: MWAIT 0x01 cpu1: C3: MWAIT 0x10 cpu1: C6: MWAIT 0x20 cpu1: C7s: MWAIT 0x33 cpu1: C8: MWAIT 0x40 cpu1: cpufreq driver: intel_pstate cpu1: cpufreq governor: powersave cpufreq intel_pstate no_turbo: 0 cpu1: MSR_MISC_FEATURE_CONTROL: 0x00000000 (L2-Prefetch L2-Prefetch-pair L1-Prefetch L1-IP-Prefetch) cpu0: MSR_PM_ENABLE: 0x00000001 (HWP) cpu0: MSR_HWP_CAPABILITIES: 0x010f2226 (high 38 guar 34 eff 15 low 1) cpu0: MSR_HWP_REQUEST: 0x80002608 (min 8 max 38 des 0 epp 0x80 window 0x0 pkg 0x0) cpu0: MSR_HWP_INTERRUPT: 0x00000001 (EN_Guaranteed_Perf_Change, Dis_Excursion_Min) cpu0: MSR_HWP_STATUS: 0x00000000 (No-Guaranteed_Perf_Change, No-Excursion_Min) cpu0: EPB: 6 (balanced) cpu0: MSR_RAPL_POWER_UNIT: 0x000a0e03 (0.125000 Watts, 0.000061 Joules, 0.000977 sec.) cpu0: MSR_PKG_POWER_INFO: 0x00000208 (65 W TDP, RAPL 0 - 0 W, 0.000000 sec.) cpu0: MSR_PKG_POWER_LIMIT: 0x42828a001b8208 (UNlocked) cpu0: PKG Limit #1: ENabled (65.000 Watts, 8.000000 sec, clamp ENabled) cpu0: PKG Limit #2: ENabled (81.250 Watts, 0.002441* sec, clamp DISabled) cpu0: MSR_VR_CURRENT_CONFIG: 0x00000000 cpu0: PKG Limit #4: 0.000000 Watts (UNlocked) cpu0: MSR_DRAM_POWER_LIMIT: 0x5400de00000000 (UNlocked) cpu0: DRAM Limit: DISabled (0.000 Watts, 0.000977 sec, clamp DISabled) cpu0: MSR_PP0_POLICY: 0 cpu0: MSR_PP0_POWER_LIMIT: 0x00000000 (UNlocked) cpu0: Cores Limit: DISabled (0.000 Watts, 0.000977 sec, clamp DISabled) cpu0: MSR_PP1_POLICY: 0 cpu0: MSR_PP1_POWER_LIMIT: 0x00000000 (UNlocked) cpu0: GFX Limit: DISabled (0.000 Watts, 0.000977 sec, clamp DISabled) cpu0: MSR_IA32_TEMPERATURE_TARGET: 0x00641400 (100 C) (100 default - 0 offset) cpu0: MSR_IA32_PACKAGE_THERM_STATUS: 0x88450800 (31 C) cpu0: MSR_IA32_PACKAGE_THERM_INTERRUPT: 0x00000003 (100 C, 100 C) cpu1: MSR_PKGC3_IRTL: 0x0000884e (valid, 79872 ns) cpu1: MSR_PKGC6_IRTL: 0x00008876 (valid, 120832 ns) cpu1: MSR_PKGC7_IRTL: 0x00008894 (valid, 151552 ns) cpu1: MSR_PKGC8_IRTL: 0x000088fa (valid, 256000 ns) cpu1: MSR_PKGC9_IRTL: 0x0000894c (valid, 339968 ns) cpu1: MSR_PKGC10_IRTL: 0x00008bf2 (valid, 1034240 ns) CPU APIC X2APIC - - - 0 0 0 1 2 2 2 4 4 3 6 6 CPU APIC X2APIC - - - 0 0 0 1 2 2 2 4 4 3 6 6 CPU APIC X2APIC - - - 0 0 0 1 2 2 2 4 4 3 6 6 ^CCPU APIC X2APIC - - - 0 0 0 1 2 2 2 4 4 3 6 6
On 15/10/2023 12:26, bugzilla-daemon@kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=217995 > > --- Comment #14 from Yanjun Yang (yangyj.ee@gmail.com) --- > (In reply to Ashok Raj from comment #13) >> Thanks, i got carried away since this was fresh in my mind. But maybe the >> BIOS bug exists in some earlier ones if it happens to be the same problem. >> >> Is it possible to get the turbostat o/p? Just to confirm if the BSP isn't >> the one with the lowest APICID? > > I ran following command using a good kernel, should I try it with the "bad" > kernel? > > ❯ sudo rdmsr -a 0x1b > fee00d00 > fee00c00 > fee00c00 > fee00c00 > ❯ sudo turbostat --show CPU,APIC,X2APIC > turbostat version 2023.03.17 - Len Brown <lenb@kernel.org> > Kernel command line: BOOT_IMAGE=/boot/vmlinuz-6.3.13_1 > root=UUID=8e25188f-b653-464f-a786-64ca8d597633 ro loglevel=4 > CPUID(0): GenuineIntel 0x16 CPUID levels > CPUID(1): family:model:stepping 0x6:9e:9 (6:158:9) microcode 0xf4 > CPUID(0x80000000): max_extended_levels: 0x80000008 > CPUID(1): SSE3 MONITOR SMX EIST TM2 TSC MSR ACPI-TM HT TM > CPUID(6): APERF, TURBO, DTS, PTM, HWP, HWPnotify, HWPwindow, HWPepp, > No-HWPpkg, > EPB > cpu1: MSR_IA32_MISC_ENABLE: 0x00850089 (TCC EIST MWAIT PREFETCH TURBO) > CPUID(7): SGX No-Hybrid > cpu1: MSR_IA32_FEATURE_CONTROL: 0x00000005 (Locked ) > CPUID(0x15): eax_crystal: 2 ebx_tsc: 284 ecx_crystal_hz: 0 > TSC: 3408 MHz (24000000 Hz * 284 / 2 / 1000000) > CPUID(0x16): base_mhz: 3400 max_mhz: 3800 bus_mhz: 100 > cpu1: MSR_MISC_PWR_MGMT: 0x00401cc0 (ENable-EIST_Coordination DISable-EPB > DISable-OOB) > RAPL: 4033 sec. Joule Counter Range, at 65 Watts > cpu1: MSR_PLATFORM_INFO: 0x88080838f1012200 > 8 * 100.0 = 800.0 MHz max efficiency frequency > 34 * 100.0 = 3400.0 MHz base frequency > cpu1: MSR_IA32_POWER_CTL: 0x002c005d (C1E auto-promotion: DISabled) > cpu1: MSR_TURBO_RATIO_LIMIT: 0x24252526 > 36 * 100.0 = 3600.0 MHz max turbo 4 active cores > 37 * 100.0 = 3700.0 MHz max turbo 3 active cores > 37 * 100.0 = 3700.0 MHz max turbo 2 active cores > 38 * 100.0 = 3800.0 MHz max turbo 1 active cores > cpu1: MSR_CONFIG_TDP_NOMINAL: 0x00000022 (base_ratio=34) > cpu1: MSR_CONFIG_TDP_LEVEL_1: 0x00000000 () > cpu1: MSR_CONFIG_TDP_LEVEL_2: 0x00000000 () > cpu1: MSR_CONFIG_TDP_CONTROL: 0x80000000 ( lock=1) > cpu1: MSR_TURBO_ACTIVATION_RATIO: 0x00000000 (MAX_NON_TURBO_RATIO=0 lock=0) > cpu1: MSR_PKG_CST_CONFIG_CONTROL: 0x1e008006 (UNdemote-C3, UNdemote-C1, > demote-C3, demote-C1, locked, pkg-cstate-limit=6 (pc8)) > /dev/cpu_dma_latency: 2000000000 usec (default) > current_driver: intel_idle > current_governor: menu > current_governor_ro: menu > cpu1: POLL: CPUIDLE CORE POLL IDLE > cpu1: C1: MWAIT 0x00 > cpu1: C1E: MWAIT 0x01 > cpu1: C3: MWAIT 0x10 > cpu1: C6: MWAIT 0x20 > cpu1: C7s: MWAIT 0x33 > cpu1: C8: MWAIT 0x40 > cpu1: cpufreq driver: intel_pstate > cpu1: cpufreq governor: powersave > cpufreq intel_pstate no_turbo: 0 > cpu1: MSR_MISC_FEATURE_CONTROL: 0x00000000 (L2-Prefetch L2-Prefetch-pair > L1-Prefetch L1-IP-Prefetch) > cpu0: MSR_PM_ENABLE: 0x00000001 (HWP) > cpu0: MSR_HWP_CAPABILITIES: 0x010f2226 (high 38 guar 34 eff 15 low 1) > cpu0: MSR_HWP_REQUEST: 0x80002608 (min 8 max 38 des 0 epp 0x80 window 0x0 pkg > 0x0) > cpu0: MSR_HWP_INTERRUPT: 0x00000001 (EN_Guaranteed_Perf_Change, > Dis_Excursion_Min) > cpu0: MSR_HWP_STATUS: 0x00000000 (No-Guaranteed_Perf_Change, > No-Excursion_Min) > cpu0: EPB: 6 (balanced) > cpu0: MSR_RAPL_POWER_UNIT: 0x000a0e03 (0.125000 Watts, 0.000061 Joules, > 0.000977 sec.) > cpu0: MSR_PKG_POWER_INFO: 0x00000208 (65 W TDP, RAPL 0 - 0 W, 0.000000 sec.) > cpu0: MSR_PKG_POWER_LIMIT: 0x42828a001b8208 (UNlocked) > cpu0: PKG Limit #1: ENabled (65.000 Watts, 8.000000 sec, clamp ENabled) > cpu0: PKG Limit #2: ENabled (81.250 Watts, 0.002441* sec, clamp DISabled) > cpu0: MSR_VR_CURRENT_CONFIG: 0x00000000 > cpu0: PKG Limit #4: 0.000000 Watts (UNlocked) > cpu0: MSR_DRAM_POWER_LIMIT: 0x5400de00000000 (UNlocked) > cpu0: DRAM Limit: DISabled (0.000 Watts, 0.000977 sec, clamp DISabled) > cpu0: MSR_PP0_POLICY: 0 > cpu0: MSR_PP0_POWER_LIMIT: 0x00000000 (UNlocked) > cpu0: Cores Limit: DISabled (0.000 Watts, 0.000977 sec, clamp DISabled) > cpu0: MSR_PP1_POLICY: 0 > cpu0: MSR_PP1_POWER_LIMIT: 0x00000000 (UNlocked) > cpu0: GFX Limit: DISabled (0.000 Watts, 0.000977 sec, clamp DISabled) > cpu0: MSR_IA32_TEMPERATURE_TARGET: 0x00641400 (100 C) (100 default - 0 > offset) > cpu0: MSR_IA32_PACKAGE_THERM_STATUS: 0x88450800 (31 C) > cpu0: MSR_IA32_PACKAGE_THERM_INTERRUPT: 0x00000003 (100 C, 100 C) > cpu1: MSR_PKGC3_IRTL: 0x0000884e (valid, 79872 ns) > cpu1: MSR_PKGC6_IRTL: 0x00008876 (valid, 120832 ns) > cpu1: MSR_PKGC7_IRTL: 0x00008894 (valid, 151552 ns) > cpu1: MSR_PKGC8_IRTL: 0x000088fa (valid, 256000 ns) > cpu1: MSR_PKGC9_IRTL: 0x0000894c (valid, 339968 ns) > cpu1: MSR_PKGC10_IRTL: 0x00008bf2 (valid, 1034240 ns) > CPU APIC X2APIC > - - - > 0 0 0 > 1 2 2 > 2 4 4 > 3 6 6 > CPU APIC X2APIC > - - - > 0 0 0 > 1 2 2 > 2 4 4 > 3 6 6 > CPU APIC X2APIC > - - - > 0 0 0 > 1 2 2 > 2 4 4 > 3 6 6 > ^CCPU APIC X2APIC > - - - > 0 0 0 > 1 2 2 > 2 4 4 > 3 6 6 > Yes; do so as comparison.
(In reply to Bagas Sanjaya from comment #15) > On 15/10/2023 12:26, bugzilla-daemon@kernel.org wrote: > > https://bugzilla.kernel.org/show_bug.cgi?id=217995 > > > > --- Comment #14 from Yanjun Yang (yangyj.ee@gmail.com) --- > > (In reply to Ashok Raj from comment #13) > >> Thanks, i got carried away since this was fresh in my mind. But maybe the > >> BIOS bug exists in some earlier ones if it happens to be the same problem. > >> > >> Is it possible to get the turbostat o/p? Just to confirm if the BSP isn't > >> the one with the lowest APICID? > > > > I ran following command using a good kernel, should I try it with the "bad" > > kernel? > > > > ❯ sudo rdmsr -a 0x1b > > fee00d00 > > fee00c00 > > fee00c00 > > fee00c00 > > ❯ sudo turbostat --show CPU,APIC,X2APIC > > turbostat version 2023.03.17 - Len Brown <lenb@kernel.org> > > Kernel command line: BOOT_IMAGE=/boot/vmlinuz-6.3.13_1 > > root=UUID=8e25188f-b653-464f-a786-64ca8d597633 ro loglevel=4 > > CPUID(0): GenuineIntel 0x16 CPUID levels > > CPUID(1): family:model:stepping 0x6:9e:9 (6:158:9) microcode 0xf4 > > CPUID(0x80000000): max_extended_levels: 0x80000008 > > CPUID(1): SSE3 MONITOR SMX EIST TM2 TSC MSR ACPI-TM HT TM > > CPUID(6): APERF, TURBO, DTS, PTM, HWP, HWPnotify, HWPwindow, HWPepp, > > No-HWPpkg, > > EPB > > cpu1: MSR_IA32_MISC_ENABLE: 0x00850089 (TCC EIST MWAIT PREFETCH TURBO) > > CPUID(7): SGX No-Hybrid > > cpu1: MSR_IA32_FEATURE_CONTROL: 0x00000005 (Locked ) > > CPUID(0x15): eax_crystal: 2 ebx_tsc: 284 ecx_crystal_hz: 0 > > TSC: 3408 MHz (24000000 Hz * 284 / 2 / 1000000) > > CPUID(0x16): base_mhz: 3400 max_mhz: 3800 bus_mhz: 100 > > cpu1: MSR_MISC_PWR_MGMT: 0x00401cc0 (ENable-EIST_Coordination DISable-EPB > > DISable-OOB) > > RAPL: 4033 sec. Joule Counter Range, at 65 Watts > > cpu1: MSR_PLATFORM_INFO: 0x88080838f1012200 > > 8 * 100.0 = 800.0 MHz max efficiency frequency > > 34 * 100.0 = 3400.0 MHz base frequency > > cpu1: MSR_IA32_POWER_CTL: 0x002c005d (C1E auto-promotion: DISabled) > > cpu1: MSR_TURBO_RATIO_LIMIT: 0x24252526 > > 36 * 100.0 = 3600.0 MHz max turbo 4 active cores > > 37 * 100.0 = 3700.0 MHz max turbo 3 active cores > > 37 * 100.0 = 3700.0 MHz max turbo 2 active cores > > 38 * 100.0 = 3800.0 MHz max turbo 1 active cores > > cpu1: MSR_CONFIG_TDP_NOMINAL: 0x00000022 (base_ratio=34) > > cpu1: MSR_CONFIG_TDP_LEVEL_1: 0x00000000 () > > cpu1: MSR_CONFIG_TDP_LEVEL_2: 0x00000000 () > > cpu1: MSR_CONFIG_TDP_CONTROL: 0x80000000 ( lock=1) > > cpu1: MSR_TURBO_ACTIVATION_RATIO: 0x00000000 (MAX_NON_TURBO_RATIO=0 lock=0) > > cpu1: MSR_PKG_CST_CONFIG_CONTROL: 0x1e008006 (UNdemote-C3, UNdemote-C1, > > demote-C3, demote-C1, locked, pkg-cstate-limit=6 (pc8)) > > /dev/cpu_dma_latency: 2000000000 usec (default) > > current_driver: intel_idle > > current_governor: menu > > current_governor_ro: menu > > cpu1: POLL: CPUIDLE CORE POLL IDLE > > cpu1: C1: MWAIT 0x00 > > cpu1: C1E: MWAIT 0x01 > > cpu1: C3: MWAIT 0x10 > > cpu1: C6: MWAIT 0x20 > > cpu1: C7s: MWAIT 0x33 > > cpu1: C8: MWAIT 0x40 > > cpu1: cpufreq driver: intel_pstate > > cpu1: cpufreq governor: powersave > > cpufreq intel_pstate no_turbo: 0 > > cpu1: MSR_MISC_FEATURE_CONTROL: 0x00000000 (L2-Prefetch L2-Prefetch-pair > > L1-Prefetch L1-IP-Prefetch) > > cpu0: MSR_PM_ENABLE: 0x00000001 (HWP) > > cpu0: MSR_HWP_CAPABILITIES: 0x010f2226 (high 38 guar 34 eff 15 low 1) > > cpu0: MSR_HWP_REQUEST: 0x80002608 (min 8 max 38 des 0 epp 0x80 window 0x0 > pkg > > 0x0) > > cpu0: MSR_HWP_INTERRUPT: 0x00000001 (EN_Guaranteed_Perf_Change, > > Dis_Excursion_Min) > > cpu0: MSR_HWP_STATUS: 0x00000000 (No-Guaranteed_Perf_Change, > > No-Excursion_Min) > > cpu0: EPB: 6 (balanced) > > cpu0: MSR_RAPL_POWER_UNIT: 0x000a0e03 (0.125000 Watts, 0.000061 Joules, > > 0.000977 sec.) > > cpu0: MSR_PKG_POWER_INFO: 0x00000208 (65 W TDP, RAPL 0 - 0 W, 0.000000 > sec.) > > cpu0: MSR_PKG_POWER_LIMIT: 0x42828a001b8208 (UNlocked) > > cpu0: PKG Limit #1: ENabled (65.000 Watts, 8.000000 sec, clamp ENabled) > > cpu0: PKG Limit #2: ENabled (81.250 Watts, 0.002441* sec, clamp DISabled) > > cpu0: MSR_VR_CURRENT_CONFIG: 0x00000000 > > cpu0: PKG Limit #4: 0.000000 Watts (UNlocked) > > cpu0: MSR_DRAM_POWER_LIMIT: 0x5400de00000000 (UNlocked) > > cpu0: DRAM Limit: DISabled (0.000 Watts, 0.000977 sec, clamp DISabled) > > cpu0: MSR_PP0_POLICY: 0 > > cpu0: MSR_PP0_POWER_LIMIT: 0x00000000 (UNlocked) > > cpu0: Cores Limit: DISabled (0.000 Watts, 0.000977 sec, clamp DISabled) > > cpu0: MSR_PP1_POLICY: 0 > > cpu0: MSR_PP1_POWER_LIMIT: 0x00000000 (UNlocked) > > cpu0: GFX Limit: DISabled (0.000 Watts, 0.000977 sec, clamp DISabled) > > cpu0: MSR_IA32_TEMPERATURE_TARGET: 0x00641400 (100 C) (100 default - 0 > > offset) > > cpu0: MSR_IA32_PACKAGE_THERM_STATUS: 0x88450800 (31 C) > > cpu0: MSR_IA32_PACKAGE_THERM_INTERRUPT: 0x00000003 (100 C, 100 C) > > cpu1: MSR_PKGC3_IRTL: 0x0000884e (valid, 79872 ns) > > cpu1: MSR_PKGC6_IRTL: 0x00008876 (valid, 120832 ns) > > cpu1: MSR_PKGC7_IRTL: 0x00008894 (valid, 151552 ns) > > cpu1: MSR_PKGC8_IRTL: 0x000088fa (valid, 256000 ns) > > cpu1: MSR_PKGC9_IRTL: 0x0000894c (valid, 339968 ns) > > cpu1: MSR_PKGC10_IRTL: 0x00008bf2 (valid, 1034240 ns) > > CPU APIC X2APIC > > - - - > > 0 0 0 > > 1 2 2 > > 2 4 4 > > 3 6 6 > > CPU APIC X2APIC > > - - - > > 0 0 0 > > 1 2 2 > > 2 4 4 > > 3 6 6 > > CPU APIC X2APIC > > - - - > > 0 0 0 > > 1 2 2 > > 2 4 4 > > 3 6 6 > > ^CCPU APIC X2APIC > > - - - > > 0 0 0 > > 1 2 2 > > 2 4 4 > > 3 6 6 > > > > Yes; do so as comparison. ❯ sudo turbostat --show CPU,APIC,X2APIC turbostat version 2023.03.17 - Len Brown <lenb@kernel.org> Kernel command line: root=UUID=8e25188f-b653-464f-a786-64ca8d597633 ro loglevel=4 CPUID(0): GenuineIntel 0x16 CPUID levels CPUID(1): family:model:stepping 0x6:9e:9 (6:158:9) microcode 0xf4 CPUID(0x80000000): max_extended_levels: 0x80000008 CPUID(1): SSE3 MONITOR SMX EIST TM2 TSC MSR ACPI-TM HT TM CPUID(6): APERF, TURBO, DTS, PTM, HWP, HWPnotify, HWPwindow, HWPepp, No-HWPpkg, EPB cpu3: MSR_IA32_MISC_ENABLE: 0x00850089 (TCC EIST MWAIT PREFETCH TURBO) CPUID(7): SGX No-Hybrid cpu3: MSR_IA32_FEATURE_CONTROL: 0x00000005 (Locked ) CPUID(0x15): eax_crystal: 2 ebx_tsc: 284 ecx_crystal_hz: 0 TSC: 3408 MHz (24000000 Hz * 284 / 2 / 1000000) CPUID(0x16): base_mhz: 3400 max_mhz: 3800 bus_mhz: 100 cpu3: MSR_MISC_PWR_MGMT: 0x00401cc0 (ENable-EIST_Coordination DISable-EPB DISable-OOB) RAPL: 4033 sec. Joule Counter Range, at 65 Watts cpu3: MSR_PLATFORM_INFO: 0x88080838f1012200 8 * 100.0 = 800.0 MHz max efficiency frequency 34 * 100.0 = 3400.0 MHz base frequency cpu3: MSR_IA32_POWER_CTL: 0x002c005d (C1E auto-promotion: DISabled) cpu3: MSR_TURBO_RATIO_LIMIT: 0x24252526 36 * 100.0 = 3600.0 MHz max turbo 4 active cores 37 * 100.0 = 3700.0 MHz max turbo 3 active cores 37 * 100.0 = 3700.0 MHz max turbo 2 active cores 38 * 100.0 = 3800.0 MHz max turbo 1 active cores cpu3: MSR_CONFIG_TDP_NOMINAL: 0x00000022 (base_ratio=34) cpu3: MSR_CONFIG_TDP_LEVEL_1: 0x00000000 () cpu3: MSR_CONFIG_TDP_LEVEL_2: 0x00000000 () cpu3: MSR_CONFIG_TDP_CONTROL: 0x80000000 ( lock=1) cpu3: MSR_TURBO_ACTIVATION_RATIO: 0x00000000 (MAX_NON_TURBO_RATIO=0 lock=0) cpu3: MSR_PKG_CST_CONFIG_CONTROL: 0x1e008006 (UNdemote-C3, UNdemote-C1, demote-C3, demote-C1, locked, pkg-cstate-limit=6 (pc8)) /dev/cpu_dma_latency: 2000000000 usec (default) current_driver: intel_idle current_governor: menu current_governor_ro: menu cpu3: POLL: CPUIDLE CORE POLL IDLE cpu3: C1: MWAIT 0x00 cpu3: C1E: MWAIT 0x01 cpu3: C3: MWAIT 0x10 cpu3: C6: MWAIT 0x20 cpu3: C7s: MWAIT 0x33 cpu3: C8: MWAIT 0x40 cpu3: cpufreq driver: intel_pstate cpu3: cpufreq governor: powersave cpufreq intel_pstate no_turbo: 0 cpu3: MSR_MISC_FEATURE_CONTROL: 0x00000000 (L2-Prefetch L2-Prefetch-pair L1-Prefetch L1-IP-Prefetch) cpu0: MSR_PM_ENABLE: 0x00000001 (HWP) cpu0: MSR_HWP_CAPABILITIES: 0x010f2226 (high 38 guar 34 eff 15 low 1) cpu0: MSR_HWP_REQUEST: 0x80002608 (min 8 max 38 des 0 epp 0x80 window 0x0 pkg 0x0) cpu0: MSR_HWP_INTERRUPT: 0x00000001 (EN_Guaranteed_Perf_Change, Dis_Excursion_Min) cpu0: MSR_HWP_STATUS: 0x00000000 (No-Guaranteed_Perf_Change, No-Excursion_Min) cpu0: EPB: 6 (balanced) cpu0: MSR_RAPL_POWER_UNIT: 0x000a0e03 (0.125000 Watts, 0.000061 Joules, 0.000977 sec.) cpu0: MSR_PKG_POWER_INFO: 0x00000208 (65 W TDP, RAPL 0 - 0 W, 0.000000 sec.) cpu0: MSR_PKG_POWER_LIMIT: 0x42828a001b8208 (UNlocked) cpu0: PKG Limit #1: ENabled (65.000 Watts, 8.000000 sec, clamp ENabled) cpu0: PKG Limit #2: ENabled (81.250 Watts, 0.002441* sec, clamp DISabled) cpu0: MSR_VR_CURRENT_CONFIG: 0x00000000 cpu0: PKG Limit #4: 0.000000 Watts (UNlocked) cpu0: MSR_DRAM_POWER_LIMIT: 0x5400de00000000 (UNlocked) cpu0: DRAM Limit: DISabled (0.000 Watts, 0.000977 sec, clamp DISabled) cpu0: MSR_PP0_POLICY: 0 cpu0: MSR_PP0_POWER_LIMIT: 0x00000000 (UNlocked) cpu0: Cores Limit: DISabled (0.000 Watts, 0.000977 sec, clamp DISabled) cpu0: MSR_PP1_POLICY: 0 cpu0: MSR_PP1_POWER_LIMIT: 0x00000000 (UNlocked) cpu0: GFX Limit: DISabled (0.000 Watts, 0.000977 sec, clamp DISabled) cpu0: MSR_IA32_TEMPERATURE_TARGET: 0x00641400 (100 C) (100 default - 0 offset) cpu0: MSR_IA32_PACKAGE_THERM_STATUS: 0x88440800 (32 C) cpu0: MSR_IA32_PACKAGE_THERM_INTERRUPT: 0x00000003 (100 C, 100 C) cpu3: MSR_PKGC3_IRTL: 0x0000884e (valid, 79872 ns) cpu3: MSR_PKGC6_IRTL: 0x00008876 (valid, 120832 ns) cpu3: MSR_PKGC7_IRTL: 0x00008894 (valid, 151552 ns) cpu3: MSR_PKGC8_IRTL: 0x000088fa (valid, 256000 ns) cpu3: MSR_PKGC9_IRTL: 0x0000894c (valid, 339968 ns) cpu3: MSR_PKGC10_IRTL: 0x00008bf2 (valid, 1034240 ns)
The culprit was reverted in mainline, 6.6-rc6 thus should fix things. Might be good if anyone could confirm this. Patch is en route for the next 6.5.y release, too.
(In reply to The Linux kernel's regression tracker (Thorsten Leemhuis) from comment #17) > The culprit was reverted in mainline, 6.6-rc6 thus should fix things. Might > be good if anyone could confirm this. Patch is en route for the next 6.5.y > release, too. Tested, and 6.6-rc6 fixed this bug for me.
Thank you, version 6.5.8 fixes the problem on my DELL Precision 3420 on arch linux.