Bug 110941
Summary: | Skylake / Intel i5-6200U hangs during boot on "HWP enabled" message | ||
---|---|---|---|
Product: | Power Management | Reporter: | anders.ekbom |
Component: | intel_pstate | Assignee: | Chen Yu (yu.c.chen) |
Status: | CLOSED CODE_FIX | ||
Severity: | normal | CC: | andy-bugzilla.kernel.org, ashley, cracker9900, cvwdhcybbjvnvv, dgadomski, dinyar.rabady+kernel, Dragonlance156156, fpswhtjjuk, geiger.mario, kernel.org, kernel, kernel, lmhxkjiihzcucb, lubko+kernel, mabawsa, marco.spirig, martinlauridsen, nelhage, prarit, quachtlc, rcvanvo, roli, rui.zhang, russell.jones, saunders.52, shafi.yussuf, srinivas.pandruvada, tim.williams.public, wil.kaedin, yrjans, yzhou61 |
Priority: | P1 | ||
Hardware: | Intel | ||
OS: | Linux | ||
Kernel Version: | 4.4.0-040400-generic | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
Boot sequence
acpidump for a thinkpad yoga 260 Picture of Yoga 260 power config BIOS screen HWP Interrupt disable Hardinfo report for Yoga 260 Thinpad Yoga 260 spec sheet Hang i7-6500U X1 Yoga Grub X1 Yoga BIOS parameters /proc/cpuinfo Lenovo X1 Carbon 4th /proc/cpuinfo & MSRs 0x19c, 0x19b, 0x64f Test patch to fix system freeze Redirect thermal interrupt to OS |
Forgot to add that boot works with kernel 3.19 without any parameters. Could you please do the following: 1. boot system with intel_pstate=no_hwp 2. after boot up, # cpuid | grep 'hardware P-State control' (you might need to install cpuid tools) this step is to confirm if your system supports HWP. 3. if step is 'true', then type: # modprobe msr # rdmsr 0x770 # wrmsr 0x770 1 to see if it hangs. (you might need to install msr-tools) This is on Ubuntu 15.10 with the 4.4-kernel: $ cpuid | grep 'hardware P-State control' hardware P-State control = false hardware P-State control = false hardware P-State control = false hardware P-State control = false (In reply to anders.ekbom from comment #3) > This is on Ubuntu 15.10 with the 4.4-kernel: > > $ cpuid | grep 'hardware P-State control' > hardware P-State control = false > hardware P-State control = false > hardware P-State control = false > hardware P-State control = false thanks, Actually the cpuid code is broken, it reports the wrong HWP status that, cpuid checks the incorrect register of edx after (cpuid 80000007) for querying HWP feature. how about test with step 3 directly? Yup, it hangs on the last command: # modprobe msr # rdmsr 0x770 0 # wrmsr ox770 1 [immediately hangs] Then this seems to be a hardware bug to me, even CPUID.EAX 006.bit7 reports a positive value for hwp support, we can not enable it by writing 1 to MSR 0x770. I think either we should update the latest BIOS, or we should report this bug to lenovo. BWT, the reason why cpuid-tools use EAX 80000007 to check hwp support is because AMD cpu is based on querying 80000007 for hwp support, however intel cpu leverage EAX 006. AFAIK, I have the latest BIOS (1.14) but I could be mistaken. Is there anything else I can do to push this issue forward? Fun fact, the link to the "latest" BIOS on http://support.lenovo.com/se/sv/products/Laptops-and-netbooks/ThinkPad-Yoga-Series-laptops/ThinkPad-Yoga-260/downloads/DS105460 is wrong, should be https://download.lenovo.com/pccbbs/mobiles/n1guj03w.exe (1.14) instead of https://download.lenovo.com/pccbbs/mobiles/n1guj02w.exe (1.12). Adding kernel parameter nolapic also allows boot, even without intel_pstate=no_hwp. I am not sure if that helps identify the root of the problem. Please let me know if there is anything I can test. I am running kernel 4.5-rc2, but this worked as far back as 4.3.3 I have put the output of dmesg when booting with this parameter here: http://pastebin.com/F4SbgBJ6 Don't know if this is useful, but: I tried adding nolapic and even though booting works, it takes significantly longer time, the "Ubuntu..." splash screen is displayed for ~30 seconds (usually ~1 second). Also: - cpuid | grep 'hardware P-State control' only lists 1 line. - The Intel 8260 wifi card isn't recognized. Do you have logs with 4.3.3 booting wit HWP? (In reply to Srinivas Pandruvada from comment #10) > Do you have logs with 4.3.3 booting wit HWP? Here is logs booting with only nolapic parameter on kernel 4.3.3 Note, I was using an Arch Linux live disk, booting from USB because I do not have kernel 4.3.3 installed on my internal drive. http://pastebin.com/q5HD51xn As with what anders said, it takes much longer to boot, as it is only using one thread on one core. Also, I had no problems with the wifi card. On your comment #8 "I am running kernel 4.5-rc2, but this worked as far back as 4.3.3" I thought you can boot 4.3.3 without lapic parameter, only issue is with 4.5.x where you need lapic. (In reply to Srinivas Pandruvada from comment #12) > On your comment #8 > "I am running kernel 4.5-rc2, but this worked as far back as 4.3.3" > I thought you can boot 4.3.3 without lapic parameter, only issue is with > 4.5.x where you need lapic. 4.3.3 boots with ONE OF intel_pstate=no_hwp OR nolapic. Here is dmesg with no_hwp instead on 4.3.3 http://pastebin.com/rJnDY8yv To be more clear: everything 4.3.3 and up appears to have the same problem (will only boot with ONE OF the above mentioned parameters) Even 4.5rc3 is like this, though that is not surprising because rc3 did not change any intel processor drivers. Let's first check if Windows running on this system was using HWP. So boot with no_hwp and run acpidump > acpidump.out Most of the distros have acpidump package. Attach acpidump.out file. Created attachment 203821 [details]
acpidump for a thinkpad yoga 260
I have attached the acpidump. I removed the MSDM table as it contains my windows product key, but the rest is intact.
I tried latest kernel 4.5.rc4 on Lenovo Yoga 900. The system can boot with HWP. So something specific to Yoga 260. I wanted to confirm whether Windows used HWP mode on this system, but ACPI tables have some dynamic loading so can't confirm whether Windows would have used HWP mode. I suspect something specific to BIOS on this system. Try few things: - Try just adding processor.max_cstate=0 to command line and try with latest kernel, see if you can boot. - Instead of nolapic try just with nolapic_timer option. - Can you boot a kernel prior to 4.3.3 (without any additional kernel command line options)? If you can, then in that case git bisect can be used to find commit introduced this issue. (In reply to Srinivas Pandruvada from comment #18) I tried processor.max_cstate=0 and nolapic_timer separately with kernel 4.5rc4 and it still gets stuck at "intel_pstate: HWP enabled" (you see that message if you set the kernel to debug) As per your last suggestion, I tried a couple different older kernel versions, though I did not have to go far. I knew that kernel 3.19 worked without any parameters (as anders also mentioned in comment #1) because Linux Mint uses that, and I had previously been able to boot a live usb from Linux Mint. So I installed 3.19.3 on this machine and sure enough it boots without any parameters. I did not downgrade any of my other packages, so it got stuck trying to draw the gui for my display manager's login screen, but the kernel has completely booted by that point, so that should not be anything related. I then tried the next version up: 4.0.0 Again, it hung at the same spot (intel_pstate: HWP enabled), so it is possibly something that was added when going from 3.19.3 > 4.0.0 One other note: I have a windows-linux dual boot setup, so if you know of a way to find out if windows is using HWP from the windows end of things I can try that. Final Note: I just noticed that lenovo released a bios update on february 9th, so I will go install that and comment back if it affects anything. http://support.lenovo.com/ca/en/products/Laptops-and-netbooks/ThinkPad-Yoga-Series-laptops/ThinkPad-Yoga-260/downloads/DS105460 Alas, updating the BIOS has not changed anything. I tried a number of the things that were already discussed, and received the same result. The intel_pstate support for skylake was introduced after 3.19: commit 7ab0256e57ae4423fbfb6b6c1611578c634702c9 Author: Kristen Carlson Accardi <kristen@linux.intel.com> Date: Wed Jan 28 13:53:28 2015 -0800 intel_pstate: Add support for SkyLake before that, it is acpi-cpufreq, and since 4.0 supports Skylake, this problem appeared. Is there any hardware support option in your bios menu? If there is, please try to disable it. -- hardware support option ++ hardware pstate support option Created attachment 203851 [details]
Picture of Yoga 260 power config BIOS screen
These settings should be what you are looking for. Which should I try disabling?
I tried booting with CPU Power Management disabled. No change. I then tried additionally disabling Speed-Step. No Change. Well, I've no idea which option it is, since I saw this option half a year ago and it seems not to be exposed anymore. As Srinivas suggested checking if windows is using HWP, I think we can write a small application to dump the HWP register, and use windows gcc to achieve it, however I haven't ever confirmed if it works, and only find a video hope it helpful: https://www.youtube.com/watch?v=k3w0igwp-FM unsigned long long native_read_msr(unsigned int msr) { unsigned long long low, high; asm volatile("rdmsr" : "=a" (low), "=d" (high) : "c" (msr)); return ((low) | (high) << 32); } #define rdmsr(msr, low, high) \ do { \ unsigned long long __val = native_read_msr((msr)); \ (void)((low) = (unsigned int)__val); \ (void)((high) = (unsigned int)(__val >> 32)); \ } while (0) printf("msr:%x value:0x%llx\n", 0x770, rdmsr(0x770); unsigned int l, h; rdmsr(0x770, l, h); printf("msr:%x low 32b:%x, high 32b:%x\n", 0x770, l, h); I am trying to get procedure to verify HWP on Windows. When you boot windows please do the following steps: http://www.softpedia.com/dyn-postdownload.php/c913c1aa397d194b28bed06882fed70c/56c602f5/9d51/4/2?tsf=0 Install SetupRw tool There is an icon called "MSR", select this. It will dialog with some predefined MSRs. In that window select on icon called "User", it will pop a dialog. In that dialog add: MSR_770=0x770 and select "Add" MSR_771=0x771 and select "Add" MSR_772=0x772 and select "Add" MSR_773=0x773 and select "Add" MSR_774=0x774 and select "Add" MSR_777=0x777 and select "Add" This will add new lines with the values and displays for all CPUs. May be you can take a screen shot and attach. I was unable to get that RW program to work, as it kept crashing the whole system. Instead I installed the windows driver kit which included WinDbg, which has the ability to perform a live kernel debug, which gives me the rdmsr command. Long story, short, I got the values from the msr's, and it it looks like RW was crashing whenever I tried to get it to read a nonexistent msr. lkd> rdmsr 0x770 msr[770] = 00000000`00000001 lkd> rdmsr 0x771 msr[771] = 00000000`0105171c lkd> rdmsr 0x772 no such msr lkd> rdmsr 0x773 msr[773] = 00000000`00000001 lkd> rdmsr 0x774 msr[774] = 0000019e`7f001c01 lkd> rdmsr 0x775 no such msr lkd> rdmsr 0x776 no such msr lkd> rdmsr 0x777 msr[777] = 00000000`00000000 Thanks for providing info. We can't reproduce in other Yogas we have. Could you try one more thing, which you did in comment 5? Before # wrmsr ox770 1 [immediately hangs] # wrmsr 0x773 0 So the sequence will be # modprobe msr # rdmsr 0x770 0 # wrmsr 0x773 0 # rdmsr 0x773 # wrmsr ox770 1 I also attached a change which can be applied to the latest kernel, which do this in driver. Created attachment 204461 [details]
HWP Interrupt disable
(In reply to Srinivas Pandruvada from comment #30) > So the sequence will be > > > # modprobe msr > > # rdmsr 0x770 > 0 > # wrmsr 0x773 0 > # rdmsr 0x773 > > # wrmsr ox770 1 > # wrmsr 0x770 1 This still hangs and does not work. I have not tried the driver patch. We want to buy this laptop. I want to match your system to reproduce. Can you provide exact model you bought including configuration? I think "anders.ekbom@gmail.com" also had this issue. If he can also provide his system information, it will be great. I have a Thinkpad Yoga 260, type 20FD-001MXS (bought from https://www.dustin.se/product/5010891621/thinkpad-yoga-260) CPU: Intel Core i5-6200U @ 2.30GHz RAM: 8GB Will attach hardinfo report for details. Created attachment 206141 [details]
Hardinfo report for Yoga 260
Report generated using the hardinfo application in Ubuntu.
Created attachment 206191 [details]
Thinpad Yoga 260 spec sheet
Here are the specs for my model. Purchased from Lenovo Canada in mid-January.
Same problem on X1 Yoga with i7-6500U Can you try to boot with idle=nomwait and see if you can boot (without intel_pstate=no_hwp or intel_pstate=disable)? I tried and it did not boot. (In reply to geiger.mario from comment #39) > I tried and it did not boot. I will second this. I have ordered this system. But the model number is not exact match for Lenovo Yoga 260 with same processors sold in US. Hope I can reproduce. Some folks have success in booting with intel_idle.max_cstate=0. Created attachment 206651 [details]
Hang i7-6500U X1 Yoga Grub
intel_idle.cstate=0 does not solve the problem.
Maybe I did not put the parameter correctly.
I attach an photography of my grub.
Sorry, it should be max_cstate intel_idle.max_cstate=0 I fact tried actually with intel_idle.max_cstate=0 I just make the error when I write the last message sorry (see attached picture) I just tried this too, with the same result as usual: It gets stuck at boot after enabling HWP. Are you all booting just with UEFI or using the legacy CSM / BIOS? The people I see having problems booting Skylake Thinkpads, all seem to be trying to use the legacy BIOS (which is not enabled by default). I am booting with UEFI only. I have legacy CSM/BIOS disabled in the BIOS settings. Created attachment 206711 [details]
X1 Yoga BIOS parameters
This is a photography of a page of my BIOS.
Lenovo builds this system on order so can't get by overnight. Need one week to ship. I just received a brand new Thinkpad X1 Carbon generation 4 and it also exhibits exactly this issue. Booting under UEFI, the Kernel halts at "HWP Enabled". Can you check booting without hwp (intel_pstate=no_hwp): Some suggestions from CPU architects: rdmsr 0x199. If bit 6 is clear, then try to set bit 6, 7 and 12 in this MSR. Then follow steps in comment 32. If you can give me contents of 0x199, I can give you write command to set bit 6, 7, and 12. Sorry, it is not MSR 0x199 to check, it is MSR 0x1AA. rdmsr 0x1AA If bit 6 is clear, then try to set bit 6, 7 and 12 in this MSR. Then follow steps in comment 32. I also have a brand new Thinkpad X1 Carbon 4th generation, and have just installed Ubuntu Xenial with a 4.4 kernel (4.4.0-10-generic) am having this same problem. "acpi=off" allows me to boot with wireless but no audio / display brightness / suspend / etc, "nolapic" gives me audio / display brightness / etc but no wireless (iwlwifi and iwlmvm have some problem), whereas "intel_pstate=no_hwp" allows me to boot with everything working as far as I can tell. Booted with "intel_pstate=no_hwp" and running "sudo rdmsr 0x1AA" gives "401cc0" if that helps? Thanks Kevin. We are trying to root cause this. Your MSR value suggest intel_pstate should work without "no_hwp". But obviously there is some issue. I have been battling for a week with my yoga 260. These steps made mine finally be able to install opensuse tumbleweed. 1) Installed the latest bios: http://support.lenovo.com/us/en/products/laptops-and-netbooks/thinkpad-yoga-series-laptops/thinkpad-yoga-260/downloads/DS105461 2) Intel Firmware: http://support.lenovo.com/us/en/products/laptops-and-netbooks/thinkpad-yoga-series-laptops/thinkpad-yoga-260/downloads/DS112240 3) Important! Reset the PC using the hard switch (poke a blunt needle in when the computer switches on): https://support.lenovo.com/us/en/documents/pd101202 4) Install your OS with the following kernel (>=4.4) intel_pstate=no_hwp Interestingly, opensuse tumbleweed does not need this update-grub'ed after the install. Every time you run windows you have to hard reset the laptop otherwise the whole experience gets a bit seat of the pants (lots of network and psmouse errors spewing out to dmesg and a refusal of the touchscreen or pen to work). Also the hotkey events change. Tbh windows 10 itself is a bit flaky on the current bios, so I would expect to see a few updates very soon. Oh FYI I tried, Ubuntu 15.10(failed), 16.10(always needed no_hwp) & Debian Testing(always needed no_hwp), Fedora 23(no network even with the dongle) & Rawhide (always needed no_hwp) and Arch (ouch). Only real success so far was tumbleweed (writing on it now). cpuid | grep 'hardware P-State control' hardware P-State control = false hardware P-State control = false hardware P-State control = false hardware P-State control = false on opensuse, so I guess my initial install parameters 'stuck' somewhere. (In reply to Dougie Murray from comment #56) > I have been battling for a week with my yoga 260. > These steps made mine finally be able to install opensuse tumbleweed. > ... I've been running Arch with the intel_pstate=no_hwp flag for the last month and a half without issue (aside from needing the flag, which I just added to grub as the default). I did not do any of your other steps. All hardware worked out of the box once I installed the correct drivers. Also, Srinivas: My MSR 0x1AA has the same value as Kevin's, 0x401cc0, so bits 6, 7 and 12 are already set. That's great Rowan. Just out of interest what drivers did you install to get the iio recognized (accelerometer, compass) and the fingerprint reader? I kind of got laptop to switch to tablet intermittently using modified versions of Sagrland scripts. I think I will RMA this unit, if everything is working with you. Hi Dougie Murray, When you boot your system with intel_pstate=no_hwp What is the value of rdmsr 0x773 (You need msr-tools package for rdmsr) If I run "sudo rdmsr 0x773" then I get "0". I also get the same result as Dougie Murray for "cpuid | grep 'hardware P-State control'" (i.e. all "false"). (In reply to Dougie Murray from comment #60) > That's great Rowan. > Just out of interest what drivers did you install to get the iio recognized > (accelerometer, compass) and the fingerprint reader? > I kind of got laptop to switch to tablet intermittently using modified > versions of Sagrland scripts. > I think I will RMA this unit, if everything is working with you. I should have said "all basic hardware" Those three components I have not got working, or maybe they do work and I don't have anything set up to use them. I have been very busy with school so I haven't had time to make hardware that is not required for basic usage work on linux yet. If you want to discuss more, click on my name and send me an email - we don't want to use this bug report to discuss things that are not related to the bug :) I involved CPU architects, so please be patient with my requests as I pass on. Can anyone send cat /proc/cpuinfo rdmsr 0x19C rdmsr 0x19B rdmsr 0x64F Note that this is a dual-core CPU which is hyperthreaded. root@rivendell:~# cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 78 model name : Intel(R) Core(TM) i7-6500U CPU @ 2.50GHz stepping : 3 microcode : 0x74 cpu MHz : 532.492 cache size : 4096 KB physical id : 0 siblings : 4 core id : 0 cpu cores : 2 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch ida arat epb pln pts dtherm hwp hwp_notify hwp_act_window hwp_epp intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 bugs : bogomips : 5183.82 clflush size : 64 cache_alignment : 64 address sizes : 39 bits physical, 48 bits virtual power management: processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 78 model name : Intel(R) Core(TM) i7-6500U CPU @ 2.50GHz stepping : 3 microcode : 0x74 cpu MHz : 459.062 cache size : 4096 KB physical id : 0 siblings : 4 core id : 1 cpu cores : 2 apicid : 2 initial apicid : 2 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch ida arat epb pln pts dtherm hwp hwp_notify hwp_act_window hwp_epp intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 bugs : bogomips : 5183.82 clflush size : 64 cache_alignment : 64 address sizes : 39 bits physical, 48 bits virtual power management: processor : 2 vendor_id : GenuineIntel cpu family : 6 model : 78 model name : Intel(R) Core(TM) i7-6500U CPU @ 2.50GHz stepping : 3 microcode : 0x74 cpu MHz : 400.664 cache size : 4096 KB physical id : 0 siblings : 4 core id : 0 cpu cores : 2 apicid : 1 initial apicid : 1 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch ida arat epb pln pts dtherm hwp hwp_notify hwp_act_window hwp_epp intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 bugs : bogomips : 5183.82 clflush size : 64 cache_alignment : 64 address sizes : 39 bits physical, 48 bits virtual power management: processor : 3 vendor_id : GenuineIntel cpu family : 6 model : 78 model name : Intel(R) Core(TM) i7-6500U CPU @ 2.50GHz stepping : 3 microcode : 0x74 cpu MHz : 399.953 cache size : 4096 KB physical id : 0 siblings : 4 core id : 1 cpu cores : 2 apicid : 3 initial apicid : 3 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch ida arat epb pln pts dtherm hwp hwp_notify hwp_act_window hwp_epp intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 bugs : bogomips : 5183.82 clflush size : 64 cache_alignment : 64 address sizes : 39 bits physical, 48 bits virtual power management: root@rivendell:~# rdmsr 0x19C 883c2800 root@rivendell:~# rdmsr 0x19B 3 root@rivendell:~# rdmsr 0x64F 1d000000 Created attachment 208051 [details]
/proc/cpuinfo
Since my output was different, here is mine. Attached is the cpuinfo, and the msr values where:
$ sudo rdmsr 0x19C
88400000
$ sudo rdmsr 0x19B
3
$ sudo rdmsr 0x64F
10200000
Cool I get the same as most on msd and will try to report bugs to the appropriate projects. # rdmsr 0x1AA 401cc0 Hi all. I've been following this thread and decided to create a user to contribute with my two cents. I own a Thinkpad Yoga 260 and I wanted to install Mint linux 17.3. With the default kernel 3.19 I had no wifi so I updated to the last stable 4.4.3-040403 (at the time). With this, I was not able to get past GRUB without using no_hwp, and doing so gave me glitchy graphics. I found that the latest supported kernel by Mint was 4.2.0-30, so I installed this kernel and it runs with no glitchy graphics and wifi also works. I have it installed besides the OEM Windows 10 which came with the laptop. My "only" issue is that sometimes I get back a black screen after continuing after hibernation, and sometimes hibernation will not work (black screen, but does not power off) when using an external monitor. Created attachment 208221 [details]
Lenovo X1 Carbon 4th /proc/cpuinfo & MSRs 0x19c, 0x19b, 0x64f
Srinivas, here you go.
Finally I got my system and I see freeze. By booting in intel_pstate=no_hwp mode, root@otcpl-ubuntu-test:/home/testuser# rdmsr 0x1aa 401cc0 You may have different value than me. Clear bit 7 and write root@otcpl-ubuntu-test:/home/testuser# wrmsr 0x1aa 0x401c40 root@otcpl-ubuntu-test:/home/testuser# rdmsr 0x1aa 401c40 root@otcpl-ubuntu-test:/home/testuser# wrmsr 0x770 0x01 root@otcpl-ubuntu-test:/home/testuser# rdmsr 0x770 1 System doesn't freeze now. Can anybody try these steps? Created attachment 208331 [details]
Test patch to fix system freeze
If anybody wants to give a try with the attached patch (attached with comment 72), it will be great. Hi Srinivas, I'd love to try, and likely this is quite a stupid question, but when I run "sudo rdmsr 0x1aa" I get "rdmsr: open: No such file or directory" Uodate. Setting the register values as specified did not work for me (but I'm running kernel 4.2.0-30). @Srinivas Pandruvada I tried wrmsr 0x1aa 0x401c40 && wrmsr 0x770 0x01 and the system did not freeze, on ubuntu 16.04 uname -a Linux yoga260 4.4.0-11-generic #26-Ubuntu SMP Sat Mar 5 14:25:21 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux processor i7-6500U register rdmsr 0x1aa looked exactly as yours, 401cc0 I did not compile the kernel to test the patch thought. This patch seems to fix the issue with Thinkpad T460s freezing on lid close while on battery power as well, that is it seems to solve bug 113551. (In reply to Srinivas Pandruvada from comment #73) > If anybody wants to give a try with the attached patch (attached with > comment 72), it will be great. Srinivas, thanks for the patch. I tested a on a Lenovo X1 Carbon 4th Generation and the kernel boots successfully 50% of the time. AFAICT bit 7 on msr 0x1AA is publicly undocumented, and I thought maybe the value was only to be cleared once on power cycle so I modified the patch to only write the bit if it was cleared. That didn't work either ... P. (In reply to Srinivas Pandruvada from comment #71) > System doesn't freeze now. > Can anybody try these steps? I got the same value for rdmsr 0x1aa and it does not freeze. (In reply to Prarit Bhargava from comment #78) > (In reply to Srinivas Pandruvada from comment #73) > > If anybody wants to give a try with the attached patch (attached with > > comment 72), it will be great. > > Srinivas, thanks for the patch. I tested a on a Lenovo X1 Carbon 4th > Generation and the kernel boots successfully 50% of the time. > > AFAICT bit 7 on msr 0x1AA is publicly undocumented, and I thought maybe the > value was only to be cleared once on power cycle so I modified the patch to > only write the bit if it was cleared. That didn't work either ... > > P. I've put the system into a tight reboot loop with kernel parameter "intel_pstate=no_hwp" and added # default is 0x401cc0 wrmsr 0x1aa 0x401c40 rdmsr 0x1aa wrmsr 0x770 0x01 rdmsr 0x770 to /etc/rc.local I'll leave it this way for a few hours to see if anything pops out. Each reboot is less than a minute ... P. So the userspace reboot test in comment #80 worked 100% of the time. I put printk's into the intel_pstate_hwp_enable() function to read back the values of 0x1AA & 0x770 to confirm the values that were set, and was surprised that the reboot test worked 100% of the time. Srinivas -- this also appears to work (applied ON TOP of your patch), but I cannot give a good argument as to why it works. The only thing I can think of is that writing the two MSRs back-to-back can cause this problem and that points to some sort of hardware settling/configuration delta ... any ideas? diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c index a460f51..0f4a9ba 100644 --- a/drivers/cpufreq/intel_pstate.c +++ b/drivers/cpufreq/intel_pstate.c @@ -27,6 +27,7 @@ #include <linux/debugfs.h> #include <linux/acpi.h> #include <linux/vmalloc.h> +#include <linux/delay.h> #include <trace/events/power.h> #include <asm/div64.h> @@ -538,6 +539,7 @@ static void intel_pstate_hwp_enable(struct cpudata *cpudata) if (!err) { val &= ~BIT(7); wrmsrl_on_cpu(cpudata->cpu, MSR_MISC_PWR_MGMT, val); + mdelay(1); // hardware settle? } wrmsrl_on_cpu(cpudata->cpu, MSR_PM_ENABLE, 0x1); } P. Good observation. I think there is a race condition. Let me check with CPU architects. Is this really a kernel bug, or is this a BIOS/FW bug that we're fixing with a kernel patch? If it is a BIOS/FW bug then I'd like to see a check for (val & BIT(7)) and a pr_warn(FW_BUG "intel_pstate: HWP enabled but bit 7 on 0x1AA is set. Please contact your hardware vendor for an updated BIOS\n"); otherwise this bug will never get fixed by vendor's BIOS. P. Comment on attachment 208331 [details]
Test patch to fix system freeze
Not valid.
Comment on attachment 208331 [details]
Test patch to fix system freeze
not valid
Comment on attachment 208331 [details] Test patch to fix system freeze >From f0907c7bbd0f61dae7dabc0a105ce5080cfc10fc Mon Sep 17 00:00:00 2001 >From: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> >Date: Tue, 8 Mar 2016 21:43:58 -0800 >Subject: [PATCH] cpufreq: intel_pstate: Fix system freeze on HWP > >Fix system freeze on enabling Hardware P States (HWP) on some systems. > >Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> >--- > drivers/cpufreq/intel_pstate.c | 8 ++++++++ > 1 file changed, 8 insertions(+) > >diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c >index cd83d47..a460f51 100644 >--- a/drivers/cpufreq/intel_pstate.c >+++ b/drivers/cpufreq/intel_pstate.c >@@ -531,6 +531,14 @@ static void __init intel_pstate_sysfs_expose_params(void) > > static void intel_pstate_hwp_enable(struct cpudata *cpudata) > { >+ int err; >+ u64 val; >+ >+ err = rdmsrl_on_cpu(cpudata->cpu, MSR_MISC_PWR_MGMT, &val); >+ if (!err) { >+ val &= ~BIT(7); >+ wrmsrl_on_cpu(cpudata->cpu, MSR_MISC_PWR_MGMT, val); >+ } > wrmsrl_on_cpu(cpudata->cpu, MSR_PM_ENABLE, 0x1); > } > >-- >1.9.1 > No the debugging is still going on, so solution or conclusion yet. I was told that kernel patch is not valid and allowed. So we can't use it. *** Bug 113351 has been marked as a duplicate of this bug. *** My ThinkPad X260 (i7-6600U) is also affected. Settings intel_pstate=no_hwp helps currently. Created attachment 209371 [details]
Redirect thermal interrupt to OS
Check the commit log for the analysis of the freeze. Please check without any kernel command line option "intel_pstate=np_hwp". This patch can be applied on top of 4.5 kernel, but wouldn't be difficult to backport.
If it doesn't work, boot with intel_pstate=np_hwp
dmesg | grep -i lvt
You should see
[ 0.203107] acpi_set_hwp_native_thermal_lvt_osc
[ 0.204813] acpi_set_hwp_native_thermal_lvt_osc successful
[ 0.204817] acpi_set_hwp_native_thermal_lvt_osc
[ 0.204820] acpi_set_hwp_native_thermal_lvt_osc
[ 0.204822] acpi_set_hwp_native_thermal_lvt_osc
[ 0.204824] acpi_set_hwp_native_thermal_lvt_osc
[ 0.204826] acpi_set_hwp_native_thermal_lvt_osc
[ 0.204828] acpi_set_hwp_native_thermal_lvt_osc
[ 0.204830] acpi_set_hwp_native_thermal_lvt_osc
Also please remove the patch I suggested in comment 72, if you are using as that patch is not a correct one. The patch suggested in comment 90, should hopefully fix for all. This new patch seems to also fix the issue with T460s freezing on lid close while on battery power as mentioned in bug 113551. Posted patch to mailing list. So please try. https://patchwork.kernel.org/patch/8606621/ (In reply to Srinivas Pandruvada from comment #93) > Posted patch to mailing list. So please try. > > https://patchwork.kernel.org/patch/8606621/ With the above patch I am able to boot (2016 Thinkpad X1 Carbon) without "intel_pstate=np_hwp" or any other extra kernel command-line args. So far so good. Let me know if you need any additional info. (In reply to Srinivas Pandruvada from comment #90) > Created attachment 209371 [details] > Redirect thermal interrupt to OS > > Check the commit log for the analysis of the freeze. Please check without > any kernel command line option "intel_pstate=np_hwp". This patch can be > applied on top of 4.5 kernel, but wouldn't be difficult to backport. > If it doesn't work, boot with intel_pstate=np_hwp > > dmesg | grep -i lvt > > You should see > > [ 0.203107] acpi_set_hwp_native_thermal_lvt_osc > [ 0.204813] acpi_set_hwp_native_thermal_lvt_osc successful > [ 0.204817] acpi_set_hwp_native_thermal_lvt_osc > [ 0.204820] acpi_set_hwp_native_thermal_lvt_osc > [ 0.204822] acpi_set_hwp_native_thermal_lvt_osc > [ 0.204824] acpi_set_hwp_native_thermal_lvt_osc > [ 0.204826] acpi_set_hwp_native_thermal_lvt_osc > [ 0.204828] acpi_set_hwp_native_thermal_lvt_osc > [ 0.204830] acpi_set_hwp_native_thermal_lvt_osc Srinivas, with this patch the 4th generation Carbon X1 boots. There was an additional report of latest upstream (latest graphics drivers) + intel HWP causing a system hang when using external displays. I am trying to get additional testing in that configuration to see if this patch resolves those issues as well. P. (In reply to Srinivas Pandruvada from comment #93) > Posted patch to mailing list. So please try. > > https://patchwork.kernel.org/patch/8606621/ I tried compiling the 4.5 kernel with this patch under Mint 17.3. I was still not able to boot without the intel_pstate=no_hwp flag. The fan starts spinning like crazy. After booting still using the no_hwp parameter, I checked dmesg and there is no mention of "lvt". It is quite possible I've done something wrong when compiling the kernel, as it is the first time I've ever done such a thing. - Martin (In reply to Prarit Bhargava from comment #95) > > Srinivas, with this patch the 4th generation Carbon X1 boots. There was an > additional report of latest upstream (latest graphics drivers) + intel HWP > causing a system hang when using external displays. I am trying to get > additional testing in that configuration to see if this patch resolves those > issues as well. Srinivas, the external display test passes as well. Was it really all that the thermal interrupt was firing and no handler was available, or was it an unhandled interrupt flood? /me wants to know so he can review your patch ;) P. Prarit, These systems uses config TDP feature and set the value to low to prevent thermal issues. So whenever some processing can't meet the TDP setting, the processor (HWP processing) will act and notify that it took action. This is via thermal interrupt. If the BIOS is having issue in handling this interrupt, this freeze can happen any time (BIOS acts as default handler of thermal interrupts in SKL). This was coincident that the reporters of this bug experienced freeze during boot (boot in Linux is aggressive), but could have happened anytime, even if they were able to boot. @martin Try the patch attached to this bugzilla at comment 90. This has debug messages. The upstream patch will not print debug messages. After applying patch make -j4 bzImage, make -j4 modules sudo make install, sudo make -j4 modules_install (In reply to Srinivas Pandruvada from comment #99) > @martin > Try the patch attached to this bugzilla at comment 90. This has debug > messages. The upstream patch will not print debug messages. > After applying patch > make -j4 bzImage, make -j4 modules > sudo make install, sudo make -j4 modules_install Thanks to your e-mail and assistance I managed to compile the kernel. I can boot it without using the no_hwp paramter. However, I am experiencing glitchy graphics which does not happen in e.g. 4.2.0 using no_hwp or in kernel 3.19. Thanks for the assistance! Hello Srinivas, Do you think your patch would be accepted if you made it depend on a boot parameter? e.g. intel_pstate.thinterrupts=bios would be the default, preserving the existing behaviour intel_pstate.thinterrupts=os would use your patch and pass control to the OS (I'm not sure what the best names for the parameters would be) This is already accepted. I think this will be in 4.6.-rc2. (In reply to Srinivas Pandruvada from comment #102) > This is already accepted. I think this will be in 4.6.-rc2. That's great news :) Could you link to the discussion/bug where it's accepted, please? Would the accepted patch work on 4.4.6+? For context: I'd like to get it into the Ubuntu Xenial (16.04) kernel (which is 4.4 based and has many if not all patches from 4.4.6 in Ubuntu's version 4.4.0-15.31) before Xenial comes out next month, if possible. Hi Russell, It is in ACPI maintainer's tree https://git.kernel.org/cgit/linux/kernel/git/rafael/linux-pm.git/commit/?h=bleeding-edge&id=42341f87ba1bee4c5be95038c24abb69cbcf361a It should work for any 4.4 kernel. I will submit for stable tree also. Hello Srinivas, Did you get a chance to submit this for the 4.4 kernel (or do you mean the 4.5 kernel? the kernel.org front page is a bit confusing about which of 4.4 and 4.5 is stable)? Hi Russell, Not yet submitted to stable. The maintainer for ACPI and power wants to keep this patch in 4.6.rc for couple of weeks, to check if there are any regressions. commit a21211672c9a1d730a39aa65d4a5b3414700adfb Author: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Date: Wed Mar 23 21:07:39 2016 -0700 ACPI / processor: Request native thermal interrupt handling via _OSC Patch has been shipped in 4.6-rc2. just as a follow-up. lenovo released a new bios 1.40 http://support.lenovo.com/cz/cs/products/Laptops-and-netbooks/ThinkPad-Yoga-Series-laptops/ThinkPad-Yoga-260/downloads/DS105461 which updates cpu microcode to 0x84, and does not solve anything. "intel_pstate=no_hwp" is still required. ubuntu released a new kernel 4.4.0-18 for 16.04 LTS which includes this patch http://changelogs.ubuntu.com/changelogs/pool/main/l/linux/linux_4.4.0-18.34/changelog https://bugs.launchpad.net/intel/+bug/1559923 this fixes the issue, and no extra kernel params are required anymore. so the patch is getting handful of testers now. -- big thanks to everyone involved |
Created attachment 200221 [details] Boot sequence On a Thinkpad Yoga 260 with an i5-6200U CPU, Ubuntu 15.10 with kernel 4.4.0-040400-generic stops during the boot process on the message "Intel_pstate: HWP enabled" Adding "intel_pstate=no_hwp" to /etc/default/grub or as a startup parameter solves the problem, as well as "intel_pstate=disabled".