Created attachment 196761 [details] Kernel configuration Got a new MSI laptop after my previous one died, it's a GE72 6QF and it comes with a Intel i7 6700HQ processor which seems to cause trouble to the linux kernel. If I try to boot without any options, the kernel freeze nearly instantly, I just have the time to see a few lines on the screen and everything goes dark, no way to use sysrq to reboot, I have to do a hardware off using the power button. I manage to boot using either one of the acpi=off or nolapic options, corresponding logs are attached. With acpi=off, all the core of my processor seems to be correctly detected and usable, but I'm afraid that as power management is off, cooling might not work correctly and therefor it's not safe to run like this. With nolapic, obviously only one core of the processor is detected and usable, which make the thing really slow, but everything else seems to work correctly. I also tried various other options : * acpi=ht * acpi=strict * acpi=noirq * pci=noacpi * pnpacpi=off * noapic * lapic=notscdeadline * acpi_osi=Linux * acpi_os_name="Windows 2015" (the laptop comes with windows 10) * acpi.power_nocheck=1 Most of them doesn't change anything, with some of them I had a few more lines before the kernel freeze, but it never reach the point where the root filesystem was mounted and so I have no log. There is some options in the bios (which is up to date, I flashed it to the latest version) related to hypthreading and power states, I tried playing with them but it doesn't seems to change anything so I left them to there default values. I had the same kind of problems with my two previous laptop (coming also from MSI) but was able to fix them quickly by editing the DSDT. But it was 4 and 6 years ago and now fixing this seems to be out of my reach (I can't even get the damn DSDT decompiled... some of the acpi tables makes iasl segfault). Here I attache the kernel config, the log files, the output of lspci and dmidecode and the acpi tables. Feel free to ask me to try other configuration or boot options, or providing more log or anything, I will try to provides them as fast as possible (I need the thing to do my work, and I'm pretty inefficient with windows).
Created attachment 196771 [details] Log file with acpi=off
Created attachment 196781 [details] Log file with nolapic
Created attachment 196791 [details] Output of lspci -vvnn
Created attachment 196801 [details] Output of dmidecode
Created attachment 196811 [details] ACPI tables
Created attachment 196821 [details] Cpuinfo with acpi=off
Created attachment 196831 [details] Cpuinfo with nolapic
Created attachment 197351 [details] Kernel Configuration for 4.4-rc5
Created attachment 197361 [details] dmesg with kernel 4.4-rc5 and acpi=off
Created attachment 197371 [details] dmesg with kernel 4.4-rc5 and nolapic
I investigated this more in depth this week-end. First, there was a EC firmware update on MSI website that I missed when I flashed the bios, so I applied it, but it haven't change a single thing, not even the ACPI tables. I also switched to the latest 4.4-rc5 kernel version and did a full review of the configuration, the new configuration is given in previous attachements, along with the new dmesg for boot options acpi=off and nolapic. I tried to use the microcode update module of the kernel, but it seems that the latest intel microcode data package does not contain the microcode for this processor, when running iucode_tool to generate an initrd file, I get iucode_tool -S --write-earlyfw=/boot/ucode.cpio /lib/firmware/intel-ucode/* iucode_tool: system has processor(s) with signature 0x000506e3 iucode_tool: No valid microcodes were selected, nothing to do... I have been able to finaly decompile the ACPI tables, which allowed me to make a guess on what the problem might be. There is two error in the kernel log when using nolapic that seems big enought to me to result in real troubles. First error is [ 0.000027] ACPI: Core revision 20150930 [ 0.019261] ACPI Error: [\_SB_.PCI0.XHC_.RHUB.HS11] Namespace lookup failure, AE_NOT_FOUND (20150930/dswload-210) [ 0.019268] ACPI Exception: AE_NOT_FOUND, During name lookup/catalog (20150930/psobject-227) [ 0.019294] ACPI Exception: AE_NOT_FOUND, (SSDT:xh_rvp11) while loading table (20150930/tbxfload-193) [ 0.026871] ACPI Error: 1 table load failures, 9 successful (20150930/tbxfload-214) The error itself appears to happen when parsing the SSDT5 table (xh_rvp11) which seems to be related to USB/XHCI. It says that the device HS11 is not found, but when I decompiled the SSDT5 table, HS11 is resolved as an external DeviceObj, coming from the DSDT probably. iasl has no problem to compile this file back without any errors. I guess the error happens because in the DSDT, the HS11 Device is created inside an If at the root of the table (meaning not in any Scope or Method, or nothing) : If (LEqual (PCHV (), SPTH)) { Scope (_SB.PCI0.XHC.RHUB) { Device (HS11) { Name (_ADR, 0x0B) // _ADR: Address Device (CAM0) It seems to be related to the laptop webcam by the way (the webcam appears as Bus 001 Device 011 in lsusb), which can be deactived by a switch and is deactivated at boot time. The PCHV Method is declared at the root of the table also Name (SPTH, One) Name (SPTL, 0x02) Method (PCHV, 0, NotSerialized) { If (LEqual (PCHS, One)) { Return (SPTH) /* \SPTH */ } If (LEqual (PCHS, 0x02)) { Return (SPTL) /* \SPTL */ } Return (Zero) } PCHS appears in OperationRegion (PNVA, SystemMemory, PNVB, PNVL) Field (PNVA, AnyAcc, Lock, Preserve) { RCRV, 32, PCHS, 16, PCHG, 16, also at the root of the table. If I get it correctly, it means that PCHS is some value that is read from the memory and/or hardware ? So it might be possible that it is not already initialized when the DSDT table is loading ? Or if it corresponds to the activation status of the webcam, it might be deactivated and being SPTL. It would make the device not created when parsing the DSDT, and result in the error later when parsing the SSDT5 table. For the second error, it might be the same kind of problem. [ 0.204210] ACPI : EC: EC description table is found, configuring boot EC [ 0.204224] ACPI : EC: EC started [ 0.213212] ACPI Error: [^^^PEG0.PEGP.EASP] Namespace lookup failure, AE_NOT_FOUND (20150930/psargs-359) [ 0.213218] ACPI Error: Method parse/execution failed [\_SB.PCI0.LPCB.EC._REG] (Node ffff8804730d2af0), AE_NOT_FOUND (20150930/psparse-542) [ 0.213231] ACPI : EC: Fail in evaluating the _REG object of EC device. Broken bios is suspected. [ 0.217187] ACPI Error: [^^^PEG0.PEGP.EASP] Namespace lookup failure, AE_NOT_FOUND (20150930/psargs-359) [ 0.217192] ACPI Error: Method parse/execution failed [\_SB.PCI0.LPCB.EC._REG] (Node ffff8804730d2af0), AE_NOT_FOUND (20150930/psparse-542) Apparently the ECDT is loading correctly and when activating it, using the _REG method of the EC device defined in the DSDT, it fails because EASP is not found. When I decompile the DSDT, using the SSDTs as external tables with the -e switch, ^^^PEG0.PEGP.EASP is resolved as an external _SB_.PCI0.PEG0.PEGP.EASP of type UnknownObj. I think it comes from the SSDT6 table (SaSsdt), which seems to be related mostly to PCI and graphical devices. Note that I can't compile back neither the DSDT nor the SSDT6 table because when decompiling it there are some unresolved external methods. In the SSDT6 table, EASP is defined inside an If at the root of the table If (CondRefOf (\_SB.PCI0.PEG0.PEGP)) { Scope (\_SB.PCI0.PEG0.PEGP) { OperationRegion (PCIS, PCI_Config, Zero, 0x0100) Field (PCIS, AnyAcc, NoLock, Preserve) { PVID, 16, PDID, 16, Offset (0x88), EASP, 2, \_SB.PCI0.PEG0.PEGP is resolved as a external DeviceObj, which is defined in the DSDT, without any condition this time. I have no idea if my reasoning is correct, and/or if it might help solve the problem, and how if so. But again, if you need more information I'm still available.
Add Lv. Lv, Can you please take a look at this? There are some ACPI errors, but most of them shouldn't make the system un-bootable, except the EC _REG failure where ECDT is involved(but I'm not sure).
I'm have the same problem on my new MSI GE62 6QF. When the booting process is freezes I can see next message on the screen: EC: Fail in evaluating the _REG object of EC device. Broken bios is suspected.
Created attachment 197541 [details] dmesg log I was able to run ARCH Linux live CD (with no freezes and get a log) with the next BIOS settings: SpeedStep: Enabled or Disabled Boot mode: Legacy (only) linux kernel settings: by default for Arch Linux live CD Hoping it will help to troubleshoot and error.
Hi, I tried to play with the bios options previously, and it didn't changed anything, but I couldn't remember if I did it before I flashed the bios to the newest version or not. The latest bios version enable more options about the CPU, they are : * SpeedStep (enabled by default) * Virtualization (enabled by default) * HyperThreading (enabled by default) * C-states (enabled by default) * VT-d (disable by default) So I reverted all the bios settings to default, removed FastBoot and put back BootMode to "UEFI with CSM" (I will need UEFI to boot windows as long as I don't have the linux kernel and nvidia drivers working), and configured UEFI to boot on the external hard drive if present. Then I played with the 3 options that was most likely to had something to do with this bug (SpeedStep, HyperThreading and C-states). The conclusion is that whatever SpeedStep and HyperThreading are set to, if C-states are enabled, the kernel freeze, and if they are disabled, I can boot either on 4.4-rc5 or 4.3.3-gentoo without any acpi or lapic options. If I boot with C-states disabled and both SpeedStep and HyperThreading enabled, the two later seem to work perfectly: I have the 8 virtual cores detected and working, and the conservative cpufreq policy seems to work correctly as I monitored the cpu frequency during a build of the kernel and it was stepping from 800MHz to 2600MHz with a lot of in-between values separetly on all the virtual cores. The kernel logs on 4.4-rc5 seems to be the same as previously, with the sames ACPI errors still occuring. As a conclusion, I would say the freeze are most likely caused by the C-states, and I can live without them for now, even if it would be better to have them working (this laptop battery is already quite short). If you have some patches to test, I will try them, and if you need some informations, I will provide them too. Now I have to install the nvidia-drivers (hoping that the _REG error will not make this driver failling as it is something related to the graphical devices) to see if I can switch completly to linux ^^
One option may be worth a try: intel_idle.max_cstate=0, this disable the default intel_idle driver and fallback to ACPI idle driver.
Enabling back the C-states in the bios and booting with the kernel parameter intel_idle.max_cstate=0 is working perfectly as far as I see. So the bug causing the freezes is definetly inside the intel cpuidle module apparently.
Thanks for the test, I'll move the bug to intel_idle.
CC Yu, he might also want to take a look.
I have msi gs40 6qe (the same i7 6700hq). Fortunately for me I had more time to run ArchLinux until kernel freezes so I even thought it because of running xorg. I didn't even notice kernel freeze without xorg( https://bbs.archlinux.org/viewtopic.php?pid=1585836 ) Here's mine https://gist.github.com/Deathangel908/8d654e7575314b3aabc3 dmesg, and https://gist.github.com/anonymous/5962155853d36ff40c7b dmidecode.
There's a chance the C-state issues in your Skylake systems are not caused by kernel bugs, but rather by falty CPU microcode. You need to run a very up-to-date kernel for Skylake -- often more up-to-date than what is available in stable/LTS distros -- as well as very up-to-date CPU microcode in the BIOS/UEFI -- often more recent than what is available from your system vendor!! Up-to-date Skylake microcode will be revision 0x56 or higher at the moment. You will notice Intel is *not* distributing any Skylake microcode updates on the public Linux microcode distribution yet, so it depends solely on your system BIOS/UEFI. Linux seems to run fine most of the time with microcode 0x49 and newer, but this is in no way certain. We know from reports that Windows 10 requires microcode newer than that to be able to run several software packages and to avoid crashing -- and that might apply to Linux just as well. Still, this does *not* rule out the possibility of a Linux kernel bug, the same way it does not rule out a firmware bug since the ACPI tables in that MSI laptop are not to be trusted. It does mean MSI owns you a BIOS/UEFI update based on the old microcode reported in /proc/cpuinfo, though.
On vacation, expect no response from me, sorry for the inconvenience.
I am having the same problem on a late 2015 Dell XPS 13 (9350) that also has the skylake chipset. I have the latest Dell BIOS update (1.1.7) resulting in microcode revision 0x5e(according to /proc/cpuinfo), and I am running ARCH with more or less HEAD of torvalds (a881643, somewhere between 4.4-rc6 and 4.4-rc7). The intel_idle.max_cstate=0 argument suppresses my problem, so I believe this is the same bug. If updating to rc7, which is compiling as we speak, fixes the problem I will post a update, so, unless I say anything, presume the problem persists in rc7. I am happy to provide more information if anyone can think of something that can help resolve the problem.
(In reply to Johannes Larsen from comment #23) > the skylake chipset. I have the latest Dell BIOS update (1.1.7) resulting in > microcode revision 0x5e(according to /proc/cpuinfo), and I am running ARCH > with more or less HEAD of torvalds (a881643, somewhere between 4.4-rc6 and > 4.4-rc7). Thanks for the report. This likely means we do have an intel-idle issue, instead of a firmware or cpu microcode issue.
Hi, Wendy, do we have a i7 6700hq in hand ? thanks. Yu
(In reply to Chen Yu from comment #25) > Hi, Wendy, > do we have a i7 6700hq in hand ? thanks. > Yu We have SKL I7 6700 CPU, but it was installed on the reference platform board, not product from market
The exact same behavior happens with an Asus GL552V, the bug is not specific to the MSI GE72 6QF it seems. Same CPU, exact same problem with 4.4. acpi=off is enough to boot There are some logs and information on the system here: https://bbs.archlinux.org/viewtopic.php?id=206790 (Jump straight to the logs since the beginning its just me complaining about the lack of HID drivers and telling how newer kernels don't work) I'll try to attach the logs I get for you to be able to tell whether is the same bug or a different one. I'm sorry but I am not competent enough to be able to tell. Ask for any information you need :)
Created attachment 198431 [details] dmesg w/ acpi=off on GL552V
Created attachment 198441 [details] dmidecode w/ acpi=off on GL552V
Created attachment 198451 [details] lscpu w/ acpi=off on GL552V
(In reply to Ludovic Magerand from comment #11) > I investigated this more in depth this week-end. > > First, there was a EC firmware update on MSI website that I missed when I > flashed the bios, so I applied it, but it haven't change a single thing, not > even the ACPI tables. > > I also switched to the latest 4.4-rc5 kernel version and did a full review > of the configuration, the new configuration is given in previous > attachements, along with the new dmesg for boot options acpi=off and nolapic. > > I tried to use the microcode update module of the kernel, but it seems that > the latest intel microcode data package does not contain the microcode for > this processor, when running iucode_tool to generate an initrd file, I get > iucode_tool -S --write-earlyfw=/boot/ucode.cpio /lib/firmware/intel-ucode/* > iucode_tool: system has processor(s) with signature 0x000506e3 > iucode_tool: No valid microcodes were selected, nothing to do... > > I have been able to finaly decompile the ACPI tables, which allowed me to > make a guess on what the problem might be. > There is two error in the kernel log when using nolapic that seems big > enought to me to result in real troubles. > > First error is > [ 0.000027] ACPI: Core revision 20150930 > [ 0.019261] ACPI Error: [\_SB_.PCI0.XHC_.RHUB.HS11] Namespace lookup > failure, AE_NOT_FOUND (20150930/dswload-210) > [ 0.019268] ACPI Exception: AE_NOT_FOUND, During name lookup/catalog > (20150930/psobject-227) > [ 0.019294] ACPI Exception: AE_NOT_FOUND, (SSDT:xh_rvp11) while loading > table (20150930/tbxfload-193) > [ 0.026871] ACPI Error: 1 table load failures, 9 successful > (20150930/tbxfload-214) > > The error itself appears to happen when parsing the SSDT5 table (xh_rvp11) > which seems to be related to USB/XHCI. > It says that the device HS11 is not found, but when I decompiled the SSDT5 > table, HS11 is resolved as an external DeviceObj, coming from the DSDT > probably. > iasl has no problem to compile this file back without any errors. > > I guess the error happens because in the DSDT, the HS11 Device is created > inside an If at the root of the table (meaning not in any Scope or Method, > or nothing) : > If (LEqual (PCHV (), SPTH)) > { ACPICA upstream has a commit to play with such kind of module level code. I'm working to correct it and enable it for Linux. Maybe you can wait a while and try again 4.6 kernels. > Scope (_SB.PCI0.XHC.RHUB) > { > Device (HS11) > { > Name (_ADR, 0x0B) // _ADR: Address > Device (CAM0) > It seems to be related to the laptop webcam by the way (the webcam appears > as Bus 001 Device 011 in lsusb), which can be deactived by a switch and is > deactivated at boot time. > The PCHV Method is declared at the root of the table also > Name (SPTH, One) > Name (SPTL, 0x02) > Method (PCHV, 0, NotSerialized) > { > If (LEqual (PCHS, One)) > { > Return (SPTH) /* \SPTH */ > } > > If (LEqual (PCHS, 0x02)) > { > Return (SPTL) /* \SPTL */ > } > > Return (Zero) > } > PCHS appears in > OperationRegion (PNVA, SystemMemory, PNVB, PNVL) > Field (PNVA, AnyAcc, Lock, Preserve) > { > RCRV, 32, > PCHS, 16, > PCHG, 16, > also at the root of the table. > If I get it correctly, it means that PCHS is some value that is read from > the memory and/or hardware ? > So it might be possible that it is not already initialized when the DSDT > table is loading ? Or if it corresponds to the activation status of the > webcam, it might be deactivated and being SPTL. It would make the device not > created when parsing the DSDT, and result in the error later when parsing > the SSDT5 table. Current ACPICA's AML interpreter won't execute the above "If" block before loading SSDT5. > > For the second error, it might be the same kind of problem. > [ 0.204210] ACPI : EC: EC description table is found, configuring boot EC > [ 0.204224] ACPI : EC: EC started > [ 0.213212] ACPI Error: [^^^PEG0.PEGP.EASP] Namespace lookup failure, > AE_NOT_FOUND (20150930/psargs-359) > [ 0.213218] ACPI Error: Method parse/execution failed > [\_SB.PCI0.LPCB.EC._REG] (Node ffff8804730d2af0), AE_NOT_FOUND > (20150930/psparse-542) > [ 0.213231] ACPI : EC: Fail in evaluating the _REG object of EC device. > Broken bios is suspected. > [ 0.217187] ACPI Error: [^^^PEG0.PEGP.EASP] Namespace lookup failure, > AE_NOT_FOUND (20150930/psargs-359) > [ 0.217192] ACPI Error: Method parse/execution failed > [\_SB.PCI0.LPCB.EC._REG] (Node ffff8804730d2af0), AE_NOT_FOUND > (20150930/psparse-542) > > Apparently the ECDT is loading correctly and when activating it, using the > _REG method of the EC device defined in the DSDT, it fails because EASP is > not found. For ECDT, _REG is not required to be evaluated. This is a bug in current EC driver. And I have a patch to correct it. Again, you should wait for the 4.6 kernels and retry. > When I decompile the DSDT, using the SSDTs as external tables with the -e > switch, ^^^PEG0.PEGP.EASP is resolved as an external > _SB_.PCI0.PEG0.PEGP.EASP of type UnknownObj. > I think it comes from the SSDT6 table (SaSsdt), which seems to be related > mostly to PCI and graphical devices. > Note that I can't compile back neither the DSDT nor the SSDT6 table because > when decompiling it there are some unresolved external methods. > > In the SSDT6 table, EASP is defined inside an If at the root of the table > If (CondRefOf (\_SB.PCI0.PEG0.PEGP)) > { > Scope (\_SB.PCI0.PEG0.PEGP) > { > OperationRegion (PCIS, PCI_Config, Zero, 0x0100) > Field (PCIS, AnyAcc, NoLock, Preserve) > { > PVID, 16, > PDID, 16, > Offset (0x88), > EASP, 2, > \_SB.PCI0.PEG0.PEGP is resolved as a external DeviceObj, which is defined in > the DSDT, without any condition this time. It's just because _REG is evaluated before executing this block. I think it can be solved by the EC fix. > > I have no idea if my reasoning is correct, and/or if it might help solve the > problem, and how if so. > But again, if you need more information I'm still available. I'm not sure. You can wait and try. Hope the intel idle problem is just because of the SSDT5 loading failure. Thanks -Lv
Well, as the same freezing problem seem to appear in other laptop models and from other manufacturers, I would now think that this bug is really related to the CPU, and not to the ACPI tables (or interpreter). Especially since disabling the intel_idle driver correct the freezing problem perfectly. But it is good to know that the ACPI interpreter will be corrected, because I guess that the ACPI errors might cause others troubles (for example Xorg segfault when I try to use the nvidia driver and device). Althought, waiting for 4.6 is a long way, we are not even arrived to the first release of 4.4. I don't think that the problem causing the freezes is in the CPU microcode, because if it was, I guess it would probably cause trouble to the OS installed by default on the laptop too, and it is not the case, the only thing that seems to have problem under this OS is the intel integrated graphical card for which the driver crash frequently when using the nvidia device (for games or for CUDA computation). But if intel release an update to this microcode, I will test it. For now it seems I'm running on the 0x39 version from cpuinfo, which is coming from the latest bios given by MSI, and the last time I tried, there was no update available from intel.
Ludovic, just one small detail. Do you mind to pick a Mint 17 (or Ubuntu 14.04, guess the result would be the same) live CD and try to boot from it using only nouveau.modeset=0 and nothing else? For my specific case it boots, so it maybe is a regression. Arch with 4.2 (December ISO) also boots, this time with the i915.preliminary_hw_support=1 (besides turning off the modeset for nouveau) and the same OS with 4.3 doesn't. If this bug is a regression it might be easier to find.
Hi, Here is my test results. With intel_idle.max_cstate set as: cstate|booting|wake up after sleep mode 0 T T 1 T F 2 T F 3 T T 4 T T/F (sometimes doesn't wake up) 6 T F So I can use 0 or 3 for now Please let me know if you need more detailed information
In my case the problem seems to be reproducible with intel_idle.max_cstate >= 3, and, as opposed to Denis, whether I am booting normally, from hibernation or from suspend does not seem to make an impact. A post, [1], in a Dell forum thread about running linux on the Dell 9350 laptop suggested adding i915 as an early loaded module in the initramfs. I tried adding it, and by doing so I am not able to reproduce the problem when intel_idle.max_cstate is unset (I have not tried setting it, but presumably that also works). So I believe this problem might be caused by loading the i915 module when the CPU is idling, or maybe if it changes C-state during loading. [1]: http://en.community.dell.com/techcenter/os-applications/f/4613/t/19659067?pi22229=8#20859687
I faced with a similar problem on my new MSI GE62 6QC (Skylake i7 6700HQ, Intel HM170). The only difference is that on kernel's freeze my screen stays ON. When system stops being started with no extra options, the three lines on the screen are as follows: [0.17xxxxx] ACPI; EC: Fail in evaluating the _REG object of EC device. Broken BIOS is suspected. [4.66xxxxx] nouveau E[ PIBUS][0000:01:00.0] HUB0: 0x6013d4 0x00005700 (0x1f408200) [4.66xxxxx] nouveau E[ PIBUS][0000:01:00.0] HUB0: 0x10ecc0 0xffffffff (0x1d40822c) intel_idle.max_cstate=0 directive takes boot process much further, but even this does not bring my system up – normal boot messages at some point become interrupted with a bunch of lines like this: … apparmor.service [ 83.635044] iwlwifi 0000:02:00.0: Unsupported splx structure [ 108.098261] NMI watchdog: BUG:soft lockup- CPU#5 stuck for 22s! [plymouthd:231] [ 136.093973] NMI watchdog: BUG:soft lockup- CPU#5 stuck for 22s! [plymouthd:231] [ 142.341017] INFO: rcu_sched self-detected stall on CPU { 5} (t=15000 jiffies g=1180 c=1179 q=0) [ 168.098261] NMI watchdog: BUG:soft lockup- CPU#5 stuck for 22s! [plymouthd:231] [ 196.098261] NMI watchdog: BUG:soft lockup- CPU#5 stuck for 22s! [plymouthd:231] [ 224.098261] NMI watchdog: BUG:soft lockup- CPU#5 stuck for 22s! [plymouthd:231] [ 240.098052] INFO:task systemd:1 blockedfor more than 120seconds. [ 240.098656] Tainted: G L 4.2.0-16-generic#19Ubuntu [ 240.099262] “echo 0 > /proc/sys/kernel/hung_task_timeout_secs” disables this message. [ 240.099920] INFO: taskkworker/0:1:78 blocked for more than120 seconds. Lines like this come repeatedly, only the time and CPU## change and seems this would take forever. Booting with acpi=off takes system up, but since devices with this option (at least my touchpad) become unoperable, I didn't run for very long, just grabbed dmesg, dmidecode, lspci and cpuinfo. Should this help investigation, I'll happily participate in provision of required info or running kernels with options of interest, so please feel free to involve me. Regards!
It seems that many people has this problem when trying to boot up. can you guys help to check if it can boot up with following command appended(other command line options remain unchanged) 'init=/bin/bash nomodeset text' or do you have a serial output?
None. Just [0.17xxxxx] ACPI; EC: Fail in evaluating the _REG object of EC device. Broken BIOS is suspected. And blinking cursor at the beginning of next line. If you meant serial port - unfortunately no, it is not available on my laptop...
Hello, I have a MSI GS40 6QE with skylake 6700HQ I was able to install and boot on debian jessie with default 3.16 kernel. I get "ACPI; EC: Fail in evaluating the _REG object of EC device. Broken BIOS is suspected." but boot and xorg with nouveau started successfully. I didn't notice other problems but didn't stay long with this kernel as all hardware isn't supported (lan/wifi) I tried jessie's backported kernel 4.2 and 4.3 and get the same ACPI error. But thoses kernel are unusable because of "NMI watchdog: BUG:soft lockup- CPU#5 stuck for 22s!" acpi=off stop the lockup issue but also disable hardware (touchpad) I'll keep going on 4.4 and can help if you need more arms to test kernel patches or get informations about this particular hardware.
Dear engineers, can we have any more information on how you're progressing with this particular issue? It severely cripples at least two (three if the Dell XPS has the same problem) of the best-selling laptops on the market today, all from different brands. The bug has the highest priority level (if it is P1) so I assume you're working on it. Did you already figured out anything? I came here to post what I think to be more information, but Andrey and "small+kernel@pasglop.net" already said basically the same. Here are some photographs (the first two) of the system with "udev.log-priority=debug nomodeset i915.modeset=0 debug ignore_loglevel earlyprintk=efi,keep log_buf_len=16M" http://imgur.com/a/cYHC3 The photos after those two are for "nomodeset i915.modeset=0 ignore_loglevel earlyprintk=efi,keep" I think, yet I'm not certain. Both cases loop like that forever saying the same every time but in different orders.
(In reply to Cláudio Pereira from comment #40) > Dear engineers, can we have any more information on how you're progressing > with this particular issue? > > It severely cripples at least two (three if the Dell XPS has the same > problem) of the best-selling laptops on the market today, all from different > brands. The bug has the highest priority level (if it is P1) so I assume > you're working on it. Did you already figured out anything? > > I came here to post what I think to be more information, but Andrey and > "small+kernel@pasglop.net" already said basically the same. > > Here are some photographs (the first two) of the system with > "udev.log-priority=debug nomodeset i915.modeset=0 debug ignore_loglevel > earlyprintk=efi,keep log_buf_len=16M" > http://imgur.com/a/cYHC3 > > The photos after those two are for "nomodeset i915.modeset=0 ignore_loglevel > earlyprintk=efi,keep" I think, yet I'm not certain. > > Both cases loop like that forever saying the same every time but in > different orders. According to your first picture, it seems that CPU0 blocked at initializing the clock source. Is it the first warnning message appearing on the monitor? I want to make sure if it is the first cause for this problem. How about adding 'notsc' in your command line? besides, if you have time, can you please help test if #Comment 37 works for you(you might need to recompile the kernel with USB2.0/USB3.0 built-in.
A short update on "init=/bin/bash" command line option. With both options "intel.idle.max_cstate=0" and "init=/bin/bash" used at the same time and "quiet splash" keywords removed, I was able not just to boot Ubuntu 15.10 but also to install it. Sometimes the installed system encounters kernel panic on boot. A trailing slash in "init=/bin/bash/" helps against this...
(In reply to Andrey from comment #42) > A short update on "init=/bin/bash" command line option. > > With both options "intel.idle.max_cstate=0" and "init=/bin/bash" used at the > same time and "quiet splash" keywords removed, I was able not just to boot > Ubuntu 15.10 but also to install it. > > Sometimes the installed system encounters kernel panic on boot. A trailing > slash in "init=/bin/bash/" helps against this... Hi, do you mean, if the command line is: "init=/bin/bash/ nomodeset text", the system can not boot up, while if it is appended with "intel.idle.max_cstate=0 init=/bin/bash/ nomodeset text", everything goes well? I think we should firstly confirm if it is related to graphic or it is actually caused by cstate.
Chen Yu, exactly (I checked onnce again). The complete initial boot options string in my case is: "file=/cdrom/preseed/ubuntu-mate.seed boot=casper initrd=/casper/initrdlz quiet splash ---" 1. Insertion of "init=/bin/bash nomodeset text" before "---" leads to well known error "ACPI: EC: Fail in evaluating _REG object...", then boot stops. The same result I also get with only the option "intel_idle.max_cstate=0". 2. Putting "intel_idle.max_cstate=0 init=/bin/bash/ nomodeset text" at the same place of boot options string lets system to boot well in my case. Thank you!
Sorry, I was on vacation the two last weaks and had only wifi to connect but I didn't installed the wifi tools, so I didn't worked on this bug. Moreover as I needed a fully working linux environnement and access to my nvidia GPU to do my work, I installed VirtualBox on the default OS so I can use it to access the nvidia GPU and have my linux system to work. Anyway, I think we have 3 differents bugs in all this : * An ACPI bug that seems to affect mostly the MSI laptops, which prevent the EC to be correctly initialized (but it seems to work somehow more or less correctly anyway), this bug should be corrected one day with some updates to acpica * A bug into the cstate on skylake architecture causing some complete system freeze and which can be disabled as a workaround using "intel_idle.max_cstate=0" * Many bugs in the i915 driver including one with modesetting for the skylake architecture, this one doesn't happen on my system, I guess that's because I'm running the latest 4.4 rc and compile the kernel with the option to include preliminary hardaware support (which does include some fixes to the modesetting code). There are still other bugs anyway in this driver as I have some segfault trace in the logs related to this driver but they doesn't make the system freeze, I can run it for hours. Andrey, I don't know which version of the kernel you are running, but I suggest you should try to enable the preliminary support in the i915 driver using the kernel option i915.preliminary_hw_support=1
Hi, Ludovic. The most recent kernel I founmd among linux distributions - was kernel 4.2 featured in Ubuntu (MATE) distribution v.15.10. If there are downloadable linux distributions with kernels > 4.2, please let me know, I will try it. Enabling support of i915 driver with "i915.preliminary_hw_support=1" option doesn't change anything, it seems...
You can try the Archlinux livecd, it comes with a 4.3.3 kernel, as far as I know, this is one of the most up to date. By the way, during lunch I realized I misunderstood what you were trying to do in the last comments, I was thinking that some people were having trouble with cstate disabled and that disabling modeset was removing the problem. But it seems you were trying to boot with cstate enable and modeset disable. Therefor I did the following test : boot kernel 4.4 rc 5 (with the config attached previously) with intel.idle.max_cstate from 1 to 8 (which is the size of the array skl_cstates in drivers/idle/intel_idle.c) and i915.modeset=0. The result is as follows : * cstate 1 to 7 was able to boot, in dmesg I have "max_cstate 7 reached" and no more error about the i915 driver segfault in the kernel log * cstate 8 caused a complete kernel freeze as previously As to me there is two different bug involved here, I also tried to boot only with intel.idle.max_cstate from 8 to 1. As previously, cstate 8 caused again a complete kernel freeze, but cstate 7 to 1 were able to boot correctly with the message "max_cstate N reached" in the kernel logs and the segfault of the i915 driver being back. So I think there is really a problem in the intel cstate code, and it is not related to the i915 driver, and this problem is probably just with the cstate 8 (named C10-SKL). During all theses test, I just run the kernel a few minutes (the time to look if /proc/cpuinfo was correct and what was in dmesg). Tonight when I will leave my office, I will boot with max_cstate=7 and I will see tomorrow if it is still up. I will create also a bug report for the i915 driver as now I know that the bug seems to be related to modeset.
(In reply to Chen Yu from comment #41) > (In reply to Cláudio Pereira from comment #40) > > Dear engineers, can we have any more information on how you're progressing > > with this particular issue? > > > > It severely cripples at least two (three if the Dell XPS has the same > > problem) of the best-selling laptops on the market today, all from > different > > brands. The bug has the highest priority level (if it is P1) so I assume > > you're working on it. Did you already figured out anything? > > > > I came here to post what I think to be more information, but Andrey and > > "small+kernel@pasglop.net" already said basically the same. > > > > Here are some photographs (the first two) of the system with > > "udev.log-priority=debug nomodeset i915.modeset=0 debug ignore_loglevel > > earlyprintk=efi,keep log_buf_len=16M" > > http://imgur.com/a/cYHC3 > > > > The photos after those two are for "nomodeset i915.modeset=0 > ignore_loglevel > > earlyprintk=efi,keep" I think, yet I'm not certain. > > > > Both cases loop like that forever saying the same every time but in > > different orders. > > According to your first picture, it seems that CPU0 blocked at initializing > the clock source. Is it the first warnning message appearing on the monitor? > I want to make sure if it is the first cause for this problem. > How about adding 'notsc' in your command line? > besides, if you have time, can you please help test if #Comment 37 works for > you(you might need to recompile the kernel with USB2.0/USB3.0 built-in. 'notsc' apparently does nothing. The first error that appears with it is "NMI watchdog; Watchdog detected hard LOCKUP on cpu 0" 0.69 seconds after booting. Yet I'm not sure if it is the same error that appeared without 'notsc'. There's also a warning which I'm not sure if is related "Using host bridge windows from ACPI: if necessary, use "pci=nocrs" and report a bug" and "[Firmware bug]: ACPI: BIOS _OSI(Linux) query ignored" Neither comment 37 nor setting the cstate work. Only acpi=off did the trick so far. I tried to compile a kernel with USB built in, but unfortunately after it compiled, it had some trouble installing. I'm not really experienced building kernels, and unfortunately don't have the time to learn right now. If you want I can give you access to this machine. I can't figure out what is going wrong but am desperate to have it working since I need it for college. Anything you need just ask.
I had the same issue. MSI GE72-6QD. Sklylake 6700HQ Dual Graphics - Nvidia and Intel Mint 17.3 KDE Ubuntu 14.04/15.04/15.10 I could not boot any kernel 4.3 or higher without a crash before being able to log in. I found elsewhere about the intel_idle fix for passing the kernel/boot flag. This worked for me. With the MSI bios update .107 they introduced the ability to disable "cstates". I am now using that instead of passing boot flags. Mint 17.3 KDE - Kernel 4.4-rc8 from the drm-intel-nightly branch (self compiled) Everything boots as long as cstates are disabled in the bios. I also have the "broken _EC" error. That was introduced for me when I upgraded the bios to .105. Prior to that I had no issue with that error message. Though I was not able to disable cstates in the bios. I upgraded to MSI's .110 bios this morning and have yet to check if I can enable cstates or not.
Update - I cannot boot with cstates enabled. Still. Microcode for MSI has been updated to 55; with the bios update. Previously it was 39.
As promised, I tested the kernel with only intel.idle.max_cstate=7 yesterday, I run 4.4-rc5 for about half an hour doing various stuff (upgrading the kernel to 4.4-rc8, updating the system, ... and also nothing). Then I booted the newly 4.4-rc8 kernel and worked one hour trying to make Xorg working, I had no freezes nor watchdog CPU stall message in the kernel logs. I left the system as is up all the night, and it was still up and running this morning. Is there a way to check how much time the processor stayed in each cstate, just to be sure that the kernel actually use them ? As a workaround for the freezes with a 4.4 kernel, I think using kernel option intel.idle.max_cstate=7 is the best for now. It might also work on 4.3 kernels (who are supposed to support the Skylake processor family), but I don't have any to test currently. Susan, you should try the workaround I just mentioned, it will enable nearly all cstates which is probably fine.
Ludovic M. I followed your suggestion of trying to limit cstates to 7. This has worked so far. The kernel flag I had to pass though was "intel_idle.max_cstate=7" So far so good on 4.4-rc8 (drm-intel-nightly).
(In reply to Ludovic Magerand from comment #51) > Is there a way to check how much time the processor stayed in each cstate, > just to be sure that the kernel actually use them ? /sys/devices/system/cpu/cpuX/cpuidle/stateY/time should tell how long the processor stayed in that idle state. See Documentation/cpuidle/sysfs.txt for more information.
Ok, I tested on the gentoo 4.3.3 kernel, after 10 minutes I did 'cat /sys/devices/system/cpu/cpu?/cpuidle/state?/time' and there was a value in every one (the last one for each CPU being a bit higher, but as I didn't do anything stressfull on the system, it seems legite that the CPU went more on the last cstate). So all the cstate from 1 to 7 are working fine on both 4.4 and 4.3.3 kernels. The problem is really just with the last cstate. I think I can't do more to help until someone has a patch to test, so I guess it's up to you :)
Hi, I can confirm that using intel_idle.max_cstate=7" on 4.4-rc8 (drm-intel-nightly) the system is bootable and wake-up after sleep. But I see some weird warnings in syslog .... WARNING: CPU: 3 PID: 893 at /home/kernel/COD/linux/drivers/gpu/drm/i915/intel_display.c:13896 intel_prepare_plane_fb+0x269/0x2d0 [i915]() .... and still [ 0.251309] ACPI Error: [^^^PEG0.PEGP.EASP] Namespace lookup failure, AE_NOT_FOUND (20150930/psargs-359) [ 0.251313] ACPI Error: Method parse/execution failed [\_SB.PCI0.LPCB.EC._REG] (Node ffff8804730edaf0), AE_NOT_FOUND (20150930/psparse-542) [ 0.251325] ACPI : EC: Fail in evaluating the _REG object of EC device. Broken bios is suspected. [ 0.283299] ACPI Error: [^^^PEG0.PEGP.EASP] Namespace lookup failure, AE_NOT_FOUND (20150930/psargs-359) [ 0.283302] ACPI Error: Method parse/execution failed [\_SB.PCI0.LPCB.EC._REG] (Node ffff8804730edaf0), AE_NOT_FOUND (20150930/psparse-542) [ 0.285500] ACPI: Executed 24 blocks of module-level executable AML code [ 0.291346] [Firmware Bug]: ACPI: BIOS _OSI(Linux) query ignored [ 0.294647] ACPI: Dynamic OEM Table Load: [ 0.294652] ACPI: SSDT 0xFFFF880470808C00 0003CF (v02 PmRef Cpu0Cst 00003001 INTL 20120913) [ 0.295490] ACPI: Dynamic OEM Table Load: [ 0.295495] ACPI: SSDT 0xFFFF880470C52800 0005EA (v02 PmRef Cpu0Ist 00003000 INTL 20120913) [ 0.297357] ACPI: Dynamic OEM Table Load: [ 0.297362] ACPI: SSDT 0xFFFF880470C53000 0005AA (v02 PmRef ApIst 00003000 INTL 20120913) [ 0.298366] ACPI: Dynamic OEM Table Load: [ 0.298369] ACPI: SSDT 0xFFFF880470FFA000 000119 (v02 PmRef ApCst 00003000 INTL 20120913) [ 0.302766] ACPI: Interpreter enabled [ 0.302774] ACPI Exception: AE_NOT_FOUND, While evaluating Sleep State [\_S1_] (20150930/hwxface-580) [ 0.302781] ACPI Exception: AE_NOT_FOUND, While evaluating Sleep State [\_S2_] (20150930/hwxface-580) [ 0.302799] ACPI: (supports S0 S3 S4 S5) [ 0.302800] ACPI: Using IOAPIC for interrupt routing I'm using discrete graphics for now. Tried intel graphics but hadsome problem with sleep mode.
Created attachment 199051 [details] cstate=7 & 4.4-rc8
*** Bug 110371 has been marked as a duplicate of this bug. ***
Same problem on the Ghost Pro 6QE (I have the 4K version). After updating the BIOS and EC firmware earlier today, I can finally get a boot on 4.x kermels. With Ubuntu 15.10, the Intel graphics stack, and the 20160109 drm-intel-nightly 4.4.0-994 kernel, and setting the cstates parameter gets a boot. However, if I do not set nouveau.modeset=0, I'll get a cpu lockup. Using the nvidia-355 drivers (tried 352 and 358 as well) gives me complete functionality of the nvidia card. However it cannot switch to the Intel GPU for some reason, which I suspect is another bug in the i915 driver (it used to switch fine on Ubuntu 14.04 and the 3.19 kernel)
I updated the bios of my GL552VW , its at version 216 now, I think it was at 210. After doing so I gave the newer kernels another go. Boot parameters: "rw intel_idle.max_cstate=7 nouveau.blacklist=1 acpi_osi=! acpi_backlight=native" Still doesn't work on 4.3, BUT it does on 4.4. It all works just fine. Of course that there are no proprietary drivers for 4.4 just yet (not that I like them, but nouveau has no HW accel in the 960M) so the discrete GPU isn't used, but the Intel APU totally works, suspend and resume included. I'll keep an eye on this bug to know about the fix, but the workaround works. What am I losing by limiting cstates to 7?
(In reply to Cláudio Pereira from comment #59) > Of course that there are no proprietary drivers for 4.4 just yet (not that I > like them, but nouveau has no HW accel in the 960M) so the discrete GPU > isn't used, but the Intel APU totally works, suspend and resume included. I'm using the 4.4.4-994 drm-intel-nightly kernel and nvidia-355 (352, 358, and 361 work too) and nvidia HW acceleration is fine. However now, on the Intel GPU it cannot suspend or logout, the kernel just hangs when attempting that. I fixed my previous problem of it not switching to the Intel GPU by updating my GuC firmware to the latest version that Intel provides on 01.org
(In reply to Cláudio Pereira from comment #59) > After doing so I gave the newer kernels another go. > Boot parameters: "rw intel_idle.max_cstate=7 nouveau.blacklist=1 acpi_osi=! > acpi_backlight=native" I don't think it's a good idea to boot with acpi_osi=! as it basically tells the acpi firmware that you don't have any OS installed, and many acpi firmware check which version of the OS is running to enable/disable some parts. Have you tried removing it ? > Still doesn't work on 4.3, BUT it does on 4.4. It all works just fine. > Of course that there are no proprietary drivers for 4.4 just yet (not that I > like them, but nouveau has no HW accel in the 960M) so the discrete GPU > isn't used, but the Intel APU totally works, suspend and resume included. > > I'll keep an eye on this bug to know about the fix, but the workaround works. > What am I losing by limiting cstates to 7? You won't loose much, just the deepest cstate. It means that when being idle your processor cores won't but put into the deepest power saving state (which seems to be so deep that the kernel can't get out of it :D). It will probably affect a little bit the battery life, but less than using intel_idle.max_cstate=0 which revert to the acpi state which are far less efficient in power saving.
Update: issue is still present in both stable 4.4 and the drm-intel kernel from http://cgit.freedesktop.org/drm-intel
The problem still relevant. Kernel: 4.4.0-3-ARCH Intel i7 6700HQ microcode: CPU0 sig=0x506e3, pf=0x20, revision=0x39 MSI PE70 6QE BIOS E1795IMS.10C 12/10/2015 GRUB_CMDLINE_LINUX_DEFAULT="intel_idle.max_cstate=7 acpi_osi=Linux acpi_backlight=native" Brightness control is not working. Without intel_idle.max_cstate OS freezes.
Created attachment 200461 [details] dmesg_acpi.txt
(In reply to Lev Lybin from comment #63) > The problem still relevant. > Kernel: 4.4.0-3-ARCH > Intel i7 6700HQ microcode: CPU0 sig=0x506e3, pf=0x20, revision=0x39 > MSI PE70 6QE > BIOS E1795IMS.10C 12/10/2015 > GRUB_CMDLINE_LINUX_DEFAULT="intel_idle.max_cstate=7 acpi_osi=Linux > acpi_backlight=native" > Brightness control is not working. Backlight is another problem, please file a new bug for it and provide dmesg/acpidump there, thanks.
(In reply to Lev Lybin from comment #63) > The problem still relevant. > Kernel: 4.4.0-3-ARCH > Intel i7 6700HQ microcode: CPU0 sig=0x506e3, pf=0x20, revision=0x39 > MSI PE70 6QE > BIOS E1795IMS.10C 12/10/2015 > GRUB_CMDLINE_LINUX_DEFAULT="intel_idle.max_cstate=7 acpi_osi=Linux > acpi_backlight=native" > Brightness control is not working. > Without intel_idle.max_cstate OS freezes. Same processor. Same laptop manufacturer. MSI GE72. Only use the cstate flag. For me the OSI flag doesn't do anything and the backlight flag breaks brightness. When using only the cstate flag everything appears to work; including brightness.
Yeah, the OSI and backlights flags are unnecessary. Your GRUB_CMDLINE_LINUX_DEFAULT should just be "intel_idle.max_cstate=7 nouveau.modeset=0". Also you should update your BIOS from MSI since they updated the microcode to v49 or v55 (depending on your model). We should all be getting another microcode update now that Intel's patching the Skylake errata.
(In reply to Rashed Abdel-Tawab from comment #67) > Yeah, the OSI and backlights flags are unnecessary. Your > GRUB_CMDLINE_LINUX_DEFAULT should just be "intel_idle.max_cstate=7 > nouveau.modeset=0". Also you should update your BIOS from MSI since they > updated the microcode to v49 or v55 (depending on your model). We should all > be getting another microcode update now that Intel's patching the Skylake > errata. I've replaced nvidia to nouveau, added nouveau.modeset=0, backlights works fine. Thanks. I have the latest version of the BIOS (v39 microcode), waiting...
Great. Has anyone else encountered the kernel panic when trying to close the X session while on the Intel GPU? I'll try to get a log for it tomorrow since I know it's useless reporting it without proper logs.
Just now I've updated the microcode on v55. The problem is relevant. Do you need some information e.g. dmesg, acpidump etc?
I have a new problem if use intel driver: skype video blinking blue during incoming call. if replace intel on modesetting, backlights doesn't works, but no problem with video. Kernel 4.4 https://bugs.launchpad.net/ubuntu/+source/skype/+bug/1078068/comments/24 https://www.reddit.com/r/archlinux/comments/41ht3e/annoying_skylake_issue_plus_skype_issue/
Is there a status on this issue? I am still facing issues with the current kernel... In case you want logs from my machine (ASUS ROG GL552VW, i7-6700 HQ, GTX 960M) just let me know :)
(In reply to Sjoerd Furth from comment #72) > Is there a status on this issue? I am still facing issues with the current > kernel... > > In case you want logs from my machine (ASUS ROG GL552VW, i7-6700 HQ, GTX > 960M) just let me know :) I am on an ROG GL752VW, i7-6700, this issue seems resolved for me as of 4.5rc1.
The problem is relevant for me on 4.5rc4.
The problem is still here in 4.5.0-rc6
If you mean the problem in comment 11, it is fixed: http://www.spinics.net/lists/linux-acpi/msg63550.html But the series contains things that need more time to review, so you have to wait a bit longer. Thanks and best regards -Lv
Linux PE70 4.5.0-rc6-mainline #1 SMP PREEMPT Tue Mar 1 22:41:17 ICT 2016 x86_64 GNU/Linux Intel i7 6700HQ microcode: CPU0 sig=0x506e3, pf=0x20, revision=0x55 MSI PE70 6QE Without intel_idle.max_cstate OS freezes. And I still get these messages: [ 0.021271] ACPI Error: [\_SB_.PCI0.XHC_.RHUB.HS11] Namespace lookup failure, AE_NOT_FOUND (20160108/dswload-210) [ 0.029990] ACPI Error: 1 table load failures, 9 successful (20160108/tbxfload-215) [ 0.243388] ACPI Error: [^^^PEG0.PEGP.EASP] Namespace lookup failure, AE_NOT_FOUND (20160108/psargs-360) [ 0.243393] ACPI Error: Method parse/execution failed [\_SB.PCI0.LPCB.EC._REG] (Node ffff8804730e34b0), AE_NOT_FOUND (20160108/psparse-542) [ 3.841840] acpi_call: Cannot get handle: Error: AE_NOT_FOUND [ 3.851290] acpi_call: Cannot get handle: Error: AE_NOT_FOUND
@Lv Zheng , thank you.
(In reply to Lv Zheng from comment #76) > If you mean the problem in comment 11, it is fixed: > http://www.spinics.net/lists/linux-acpi/msg63550.html > But the series contains things that need more time to review, so you have to > wait a bit longer. > > Thanks and best regards > -Lv Dear Lv, First of all thanks for your efforts. Only I am not quite sure which part will be fixed in that link. Is it that the bootflag intel_idle.max_cstate wont be needed anymore or is has it something to do with the ACPI tables (or both)? With kind regards, Sjoerd Furth
I believe Lv linked tgat ACPI big as a fix for the MSI laptops that are presenting issues with ACPI.
Let's focus this report on the boot failure that requires "intel_idle.max_cstate=7" to work-around. Please file other but reports for issues not directly related to that failure.
Created attachment 208761 [details] debug patch to disable c8 + C9 on selected SKL-H systems Please report if the attached patch allows your system to boot with no "intel_idle.max_cstate=" (or acpi=off) cmdline workaround. If it is working as intended, you should see something like this in dmesg: dmesg | grep idle intel_idle: MWAIT substates: 0x11142120 intel_idle: v0.4.1 model 0x5E intel_idle: lapic_timer_reliable_states 0xffffffff intel_idle: SGX present 0x29c6fbf intel_idle: state C8-SKL is disabled intel_idle: state C9-SKL is disabled grep . /sys/devices/system/cpu/cpu0/cpuidle/*/* should show that C8-SKL and C9-SKL are not longer present. If your BIOS has a SETUP option to enable SGX and you enable it, then you should be able to boot without this patch, and this patch will print another line about SGX being enabled, but you will not see the bit about C8-SKL and C9-SKL being disabled, and you should see them in sysfs using the grep above. If this patch fails to fix your boot issue, please boot with "intel_idle.max_cstate=7" and show the output from "dmesg | grep idle"
Thank you Len Brown. I'm successfully running a 4.5-rc7 kernel with your patch and without the "intel_idle.max_cstate=7" I needed before. My hardware is MSI GS40 "phantom" 6QE with i7-6700HQ CPU @ 2.60GHz. dmesg : [ 0.000000]Command line: BOOT_IMAGE=/vmlinuz-4.5.0-rc7-phantom root=/dev/mapper/pcsd-root ro text nomodeset (...) [ 0.778740] intel_idle: MWAIT substates: 0x11142120 [ 0.778741] intel_idle: v0.4.1 model 0x5E [ 0.778741] intel_idle: lapic_timer_reliable_states 0xffffffff [ 0.778743] intel_idle: SGX present 0x29c6fbf [ 0.778744] intel_idle: state C8-SKL is disabled [ 0.778745] intel_idle: state C9-SKL is disabled grep . /sys/devices/system/cpu/cpu0/cpuidle/*/* : /sys/devices/system/cpu/cpu0/cpuidle/state0/desc:CPUIDLE CORE POLL IDLE /sys/devices/system/cpu/cpu0/cpuidle/state0/disable:0 /sys/devices/system/cpu/cpu0/cpuidle/state0/latency:0 /sys/devices/system/cpu/cpu0/cpuidle/state0/name:POLL /sys/devices/system/cpu/cpu0/cpuidle/state0/power:4294967295 /sys/devices/system/cpu/cpu0/cpuidle/state0/residency:0 /sys/devices/system/cpu/cpu0/cpuidle/state0/time:7674964 /sys/devices/system/cpu/cpu0/cpuidle/state0/usage:3133 /sys/devices/system/cpu/cpu0/cpuidle/state1/desc:MWAIT 0x00 /sys/devices/system/cpu/cpu0/cpuidle/state1/disable:0 /sys/devices/system/cpu/cpu0/cpuidle/state1/latency:2 /sys/devices/system/cpu/cpu0/cpuidle/state1/name:C1-SKL /sys/devices/system/cpu/cpu0/cpuidle/state1/power:0 /sys/devices/system/cpu/cpu0/cpuidle/state1/residency:2 /sys/devices/system/cpu/cpu0/cpuidle/state1/time:6542990 /sys/devices/system/cpu/cpu0/cpuidle/state1/usage:18820 /sys/devices/system/cpu/cpu0/cpuidle/state2/desc:MWAIT 0x01 /sys/devices/system/cpu/cpu0/cpuidle/state2/disable:0 /sys/devices/system/cpu/cpu0/cpuidle/state2/latency:10 /sys/devices/system/cpu/cpu0/cpuidle/state2/name:C1E-SKL /sys/devices/system/cpu/cpu0/cpuidle/state2/power:0 /sys/devices/system/cpu/cpu0/cpuidle/state2/residency:20 /sys/devices/system/cpu/cpu0/cpuidle/state2/time:12444721 /sys/devices/system/cpu/cpu0/cpuidle/state2/usage:22193 /sys/devices/system/cpu/cpu0/cpuidle/state3/desc:MWAIT 0x10 /sys/devices/system/cpu/cpu0/cpuidle/state3/disable:0 /sys/devices/system/cpu/cpu0/cpuidle/state3/latency:70 /sys/devices/system/cpu/cpu0/cpuidle/state3/name:C3-SKL /sys/devices/system/cpu/cpu0/cpuidle/state3/power:0 /sys/devices/system/cpu/cpu0/cpuidle/state3/residency:100 /sys/devices/system/cpu/cpu0/cpuidle/state3/time:2071866 /sys/devices/system/cpu/cpu0/cpuidle/state3/usage:4613 /sys/devices/system/cpu/cpu0/cpuidle/state4/desc:MWAIT 0x20 /sys/devices/system/cpu/cpu0/cpuidle/state4/disable:0 /sys/devices/system/cpu/cpu0/cpuidle/state4/latency:85 /sys/devices/system/cpu/cpu0/cpuidle/state4/name:C6-SKL /sys/devices/system/cpu/cpu0/cpuidle/state4/power:0 /sys/devices/system/cpu/cpu0/cpuidle/state4/residency:200 /sys/devices/system/cpu/cpu0/cpuidle/state4/time:76102469 /sys/devices/system/cpu/cpu0/cpuidle/state4/usage:68295 /sys/devices/system/cpu/cpu0/cpuidle/state5/desc:MWAIT 0x33 /sys/devices/system/cpu/cpu0/cpuidle/state5/disable:0 /sys/devices/system/cpu/cpu0/cpuidle/state5/latency:124 /sys/devices/system/cpu/cpu0/cpuidle/state5/name:C7s-SKL /sys/devices/system/cpu/cpu0/cpuidle/state5/power:0 /sys/devices/system/cpu/cpu0/cpuidle/state5/residency:800 /sys/devices/system/cpu/cpu0/cpuidle/state5/time:235877869 /sys/devices/system/cpu/cpu0/cpuidle/state5/usage:113583 /sys/devices/system/cpu/cpu0/cpuidle/state6/desc:MWAIT 0x60 /sys/devices/system/cpu/cpu0/cpuidle/state6/disable:0 /sys/devices/system/cpu/cpu0/cpuidle/state6/latency:890 /sys/devices/system/cpu/cpu0/cpuidle/state6/name:C10-SKL /sys/devices/system/cpu/cpu0/cpuidle/state6/power:0 /sys/devices/system/cpu/cpu0/cpuidle/state6/residency:5000 /sys/devices/system/cpu/cpu0/cpuidle/state6/time:488816024 /sys/devices/system/cpu/cpu0/cpuidle/state6/usage:29191 Thanks, and best regards. sdavid
Works fine :) Thanks.
Are we losing anything out of our systems by using this patch? I mean, this essentially seems to disable features our processors have, am I wrong? #61 made me think this is all about power saving features, and laptops with this processor are usually power hungry machines, so every every bit counts. Also since this is a bugfix, will it get backported into currently supported mainstream kernels? Eg. it would be a shame if the upcoming Ubuntu LTS had trouble dealing with this for having 4.4 instead of 4.6.
Some information there, page 66: http://www.intel.com/content/dam/www/public/us/en/documents/datasheets/desktop-6th-gen-core-family-datasheet-vol-1.pdf As I understood, C7-C10 are similar. We have C7 and C10. Can to see this: grep . /sys/devices/system/cpu/cpu0/cpuidle/*/* But why C8 and C9 are disabled, I hadn't understood fully. Is it bug of processor? Is it disabled using microcode?
(In reply to Lv Zheng from comment #76) > If you mean the problem in comment 11, it is fixed: > http://www.spinics.net/lists/linux-acpi/msg63550.html > But the series contains things that need more time to review, so you have to > wait a bit longer. > > Thanks and best regards > -Lv about the ACPI error : I understood it was a separated issue. Is there a dedicated report for this issue that I could also follow ? I searched bugzilla tracking under ACPI topics but didn't found any reports that match. I can create it, it's just I'm new here and I don't know what's are the rules. Thanks and best regards. sdavid
I think this problem is solved here CTRL+F "Lv Zheng (7)": http://lkml.iu.edu/hypermail/linux/kernel/1603.1/05278.html
(In reply to Lev Lybin from comment #88) > I think this problem is solved here CTRL+F "Lv Zheng (7)": > http://lkml.iu.edu/hypermail/linux/kernel/1603.1/05278.html Thank you !
Re: comment #85 Yes, when intel_idle disables C8 and C9, the OS loses the ability to directly request those idle states. However, we do this only when C10 is enabled. So the processor can still enter C10 (which saves more energy than C8,C9) and the processor can still choose to "demote" those C10 requests to C8,C9 residency if it determines that is a better match for the expected latency. So I don't expect this workaround to have a measurable impact except in academic scenarios. Note that, by comparison, ACPI mode generally exports C1/C7/C10 -- so even with C8, C9 removed from intel_idle, it offers more fine-grain C-state selection than ACPI, which is what Windows uses...
(In reply to Len Brown from comment #82) > Created attachment 208761 [details] > debug patch to disable c8 + C9 on selected SKL-H systems > > Please report if the attached patch allows your system to boot > with no "intel_idle.max_cstate=" (or acpi=off) cmdline workaround. > > If it is working as intended, you should see something like this in dmesg: > > dmesg | grep idle > > intel_idle: MWAIT substates: 0x11142120 > intel_idle: v0.4.1 model 0x5E > intel_idle: lapic_timer_reliable_states 0xffffffff > intel_idle: SGX present 0x29c6fbf > intel_idle: state C8-SKL is disabled > intel_idle: state C9-SKL is disabled > > grep . /sys/devices/system/cpu/cpu0/cpuidle/*/* > should show that C8-SKL and C9-SKL are not longer present. > > If your BIOS has a SETUP option to enable SGX and you enable it, > then you should be able to boot without this patch, and this patch > will print another line about SGX being enabled, but you will not > see the bit about C8-SKL and C9-SKL being disabled, and you should > see them in sysfs using the grep above. > > If this patch fails to fix your boot issue, > please boot with "intel_idle.max_cstate=7" > and show the output from "dmesg | grep idle" Dear Lev, Today I tested the patch on kernel 4.4.5 (Arch current stable). It is also working there. [sjoerd@Sjoerd-Laptop-Arch-Linux ~]$ dmesg | grep idle [ 0.000000] Command line: \boot\vmlinuz-linux-custom root=/dev/sdb5 rw initrd=/boot/initramfs-linux-custom.img [ 0.000000] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 6370452778343963 ns [ 0.000000] clocksource: hpet: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 79635855245 ns [ 0.039644] process: using mwait in idle threads [ 0.209242] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 6370867519511994 ns [ 0.220851] cpuidle: using governor ladder [ 0.234201] cpuidle: using governor menu [ 0.390041] clocksource: acpi_pm: mask: 0xffffff max_cycles: 0xffffff, max_idle_ns: 2085701024 ns [ 0.445249] intel_idle: MWAIT substates: 0x11142120 [ 0.445250] intel_idle: v0.4.1 model 0x5E [ 0.445251] intel_idle: lapic_timer_reliable_states 0xffffffff [ 0.445252] intel_idle: SGX present 0x29c6fbf [ 0.445253] intel_idle: state C8-SKL is disabled [ 0.445254] intel_idle: state C9-SKL is disabled [ 1.435604] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x255cb5c6a11, max_idle_ns: 440795249002 ns [sjoerd@Sjoerd-Laptop-Arch-Linux ~]$ grep . /sys/devices/system/cpu/cpu0/cpuidle/*/* /sys/devices/system/cpu/cpu0/cpuidle/state0/desc:CPUIDLE CORE POLL IDLE /sys/devices/system/cpu/cpu0/cpuidle/state0/disable:0 /sys/devices/system/cpu/cpu0/cpuidle/state0/latency:0 /sys/devices/system/cpu/cpu0/cpuidle/state0/name:POLL /sys/devices/system/cpu/cpu0/cpuidle/state0/power:4294967295 /sys/devices/system/cpu/cpu0/cpuidle/state0/residency:0 /sys/devices/system/cpu/cpu0/cpuidle/state0/time:10148 /sys/devices/system/cpu/cpu0/cpuidle/state0/usage:74 /sys/devices/system/cpu/cpu0/cpuidle/state1/desc:MWAIT 0x00 /sys/devices/system/cpu/cpu0/cpuidle/state1/disable:0 /sys/devices/system/cpu/cpu0/cpuidle/state1/latency:2 /sys/devices/system/cpu/cpu0/cpuidle/state1/name:C1-SKL /sys/devices/system/cpu/cpu0/cpuidle/state1/power:0 /sys/devices/system/cpu/cpu0/cpuidle/state1/residency:2 /sys/devices/system/cpu/cpu0/cpuidle/state1/time:9223221 /sys/devices/system/cpu/cpu0/cpuidle/state1/usage:48074 /sys/devices/system/cpu/cpu0/cpuidle/state2/desc:MWAIT 0x01 /sys/devices/system/cpu/cpu0/cpuidle/state2/disable:0 /sys/devices/system/cpu/cpu0/cpuidle/state2/latency:10 /sys/devices/system/cpu/cpu0/cpuidle/state2/name:C1E-SKL /sys/devices/system/cpu/cpu0/cpuidle/state2/power:0 /sys/devices/system/cpu/cpu0/cpuidle/state2/residency:20 /sys/devices/system/cpu/cpu0/cpuidle/state2/time:9280773 /sys/devices/system/cpu/cpu0/cpuidle/state2/usage:13186 /sys/devices/system/cpu/cpu0/cpuidle/state3/desc:MWAIT 0x10 /sys/devices/system/cpu/cpu0/cpuidle/state3/disable:0 /sys/devices/system/cpu/cpu0/cpuidle/state3/latency:70 /sys/devices/system/cpu/cpu0/cpuidle/state3/name:C3-SKL /sys/devices/system/cpu/cpu0/cpuidle/state3/power:0 /sys/devices/system/cpu/cpu0/cpuidle/state3/residency:100 /sys/devices/system/cpu/cpu0/cpuidle/state3/time:720935 /sys/devices/system/cpu/cpu0/cpuidle/state3/usage:795 /sys/devices/system/cpu/cpu0/cpuidle/state4/desc:MWAIT 0x20 /sys/devices/system/cpu/cpu0/cpuidle/state4/disable:0 /sys/devices/system/cpu/cpu0/cpuidle/state4/latency:85 /sys/devices/system/cpu/cpu0/cpuidle/state4/name:C6-SKL /sys/devices/system/cpu/cpu0/cpuidle/state4/power:0 /sys/devices/system/cpu/cpu0/cpuidle/state4/residency:200 /sys/devices/system/cpu/cpu0/cpuidle/state4/time:10644507 /sys/devices/system/cpu/cpu0/cpuidle/state4/usage:4403 /sys/devices/system/cpu/cpu0/cpuidle/state5/desc:MWAIT 0x33 /sys/devices/system/cpu/cpu0/cpuidle/state5/disable:0 /sys/devices/system/cpu/cpu0/cpuidle/state5/latency:124 /sys/devices/system/cpu/cpu0/cpuidle/state5/name:C7s-SKL /sys/devices/system/cpu/cpu0/cpuidle/state5/power:0 /sys/devices/system/cpu/cpu0/cpuidle/state5/residency:800 /sys/devices/system/cpu/cpu0/cpuidle/state5/time:12803186 /sys/devices/system/cpu/cpu0/cpuidle/state5/usage:4540 /sys/devices/system/cpu/cpu0/cpuidle/state6/desc:MWAIT 0x60 /sys/devices/system/cpu/cpu0/cpuidle/state6/disable:0 /sys/devices/system/cpu/cpu0/cpuidle/state6/latency:890 /sys/devices/system/cpu/cpu0/cpuidle/state6/name:C10-SKL /sys/devices/system/cpu/cpu0/cpuidle/state6/power:0 /sys/devices/system/cpu/cpu0/cpuidle/state6/residency:5000 /sys/devices/system/cpu/cpu0/cpuidle/state6/time:49171424 /sys/devices/system/cpu/cpu0/cpuidle/state6/usage:2143 Thank you for your hard work! With kind regards, Sjoerd Furth
fix shipped upstream in v4.6-rc1: commit d70e28f57e14a481977436695b0c9ba165472431 Author: Len Brown <len.brown@intel.com> Date: Sun Mar 13 00:33:48 2016 -0500 intel_idle: prevent SKL-H boot failure when C8+C9+C10 enabled This patch will need to be applied where intel_idle has SKL support. For the upstream kernel, that is Linux Linux-4.3, 4.4, and 4.5. closed.
(In reply to sdavid from comment #87) > (In reply to Lv Zheng from comment #76) > > If you mean the problem in comment 11, it is fixed: > > http://www.spinics.net/lists/linux-acpi/msg63550.html > > But the series contains things that need more time to review, so you have > to > > wait a bit longer. > > > > Thanks and best regards > > -Lv > > about the ACPI error : > I understood it was a separated issue. > Is there a dedicated report for this issue that I could also follow ? > I searched bugzilla tracking under ACPI topics but didn't found any reports > that match. > I can create it, it's just I'm new here and I don't know what's are the > rules. > > Thanks and best regards. > > sdavid Thanks for the ping. You can find the related fix on this bug entry: https://bugzilla.kernel.org/show_bug.cgi?id=102421 Several fixes of them are upstreamed. The issue is more serious than expected. Though there are only error logs bugging us around, the errors in fact indicate many issues related to the ACPI subsystem initialization. So you can have your cases tested there or file another bug and assign it to me. Thanks -Lv
*** Bug 112261 has been marked as a duplicate of this bug. ***
Hello, I'm trying to install the system but climbs all the same error iwlwifi 0000: 02: 00.0: Unsupported splx structure nmi watchdog: bug: soft lockup - cpu # 1 stuck for 23s! ... with a preinstalled operating system and installed microcode ucode-intel-20170714.1 same. Even the translation system in Legacy mode. All the same, the same. What else can you do?
specifications msi gp-62-lp-466 Intel i7 6700HQ bios version 2016-01-26