This is an upstream resubmit of this bug report for opensuse: https://bugzilla.opensuse.org/show_bug.cgi?id=1011254 My computer will not boot unless acpi=off is set in boot parameters for OpenSuSE Leap 42.2 (I have tried acpi=ht, acpi=off and acpi=noirq). I have also tried booting these other kernels (they also fail to boot): kernel-default-4.1.35-12.1.g2e75991.x86_64.rpm and kernel-default-4.8.10-1.1.gd1ec066.x86_64.rpm The system will attempt to load, then reset (screen blanks, computer power cycles). The reset always occurs, but doesn't always occur at the same "time". It can be as little as a second or two from loading initrd up to about 6 seconds (based on readouts I can glimpse with quiet off). I have updated my motherboard with the most recent bios (E7918IMS.2A0). This other bug may, or may not be relevant: https://bugzilla.opensuse.org/show_bug.cgi?id=990003 The Opensuse ppl suggested lodging this bug upstream. Thanks
I have done many, many tests with kernels from system rescue cds. None of the kernels I tried would boot and stay stable without extra parameters. By stable I mean, did not reset/reboot automatically[1]. More recent kernels would boot neat (ie no parameters) but would reset on a right mouse click/menu operation. However, the following kernels would all boot and be stable with nohz=off parameter: 3.10.25 3.10.32 3.10.55 3.12.7 The following needed acpi=off to boot and be stable. That is, nohz=off was not enough to ensure a booted, stable system: 3.13.5 3.14.20 3.18.34 4.1.27 In one case (3.13.5) nohz=off gave a _stable system on alternate reboots_ first boot unstable reboot stable, reboot unstable etc (rebooted 7 times). Something seems to have happened between 3.12.7 and 3.13.5. Up to 3.12.7 nohz=off was enough to give me a stable system. From 3.13.5, I needed to use acpi=off instead to get a stable system. Note: 1: to test stability I typed uname -r into the console and ran the internet browser, right clicked to get a menu, opened the applications menu. Typically it would reboot within seconds of either a console or X starting (the usb boots to a command line. You then startx). If the system made it to X, right clicking the mouse seemed to trigger a reset. Could someone take a look at this please?
does the problem still exist if you use boot option idle=poll?
If I use idle=poll the system will boot and seems to be stable (40 minutes and counting). The system fan runs noticably louder, although system load seems small. Tested: Leap 42.2 (kernel 4.8.10-1.gdlec066-default) (3 times - ie booted, seemed stable, tried again anyway) Sys rescue cd 4.8.0 (kernel 4.4.28) (3 times) Sysrescue cd 4.0.0 (kernel 3.10.25)
then what about idle=nomwait?
and what about intel_idle.max_cstate=1
idle=nomwait: will not boot (fails after a few seconds, well before X loads) intel_idle.max_cstate=1: boots, seems stable. (kernel: 4.8.10)
then what about intel_idle.max_cstate=2 or 3?
3: fails to boot 2: seems to be stable
Where can I find out whether it's better to boot with intel_idle.max_cstate=2 vs acpi=off?
it's better to use intel_idle.max_cstate=2, which has ACPI enabled.
with intel_idle.max_cstate=2 I am getting random reboots. They occur infrequently (once a week to once a fortnight). I thought they might be related to my corsair mouse, but they occur without the mouse present. Should I open a new bug report?
please attach the dmesg output when boot with intel_idle.max_cstate=1
Created attachment 255223 [details] Dmesg for intel_idle.max_cstate=1
Attached
Actually, the reboots are more frequent. Might be using the computer more. However, probably once every day or two. Sometimes more than once a day.
(In reply to brendan_os from comment #15) > Actually, the reboots are more frequent. Might be using the computer more. > However, probably once every day or two. Sometimes more than once a day. what do you mean? I thought intel_idle.max_cstate=1 would be sufficient to stop the reboot issue, no?
No, when max_cstate=2 I'm getting random reboots. The screen goes black without warning and the machine restarts - see comment 11.
(In reply to brendan_os from comment #17) > No, when max_cstate=2 I'm getting random reboots. The screen goes black > without warning and the machine restarts - see comment 11. and with intel_idle.max_cstate=1, the problem never exist, right?
Yes. I have now edited the defaults to be cstate=1. If I have reboots I will post.
[ 0.067784] smpboot: CPU0: Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz (family: 0x6, model: 0x3c, stepping: 0x3) I guess this is a Haswell processor, right? please attach the lspci output to confirm.
>/sbin/lspci 00:00.0 Host bridge: Intel Corporation 4th Gen Core Processor DRAM Controller (rev 06) 00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller (rev 06) 00:03.0 Audio device: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor HD Audio Controller (rev 06) 00:14.0 USB controller: Intel Corporation 9 Series Chipset Family USB xHCI Controller 00:16.0 Communication controller: Intel Corporation 9 Series Chipset Family ME Interface #1 00:1a.0 USB controller: Intel Corporation 9 Series Chipset Family USB EHCI Controller #2 00:1b.0 Audio device: Intel Corporation 9 Series Chipset Family HD Audio Controller 00:1c.0 PCI bridge: Intel Corporation 9 Series Chipset Family PCI Express Root Port 1 (rev d0) 00:1c.2 PCI bridge: Intel Corporation 9 Series Chipset Family PCI Express Root Port 3 (rev d0) 00:1c.3 PCI bridge: Intel Corporation 82801 PCI Bridge (rev d0) 00:1d.0 USB controller: Intel Corporation 9 Series Chipset Family USB EHCI Controller #1 00:1f.0 ISA bridge: Intel Corporation 9 Series Chipset Family Z97 LPC Controller 00:1f.2 SATA controller: Intel Corporation 9 Series Chipset Family SATA Controller [AHCI Mode] 00:1f.3 SMBus: Intel Corporation 9 Series Chipset Family SMBus Controller 02:00.0 Ethernet controller: Qualcomm Atheros Killer E220x Gigabit Ethernet Controller (rev 13) 03:00.0 PCI bridge: ASMedia Technology Inc. ASM1083/1085 PCIe to PCI Bridge (rev 03)
I didn't get any useful information from the lspci output, but anyway, according to [ 0.067784] smpboot: CPU0: Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz (family: 0x6, model: 0x3c, stepping: 0x3) and #define INTEL_FAM6_HASWELL_CORE 0x3C this should be a HSW platform.
please reboot into BIOS SETUP and make sure that the system is using SETUP *DEFAULTS* and is not overclocked. you might also consider booting into memtest and running that overnight Another thing to try may be in the BIOS SETUP, disabling high frequency performance states. in general, this looks like an electrical hardware problem, rather than a Linux kernel issue.
I'm not overclocking. Everything is auto. The system was fine up to and including kernel 3.11.10-34.1 see https://bugzilla.opensuse.org/show_bug.cgi?id=990003 After that kernel I would need to use nohz=off to boot (OpenSuSE 13.1). I tried staying on that kernel for as long as possible, then tried to upgrade to leap, hoping a newer kernel (v 4+) would fix it. When I upgraded to OpenSuSE Leap 42.2 it wouldn't boot without acpi=off / intel_idle.max_cstate=1 I've assumed that since it works with the older kernel it's not a hardware problem? Is that fair? I will need to research what to disable re high frequency performance states. I think the BIOS has intel turbo boost set.
Looks like the hardware problem assessment was correct. I have replaced my power supply and can now boot with the kernel's default boot options. I have run the machine for several hours and no random reboots.
good to know. Bug closed