The kernel hangs at boot when CONFIG_INTEL_IDLE=y and CONFIG_ACPI=y. The problem doesn't appear when ACPI is disabled or the command line option intel_idle.max_cstate=0 is used.
Created attachment 26614 [details] kernel_log.txt.gz
can the hang be reproduced without this boot option? acpi_enforce_resources=lax Can the hang be reproduced with CONFIG_INTEL_IDLE=n ? Is the attached kernel log a success or a failure? it is running acpi_idle, so it must be that CONFIG_INTEL_IDLE=n there.
please re-open if this is reproducible with 2.6.35.stable, 2.6.36, or later.
I don't have the rights to re-open the bug but I confirm the bug in the latest version of the kernel currently available, i.e. 2.6.37-rc2-git6. I have also tried to remove the boot option acpi_enforce_resources=lax but the problem is still there. So to summarize the hang is reproducible every time with: - CONFIG_INTEL_IDLE=y and acpi_enforce_resources=lax - CONFIG_INTEL_IDLE=y without acpi_enforce_resources=lax
Created attachment 37712 [details] dmesg-2.6.37-rc2-git6.log Here is the dmesg log of a working kernel, when CONFIG_INTEL_IDLE=y the kernel hangs at line HDA Intel 0000:00:1b.0: PCI INT A -> GSI 22 (level, low) -> IRQ 22
Created attachment 37722 [details] config-2.6-37-rc2-git6.gz Here is the config file of the working kernel
the working config has CONFIG_INTEL_IDLE=n, while the failing config has it enabled. For the working config, please show the output from: grep . /sys/devices/system/cpu/cpu*/cpuidle/*/* For the failing config, please verify that it can boot with "intel_idle.max_cstate=0", and then change the 0 to a 1, then a 2 etc until the failure comes back, and report the lowest value which allows the failure to occur.
With the working config I get # grep . /sys/devices/system/cpu/cpuidle/* /sys/devices/system/cpu/cpuidle/current_driver:acpi_idle /sys/devices/system/cpu/cpuidle/current_governor_ro:ladder With the failing config I can boot with intel_idle.max_cstate equal to 0 or 1 but it starts to hang with intel_idle.max_cstate=2
for the working config, please show the output from: grep . /sys/devices/system/cpu/cpu*/cpuidle/*/* > # CONFIG_NO_HZ is not set Please try CONFIG_NO_HZ=y
FWIW, the config in comment #6 boots for me on Fedora 13 -- (though I had to add EXT4) I tried it on a few different Core i7 machines.
(In reply to comment #9) > Please try CONFIG_NO_HZ=y With dynamic ticks and CONFIG_INTEL_IDLE=y the kernel is booting correctly. (In reply to comment #10) > FWIW, the config in comment #6 boots for me on Fedora 13 -- > (though I had to add EXT4) I tried it on a few different > Core i7 machines. Yes, that is with CONFIG_INTEL_IDLE=n
>> FWIW, the config in comment #6 boots for me on Fedora 13 -- >> (though I had to add EXT4) I tried it on a few different >> Core i7 machines. > >Yes, that is with CONFIG_INTEL_IDLE=n it still boots for me with CONFIG_INTEL_IDLE=y any difference if the failing config is booted with "nolapic_timer"? I'd like to see that acpi_idle and intel_idle are using the same states. Taking your working config with... CONFIG_NO_HZ=y CONFIG_INTEL_IDLE=y please show the output from grep . /sys/devices/system/cpu/cpu0/cpuidle/*/* and then reboot with "intel_idle.max_cstate=0" and show the same output for acpi_idle. BTW. is the latest BIOS being used, and is it using SETUP defaults?
(In reply to comment #12) > any difference if the failing config is booted with "nolapic_timer"? CONFIG_NO_HZ=n + CONFIG_INTEL_IDLE=y + nolapic_timer works > I'd like to see that acpi_idle and intel_idle are using > the same states. Taking your working config with... > CONFIG_NO_HZ=y > CONFIG_INTEL_IDLE=y > > please show the output from > grep . /sys/devices/system/cpu/cpu0/cpuidle/*/* /sys/devices/system/cpu/cpuidle/current_driver:intel_idle /sys/devices/system/cpu/cpuidle/current_governor_ro:menu /sys/devices/system/cpu/cpu0/cpuidle/state0/desc:CPUIDLE CORE POLL IDLE /sys/devices/system/cpu/cpu0/cpuidle/state0/latency:0 /sys/devices/system/cpu/cpu0/cpuidle/state0/name:C0 /sys/devices/system/cpu/cpu0/cpuidle/state0/power:4294967295 /sys/devices/system/cpu/cpu0/cpuidle/state0/time:16691157 /sys/devices/system/cpu/cpu0/cpuidle/state0/usage:15677 /sys/devices/system/cpu/cpu0/cpuidle/state1/desc:MWAIT 0x00 /sys/devices/system/cpu/cpu0/cpuidle/state1/latency:3 /sys/devices/system/cpu/cpu0/cpuidle/state1/name:NHM-C1 /sys/devices/system/cpu/cpu0/cpuidle/state1/power:4294967294 /sys/devices/system/cpu/cpu0/cpuidle/state1/time:456107803 /sys/devices/system/cpu/cpu0/cpuidle/state1/usage:2517336 /sys/devices/system/cpu/cpu0/cpuidle/state2/desc:MWAIT 0x10 /sys/devices/system/cpu/cpu0/cpuidle/state2/latency:20 /sys/devices/system/cpu/cpu0/cpuidle/state2/name:NHM-C3 /sys/devices/system/cpu/cpu0/cpuidle/state2/power:4294967293 /sys/devices/system/cpu/cpu0/cpuidle/state2/time:10896533025 /sys/devices/system/cpu/cpu0/cpuidle/state2/usage:8510403 /sys/devices/system/cpu/cpu0/cpuidle/state3/desc:MWAIT 0x20 /sys/devices/system/cpu/cpu0/cpuidle/state3/latency:200 /sys/devices/system/cpu/cpu0/cpuidle/state3/name:NHM-C6 /sys/devices/system/cpu/cpu0/cpuidle/state3/power:4294967292 /sys/devices/system/cpu/cpu0/cpuidle/state3/time:21110859942 /sys/devices/system/cpu/cpu0/cpuidle/state3/usage:6648787 > and then reboot with "intel_idle.max_cstate=0" > and show the same output for acpi_idle. In this case I don't have a cpuidle folder inside cpuX but only the cpuidle folder reported above in #8 > BTW. is the latest BIOS being used, > and is it using SETUP defaults? BIOS 0302, the customizations are the DDR3 frequency and QPI link data rate. I have also tried - latest BIOS version - setup defaults but nothing changes
As CONFIG_NO_HZ=n + CONFIG_INTEL_IDLE=y + nolapic_timer works, and CONFIG_NO_HZ=y + CONFIG_INTEL_IDLE=y works, it seems that the failure may be specific to LAPIC timer one-shot mode. > > ... show the same output for acpi_idle. > In this case I don't have a cpuidle folder inside cpuX but only the cpuidle > folder reported above in #8 This means that you have no C-states when running in ACPI mode. Please double check that they are not disabled in BIOS SETUP, please double check what the default BIOS setting for this is. If you can enable them, give that a go and see if we can get the grep . /sys/devices/system/cpu/cpu0/cpuidle/*/* for acpi_idle. please attach the output from acpidump --- please boot the failing config with intel_idle.max_cstate=1 and then reboot, increasing the '1' and report the lowest number where the failure comes back.
(In reply to comment #14) > As CONFIG_NO_HZ=n + CONFIG_INTEL_IDLE=y + nolapic_timer works, > and CONFIG_NO_HZ=y + CONFIG_INTEL_IDLE=y works, > it seems that the failure may be specific > to LAPIC timer one-shot mode. Any hint to debug this? > This means that you have no C-states when running in ACPI mode. > Please double check that they are not disabled in BIOS SETUP, > please double check what the default BIOS setting for this is. > If you can enable them, give that a go and see if we can get > the grep . /sys/devices/system/cpu/cpu0/cpuidle/*/* > for acpi_idle. ok, I had disabled Intel C-states in the BIOS. Here is the result: /sys/devices/system/cpu/cpuidle/current_driver:acpi_idle /sys/devices/system/cpu/cpuidle/current_governor_ro:ladder /sys/devices/system/cpu/cpu0/cpuidle/state0/desc:CPUIDLE CORE POLL IDLE /sys/devices/system/cpu/cpu0/cpuidle/state0/latency:0 /sys/devices/system/cpu/cpu0/cpuidle/state0/name:C0 /sys/devices/system/cpu/cpu0/cpuidle/state0/power:4294967295 /sys/devices/system/cpu/cpu0/cpuidle/state0/time:0 /sys/devices/system/cpu/cpu0/cpuidle/state0/usage:0 /sys/devices/system/cpu/cpu0/cpuidle/state1/desc:ACPI FFH INTEL MWAIT 0x0 /sys/devices/system/cpu/cpu0/cpuidle/state1/latency:1 /sys/devices/system/cpu/cpu0/cpuidle/state1/name:C1 /sys/devices/system/cpu/cpu0/cpuidle/state1/power:4294967294 /sys/devices/system/cpu/cpu0/cpuidle/state1/time:10068558 /sys/devices/system/cpu/cpu0/cpuidle/state1/usage:41999 /sys/devices/system/cpu/cpu0/cpuidle/state2/desc:ACPI FFH INTEL MWAIT 0x10 /sys/devices/system/cpu/cpu0/cpuidle/state2/latency:17 /sys/devices/system/cpu/cpu0/cpuidle/state2/name:C2 /sys/devices/system/cpu/cpu0/cpuidle/state2/power:4294967293 /sys/devices/system/cpu/cpu0/cpuidle/state2/time:74892494 /sys/devices/system/cpu/cpu0/cpuidle/state2/usage:157805 /sys/devices/system/cpu/cpu0/cpuidle/state3/desc:ACPI FFH INTEL MWAIT 0x20 /sys/devices/system/cpu/cpu0/cpuidle/state3/latency:17 /sys/devices/system/cpu/cpu0/cpuidle/state3/name:C3 /sys/devices/system/cpu/cpu0/cpuidle/state3/power:4294967292 /sys/devices/system/cpu/cpu0/cpuidle/state3/time:5229991003 /sys/devices/system/cpu/cpu0/cpuidle/state3/usage:2348180 > please attach the output from acpidump attached > please boot the failing config with > intel_idle.max_cstate=1 > and then reboot, increasing the '1' > and report the lowest number where the failure comes back. the lowest number reproducing the problem is 2
Created attachment 38362 [details] acpi_dump.txt.gz
When C-states are enabled in the BIOS, do you still see the failure when running intel_idle? There is a very similar sighting in bug #20722 where they see an issue only when: C-states disabled in BIOS INTEL_IDLE=y; NO_HZ=n (implies ladder governor); HIGH_RES_TIMERS=n; intel_idle.max_cstate>1 and changing any of those makes the failure go away.
(In reply to comment #17) > When C-states are enabled in the BIOS, do you still see the failure > when running intel_idle? There is a very similar sighting in > bug #20722 where they see an issue only when: > > C-states disabled in BIOS > INTEL_IDLE=y; > NO_HZ=n (implies ladder governor); > HIGH_RES_TIMERS=n; > intel_idle.max_cstate>1 > > and changing any of those makes the failure go away. With C-states enabled and intel_idle I still get the hang (with NO_HZ=n and with or without HIGH_RES_TIMERS, so it seems HIGH_RES_TIMERS doesn't have an impact in my case).
I'm attaching now a comparison of ACPI and MSR dumps with C-states enabled/disabled
Created attachment 38382 [details] acpidump_cstates_disabled.txt.gz
Created attachment 38392 [details] acpidump_cstates_enabled.txt.gz
Created attachment 38402 [details] msr_cstates_disabled.out.gz
Created attachment 38412 [details] msr_cstates_enabled.out.gz
>> As CONFIG_NO_HZ=n + CONFIG_INTEL_IDLE=y + nolapic_timer works, >> and CONFIG_NO_HZ=y + CONFIG_INTEL_IDLE=y works, >> it seems that the failure may be specific >> to LAPIC timer one-shot mode. Looks like I said that backwards -- too many double-negatives:-) Normal Tickless idle (CONFIG_N0_HZ=y) is working fine, so lapic one-shot mode is working fine. Old style tickfull idle (CONFIG_NO_HZ=n) is failing, but it works with nolapic_timer -- so that suggests LAPIC timer periodic mode is broken. > Any hint to debug this? Now that ACPI C-states are enabled in your BIOS, and it appears that intel_idle and acpi_idle are using the same states, can you reproduce this issue with acpi via CONFIG_INTEL_IDLE=n? (or simply boot with intel_idle.max_cstate=0) If acpi_idle is behaving the same was as intel_idle, then processor.max_cstate=1 should work processor.max_cstate=2 should fail
It's great that kernel bugzilla is back. can you please verify if the problem still exists in the latest upstream kernel? can you please verify if the problem still exists if you follows the suggestions in comment #24.
Right now I'm using the kernel 3.3.0-rc1-wl-00064-g52a3f5d. I can't reproduce the problem anymore but to be honest I don't remember all the configurations tested :-) Now INTEL_IDLE=y works with NO_HZ=y or NO_HZ=n without using nolapic_timer.
okay. I'll close this bug for now. Please feel free to reopen it once you got the problem again. :)