Bug 16105 - intel_idle boot hang if CONFIG_NO_HZ=n - Asus P6T DELUXE X58 motherboard
Summary: intel_idle boot hang if CONFIG_NO_HZ=n - Asus P6T DELUXE X58 motherboard
Status: CLOSED UNREPRODUCIBLE
Alias: None
Product: Power Management
Classification: Unclassified
Component: intel_idle (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: power-management_intel_idle@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-06-02 10:16 UTC by Fabio Rossi
Modified: 2012-02-02 02:50 UTC (History)
3 users (show)

See Also:
Kernel Version: 2.6.37-rc2-git6
Subsystem:
Regression: No
Bisected commit-id:


Attachments
kernel_log.txt.gz (12.92 KB, application/gzip)
2010-06-02 10:18 UTC, Fabio Rossi
Details
dmesg-2.6.37-rc2-git6.log (50.04 KB, text/plain)
2010-11-20 10:52 UTC, Fabio Rossi
Details
config-2.6-37-rc2-git6.gz (15.40 KB, application/gzip)
2010-11-20 10:54 UTC, Fabio Rossi
Details
acpi_dump.txt.gz (61.79 KB, application/gzip)
2010-11-27 23:24 UTC, Fabio Rossi
Details
acpidump_cstates_disabled.txt.gz (61.79 KB, application/gzip)
2010-11-28 10:42 UTC, Fabio Rossi
Details
acpidump_cstates_enabled.txt.gz (61.80 KB, application/gzip)
2010-11-28 10:43 UTC, Fabio Rossi
Details
msr_cstates_disabled.out.gz (1.18 KB, application/gzip)
2010-11-28 10:43 UTC, Fabio Rossi
Details
msr_cstates_enabled.out.gz (1.26 KB, application/gzip)
2010-11-28 10:43 UTC, Fabio Rossi
Details

Description Fabio Rossi 2010-06-02 10:16:40 UTC
The kernel hangs at boot when CONFIG_INTEL_IDLE=y and CONFIG_ACPI=y. The problem doesn't appear when ACPI is disabled or the command line option intel_idle.max_cstate=0 is used.
Comment 1 Fabio Rossi 2010-06-02 10:18:16 UTC
Created attachment 26614 [details]
kernel_log.txt.gz
Comment 2 Len Brown 2010-10-19 03:35:10 UTC
can the hang be reproduced without this boot option?

acpi_enforce_resources=lax

Can the hang be reproduced with CONFIG_INTEL_IDLE=n ?
Is the attached kernel log a success or a failure?
it is running acpi_idle, so it must be that
CONFIG_INTEL_IDLE=n there.
Comment 3 Len Brown 2010-10-22 06:03:12 UTC
please re-open if this is reproducible with 2.6.35.stable, 2.6.36, or later.
Comment 4 Fabio Rossi 2010-11-20 10:46:59 UTC
I don't have the rights to re-open the bug but I confirm the bug in the latest version of the kernel currently available, i.e. 2.6.37-rc2-git6.

I have also tried to remove the boot option acpi_enforce_resources=lax but the problem is still there. So to summarize the hang is reproducible every time with:
- CONFIG_INTEL_IDLE=y and acpi_enforce_resources=lax
- CONFIG_INTEL_IDLE=y without acpi_enforce_resources=lax
Comment 5 Fabio Rossi 2010-11-20 10:52:15 UTC
Created attachment 37712 [details]
dmesg-2.6.37-rc2-git6.log

Here is the dmesg log of a working kernel, when CONFIG_INTEL_IDLE=y the kernel hangs at line

HDA Intel 0000:00:1b.0: PCI INT A -> GSI 22 (level, low) -> IRQ 22
Comment 6 Fabio Rossi 2010-11-20 10:54:52 UTC
Created attachment 37722 [details]
config-2.6-37-rc2-git6.gz

Here is the config file of the working kernel
Comment 7 Len Brown 2010-11-23 16:10:15 UTC
the working config has CONFIG_INTEL_IDLE=n,
while the failing config has it enabled.

For the working config, please show the output from:

grep . /sys/devices/system/cpu/cpu*/cpuidle/*/*

For the failing config, please verify that it can boot with
"intel_idle.max_cstate=0", and then change the 0 to a 1,
then a 2 etc until the failure comes back, and report
the lowest value which allows the failure to occur.
Comment 8 Fabio Rossi 2010-11-23 23:00:03 UTC
With the working config I get

# grep . /sys/devices/system/cpu/cpuidle/*
/sys/devices/system/cpu/cpuidle/current_driver:acpi_idle
/sys/devices/system/cpu/cpuidle/current_governor_ro:ladder

With the failing config I can boot with intel_idle.max_cstate equal to 0 or 1 but it starts to hang with intel_idle.max_cstate=2
Comment 9 Len Brown 2010-11-24 05:44:29 UTC
for the working config, please show the output from:

grep . /sys/devices/system/cpu/cpu*/cpuidle/*/*

> # CONFIG_NO_HZ is not set

Please try CONFIG_NO_HZ=y
Comment 10 Len Brown 2010-11-24 16:48:30 UTC
FWIW, the config in comment #6 boots for me on Fedora 13 --
(though I had to add EXT4)  I tried it on a few different
Core i7 machines.
Comment 11 Fabio Rossi 2010-11-26 12:36:46 UTC
(In reply to comment #9)

> Please try CONFIG_NO_HZ=y

With dynamic ticks and CONFIG_INTEL_IDLE=y the kernel is booting correctly.


(In reply to comment #10)

> FWIW, the config in comment #6 boots for me on Fedora 13 --
> (though I had to add EXT4)  I tried it on a few different
> Core i7 machines.

Yes, that is with CONFIG_INTEL_IDLE=n
Comment 12 Len Brown 2010-11-26 20:02:32 UTC
>> FWIW, the config in comment #6 boots for me on Fedora 13 --
>> (though I had to add EXT4)  I tried it on a few different
>> Core i7 machines.
>
>Yes, that is with CONFIG_INTEL_IDLE=n

it still boots for me with CONFIG_INTEL_IDLE=y

any difference if the failing config is booted with "nolapic_timer"?

I'd like to see that acpi_idle and intel_idle are using
the same states.  Taking your working config with...
CONFIG_NO_HZ=y
CONFIG_INTEL_IDLE=y

please show the output from
grep . /sys/devices/system/cpu/cpu0/cpuidle/*/*

and then reboot with "intel_idle.max_cstate=0"
and show the same output for acpi_idle.

BTW. is the latest BIOS being used,
and is it using SETUP defaults?
Comment 13 Fabio Rossi 2010-11-27 01:08:07 UTC
(In reply to comment #12)

> any difference if the failing config is booted with "nolapic_timer"?

CONFIG_NO_HZ=n + CONFIG_INTEL_IDLE=y + nolapic_timer works
 
> I'd like to see that acpi_idle and intel_idle are using
> the same states.  Taking your working config with...
> CONFIG_NO_HZ=y
> CONFIG_INTEL_IDLE=y
> 
> please show the output from
> grep . /sys/devices/system/cpu/cpu0/cpuidle/*/*

/sys/devices/system/cpu/cpuidle/current_driver:intel_idle
/sys/devices/system/cpu/cpuidle/current_governor_ro:menu
/sys/devices/system/cpu/cpu0/cpuidle/state0/desc:CPUIDLE CORE POLL IDLE
/sys/devices/system/cpu/cpu0/cpuidle/state0/latency:0
/sys/devices/system/cpu/cpu0/cpuidle/state0/name:C0
/sys/devices/system/cpu/cpu0/cpuidle/state0/power:4294967295
/sys/devices/system/cpu/cpu0/cpuidle/state0/time:16691157
/sys/devices/system/cpu/cpu0/cpuidle/state0/usage:15677
/sys/devices/system/cpu/cpu0/cpuidle/state1/desc:MWAIT 0x00
/sys/devices/system/cpu/cpu0/cpuidle/state1/latency:3
/sys/devices/system/cpu/cpu0/cpuidle/state1/name:NHM-C1
/sys/devices/system/cpu/cpu0/cpuidle/state1/power:4294967294
/sys/devices/system/cpu/cpu0/cpuidle/state1/time:456107803
/sys/devices/system/cpu/cpu0/cpuidle/state1/usage:2517336
/sys/devices/system/cpu/cpu0/cpuidle/state2/desc:MWAIT 0x10
/sys/devices/system/cpu/cpu0/cpuidle/state2/latency:20
/sys/devices/system/cpu/cpu0/cpuidle/state2/name:NHM-C3
/sys/devices/system/cpu/cpu0/cpuidle/state2/power:4294967293
/sys/devices/system/cpu/cpu0/cpuidle/state2/time:10896533025
/sys/devices/system/cpu/cpu0/cpuidle/state2/usage:8510403
/sys/devices/system/cpu/cpu0/cpuidle/state3/desc:MWAIT 0x20
/sys/devices/system/cpu/cpu0/cpuidle/state3/latency:200
/sys/devices/system/cpu/cpu0/cpuidle/state3/name:NHM-C6
/sys/devices/system/cpu/cpu0/cpuidle/state3/power:4294967292
/sys/devices/system/cpu/cpu0/cpuidle/state3/time:21110859942
/sys/devices/system/cpu/cpu0/cpuidle/state3/usage:6648787

> and then reboot with "intel_idle.max_cstate=0"
> and show the same output for acpi_idle.

In this case I don't have a cpuidle folder inside cpuX but only the cpuidle folder reported above in #8
 
> BTW. is the latest BIOS being used,
> and is it using SETUP defaults?

BIOS 0302, the customizations are the DDR3 frequency and QPI link data rate. I have also tried
- latest BIOS version
- setup defaults
but nothing changes
Comment 14 Len Brown 2010-11-27 20:37:40 UTC
As CONFIG_NO_HZ=n + CONFIG_INTEL_IDLE=y + nolapic_timer works,
and CONFIG_NO_HZ=y + CONFIG_INTEL_IDLE=y works,
it seems that the failure may be specific
to LAPIC timer one-shot mode.

> > ...  show the same output for acpi_idle.

> In this case I don't have a cpuidle folder inside cpuX but only the cpuidle
> folder reported above in #8

This means that you have no C-states when running in ACPI mode.
Please double check that they are not disabled in BIOS SETUP,
please double check what the default BIOS setting for this is.
If you can enable them, give that a go and see if we can get
the grep . /sys/devices/system/cpu/cpu0/cpuidle/*/*
for acpi_idle.

please attach the output from acpidump
---
please boot the failing config with
intel_idle.max_cstate=1
and then reboot, increasing the '1'
and report the lowest number where the failure comes back.
Comment 15 Fabio Rossi 2010-11-27 23:23:07 UTC
(In reply to comment #14)

> As CONFIG_NO_HZ=n + CONFIG_INTEL_IDLE=y + nolapic_timer works,
> and CONFIG_NO_HZ=y + CONFIG_INTEL_IDLE=y works,
> it seems that the failure may be specific
> to LAPIC timer one-shot mode.

Any hint to debug this?

> This means that you have no C-states when running in ACPI mode.
> Please double check that they are not disabled in BIOS SETUP,
> please double check what the default BIOS setting for this is.
> If you can enable them, give that a go and see if we can get
> the grep . /sys/devices/system/cpu/cpu0/cpuidle/*/*
> for acpi_idle.

ok, I had disabled Intel C-states in the BIOS. Here is the result:

/sys/devices/system/cpu/cpuidle/current_driver:acpi_idle
/sys/devices/system/cpu/cpuidle/current_governor_ro:ladder
/sys/devices/system/cpu/cpu0/cpuidle/state0/desc:CPUIDLE CORE POLL IDLE
/sys/devices/system/cpu/cpu0/cpuidle/state0/latency:0
/sys/devices/system/cpu/cpu0/cpuidle/state0/name:C0
/sys/devices/system/cpu/cpu0/cpuidle/state0/power:4294967295
/sys/devices/system/cpu/cpu0/cpuidle/state0/time:0
/sys/devices/system/cpu/cpu0/cpuidle/state0/usage:0
/sys/devices/system/cpu/cpu0/cpuidle/state1/desc:ACPI FFH INTEL MWAIT 0x0
/sys/devices/system/cpu/cpu0/cpuidle/state1/latency:1
/sys/devices/system/cpu/cpu0/cpuidle/state1/name:C1
/sys/devices/system/cpu/cpu0/cpuidle/state1/power:4294967294
/sys/devices/system/cpu/cpu0/cpuidle/state1/time:10068558
/sys/devices/system/cpu/cpu0/cpuidle/state1/usage:41999
/sys/devices/system/cpu/cpu0/cpuidle/state2/desc:ACPI FFH INTEL MWAIT 0x10
/sys/devices/system/cpu/cpu0/cpuidle/state2/latency:17
/sys/devices/system/cpu/cpu0/cpuidle/state2/name:C2
/sys/devices/system/cpu/cpu0/cpuidle/state2/power:4294967293
/sys/devices/system/cpu/cpu0/cpuidle/state2/time:74892494
/sys/devices/system/cpu/cpu0/cpuidle/state2/usage:157805
/sys/devices/system/cpu/cpu0/cpuidle/state3/desc:ACPI FFH INTEL MWAIT 0x20
/sys/devices/system/cpu/cpu0/cpuidle/state3/latency:17
/sys/devices/system/cpu/cpu0/cpuidle/state3/name:C3
/sys/devices/system/cpu/cpu0/cpuidle/state3/power:4294967292
/sys/devices/system/cpu/cpu0/cpuidle/state3/time:5229991003
/sys/devices/system/cpu/cpu0/cpuidle/state3/usage:2348180
 
> please attach the output from acpidump

attached

> please boot the failing config with
> intel_idle.max_cstate=1
> and then reboot, increasing the '1'
> and report the lowest number where the failure comes back.

the lowest number reproducing the problem is 2
Comment 16 Fabio Rossi 2010-11-27 23:24:06 UTC
Created attachment 38362 [details]
acpi_dump.txt.gz
Comment 17 Len Brown 2010-11-28 08:10:24 UTC
When C-states are enabled in the BIOS, do you still see the failure
when running intel_idle?  There is a very similar sighting in
bug #20722 where they see an issue only when:

    C-states disabled in BIOS
    INTEL_IDLE=y;
    NO_HZ=n (implies ladder governor);
    HIGH_RES_TIMERS=n;
    intel_idle.max_cstate>1

and changing any of those makes the failure go away.
Comment 18 Fabio Rossi 2010-11-28 10:39:00 UTC
(In reply to comment #17)

> When C-states are enabled in the BIOS, do you still see the failure
> when running intel_idle?  There is a very similar sighting in
> bug #20722 where they see an issue only when:
> 
>     C-states disabled in BIOS
>     INTEL_IDLE=y;
>     NO_HZ=n (implies ladder governor);
>     HIGH_RES_TIMERS=n;
>     intel_idle.max_cstate>1
> 
> and changing any of those makes the failure go away.

With C-states enabled and intel_idle I still get the hang (with NO_HZ=n and with or without HIGH_RES_TIMERS, so it seems HIGH_RES_TIMERS doesn't have an impact in my case).
Comment 19 Fabio Rossi 2010-11-28 10:42:08 UTC
I'm attaching now a comparison of ACPI and MSR dumps with C-states enabled/disabled
Comment 20 Fabio Rossi 2010-11-28 10:42:44 UTC
Created attachment 38382 [details]
acpidump_cstates_disabled.txt.gz
Comment 21 Fabio Rossi 2010-11-28 10:43:01 UTC
Created attachment 38392 [details]
acpidump_cstates_enabled.txt.gz
Comment 22 Fabio Rossi 2010-11-28 10:43:28 UTC
Created attachment 38402 [details]
msr_cstates_disabled.out.gz
Comment 23 Fabio Rossi 2010-11-28 10:43:46 UTC
Created attachment 38412 [details]
msr_cstates_enabled.out.gz
Comment 24 Len Brown 2011-08-01 16:06:23 UTC
>> As CONFIG_NO_HZ=n + CONFIG_INTEL_IDLE=y + nolapic_timer works,
>> and CONFIG_NO_HZ=y + CONFIG_INTEL_IDLE=y works,
>> it seems that the failure may be specific
>> to LAPIC timer one-shot mode.

Looks like I said that backwards -- too many double-negatives:-)

Normal Tickless idle (CONFIG_N0_HZ=y) is working fine,
so lapic one-shot mode is working fine.

Old style tickfull idle (CONFIG_NO_HZ=n) is failing,
but it works with nolapic_timer -- so that suggests
LAPIC timer periodic mode is broken.

> Any hint to debug this?

Now that ACPI C-states are enabled in your BIOS,
and it appears that intel_idle and acpi_idle are using
the same states, can you reproduce this issue with acpi via
CONFIG_INTEL_IDLE=n? (or simply boot with intel_idle.max_cstate=0)

If acpi_idle is behaving the same was as intel_idle, then
processor.max_cstate=1 should work
processor.max_cstate=2 should fail
Comment 25 Zhang Rui 2012-01-18 02:11:49 UTC
It's great that kernel bugzilla is back.

can you please verify if the problem still exists in the latest upstream
kernel?
can you please verify if the problem still exists if you follows the suggestions in comment #24.
Comment 26 Fabio Rossi 2012-01-31 12:18:01 UTC
Right now I'm using the kernel 3.3.0-rc1-wl-00064-g52a3f5d. I can't reproduce the problem anymore but to be honest I don't remember all the configurations tested :-)

Now INTEL_IDLE=y works with NO_HZ=y or NO_HZ=n without using nolapic_timer.
Comment 27 Zhang Rui 2012-02-02 02:50:02 UTC
okay.
I'll close this bug for now. Please feel free to reopen it once you got the problem again. :)

Note You need to log in before you can comment on or make changes to this bug.