Bug 29992

Summary: boot hang 2.6.37.1 regression w/ intel_idle and CONFIG_NO_HZ=n - asus p7p55d le
Product: Power Management Reporter: De Ganseman Amaury (amaury.deganseman)
Component: intel_idleAssignee: Shaohua (shaohua.li)
Status: REJECTED UNREPRODUCIBLE    
Severity: normal CC: acpi-bugzilla, amaury.deganseman, florian, graham.anderson, lenb, maciej.rutecki, rjw, shaohua.li
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.37.1 and 2.6.37.2 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 27352    
Attachments: My .config without INTEL_IDLE
My dmesg
/proc/cpuinfo

Description De Ganseman Amaury 2011-02-27 10:38:36 UTC
The kernel hangs at boot with intel_idle option only for 2.6.37.1 and 2.6.37.2 kernel.

I have no problem with 2.6.37.

My .config in attach.


Can be related to this bug:

https://bugzilla.kernel.org/show_bug.cgi?id=26502
Comment 1 De Ganseman Amaury 2011-02-27 10:39:30 UTC
Created attachment 49472 [details]
My .config without INTEL_IDLE
Comment 2 Len Brown 2011-03-01 02:18:36 UTC
can't be relatred to bug 26502 because that failure
is specific to acpi_idle, and here you're running intel_idle.

can you revert the two intel_idle patches that were
applied to 2.6.37.stable and re-test?
(you can just grab the intel_idle.c from 2.6.37.0 and compile that)

also, please describe the machine model
and supply the output from dmesg

do any cmdline params work around the hang?
eg "nolapic_timer" or "hpet=disable" etc
Comment 3 De Ganseman Amaury 2011-03-02 10:36:34 UTC
It works with kernel 2.6.37.2 and intel_idle.c from 2.6.37.0.

I didn't forget to add intel_idle kernel option, make clean and recompile.
Comment 4 De Ganseman Amaury 2011-03-02 10:37:16 UTC
Created attachment 49882 [details]
My dmesg
Comment 5 De Ganseman Amaury 2011-03-02 10:37:34 UTC
Created attachment 49892 [details]
/proc/cpuinfo
Comment 6 De Ganseman Amaury 2011-03-02 10:38:07 UTC
I forget to add my motherboard: asus p7p55d le
Comment 7 Len Brown 2011-03-03 01:51:06 UTC
The difference between what worked and what failed was these two patches:


ommit 0f212b87548cc4598fb7c77d92bfef23d5ee4d1a
Author: Shaohua Li <shaohua.li@intel.com>
Date:   Mon Jan 24 08:00:01 2011 +0000

    fix a shutdown regression in intel_idle
    
    commit ec30f343d61391ab23705e50a525da1d55395780 upstream.


commit 0f076e96eae1e03f5fd988911c7062dee22e14a6
Author: Shaohua Li <shaohua.li@intel.com>
Date:   Mon Jan 10 09:38:12 2011 +0800

    intel_idle: open broadcast clock event
    
    commit 2a2d31c8dc6f1ebcf5eab1d93a0cb0fb4ed57c7c upstream.

BTW. does the upstream 2.6.38-rc kernel also fail?
Also, I assume the failing kernel works properly if you
boot with "intel_idle.max_cstate=0", yes?
Comment 8 Len Brown 2011-03-03 01:54:29 UTC
> cpuidle: using governor ladder

# CONFIG_NO_HZ is not set

Does this problem go away with CONFIG_NO_HZ=y ?
Comment 9 De Ganseman Amaury 2011-03-04 18:10:57 UTC
Yes it go away with CONFIG_NO_HZ=y

I have no time today to try .38-rc kernel.
I'll try tomorrow.


Thanks


N.B: intel_idle.max_cstate=0 <---- Where I have to put that ?
Comment 10 Florian Mickler 2011-03-05 00:48:38 UTC
That would go on the kernel commandline. (hitting e on the selected kernel in grub and inserting it there)
Comment 11 Shaohua 2011-03-07 01:01:32 UTC
please try a .38-rc kernel.
does reverting the two commits mentioned in comment #7 fix the problem?
Comment 12 De Ganseman Amaury 2011-03-07 12:50:41 UTC
It's OK with the .38-rc7

To revert the two commits I use the intel_idle.c from 2.6.37.0 version with .37.2 kernel.
Comment 13 Shaohua 2011-03-08 07:46:43 UTC
since .38-rc7 works, this means the intel-idle changes might not be the reason. we might need to backport something else related to timer to 2.6.37.
any chance you can try "nolapic_timer" or "hpet=disable"? this will help us to isolate the problem.
Comment 14 Graham Anderson 2011-03-11 10:49:38 UTC
I am experiencing this issue with 2.6.37.1 & 2.6.37.2 as well

Hardware profile: http://www.smolts.org/client/show/pub_1a849a4e-cf37-4b09-92a6-304e1f8d9968

Downstream bug report: https://bugzilla.novell.com/show_bug.cgi?id=675161

Test with other params as per comment #13 in this report

pass only: intel_idle.max_cstate=0
result: System boots normally and behaves as expected with no further issues

pass only: nolapic_timer
result: hang/crash during boot as per this bug report

pass only: hpet=disable
result: system sometimes boots, but completely locks up soon after
Comment 15 Len Brown 2011-03-22 02:09:03 UTC
De Ganseman Amaury,
please clarify comment #12.  Did un-modified 2.6.38-rc work,
or did you revert patches from it?  At this point 2.6.38 is released,
so please report if that, unmodified, works.  Also, I assume that
you are still building with CONFIG_NO_HZ=n, since that was the only
way to provoke the failure?

Graham, does this issue go away with CONFIG_NO_HZ=y for you also?

Graham, does reverting the two intel_idle patches make 2.6.37 work,
or make 2.6.38 work?
Comment 16 Florian Mickler 2011-05-01 14:30:57 UTC
...Ping...
Comment 17 Len Brown 2011-08-01 18:49:09 UTC
Please reply to questions above and re-open if still an issue
with a modern kernel.
Comment 18 De Ganseman Amaury 2011-08-01 19:08:48 UTC
No issues with newer kernel.