Bug 10093

Summary: 2.6.25-current-git blocks for 10-15 secs on boot unless CONFIG_CPU_IDLE=n - Apple
Product: ACPI Reporter: Rafael J. Wysocki (rjw)
Component: Power-ProcessorAssignee: Venkatesh Pallipadi (venki)
Status: CLOSED CODE_FIX    
Severity: normal CC: acpi-bugzilla, bunk, kernel, venki
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.25-rc Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 9832    
Attachments: kernel config
dmesg of normal boot with debug on
max.cstate=1 15 sec hang at ladder
Add few debug printk in cpuidle
Remove poll idle from cpuidle
cpu_idle_wait patch
cpu_idle_wait patch rebased with rc8

Description Rafael J. Wysocki 2008-02-24 17:30:34 UTC
Subject         : 2.6.25-current-git hangs on boot
Submitter       : Soeren Sonnenburg <kernel@nn7.de>
Date            : 2008-02-23 18:55
References      : http://lkml.org/lkml/2008/2/23/263
References      : http://marc.info/?l=linux-acpi&m=120387537018467&w=4
Handled-By      : "Pallipadi, Venkatesh" <venkatesh.pallipadi@intel.com>

This entry is being used for tracking a regression from 2.6.24.  Please don't
close it until the problem is fixed in the mainline.
Comment 1 Len Brown 2008-03-10 21:31:32 UTC
did 2.6..24 with CONFIG_CPU_IDLE=y work?
Comment 2 Venkatesh Pallipadi 2008-03-13 17:28:31 UTC
Soeren,

I tried reproducing this on Macbook Pro locally with latest git and and an earlier rc (25-rc2). I used the config that you had sent earlier. I dont seem to reproduce this problem.

Can you please retry with latest git and see whether the CPUIDLE problem is still there? attach the CONFIG you used to this bugzilla.

Also, if you can reproduce the problem still..
with ACPI PROCESSOR configured in kernel (not as module) and CPUIDLE try booting with processor.max_cstate=2 and see whether that helps.

Thanks,
Venki
Comment 3 Soeren Sonnenburg 2008-03-21 07:50:38 UTC
OK, I am just back from vacation and tried current git 457fb605834504af294916411be128a9b21fc3f6.

The CONFIG_CPU_IDLE=y hang is still there and yes it worked all fine with 2.6.24*.

Please not that it does not *always* hang but only from time to time.
Comment 4 Soeren Sonnenburg 2008-03-21 07:55:23 UTC
I cannot attach the config (as bugzilla displays an internal error) so I am sending it to Venkatesh directly.
Comment 5 Venkatesh Pallipadi 2008-03-21 08:08:49 UTC
Does it hang at a specific point during boot? Or at various times?
Comment 6 Soeren Sonnenburg 2008-03-21 08:29:42 UTC
always with cpuidle: using governor 
(I think ladder) line.
Comment 7 Soeren Sonnenburg 2008-03-21 08:30:07 UTC
Created attachment 15374 [details]
kernel config
Comment 8 Venkatesh Pallipadi 2008-03-21 15:16:12 UTC
Does
processor.max_cstate=2
or
processor.max_cstate=1
boot option help?
(Note ACPI->PROCESSOR needs to be configured as "in kernel" and not as module for the above boot options to work.
Comment 9 Venkatesh Pallipadi 2008-03-21 15:16:47 UTC
Does
processor.max_cstate=2
or
processor.max_cstate=1
boot option help?
(Note ACPI->PROCESSOR needs to be configured as "in kernel" and not as module for the above boot options to work.
Comment 10 Soeren Sonnenburg 2008-03-22 12:16:25 UTC
I tried cstate=1 and it managed to get it to hang too...
Comment 11 Venkatesh Pallipadi 2008-03-22 14:55:11 UTC
Hmm.. There were only few CPUIDLE patches that went in after .24. We should be able to check them individually. My first suspect is this patch. Can you remove this change from latest git and see whether that helps.

http://git.kernel.org/gitweb.cgi?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=9a0b841586c3c6c846effdbe75885c2ebc0031b0
Comment 12 Soeren Sonnenburg 2008-03-24 14:39:12 UTC
I've applied this patch and rebooted about 10 times and the hang seems gone...
Comment 13 Rafael J. Wysocki 2008-03-28 16:05:05 UTC
Patch : http://marc.info/?l=linux-kernel&m=120674502201007&w=4
Comment 14 Soeren Sonnenburg 2008-03-29 11:25:07 UTC
I've applied this patch (using latest git) and rebooted >10 times. I did not see #10117 to occur and the cpuidle bug (this bug) happened about 3 times. However after waiting for 5 seconds I decided to press the power off button and voila it *always* continued to boot...
Comment 15 Rafael J. Wysocki 2008-03-29 11:37:21 UTC
I don't quite understand your comment.  I _guess_ the patch didn't help?
Comment 16 Soeren Sonnenburg 2008-03-29 11:53:12 UTC
It fixed #10117 but not #10093.
Comment 17 Venkatesh Pallipadi 2008-03-31 09:58:19 UTC
Soeren,

With the patch from comment #13, can you try processor.max_cstate parameter experiment again. With that patch applied, expectation is that it should never hang with processor.max_cstate=1. Also try processor.max_cstate=2 and 3 to see which one sees this hang.

Also, in 3 hangs you mentioned in comment #14, where exactly does it hang? Does it always hang at menu/ladder governor message or at different places?
You said it continues to boot with power button pressed. Does it also continue to boot with any keyboard key pressed?

Thanks.
Comment 18 Soeren Sonnenburg 2008-03-31 10:08:22 UTC
Venkatesh,

I will do this experiment. Regarding the 3 hangs of comment #14: Yes they always happened with the ladder governor message being last.
Comment 19 Venkatesh Pallipadi 2008-03-31 10:12:29 UTC
One more qn:

Can you boot with parameter "debug" and attach the dmesg output when the system fully boots.
Comment 20 Soeren Sonnenburg 2008-04-02 13:40:04 UTC
I just recognized that it will continue booting whether I press a key or not. It is just that it might wait for 5-10 seconds *sometimes* before continuing.
Comment 21 Soeren Sonnenburg 2008-04-03 23:35:34 UTC
I've now booted with processor.max_cstate=1 - short hang was there again. I am attaching the dmesg with debug on a normal boot.
Comment 22 Soeren Sonnenburg 2008-04-03 23:36:11 UTC
Created attachment 15603 [details]
dmesg of normal boot with debug on
Comment 23 Venkatesh Pallipadi 2008-04-03 23:51:30 UTC
Was this the dmesg of the boot during which it waited for 15 sec?
If not, can you post the dmesg when that happens with max_cstate=1.
And also, exactly after what msg did it waits?
Comment 24 Soeren Sonnenburg 2008-04-03 23:57:32 UTC
No this was the dmesg for the *normal boot* with debug on the cmdline. It always waits at ladder governor. I am attaching the max_cstate=1 15sec dmesg now.
Comment 25 Soeren Sonnenburg 2008-04-03 23:58:19 UTC
Created attachment 15604 [details]
max.cstate=1 15 sec hang at ladder
Comment 26 Venkatesh Pallipadi 2008-04-04 16:26:33 UTC
Created attachment 15607 [details]
Add few debug printk in cpuidle
Comment 27 Venkatesh Pallipadi 2008-04-04 16:27:17 UTC
Created attachment 15608 [details]
Remove poll idle from cpuidle
Comment 28 Venkatesh Pallipadi 2008-04-04 16:31:02 UTC
Soeren,

Can you try
git + (patch in comment #13) + (patch in comment #26) + (patch in comment #27)

And see whether you still see the hang.
If you still see the hang
{
    what are the last few messages you see before the hang?
} Else /* If you do not see the hang with all the patches */
{
    try git + (patch in comment #13) + (patch in comment #26)
    and see whether you still see the hang
    if you see the hang
    {
          what are the last few messages before the hang?
    }
}


Thanks.
Comment 29 Rafael J. Wysocki 2008-04-06 13:59:12 UTC
References : http://lkml.org/lkml/2008/4/5/34
Comment 30 Venkatesh Pallipadi 2008-04-07 12:01:57 UTC
Soeren,

One more patch to try for this ladder governor block.
http://www.ussg.iu.edu/hypermail/linux/kernel/0802.1/0259.html

(
 Note that this patch will _not_ affect the other hang at
ACPI: ACPI0007:01 is registered as cooling_device1
ACPI: Processor [CPU1] (supports 8 throttling states)
 reported in bug #10117
)
Comment 31 Soeren Sonnenburg 2008-04-08 00:04:41 UTC
Venkatesh,

could you please attach the patch or point me to a raw version of it (not an html page)

Also should I revert the other patches before trying this one or how should this be used?

Thanks.
Comment 32 Venkatesh Pallipadi 2008-04-08 11:31:59 UTC
Created attachment 15668 [details]
cpu_idle_wait patch
Comment 33 Venkatesh Pallipadi 2008-04-08 11:33:35 UTC
Attached the cpu_idle_wait patch that I referred to in comment #30.

Please try this patch along with (patch in comment #13) + (patch in comment #26) over rc8.
Comment 34 Soeren Sonnenburg 2008-04-09 02:07:29 UTC
the patch has failed hunks... for both current git and rc8 applied after #13 and #26...
Comment 35 Venkatesh Pallipadi 2008-04-09 10:11:56 UTC
Created attachment 15703 [details]
cpu_idle_wait patch rebased with rc8
Comment 36 Venkatesh Pallipadi 2008-04-09 10:17:52 UTC
The reason I am asking you to check this patch in #35 is that this code is known to cause kind of problems (some delay while booting, at the time of switching governors) you are having.
Something similar was originally reported here:
http://kerneltrap.org/mailarchive/linux-kernel/2008/1/8/546527

Steven said that his patch that is in .24 fixed the problem for him. But, looks like there is some race in there still. The above patch from #35 simplifies cpu_idle_wait altogether. So, I am expecting that this race should noe be there any more.
Comment 37 Soeren Sonnenburg 2008-04-09 11:17:11 UTC
well done! I only applied the patch from #35 and now the system boots *much* faster not the 1-2 seconds waits I had with all the other previous attempts (if n ot even longer waits)... 
Comment 38 Venkatesh Pallipadi 2008-04-09 14:05:56 UTC
Thats good news. Did you try rebooting multiple times and couldn't reproduce the 10-15 seconds hang with ladder governor message any more?
If yes, I will go ahead and push the patch in #35 towards ingo/thomas/hpa..
Comment 39 Rafael J. Wysocki 2008-04-09 14:08:19 UTC
Regressions list annotation:
Patch : http://bugzilla.kernel.org/attachment.cgi?id=15703&action=view
Comment 40 Soeren Sonnenburg 2008-04-09 22:15:14 UTC
of course I *always* rebooted about 10 times before drawing conclusions ... but this patch is different in a way that I 

a) don't see any short wait (not even the 1-3 seconds that I am seeing usually)
b) don't see the long waits >10 seconds
Comment 41 Venkatesh Pallipadi 2008-04-09 22:35:21 UTC
OK. Thanks.
Thats great news...
Comment 42 Adrian Bunk 2008-04-10 15:46:09 UTC
fixed by commit 783e391b7b5b273cd20856d8f6f4878da8ec31b3