Subject : 2.6.25-current-git hangs on boot
Submitter : Soeren Sonnenburg <email@example.com>
Date : 2008-02-23 18:55
References : http://lkml.org/lkml/2008/2/23/263
References : http://marc.info/?l=linux-acpi&m=120387537018467&w=4
Handled-By : "Pallipadi, Venkatesh" <firstname.lastname@example.org>
This entry is being used for tracking a regression from 2.6.24. Please don't
close it until the problem is fixed in the mainline.
did 2.6..24 with CONFIG_CPU_IDLE=y work?
I tried reproducing this on Macbook Pro locally with latest git and and an earlier rc (25-rc2). I used the config that you had sent earlier. I dont seem to reproduce this problem.
Can you please retry with latest git and see whether the CPUIDLE problem is still there? attach the CONFIG you used to this bugzilla.
Also, if you can reproduce the problem still..
with ACPI PROCESSOR configured in kernel (not as module) and CPUIDLE try booting with processor.max_cstate=2 and see whether that helps.
OK, I am just back from vacation and tried current git 457fb605834504af294916411be128a9b21fc3f6.
The CONFIG_CPU_IDLE=y hang is still there and yes it worked all fine with 2.6.24*.
Please not that it does not *always* hang but only from time to time.
I cannot attach the config (as bugzilla displays an internal error) so I am sending it to Venkatesh directly.
Does it hang at a specific point during boot? Or at various times?
always with cpuidle: using governor
(I think ladder) line.
Created attachment 15374 [details]
boot option help?
(Note ACPI->PROCESSOR needs to be configured as "in kernel" and not as module for the above boot options to work.
I tried cstate=1 and it managed to get it to hang too...
Hmm.. There were only few CPUIDLE patches that went in after .24. We should be able to check them individually. My first suspect is this patch. Can you remove this change from latest git and see whether that helps.
I've applied this patch and rebooted about 10 times and the hang seems gone...
Patch : http://marc.info/?l=linux-kernel&m=120674502201007&w=4
I've applied this patch (using latest git) and rebooted >10 times. I did not see #10117 to occur and the cpuidle bug (this bug) happened about 3 times. However after waiting for 5 seconds I decided to press the power off button and voila it *always* continued to boot...
I don't quite understand your comment. I _guess_ the patch didn't help?
It fixed #10117 but not #10093.
With the patch from comment #13, can you try processor.max_cstate parameter experiment again. With that patch applied, expectation is that it should never hang with processor.max_cstate=1. Also try processor.max_cstate=2 and 3 to see which one sees this hang.
Also, in 3 hangs you mentioned in comment #14, where exactly does it hang? Does it always hang at menu/ladder governor message or at different places?
You said it continues to boot with power button pressed. Does it also continue to boot with any keyboard key pressed?
I will do this experiment. Regarding the 3 hangs of comment #14: Yes they always happened with the ladder governor message being last.
One more qn:
Can you boot with parameter "debug" and attach the dmesg output when the system fully boots.
I just recognized that it will continue booting whether I press a key or not. It is just that it might wait for 5-10 seconds *sometimes* before continuing.
I've now booted with processor.max_cstate=1 - short hang was there again. I am attaching the dmesg with debug on a normal boot.
Created attachment 15603 [details]
dmesg of normal boot with debug on
Was this the dmesg of the boot during which it waited for 15 sec?
If not, can you post the dmesg when that happens with max_cstate=1.
And also, exactly after what msg did it waits?
No this was the dmesg for the *normal boot* with debug on the cmdline. It always waits at ladder governor. I am attaching the max_cstate=1 15sec dmesg now.
Created attachment 15604 [details]
max.cstate=1 15 sec hang at ladder
Created attachment 15607 [details]
Add few debug printk in cpuidle
Created attachment 15608 [details]
Remove poll idle from cpuidle
Can you try
git + (patch in comment #13) + (patch in comment #26) + (patch in comment #27)
And see whether you still see the hang.
If you still see the hang
what are the last few messages you see before the hang?
} Else /* If you do not see the hang with all the patches */
try git + (patch in comment #13) + (patch in comment #26)
and see whether you still see the hang
if you see the hang
what are the last few messages before the hang?
References : http://lkml.org/lkml/2008/4/5/34
One more patch to try for this ladder governor block.
Note that this patch will _not_ affect the other hang at
ACPI: ACPI0007:01 is registered as cooling_device1
ACPI: Processor [CPU1] (supports 8 throttling states)
reported in bug #10117
could you please attach the patch or point me to a raw version of it (not an html page)
Also should I revert the other patches before trying this one or how should this be used?
Created attachment 15668 [details]
Attached the cpu_idle_wait patch that I referred to in comment #30.
Please try this patch along with (patch in comment #13) + (patch in comment #26) over rc8.
the patch has failed hunks... for both current git and rc8 applied after #13 and #26...
Created attachment 15703 [details]
cpu_idle_wait patch rebased with rc8
The reason I am asking you to check this patch in #35 is that this code is known to cause kind of problems (some delay while booting, at the time of switching governors) you are having.
Something similar was originally reported here:
Steven said that his patch that is in .24 fixed the problem for him. But, looks like there is some race in there still. The above patch from #35 simplifies cpu_idle_wait altogether. So, I am expecting that this race should noe be there any more.
well done! I only applied the patch from #35 and now the system boots *much* faster not the 1-2 seconds waits I had with all the other previous attempts (if n ot even longer waits)...
Thats good news. Did you try rebooting multiple times and couldn't reproduce the 10-15 seconds hang with ladder governor message any more?
If yes, I will go ahead and push the patch in #35 towards ingo/thomas/hpa..
Regressions list annotation:
Patch : http://bugzilla.kernel.org/attachment.cgi?id=15703&action=view
of course I *always* rebooted about 10 times before drawing conclusions ... but this patch is different in a way that I
a) don't see any short wait (not even the 1-3 seconds that I am seeing usually)
b) don't see the long waits >10 seconds
Thats great news...
fixed by commit 783e391b7b5b273cd20856d8f6f4878da8ec31b3