Bug 12269 - Boot slowdown/timeout (2.6.27-2.6.28 regression)
Boot slowdown/timeout (2.6.27-2.6.28 regression)
Status: CLOSED UNREPRODUCIBLE
Product: ACPI
Classification: Unclassified
Component: EC
All Linux
: P1 normal
Assigned To: Alexey Starikovskiy
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2008-12-21 15:06 UTC by Robin Bankhead
Modified: 2009-08-13 03:05 UTC (History)
4 users (show)

See Also:
Kernel Version: 2.6.27
Tree: Mainline
Regression: Yes


Attachments
dmesg from kernel 2.6.27.8 (slow/freezing boot) (123.17 KB, text/plain)
2008-12-21 15:12 UTC, Robin Bankhead
Details
dmesg from kernel 2.6.28-rc2 (normal boot-time) (81.00 KB, text/plain)
2008-12-21 15:20 UTC, Robin Bankhead
Details
dmesg from kernel 2.6.27 (123.12 KB, text/plain)
2009-03-12 16:36 UTC, Robin Bankhead
Details
.config for 2.6.27 kernel (55.26 KB, text/plain)
2009-03-12 16:46 UTC, Robin Bankhead
Details
Dmesg and .config from 2.6.28-ubuntu (51.71 KB, application/x-bzip)
2009-03-14 11:00 UTC, Giovanni Galizia
Details

Description Robin Bankhead 2008-12-21 15:06:03 UTC
Latest working kernel version: 2.6.26.8
Earliest failing kernel version: 2.6.27.*
Distribution: Gentoo
Hardware Environment: x86 (Celeron Northwood 2.6GHz on Toshiba A30-104 laptop)
Software Environment: gcc-4.3.2, glibc-2.8_p20080602-r0
Problem Description:

Gentoo kernel maintainers have asked me to kick this upstream. It is a regression that is fixed in 2.6.28 but they (and I) would like to get it fixed in 2.6.27 tree.

Starting from 2.6.27, the kernel starts to boot normally,
then just around the time init begins, output slows to a crawl and may hang
altogether.  This persists right through init and during the KDE4 desktop
session, and causes various errors on login, e.g. "the process for the
desktop:/ protocol died unexpectedly".

Speed can be regained by keyboard activity, e.g. holding down the ALT key. 
Touchpad responsiveness times out frequently, but can be temporarily restored
by using the keyboard.

The bug does not occur when I boot with acpi=off.

In the console, typing is as responsive as normal, but console-output programs
like powertop are very slow.

CPU fan activity remains constant, even when the system is idle and CPU
activity is low according to top and powertop.

To follow: dmesg output from 2.6.27.8 (bad) and 2.6.28-rc2 (good).  Please let me know if you require any further info, and see also:
http://bugs.gentoo.org/show_bug.cgi?id=244292
Comment 1 Robin Bankhead 2008-12-21 15:12:49 UTC
Created attachment 19408 [details]
dmesg from kernel 2.6.27.8 (slow/freezing boot)
Comment 2 Robin Bankhead 2008-12-21 15:20:27 UTC
Created attachment 19409 [details]
dmesg from kernel 2.6.28-rc2 (normal boot-time)
Comment 3 Alexey Starikovskiy 2008-12-21 16:06:15 UTC
You could just copy drivers/acpi/ec.c from .28 to .27 to check if the slowdown is caused by it.
Comment 4 Robin Bankhead 2008-12-29 04:03:16 UTC
Sorry for the delay (away for Christmas). Copying the file doesn't work, I get the following build errors:

  CHK     include/linux/version.h
  CHK     include/linux/utsrelease.h
  CALL    scripts/checksyscalls.sh
  CHK     include/linux/compile.h
  CC      drivers/acpi/ec.o
distcc[23946] ERROR: compile /root/.ccache/ec.tmp.pengi.22021.i on localhost failed
drivers/acpi/ec.c: In function 'acpi_ec_transaction_unlocked':
drivers/acpi/ec.c:286: error: too few arguments to function 'acpi_disable_gpe'
drivers/acpi/ec.c:309: error: too few arguments to function 'acpi_enable_gpe'
drivers/acpi/ec.c: In function 'ec_parse_device':
drivers/acpi/ec.c:782: warning: passing argument 4 of 'acpi_evaluate_integer' from incompatible pointer type
drivers/acpi/ec.c:788: warning: passing argument 4 of 'acpi_evaluate_integer' from incompatible pointer type
drivers/acpi/ec.c: In function 'ec_install_handlers':
drivers/acpi/ec.c:905: error: too few arguments to function 'acpi_enable_gpe'
drivers/acpi/ec.c: In function 'acpi_ec_suspend':
drivers/acpi/ec.c:1044: error: too few arguments to function 'acpi_disable_gpe'
drivers/acpi/ec.c: In function 'acpi_ec_resume':
drivers/acpi/ec.c:1053: error: too few arguments to function 'acpi_enable_gpe'
make[2]: *** [drivers/acpi/ec.o] Error 1
make[1]: *** [drivers/acpi] Error 2
make[1]: *** Waiting for unfinished jobs....
make: *** [drivers] Error 2
Comment 5 Robin Bankhead 2009-01-03 13:42:54 UTC
OK, something has changed.  I just tested with the final 2.6.28 kernel (previous successful results were with rc2) and the slowdown is back.

Is there still other info needed from me, by the way?
Comment 6 Robin Bankhead 2009-01-17 07:32:01 UTC
Some observations from using powertop with the 2.6.28 kernel.  Note that although most other activity in the console (on a TTY, no desktop loaded) is at normal speed, including the output of verbose commands like "ps aux", powertop and top redraw incredibly slowly (and this can only be partially be mitigated by keyboard activity).

Under 2.6.28, the wakeups-from-idle per second are in 5-6 figures, whereas under 2.6.26 they were ~350 at all times.  The most wakeups are caused by schedule_hrtimeout_range and PS/2 keyboard/mouse/touchpad (each with about 39% of wakeups). PS/2 keyboard/mouse/touchpad is on the list under 2.6.26, but lower down the list, and the other item isn't there at all. What does that item mean?

I'm just hoping this may provide some clues to those who can interpret them, i.e. not me.  I also don't know why this remains marked as NEEDINFO but as I said, if there is more I can provide, please just ask.
Comment 7 Giovanni Galizia 2009-01-25 07:21:07 UTC
I have the same problem on my Sony Vaio PCG-GRT816M (Debian/Sid)
under 2.6.28 (vanilla kernel): several delays during boot time
(also during "top" in the console). Press any key can be useful
to regain speed.
Similar bugs are
this: http://bugzilla.kernel.org/show_bug.cgi?id=12311
and this: http://bugzilla.kernel.org/show_bug.cgi?id=12439
Comment 8 Robin Bankhead 2009-01-25 18:23:15 UTC
I don't believe either of the bugs linked are comparable to this. This bug is not "episodic" as seems to be described in the linked bugs. It is a permanent condition, requiring continuous input activity (either holding down/rapidly tapping a key or "scrubbing" the touchpad) for any progress to occur regardless of what part of the boot/init process it has reached. "tapping" a key would have almost no effect.  This condition persists once the desktop is loaded.

Going back to my previous post above, I've observed that if kdm is stopped, the major cause of wakeups (51%+) is PS/2 keyboard/mouse/touchpad according to powertop. Based on the visible output, it's approximately when this component is loaded that the slowdown starts. I'd like to explore this observation further, but I'm not sure what use it would be if I booted a kernel built without PS/2 support (and therefore no means of input). Any suggestions?
Comment 9 Giovanni Galizia 2009-01-26 07:41:12 UTC
Really the second seems the same: "Kernel stops loading until any key is pressed". I'll try to understand in the other thread, if in effect is related to ours. Also it's curious that all interested pc are laptops. Do you know other people having this problem? 
Anyway, today I tried with the acpi=off kernel option, and the bug didn't occur.
About your idea, it's may be a good way to understand if the ps2 driver it's the problem. Also you can copy the ps2 driver from .28-rc2.
Comment 10 Robin Bankhead 2009-01-27 13:52:34 UTC
I'm game to give it a try, but Is there any way of safely shutting down if I have no keyboard or touchpad control?
Comment 11 Giovanni Galizia 2009-01-29 08:19:09 UTC
You can use a usb keyboard or mouse.
Comment 12 Robin Bankhead 2009-02-01 07:29:18 UTC
I'm not sure if I went about this right, but I tried my no-PS/2 idea.  All I could actually disable was "Mice", as the generic keyboard and mouse support items are hard-selected.  Generic input device support is selected by VT which is in turn selected by "!S390 && EMBEDDED" (even though I'm manifestly not configged for either of these) - and turning off VT is not something I'd do without further advice anyway...

Anyway, the result of this change is that the boot hangs completely (at a much earlier stage), and is not responsive to keyboard activity even though I don't think my keyboard was disabled.  Magic SysReq had no effect either, so I had to do a hard poweroff.  All in all this doesn't seem like a plan of attack that's going to get us very far, but I'm willing to keep at it if anyone can advise further on how to get useful results. NB: I don't have any non-PS/2 input devices, unfortunately.

I'd really like to make some headway on this, and will do whatever I can to get there, as I've just learned that the newer kernels may improve the (currently desperate) Intel graphics performance.
Comment 13 Robin Bankhead 2009-02-01 13:03:33 UTC
Another rebuild, another set of results! This time I disabled CPUFREQ (I'm not sure if my Celeron chip supports any of the schemes anyway), changing nothing else, and the boot failure no longer happened (the slowness was still there though).

It occurs, though, that the PS/2 issue may well be a red herring.  Thinking about it: surely the reason powertop rated it as the highest cause of wakeups was because I was holding-down a key throughout the data-collecting period!

I sought to try and discount this by entering a screen session, then issuing:

sleep 5; powertop -d > ptop.txt

then detaching from that screen and waiting, while leaving the keyboard alone.  Result: 20 seconds later I reattached to the screen, but the operation was still ongoing.  It didn't exit until I'd held down a key for a couple of seconds (less than 5, so this seems to suggest that powertop had begun to execute while I was detached, at least).

What do you make of the contrast between the instant output of commands like ps, ls and cat, and the SLOOOW operation of those like top or powertop?
Comment 14 Robin Bankhead 2009-03-01 04:37:18 UTC
Can anyone give me further suggestions for debugging this issue? I'm very eager to make progress with it and game for any suggestions, but I don't know where to start on my own.
Comment 15 Robin Bankhead 2009-03-07 10:15:41 UTC
Still having the same issue with 2.6.29_rc7, BTW.
Comment 16 Giovanni Galizia 2009-03-10 10:16:12 UTC
Try to boot with kernel options: noapic nolapic
Comment 17 Zhang Rui 2009-03-10 19:20:40 UTC
please boot kernel 2.6.29-rc7 with boot option "initcall_debug",
and attach the dmesg output.
Comment 18 Robin Bankhead 2009-03-11 06:49:41 UTC
(In reply to comment #16)
> Try to boot with kernel options: noapic nolapic
> 

This prevents the bug! What does it do?

I do however have chronic corruption occurring with composited desktops, but I assume this is unrelated.

@Zhang: Do you still want the dmesg in light of the above? I've removed that kernel now and returned to 2.6.28, but am happy to rebuild the rc version if you want.
Comment 19 Zhang Rui 2009-03-11 20:23:18 UTC
is "noapic" sufficient to workaround this problem?

Robin, we have not root caused the problem yet.
so please boot 2.6.27.8 with "initcall_debug" and "printk.time=1", and WITHOUT "ingore_loglevel", and attach the dmesg output.
please attach your kernel config file as well.
Comment 20 Robin Bankhead 2009-03-12 02:18:03 UTC
ÒK, will do. One more question: as I've had this issue with the whole 2.6.27 line, would it be better to do this with 2.6.27.0 ?

I'll also check if noapic alone does the job, but gotta run off to work right now.
Comment 21 Robin Bankhead 2009-03-12 12:49:45 UTC
To answer the other question, "noapic" on its own is enough to prevent the problem.
Comment 22 Robin Bankhead 2009-03-12 16:36:02 UTC
Created attachment 20509 [details]
dmesg from kernel 2.6.27

Here's the dmesg from original 2.6.27 kernel with the boot params you requested. The problem was actually even worse this time than when I first encountered it with Gentoo's patched version of this kernel. The basic behaviour was the same throughout init, but once I'd gotten as far as kdm and vt-switched back to the console, I found that logging-in seemingly timed out as the prompt never appeared, and my cursor was a single underscore that I could move up and around the console screen (never seen anything like this before). I ^C'd and found myself on a non-logged-in bash prompt, from which a second attempt to login was successful. However, I found simple operations like cd seemed to time out. I couldn't even shutdown and had to use SysReq to reboot.

I didn't keep my original .config from 2.6.27, but this one was created in the same manner (running make oldconfig using my 2.6.26 .config, accepting mostly the defaults, then turning on more debug options) so should be fairly similar. It's coming below.
Comment 23 Robin Bankhead 2009-03-12 16:46:32 UTC
Created attachment 20510 [details]
.config for 2.6.27 kernel

Here's the .config that produced the above dmesg.

Ran with boot params: acpi=on initcall_debug printk.time=1  video=uvesafb:1024x768-32,mtrr:3,ywrap

(Note: have also tried intelfb, it doesn't affect the slowdown issue)
Comment 24 Giovanni Galizia 2009-03-14 11:00:59 UTC
Created attachment 20521 [details]
Dmesg and .config from 2.6.28-ubuntu

"noapic" is sufficient to workaround the problem.
Here's the .config, and the dmesg with and without noapic.
Now my system is ubuntu 9.04.
Comment 25 Len Brown 2009-08-13 03:05:01 UTC
no activity in this bug report in 5 months.
please re-open if this is still an issue in the latest stable kernel.

Note You need to log in before you can comment on or make changes to this bug.