Kernel Bug Tracker – Bug 12269
Boot slowdown/timeout (2.6.27-2.6.28 regression)
Last modified: 2009-08-13 03:05:01 UTC
Latest working kernel version: 18.104.22.168
Earliest failing kernel version: 2.6.27.*
Hardware Environment: x86 (Celeron Northwood 2.6GHz on Toshiba A30-104 laptop)
Software Environment: gcc-4.3.2, glibc-2.8_p20080602-r0
Gentoo kernel maintainers have asked me to kick this upstream. It is a regression that is fixed in 2.6.28 but they (and I) would like to get it fixed in 2.6.27 tree.
Starting from 2.6.27, the kernel starts to boot normally,
then just around the time init begins, output slows to a crawl and may hang
altogether. This persists right through init and during the KDE4 desktop
session, and causes various errors on login, e.g. "the process for the
desktop:/ protocol died unexpectedly".
Speed can be regained by keyboard activity, e.g. holding down the ALT key.
Touchpad responsiveness times out frequently, but can be temporarily restored
by using the keyboard.
The bug does not occur when I boot with acpi=off.
In the console, typing is as responsive as normal, but console-output programs
like powertop are very slow.
CPU fan activity remains constant, even when the system is idle and CPU
activity is low according to top and powertop.
To follow: dmesg output from 22.214.171.124 (bad) and 2.6.28-rc2 (good). Please let me know if you require any further info, and see also:
Created attachment 19408 [details]
dmesg from kernel 126.96.36.199 (slow/freezing boot)
Created attachment 19409 [details]
dmesg from kernel 2.6.28-rc2 (normal boot-time)
You could just copy drivers/acpi/ec.c from .28 to .27 to check if the slowdown is caused by it.
Sorry for the delay (away for Christmas). Copying the file doesn't work, I get the following build errors:
distcc ERROR: compile /root/.ccache/ec.tmp.pengi.22021.i on localhost failed
drivers/acpi/ec.c: In function 'acpi_ec_transaction_unlocked':
drivers/acpi/ec.c:286: error: too few arguments to function 'acpi_disable_gpe'
drivers/acpi/ec.c:309: error: too few arguments to function 'acpi_enable_gpe'
drivers/acpi/ec.c: In function 'ec_parse_device':
drivers/acpi/ec.c:782: warning: passing argument 4 of 'acpi_evaluate_integer' from incompatible pointer type
drivers/acpi/ec.c:788: warning: passing argument 4 of 'acpi_evaluate_integer' from incompatible pointer type
drivers/acpi/ec.c: In function 'ec_install_handlers':
drivers/acpi/ec.c:905: error: too few arguments to function 'acpi_enable_gpe'
drivers/acpi/ec.c: In function 'acpi_ec_suspend':
drivers/acpi/ec.c:1044: error: too few arguments to function 'acpi_disable_gpe'
drivers/acpi/ec.c: In function 'acpi_ec_resume':
drivers/acpi/ec.c:1053: error: too few arguments to function 'acpi_enable_gpe'
make: *** [drivers/acpi/ec.o] Error 1
make: *** [drivers/acpi] Error 2
make: *** Waiting for unfinished jobs....
make: *** [drivers] Error 2
OK, something has changed. I just tested with the final 2.6.28 kernel (previous successful results were with rc2) and the slowdown is back.
Is there still other info needed from me, by the way?
Some observations from using powertop with the 2.6.28 kernel. Note that although most other activity in the console (on a TTY, no desktop loaded) is at normal speed, including the output of verbose commands like "ps aux", powertop and top redraw incredibly slowly (and this can only be partially be mitigated by keyboard activity).
Under 2.6.28, the wakeups-from-idle per second are in 5-6 figures, whereas under 2.6.26 they were ~350 at all times. The most wakeups are caused by schedule_hrtimeout_range and PS/2 keyboard/mouse/touchpad (each with about 39% of wakeups). PS/2 keyboard/mouse/touchpad is on the list under 2.6.26, but lower down the list, and the other item isn't there at all. What does that item mean?
I'm just hoping this may provide some clues to those who can interpret them, i.e. not me. I also don't know why this remains marked as NEEDINFO but as I said, if there is more I can provide, please just ask.
I have the same problem on my Sony Vaio PCG-GRT816M (Debian/Sid)
under 2.6.28 (vanilla kernel): several delays during boot time
(also during "top" in the console). Press any key can be useful
to regain speed.
Similar bugs are
and this: http://bugzilla.kernel.org/show_bug.cgi?id=12439
I don't believe either of the bugs linked are comparable to this. This bug is not "episodic" as seems to be described in the linked bugs. It is a permanent condition, requiring continuous input activity (either holding down/rapidly tapping a key or "scrubbing" the touchpad) for any progress to occur regardless of what part of the boot/init process it has reached. "tapping" a key would have almost no effect. This condition persists once the desktop is loaded.
Going back to my previous post above, I've observed that if kdm is stopped, the major cause of wakeups (51%+) is PS/2 keyboard/mouse/touchpad according to powertop. Based on the visible output, it's approximately when this component is loaded that the slowdown starts. I'd like to explore this observation further, but I'm not sure what use it would be if I booted a kernel built without PS/2 support (and therefore no means of input). Any suggestions?
Really the second seems the same: "Kernel stops loading until any key is pressed". I'll try to understand in the other thread, if in effect is related to ours. Also it's curious that all interested pc are laptops. Do you know other people having this problem?
Anyway, today I tried with the acpi=off kernel option, and the bug didn't occur.
About your idea, it's may be a good way to understand if the ps2 driver it's the problem. Also you can copy the ps2 driver from .28-rc2.
I'm game to give it a try, but Is there any way of safely shutting down if I have no keyboard or touchpad control?
You can use a usb keyboard or mouse.
I'm not sure if I went about this right, but I tried my no-PS/2 idea. All I could actually disable was "Mice", as the generic keyboard and mouse support items are hard-selected. Generic input device support is selected by VT which is in turn selected by "!S390 && EMBEDDED" (even though I'm manifestly not configged for either of these) - and turning off VT is not something I'd do without further advice anyway...
Anyway, the result of this change is that the boot hangs completely (at a much earlier stage), and is not responsive to keyboard activity even though I don't think my keyboard was disabled. Magic SysReq had no effect either, so I had to do a hard poweroff. All in all this doesn't seem like a plan of attack that's going to get us very far, but I'm willing to keep at it if anyone can advise further on how to get useful results. NB: I don't have any non-PS/2 input devices, unfortunately.
I'd really like to make some headway on this, and will do whatever I can to get there, as I've just learned that the newer kernels may improve the (currently desperate) Intel graphics performance.
Another rebuild, another set of results! This time I disabled CPUFREQ (I'm not sure if my Celeron chip supports any of the schemes anyway), changing nothing else, and the boot failure no longer happened (the slowness was still there though).
It occurs, though, that the PS/2 issue may well be a red herring. Thinking about it: surely the reason powertop rated it as the highest cause of wakeups was because I was holding-down a key throughout the data-collecting period!
I sought to try and discount this by entering a screen session, then issuing:
sleep 5; powertop -d > ptop.txt
then detaching from that screen and waiting, while leaving the keyboard alone. Result: 20 seconds later I reattached to the screen, but the operation was still ongoing. It didn't exit until I'd held down a key for a couple of seconds (less than 5, so this seems to suggest that powertop had begun to execute while I was detached, at least).
What do you make of the contrast between the instant output of commands like ps, ls and cat, and the SLOOOW operation of those like top or powertop?
Can anyone give me further suggestions for debugging this issue? I'm very eager to make progress with it and game for any suggestions, but I don't know where to start on my own.
Still having the same issue with 2.6.29_rc7, BTW.
Try to boot with kernel options: noapic nolapic
please boot kernel 2.6.29-rc7 with boot option "initcall_debug",
and attach the dmesg output.
(In reply to comment #16)
> Try to boot with kernel options: noapic nolapic
This prevents the bug! What does it do?
I do however have chronic corruption occurring with composited desktops, but I assume this is unrelated.
@Zhang: Do you still want the dmesg in light of the above? I've removed that kernel now and returned to 2.6.28, but am happy to rebuild the rc version if you want.
is "noapic" sufficient to workaround this problem?
Robin, we have not root caused the problem yet.
so please boot 188.8.131.52 with "initcall_debug" and "printk.time=1", and WITHOUT "ingore_loglevel", and attach the dmesg output.
please attach your kernel config file as well.
ÒK, will do. One more question: as I've had this issue with the whole 2.6.27 line, would it be better to do this with 184.108.40.206 ?
I'll also check if noapic alone does the job, but gotta run off to work right now.
To answer the other question, "noapic" on its own is enough to prevent the problem.
Created attachment 20509 [details]
dmesg from kernel 2.6.27
Here's the dmesg from original 2.6.27 kernel with the boot params you requested. The problem was actually even worse this time than when I first encountered it with Gentoo's patched version of this kernel. The basic behaviour was the same throughout init, but once I'd gotten as far as kdm and vt-switched back to the console, I found that logging-in seemingly timed out as the prompt never appeared, and my cursor was a single underscore that I could move up and around the console screen (never seen anything like this before). I ^C'd and found myself on a non-logged-in bash prompt, from which a second attempt to login was successful. However, I found simple operations like cd seemed to time out. I couldn't even shutdown and had to use SysReq to reboot.
I didn't keep my original .config from 2.6.27, but this one was created in the same manner (running make oldconfig using my 2.6.26 .config, accepting mostly the defaults, then turning on more debug options) so should be fairly similar. It's coming below.
Created attachment 20510 [details]
.config for 2.6.27 kernel
Here's the .config that produced the above dmesg.
Ran with boot params: acpi=on initcall_debug printk.time=1 video=uvesafb:1024x768-32,mtrr:3,ywrap
(Note: have also tried intelfb, it doesn't affect the slowdown issue)
Created attachment 20521 [details]
Dmesg and .config from 2.6.28-ubuntu
"noapic" is sufficient to workaround the problem.
Here's the .config, and the dmesg with and without noapic.
Now my system is ubuntu 9.04.
no activity in this bug report in 5 months.
please re-open if this is still an issue in the latest stable kernel.