Bug 16495 (boot_extremely_slow)

Summary: Kernel 2.6.35 takes almost half an hour to boot
Product: Platform Specific/Hardware Reporter: Claudio M. Camacho (claudiomkd)
Component: x86-64Assignee: platform_x86_64 (platform_x86_64)
Status: CLOSED CODE_FIX    
Severity: normal CC: akpm, cebbert, dmitry.torokhov, florian, hpa, maciej.rutecki, oskarw85.spam, rjw, tglx
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.35.2 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 16055    
Attachments: .config for 2.6.35 (exactly the same as my working 2.6.34.1)
Patch to test on top of 2.6.35.4 for the hpet issue

Description Claudio M. Camacho 2010-08-02 17:16:23 UTC
Created attachment 27323 [details]
.config for 2.6.35 (exactly the same as my working 2.6.34.1)

I just compiled the brand new 2.6.35.

I have a Vaio SR11M with an Intel processor Core 2 Duo Penryn P8400.

When I booted my system, it took about half an hour to reach the console login (getty). At boot, the whole system is stuck (no cursor blinking anymore, although I can press CapsLocks and the led will light up). It takes about 5 minutes for udev scripts to run.

After about 7 minutes, the kernel reports that it was unable to load the iwlwifi-5000-ucode firmware (this is probably another bug, but wifi worked perfectly with 2.6.34.1). Still, after this the boot process continues but very very slowly. Every line appears every 2 or 3 minutes, and it seems that the kernel is doing something.

After waiting for almost half an hour, the system finally boots, I can see the display manager, but in console 0 (in text mode). Again (although this should be another topic) I can see ioctl errors about wlan0, my wireless device is not recognized.

I use Debian Sid (up to date), kernel 2.6.35 vanilla with the same options as 2.6.34.1 (which works perfectly in the exact same environment). My system is 64-bits (Debian amd64).

Any clues why 2.6.35 boots so extremely slowly?

Please find enclosed my .config-2.6.35.

--Claudio
Comment 1 Claudio M. Camacho 2010-08-14 07:44:32 UTC
Hi,

I just performed some tests. I downloaded 2.6.35.2 and tried to boot. So here is some explanation now:

When the system boots, the kernel loads perfectly, without any stop. Then, udev starts to initialize. The system freezes at the point where udev says: "Waiting for /dev to be populated..".

But the system is not actually frozen, it seems to be the console. If I keep pressing any key on my keyboard, the system loads. That is, any time I press one key, the system does "something" and stops again. So, as long as I keep pressing keys, the console spits out boot messages (from rc scripts and so on).

Once I reach the X display manager, the system works perfectly. In fact, I am writing right now from my browser on 2.6.35.2.

To sum up, it seems that only the rc messages at boot are frozen and I have to press any key on the keyboard to make the system react and go on with the boot process. Otherwise (once the system gets to the login) everything is completely normal.

Here is my hardware/software that could affect:

- Intel GM45 (using KMS)
- Intel ICH9 bus
- udev 160-1 (from Debian Sid)
- sysvinit 2.88dsf-11 (from Debian Sid)

Until you have time to check this, I will have to boot by pressing down a key on my keyboard until I reach the X display manager.

Thank you in advance, once again.


--Claudio
Comment 2 Andrew Morton 2010-08-26 21:49:54 UTC
(marked as a regression)

(recategorised as x86_64)

(cc Dmitry)

I've seen numerous reports of this "kernel needs me to keep hitting the keyboard to be able to boot" problem but I don't recall ever having seen an explanation for why this happens.

Is it familiar to any of you guys?
Comment 3 H. Peter Anvin 2010-08-26 21:58:08 UTC
In general, "need to hit the keyboard" usually implies that some kind of interrupts aren't being received and so implies a problem with the interrupt controller or timer device.  Hitting the keyboard provides a source of interrupts and therefore allows the boot to proceed.

This is the first bug report I remember seeing with 2.6.35 for an Intel processor, though.  Bug 16636 had similar behavior, but that one only affects AMD processors.

Claudio, is there any way you can do a "git bisect" on this from the last known kernel?
Comment 4 Claudio M. Camacho 2010-08-29 20:08:41 UTC
Hi,

Sorry for answering so late, but I was on holiday.

This problem happens both in my Debian and in Ubuntu 10.10 (alpha), both
with kernel 2.6.35.x. However, Ubuntu seems to have patched "somehow" the
kernel, since it boots normally, it is only the shutdown that needs
interrupts (e.g. from the keyboard).

2.6.36-rc1 seems to boot fine sometimes, but sometimes it needs from
keyboard feedback too, not always though (as it happens in 2.6.35.x).

I could try to bisect, but I have almost no free time. Should I start from
2.6.35.4 onwards?



2010/8/27 <bugzilla-daemon@bugzilla.kernel.org>

> https://bugzilla.kernel.org/show_bug.cgi?id=16495
>
>
> H. Peter Anvin <hpa@zytor.com> changed:
>
>           What    |Removed                     |Added
>
> ----------------------------------------------------------------------------
>                 CC|                            |hpa@zytor.com
>
>
>
>
> --- Comment #3 from H. Peter Anvin <hpa@zytor.com>  2010-08-26 21:58:08
> ---
> In general, "need to hit the keyboard" usually implies that some kind of
> interrupts aren't being received and so implies a problem with the
> interrupt
> controller or timer device.  Hitting the keyboard provides a source of
> interrupts and therefore allows the boot to proceed.
>
> This is the first bug report I remember seeing with 2.6.35 for an Intel
> processor, though.  Bug 16636 had similar behavior, but that one only
> affects
> AMD processors.
>
> Claudio, is there any way you can do a "git bisect" on this from the last
> known
> kernel?
>
> --
> Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
> ------- You are receiving this mail because: -------
> You reported the bug.
>
Comment 5 Claudio M. Camacho 2010-09-10 21:25:55 UTC
Hello,

I don't have time to test nor bisect. Yet I discovered that this bug is half-fixed in 2.6.36-rc3. I mean that it sometimes boots normally (most of the times) without having to give interrupts to the bus. Yet sometimes (maybe 30% of the times) I have to press some keyboard key all the time until I get to the display manager.

I assume that the X work fine because I am interacting with the mouse and the keyboard which provide a source of interrupts.

I just thought it would be good to let you know.

Kind regards,


--Claudio
Comment 6 Chuck Ebbert 2010-09-14 23:03:16 UTC
The first thing to try is adding "nohz=off highres=off" to the boot options. You could also try "hpet=0".
Comment 7 Claudio M. Camacho 2010-09-15 05:13:19 UTC
Hello Chuck,

Thank you for the tip! It is "nohz=off" that makes it work properly!

I am at work, so I just tested 2.6.36-rc4. But I assume it is the same thing as in 2.6.35.x.

I tried all the combinations and the system works normally only if I have the option "nohz=off". The highres and hpet are not relevant to this bug, at least on my laptop.

If you need further details, please let me know. Hopefully there is a fix and can be included in 2.6.36 still, or then in 2.6.37.

Looking forward to hear from you.

Kind regards,


--Claudio
Comment 8 Claudio M. Camacho 2010-09-15 05:31:20 UTC
Hello again,

Alright, I just tested "nohz=off" with 2.6.35.4 and it also makes it work!

Hence, to sum up:

Using "nohz=off" with 2.6.35.x and 2.6.36-rc4 makes the system work again. If I remove that option, I have to press the keyboard (I guess to generate interrupts).

Now, what could be the fix for this bug? I assume "nohz=off" disables the "tickless" option.

I am looking forward to hear from you.

Kind regards,


--Claudio
Comment 9 Thomas Gleixner 2010-09-18 12:38:45 UTC
Claudio,

this looks like the HPET bug we fixed a couple of days ago. Can you
test the following patch ?

http://git.kernel.org/?p=linux/kernel/git/x86/linux-2.6-tip.git;a=patch;h=995bd3bb5c78f3ff71339803c0b8337ed36d64fb

Thanks,

	tglx
Comment 10 Claudio M. Camacho 2010-09-19 19:10:47 UTC
Hi,

Against which kernel version exactly should I apply this patch?

Thanks in advance,


--Claudio
Comment 11 Florian Mickler 2010-09-21 05:03:44 UTC
It is already in mainline, so you could just try 2.6.36-rc5 which just Linus just released.
Comment 12 Claudio M. Camacho 2010-09-21 05:22:07 UTC
Hi,

It is fixed with 2.6.36-rc5. You can close this.

Thank you very much for your effort!



--Claudio
Comment 13 Andrew Morton 2010-09-21 05:31:47 UTC
Confused.  AFACIT 995bd3bb5c78f3ff71339803c0b8337ed36d64fb is not present in 2.6.36-rc5.  Nor is it present in linux-next.

And the bug report is against 2.6.35.2 but the commit to which tglx linked has no Cc:stable in the changelog.
Comment 14 Florian Mickler 2010-09-21 06:44:59 UTC
Indeed. I'm confused too now. 995bd3bb5c78f3ff71339803c0b8337ed36d64fb is actually in some of the x86 git trees and not mainline. Sorry for that. 

There were actually some other reports about similar issues, so it could be fixed nonetheless in mainline. 

I will close this as fixed. 

Claudio, if it happens again to you, shout loud and clear. 

Thx,
Flo
Comment 15 Andrew Morton 2010-09-21 06:48:11 UTC
(In reply to comment #14)
> 
> I will close this as fixed. 
> 

I'll reopen it because the report is against 2.6.35.x and it isn't fixed there.

If it's really fixed in 2.6.36-rc5 then we want to find out what fixed it and backport that.
Comment 16 Florian Mickler 2010-09-21 07:48:57 UTC
Created attachment 30892 [details]
Patch to test on top of 2.6.35.4 for the hpet issue

Right. Claudio, could you test if the attached patch from Thomas on top of 2.6.35.4 fixes the issue for you?

(Patch taken out of 2.6.35.5)
Comment 17 Claudio M. Camacho 2010-09-21 08:25:59 UTC
Hi,

Tested and working! So 2.6.35.4 with that patch is working for my Vaio
SR11M.

Thank you!



2010/9/21 <bugzilla-daemon@bugzilla.kernel.org>

> https://bugzilla.kernel.org/show_bug.cgi?id=16495
>
>
>
>
>
> --- Comment #16 from Florian Mickler <florian@mickler.org>  2010-09-21
> 07:48:57 ---
> Created an attachment (id=30892)
>  --> (https://bugzilla.kernel.org/attachment.cgi?id=30892)
> Patch to test on top of 2.6.35.4 for the hpet issue
>
> Right. Claudio, could you test if the attached patch from Thomas on top of
> 2.6.35.4 fixes the issue for you?
>
> (Patch taken out of 2.6.35.5)
>
> --
> Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
> ------- You are receiving this mail because: -------
> You reported the bug.
>
Comment 18 Thomas Gleixner 2010-09-21 09:53:39 UTC
B1;2401;0c> --- Comment #15 from Andrew Morton <akpm@linux-foundation.org>  2010-09-21 06:48:11 ---
> I'll reopen it because the report is against 2.6.35.x and it isn't fixed
> there.
> 
> If it's really fixed in 2.6.36-rc5 then we want to find out what fixed it and
> backport that.

It's commit 54ff7e59 in Linus tree, which has a cc:stable tag. And
it's in 2.6.35.5 commit 8ba36a90f. So we can close that thing.

Thanks,

	tglx