Bug 17411 - stable regression from 2.6.35.1: After uncompressing the kernel, at boot time, the server hangs.
Summary: stable regression from 2.6.35.1: After uncompressing the kernel, at boot time...
Status: CLOSED CODE_FIX
Alias: None
Product: Platform Specific/Hardware
Classification: Unclassified
Component: i386 (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: platform_i386
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-08-30 09:06 UTC by Florian Mickler
Modified: 2010-09-17 19:53 UTC (History)
4 users (show)

See Also:
Kernel Version: 2.6.32.2
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments

Description Florian Mickler 2010-08-30 09:06:46 UTC
In Bug #16173 Guan Xin reported a stable regression:

> Comment #33 From  Guan Xin   2010-08-16 01:27:15   (-) [reply] -------
>
> This problem did not occur to me at 2.6.35.1. It's new to me at 2.6.35.2.
>
> UP:
> vendor_id       : GenuineIntel
> cpu family      : 6
> model           : 13
> model name      : Intel(R) Pentium(R) M processor 1.60GHz
> stepping        : 8
>
> Comment #34 From  Guan Xin   2010-08-16 01:35:50   (-) [reply] -------
> 
> Created an attachment (id=27457) [details]
> config with which the problem can be reproduced on ThinkPad T43 1871-FU1
> 1.5GB
> Mem with 2.6.35.2
>
Comment 1 Florian Mickler 2010-08-30 09:08:43 UTC
took the subject line from bug #16173
Comment 2 Eric W. Biederman 2010-08-30 17:25:32 UTC
Odd.  You have a UP kernel with IOAPICS enabled.

My changes only happened between 2.6.35.3 and 2.6.35.4 so while in theory I might have broken something in this case I am off the hook.

Does 2.6.35.4 work?

Is it possible for you get any to console output at all from the kernel?
Perhaps with early_printk?

Could you bisect between 2.6.35.1 and 2.6.35.2?
There are few enough changes it should be possible to find the one that breaks the boot for you fairly quickly. 

Eric
Comment 3 Guan Xin 2010-08-30 21:51:47 UTC
2.6.35.4 works for UP and SMP. 2.6.35.2/3 work if configured for SMP. 2.6.35.2/3 don't work if configured for UP no matter local APIC is enabled or disabled.

2.6.35.2 stops for a few seconds right after "Freeing unused kernel memory" then reboots.

2.6.35.3 crashes on starting udev with kernel dumps showing many errors in intel_agp then it halts the computer instead of rebooting.

Is it a known and bug and fixed in 2.6.35.4? If not, I can try further later this week. I'm a little busy in the beginning of the new semester.
Comment 4 Eric W. Biederman 2010-08-31 00:04:35 UTC
Ok.  So what you see in 2.6.35.2 and 2.6.35.3 is userspace going wonky.
The "Freeing unused kernel memory" is the very last message printed before
init is started.

There are a lot of things that changed in 2.6.35.4 I don't see one that
sounds exactly like your issue so I would not be surprised if it was fixed.

It sounds like you are going to be running your computer on 2.6.35.4 for a while
so I would recommend staying with that and counting this bug as solved unless
2.6.35.4 starts showing this problem.

I don't know the bug, but the kernel is vast and I haven't been paying much
attention beyond fixing those things I have broken lately.  Every patch in 2.6.35.4 represents a bug someone knows about and has fixed, so the odds are
whatever bit you has been properly fixed.

I do appreciate the willingness to dig more into the problem, but I don't think that would be a good use of anyone's time, since 2.6.35.4 works for you.

Eric
Comment 5 Guan Xin 2010-08-31 07:29:05 UTC
Not necessarily userspace. After init starts there is still kernel space code running until power off. I don't see anything exactly for this in ChangeLog-2.6.35.4, either. When I am at home this evening I will check with a generic config with 2.6.35.4 to see if it's really gone.
Comment 6 Florian Mickler 2010-09-01 18:33:54 UTC
One thing to verify if it was fixed or just papered over is bisecting the patches that were added in 2.6.35.4. If you get to the patch that fixes it, the description might make it clear what the issue is/was. 

I'm closing this for now as fixed under the assumption that it was indeed fixed and not just papered over.

If you get to the bottom of this and have a patch that fixes the underlying cause (if it is unfixed) and need to be applied to current mainline, then it's best to follow the guidelines in Documentation/SubmittingPatches.
Comment 7 Guan Xin 2010-09-17 19:53:13 UTC
(In reply to comment #6)
> One thing to verify if it was fixed or just papered over is bisecting the
> patches that were added in 2.6.35.4. If you get to the patch that fixes it,
> the
> description might make it clear what the issue is/was. 
> 
> I'm closing this for now as fixed under the assumption that it was indeed
> fixed
> and not just papered over.
> 
> If you get to the bottom of this and have a patch that fixes the underlying
> cause (if it is unfixed) and need to be applied to current mainline, then
> it's
> best to follow the guidelines in Documentation/SubmittingPatches.

Yes. That's Ok. My T43 died. I have little chance to get such an old computer again.
I cannot probe further into this problem.

Note You need to log in before you can comment on or make changes to this bug.