Bug 8040

Summary: Hang before INIT when CONFIG_HIGHMEM4G=y
Product: Memory Management Reporter: Nilshar (Nilshar)
Component: OtherAssignee: Andrew Morton (akpm)
Status: CLOSED PATCH_ALREADY_AVAILABLE    
Severity: blocking CC: protasnb
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.20 Subsystem:
Regression: --- Bisected commit-id:

Description Nilshar 2007-02-19 05:59:40 UTC
Most recent kernel where this bug did *NOT* occur: 2.6.20-rc2
Distribution: Debian Sarge
Hardware Environment: Dell PowerEdge 850
Software Environment: 
Problem Description: 
I had a hang just after "Freeing unused kernel memory" where INIT is supposed to
start. It's not a total freeze, keyboard is working, I can ctrl-alt-suppr etc..
but INIT is not starting.
This does not happen with CONFIG_HIGHMEM4G=y but only with CONFIG_HIGHMEM64G=y

Diff between working 2.6.20 and non working 2.6.20 :

181,182c181,182
< # CONFIG_HIGHMEM4G is not set
< CONFIG_HIGHMEM64G=y
---
> CONFIG_HIGHMEM4G=y
> # CONFIG_HIGHMEM64G is not set
185d184
< CONFIG_X86_PAE=y
191c190
< CONFIG_RESOURCES_64BIT=y
---
> # CONFIG_RESOURCES_64BIT is not set

On a 2.6.20-rc2 and previous, CONFIG_HIGHMEM64G=y all is working fine.

Steps to reproduce:
Compile a kernel with CONFIG_HIGHMEM64G and use it on a Dell PE850.
Comment 1 Nilshar 2007-02-19 06:04:30 UTC
*** Bug 8039 has been marked as a duplicate of this bug. ***
Comment 2 Hrvoje 2007-03-05 07:38:50 UTC
This bug also happens on Xeon 3000 model Supermicro
Comment 3 js 2007-03-07 22:17:44 UTC
Please see bug 8148 too.
I have neither of those kernel options enabled and with an older K6 600 mhz
based system I have the exact same symptoms.
Comment 4 Nilshar 2007-03-14 03:13:24 UTC
Any news on that bug please ?
Comment 5 Anonymous Emailer 2007-03-14 04:17:22 UTC
Reply-To: akpm@linux-foundation.org

> On Wed, 14 Mar 2007 03:13:25 -0700 bugme-daemon@bugzilla.kernel.org wrote:
> 
> http://bugzilla.kernel.org/show_bug.cgi?id=8040
> 
> 
> 
> 
> 
> ------- Additional Comments From Nilshar@gmail.com  2007-03-14 03:13 -------
> Any news on that bug please ?

None whatsoever.  Three people are reporting this and it's a drop-dead
showstopper for a 2.6.21 release so we just have to wait until someone
wakes up and thinks about it.

It would be very useful if one of the reporters could perform a git-bisect
search to identify the offending change, please.

I would dearly like to point you at a document or web page which describes
kernel-git-bisect-for-newbies, but afaik there isn't such a thing, which is
a huge failing.

Comment 6 Randy Dunlap 2007-03-14 04:25:46 UTC
On Wed, 14 Mar 2007, Andrew Morton wrote:

> > On Wed, 14 Mar 2007 03:13:25 -0700 bugme-daemon@bugzilla.kernel.org wrote:
> >
> > http://bugzilla.kernel.org/show_bug.cgi?id=8040
> >
> >
> > ------- Additional Comments From Nilshar@gmail.com  2007-03-14 03:13 -------
> > Any news on that bug please ?
>
> None whatsoever.  Three people are reporting this and it's a drop-dead
> showstopper for a 2.6.21 release so we just have to wait until someone
> wakes up and thinks about it.
>
> It would be very useful if one of the reporters could perform a git-bisect
> search to identify the offending change, please.
>
> I would dearly like to point you at a document or web page which describes
> kernel-git-bisect-for-newbies, but afaik there isn't such a thing, which is
> a huge failing.

I have one of those one-of-my-machines-wont-boot-2.6.21-rc*
and I expect that I'll try to use git bisect on it, in which case
I will also document it.

Comment 7 Michal Piotrowski 2007-03-14 04:31:03 UTC
On 14/03/07, Andrew Morton <akpm@linux-foundation.org> wrote:
> > On Wed, 14 Mar 2007 03:13:25 -0700 bugme-daemon@bugzilla.kernel.org wrote:
> >
> > http://bugzilla.kernel.org/show_bug.cgi?id=8040
> >
> >
> >
> >
> >
> > ------- Additional Comments From Nilshar@gmail.com  2007-03-14 03:13 -------
> > Any news on that bug please ?
>
> None whatsoever.  Three people are reporting this and it's a drop-dead
> showstopper for a 2.6.21 release so we just have to wait until someone
> wakes up and thinks about it.
>
> It would be very useful if one of the reporters could perform a git-bisect
> search to identify the offending change, please.
>
> I would dearly like to point you at a document or web page which describes
> kernel-git-bisect-for-newbies, but afaik there isn't such a thing, which is
> a huge failing.

"Linux testers handbook" should be translated in a few weeks.

Here is a "git-bisect basics" movie :)
http://www.youtube.com/watch?v=R7_LY-ceFbE

Regards,
Michal

Comment 8 Nilshar 2007-03-14 04:33:35 UTC
I'll try git-bisect, not sure what it is exactly, but I can certainly do it to
try  helping in resolving this bug. I'll search google, but if you got a good
link, I can use it :)
Comment 9 Leroy Raymond van Logchem 2007-03-14 14:47:16 UTC
Bisecting went well, after 13 compiles this commit was found:

a1f3bb9ae4497a2ed3eac773fd7798ac33a0371f is first bad commit
commit a1f3bb9ae4497a2ed3eac773fd7798ac33a0371f
Author: Roland McGrath <roland <at> redhat.com>
Date:   Fri Jan 26 00:56:46 2007 -0800

    [PATCH] Fix CONFIG_COMPAT_VDSO

    I wouldn't mind if CONFIG_COMPAT_VDSO went away entirely.  But if it's there,
    it should work properly.  Currently it's quite haphazard: both real vma and
    fixmap are mapped, both are put in the two different AT_* slots, sysenter
    returns to the vma address rather than the fixmap address, and core dumps yet
    are another story.

    This patch makes CONFIG_COMPAT_VDSO disable the real vma and use the fixmap
    area consistently.  This makes it actually compatible with what the old vdso
    implementation did.

    Signed-off-by: Roland McGrath <roland <at> redhat.com>
    Cc: Ingo Molnar <mingo <at> elte.hu>
    Cc: Paul Mackerras <paulus <at> samba.org>
    Cc: Benjamin Herrenschmidt <benh <at> kernel.crashing.org>
    Cc: Andi Kleen <ak <at> suse.de>
    Signed-off-by: Andrew Morton <akpm <at> osdl.org>
    Signed-off-by: Linus Torvalds <torvalds <at> linux-foundation.org>

:040000 040000 802ab3366a651ecba28c8677fa84a9f7c506392b
f44adc4dcdab733e5965b68ccd0d643f0a550a80 M      arch
:040000 040000 be1e217152d8b3fcd05f09aa2b3f4f9dcb8208aa
46cc86427e861350dd3fef9469474c55119f27ce M      include

I had both CONFIG_COMPAT_VDSO=y and CONFIG_HIGHMEM64G=y configured.
Using a 4GB Supermicro 7044 SMP dual Xeon. Details upon request.

--
Leroy
Comment 10 Nilshar 2007-03-15 01:00:54 UTC
So if I set CONFIG_COMPAT_VDSO=n I should be able to boot ?


2007/3/14, bugme-daemon@bugzilla.kernel.org <bugme-daemon@bugzilla.kernel.org>:
> http://bugzilla.kernel.org/show_bug.cgi?id=8040
>
>
>
>
>
> ------- Additional Comments From leroy.vanlogchem@wldelft.nl  2007-03-14 14:47 -------
> Bisecting went well, after 13 compiles this commit was found:
>
> a1f3bb9ae4497a2ed3eac773fd7798ac33a0371f is first bad commit
> commit a1f3bb9ae4497a2ed3eac773fd7798ac33a0371f
> Author: Roland McGrath <roland <at> redhat.com>
> Date:   Fri Jan 26 00:56:46 2007 -0800
>
>     [PATCH] Fix CONFIG_COMPAT_VDSO
>
>     I wouldn't mind if CONFIG_COMPAT_VDSO went away entirely.  But if it's there,
>     it should work properly.  Currently it's quite haphazard: both real vma and
>     fixmap are mapped, both are put in the two different AT_* slots, sysenter
>     returns to the vma address rather than the fixmap address, and core dumps yet
>     are another story.
>
>     This patch makes CONFIG_COMPAT_VDSO disable the real vma and use the fixmap
>     area consistently.  This makes it actually compatible with what the old vdso
>     implementation did.
>
>     Signed-off-by: Roland McGrath <roland <at> redhat.com>
>     Cc: Ingo Molnar <mingo <at> elte.hu>
>     Cc: Paul Mackerras <paulus <at> samba.org>
>     Cc: Benjamin Herrenschmidt <benh <at> kernel.crashing.org>
>     Cc: Andi Kleen <ak <at> suse.de>
>     Signed-off-by: Andrew Morton <akpm <at> osdl.org>
>     Signed-off-by: Linus Torvalds <torvalds <at> linux-foundation.org>
>
> :040000 040000 802ab3366a651ecba28c8677fa84a9f7c506392b
> f44adc4dcdab733e5965b68ccd0d643f0a550a80 M      arch
> :040000 040000 be1e217152d8b3fcd05f09aa2b3f4f9dcb8208aa
> 46cc86427e861350dd3fef9469474c55119f27ce M      include
>
> I had both CONFIG_COMPAT_VDSO=y and CONFIG_HIGHMEM64G=y configured.
> Using a 4GB Supermicro 7044 SMP dual Xeon. Details upon request.
>
> --
> Leroy
>
>
> ------- You are receiving this mail because: -------
> You reported the bug, or are watching the reporter.
>

Comment 11 Leroy Raymond van Logchem 2007-03-15 15:36:25 UTC
Chuck Ebbert at redhat.com asked:

> Can you please double check this by trying with/without again -- sometimes
bisects go bad.

As requested I started to redo the test but now without git using 
kernel.org tars.
The results now are, still using the same .config:
linux-2.6.20.tar.gz   : bad
linux-2.6.20.1.tar.gz: bad (boot log equal)
linux-2.6.20.2.tar.gz: good
linux-2.6.20.3.tar.gz: good
(triple checked)

Really strange. Nilshar, please try these kernels too with:
COMPAT_VDSO=y
CONFIG_HIGHMEM64G=y

Nilshar did try and says 2.6.20.3 works fine. So only 2.6.20 and 2.6.20.1 had
this 'hang' at boot behaviour.
Comment 12 Natalie Protasevich 2008-02-14 23:19:12 UTC
Since no more reports for later kernel releases, I guess the bug can be closed as fixed. Please reopen if you believe the problem is still there.