Bug 11608

Summary: 2.6.27-rc6 BUG: unable to handle kernel paging request
Product: Memory Management Reporter: Rafael J. Wysocki (rjw)
Component: Page AllocatorAssignee: Andrew Morton (akpm)
Status: CLOSED INSUFFICIENT_DATA    
Severity: normal    
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.27-rc6 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 11167    

Description Rafael J. Wysocki 2008-09-21 12:15:20 UTC
Subject    : 2.6.27-rc6 BUG: unable to handle kernel paging request
Submitter  : John Daiker <daikerjohn@gmail.com>
Date       : 2008-09-16 23:00
References : http://marc.info/?l=linux-kernel&m=122160611517267&w=4

This entry is being used for tracking a regression from 2.6.26.  Please don't
close it until the problem is fixed in the mainline.
Comment 1 Chuck Ebbert 2008-09-21 15:14:43 UTC
Oops: 000b

Bit 3 is set: the processor detected 1s in reserved bits of the page directory.
Comment 2 Rafael J. Wysocki 2008-09-26 16:04:55 UTC
On Thursday, 25 of September 2008, Nick Piggin wrote:
> On Wed, Sep 24, 2008 at 08:46:55PM -0400, Chuck Ebbert wrote:
> > On Sun, 21 Sep 2008 20:54:23 +0200 (CEST)
> > "Rafael J. Wysocki" <rjw@sisk.pl> wrote:
> > 
> > > This message has been generated automatically as a part of a report
> > > of recent regressions.
> > > 
> > > The following bug entry is on the current list of known regressions
> > > from 2.6.26.  Please verify if it still should be listed and let me know
> > > (either way).
> > > 
> > > 
> > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11608
> > > Subject           : 2.6.27-rc6 BUG: unable to handle kernel paging
> request
> > > Submitter : John Daiker <daikerjohn@gmail.com>
> > > Date              : 2008-09-16 23:00 (6 days old)
> > > References        :
> http://marc.info/?l=linux-kernel&m=122160611517267&w=4
> > > 
> > > 
> > 
> > As I said in the bugzilla entry:
> > 
> >   Oops: 000b
> > 
> >   Bit 3 is set -- the processor detected 1's in reserved bits of the page
> directory.
> > 
> > That can't be good...
> 
> 54384.988151] BUG: unable to handle kernel paging request at ffff8800601dd000
> [54384.992095] IP: [<ffffffff80375457>] clear_page_c+0x7/0x10
> [54384.992095] PGD 202063 PUD 8067 PMD 65d54163 PTE 80002020601dd163
> [54384.992095] Oops: 000b [1] SMP DEBUG_PAGEALLOC
> 
> I initially suspect PAT (maybe via DEBUG_PAGEALLOC)... but let's see if the
> 3rd line here is useful.
> 
>      xRRRRRRRRRRRRRRRRRRRRRRR|40b|<--MAXPHYS     PHYS-->|...RR.actuwp
> PGD:                                         001000000010000001100011
> 
>      xRRRRRRRRRRRRRRRRRRRRRRR|40b|<--MAXPHYS     PHYS-->|...RR.actuwp
> PUD:                                                 1000000001100111
> 
>      xRRRRRRRRRRRRRRRRRRRRRRR|40b|<--MAXPHYS     PHYS-->|...Rs.actuwp
> PMD:                                 01100101110101010100000101100011
> 
>      xRRRRRRRRRRRRRRRRRRRRRRR|40b|<--MAXPHYS     PHYS-->|...gP.actuwp
> PTE: 1000000000000000001000000010000001100000000111011101000101100011
>      3210987654321098765432109876543210987654321098765432109876543210
> 
> Is this a 36-bit physical address CPU? In which case you have 2 bits in
> the pte that are outside "maxphys". Or if it is a 40-bit CPU, then you
> have just 1 bit outside maxphys, in which case I'd say it is memory
> corruption (maybe a hardware bug, maybe a scribble from elsewhere). So
> I'm wrong about PAT.
> 
> Interestingly, the PMD also has a 1 set in a reserved bit (page global),
> but according to the Intel docs, the CPU doesn't check that bit, so it
> is not faulting there.
> 
> Does the machine survive memtest? Is the bug reproduceable? If the
> answer is no to either of these, I think we can take it off the
> regression list. Otherwise, is it possible to track down to a specific
> commit?
Comment 3 Rafael J. Wysocki 2008-11-16 10:58:39 UTC
No response from the reporter.  Closing.