Bug 10732 - REGRESSION: 2.6.26-rc2-git4: X server failed start on X61s laptop
REGRESSION: 2.6.26-rc2-git4: X server failed start on X61s laptop
Status: CLOSED CODE_FIX
Product: Platform Specific/Hardware
Classification: Unclassified
Component: i386
All Linux
: P1 normal
Assigned To: Theodore Tso
:
Depends on:
Blocks: 10492
  Show dependency treegraph
 
Reported: 2008-05-17 03:11 UTC by Rafael J. Wysocki
Modified: 2008-05-29 13:51 UTC (History)
3 users (show)

See Also:
Kernel Version: 2.6.26-rc2-git4
Tree: Mainline
Regression: Yes


Attachments
Xorg.0.log using a 2.6.26-rc2-git5 based kernel with commit 1c12c4c reverted (50.23 KB, text/x-log)
2008-05-17 08:10 UTC, Theodore Tso
Details
Patch to print PAT debugging information. (2.96 KB, text/x-patch)
2008-05-17 08:31 UTC, Theodore Tso
Details
Messages for the X server failure case with debugpat enabled. (94.94 KB, application/x-bzip)
2008-05-17 08:32 UTC, Theodore Tso
Details

Description Rafael J. Wysocki 2008-05-17 03:11:47 UTC
Subject    : REGRESSION: 2.6.26-rc2-git4: X server failed start on X61s laptop
Submitter  : "Theodore Ts'o" <tytso@mit.edu>
Date       : 2008-05-17 7:32
References : http://marc.info/?l=linux-kernel&m=121100994722524&w=4

This entry is being used for tracking a regression from 2.6.25.  Please don't
close it until the problem is fixed in the mainline.
Comment 1 Rafael J. Wysocki 2008-05-17 07:41:23 UTC
Caused by:

commit 1c12c4cf9411eb130b245fa8d0fbbaf989477c7b
Author: Venki Pallipadi <venkatesh.pallipadi@intel.com>
Date:   Wed May 14 16:05:51 2008 -0700

    mprotect: prevent alteration of the PAT bits

    Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
    Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
    Signed-off-by: Ingo Molnar <mingo@elte.hu>
    Signed-off-by: Hugh Dickins <hugh@veritas.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Comment 2 Theodore Tso 2008-05-17 08:03:58 UTC
On Sat, May 17, 2008 at 09:49:07AM +0000, Sitsofe Wheeler wrote:
> On Sat, 17 May 2008 03:32:05 -0400, Theodore Ts'o wrote:
> 
> > The X server failed to come up using the Intel driver, and forced a
> > fallback to the 800x600 VESA server.  This was on my Lenovo X61s laptop
> >
> > The git bisect identified this commit as the guilty one:
> > 
> > 1c12c4cf9411eb130b245fa8d0fbbaf989477c7b is first bad commit commit
> > 1c12c4cf9411eb130b245fa8d0fbbaf989477c7b Author: Venki Pallipadi
> > <venkatesh.pallipadi@intel.com> Date:   Wed May 14 16:05:51 2008 -0700
> > 
> >     mprotect: prevent alteration of the PAT bits
> 
> This sounds related to
> http://groups.google.com/group/linux.kernel/browse_frm/thread/1dba81049ce4ad16/546aff3e91055ab7#546aff3e91055ab7 ...

Yep, although that thread seemed to people trying to pursuade
themselves that there would be no problems with existing X servers,
and I have a demonstration that indeed, this patch *does* cause
problems and causes a regression.

I've been able to confirm that reverting this commit allows my X
server (driving an Intel GM965/GL960 graphics controller) to function
correctly on a 2.6.26-rc2-git5 kernel with the ext3.

What version of the X server am I using?

ii  xserver-xorg              1:7.2-5ubuntu13           the X.Org X server
ii  xserver-xorg-core         2:1.3.0.0.dfsg-12ubuntu8. X.Org X server -- core server
ii  xserver-xorg-video-intel  2:2.1.1-0ubuntu9.1        X.Org X server -- Intel i8xx, i9xx display driver

My Xorg.0.log file is attached.

						- Ted

Comment 3 Theodore Tso 2008-05-17 08:10:43 UTC
Created attachment 16172 [details]
Xorg.0.log using a 2.6.26-rc2-git5 based kernel with commit 1c12c4c reverted
Comment 4 Theodore Tso 2008-05-17 08:31:08 UTC
Created attachment 16173 [details]
Patch to print PAT debugging information.

Per Venki's request, I adapted Ingo's patch so it would apply against 2.6.26-rc2-git5, and tried it without reverting commit 1c12c4c (which definitely does solve my problem).
Comment 5 Theodore Tso 2008-05-17 08:32:52 UTC
Created attachment 16174 [details]
Messages for the X server failure case with debugpat enabled.
Comment 6 Theodore Tso 2008-05-17 08:41:45 UTC
On Sat, May 17, 2008 at 06:21:18AM -0700, Pallipadi, Venkatesh wrote:
> 
> Can you please send the complete dmesg after X failure, with patch here
> and debugpat boot option.
> http://www.ussg.iu.edu/hypermail/linux/kernel/0805.0/2657.html

With debugpat enabled, the resulting dump that landed in
/var/log/messages was something like 1.6 megabytes, so it totally
overflowed the dmesg buffer.

I've created a Bugzilla attachment of the bzip'ed messages excerpt here:

http://bugzilla.kernel.org/attachment.cgi?id=16174&action=view

The patch above was only available in HTML-only mode (and I couldn't
find Ingo's patch on lkml.org, so I had to reconstruct it, and then
forward part it to 2.6.26-rc2-git5).  What I actually used can be
found here:

http://bugzilla.kernel.org/attachment.cgi?id=16173&action=view

I'm not sure this will be helpful for you, but this should be
completely trivial for you to reproduce; just use Ubuntu Gutsy on
Intel Video based laptop, with a 2.6.26-rc2-git4 or more recent
kernel, and watch it fail.  

For me, I'll just locally revert commit 1c12c4cf since otherwise X
becomes more-or-less unsuable.  (Getting used to 1024x768 after using
a 1600x1200 display is one thing; 800x600 is quite another, and the
performance using the VESA driver is quite abysmal.)

						- Ted

Comment 7 Venkatesh Pallipadi 2008-05-17 09:00:35 UTC
 

>-----Original Message-----
>From: Theodore Tso [mailto:tytso@mit.edu] 
>Sent: Saturday, May 17, 2008 8:42 AM
>To: Pallipadi, Venkatesh
>Cc: linux-kernel@vger.kernel.org; Ingo Molnar; Siddha, Suresh 
>B; bugme-daemon@bugzilla.kernel.org
>Subject: [Bug 10732] REGRESSION: 2.6.26-rc2-git4: X server 
>failed start onX61s laptop
>
>
>On Sat, May 17, 2008 at 06:21:18AM -0700, Pallipadi, Venkatesh wrote:
>> 
>> Can you please send the complete dmesg after X failure, with 
>patch here
>> and debugpat boot option.
>> http://www.ussg.iu.edu/hypermail/linux/kernel/0805.0/2657.html
>
>With debugpat enabled, the resulting dump that landed in
>/var/log/messages was something like 1.6 megabytes, so it totally
>overflowed the dmesg buffer.
>
>I've created a Bugzilla attachment of the bzip'ed messages 
>excerpt here:
>
>http://bugzilla.kernel.org/attachment.cgi?id=16174&action=view
>
>The patch above was only available in HTML-only mode (and I couldn't
>find Ingo's patch on lkml.org, so I had to reconstruct it, and then
>forward part it to 2.6.26-rc2-git5).  What I actually used can be
>found here:
>
>http://bugzilla.kernel.org/attachment.cgi?id=16173&action=view
>
>I'm not sure this will be helpful for you, but this should be
>completely trivial for you to reproduce; just use Ubuntu Gutsy on
>Intel Video based laptop, with a 2.6.26-rc2-git4 or more recent
>kernel, and watch it fail.  
>
>For me, I'll just locally revert commit 1c12c4cf since otherwise X
>becomes more-or-less unsuable.  (Getting used to 1024x768 after using
>a 1600x1200 display is one thing; 800x600 is quite another, and the
>performance using the VESA driver is quite abysmal.)
>

Thanks for the log Ted. From first looks, I don't see any PAT related
failures
in the log. And as the revert of the commit 1c12c4cf is fixing the
problem here,
adding the X guys to the thread.

I am thinking that may be X is depending on mprotect changes somehow and
failing when
it cannot change PAT attributes with mprotect call. Keith earlier
mentioned that X
will not depend on this. May be something to do with earlier X version.
Keith?

Thanks,
Venki

Comment 8 Arjan van de Ven 2008-05-17 09:36:46 UTC
On Sat, 17 May 2008 11:41:31 -0400

> The patch above was only available in HTML-only mode

as a totally unrelated sidenote, you do know about the "dehtmldiff"
command in patchutils, right?

Comment 9 Theodore Tso 2008-05-17 09:53:57 UTC
On Sat, May 17, 2008 at 09:02:32AM -0700, Pallipadi, Venkatesh wrote:
> Thanks for the log Ted. From first looks, I don't see any PAT
> related failures in the log. 

I forgot to add that I had tried booting with the "nopat" option, and
that didn't change anything, so yeah, it doesn't seem likely to be
related.

						- Ted

Comment 10 Anonymous Emailer 2008-05-17 11:12:11 UTC
Reply-To: keithp@keithp.com

On Sat, 2008-05-17 at 09:02 -0700, Pallipadi, Venkatesh wrote:

> I am thinking that may be X is depending on mprotect changes somehow and
> failing when
> it cannot change PAT attributes with mprotect call. Keith earlier
> mentioned that X
> will not depend on this. May be something to do with earlier X version.
> Keith?

It looks like we can turn off read/write access, but we can't turn it
back on. So,

        mprotect (map->memory, map->size, PROT_NONE);
        mprotect (map->memory, map->size, PROT_READ|PROT_WRITE);
 
leaves the memory with no access.

Eric saw this last week; I don't know whether he debugged into it
further.

Which repository is commit 1c12c4cf in?

Comment 11 Anonymous Emailer 2008-05-17 11:33:26 UTC
Reply-To: nix.or.die@googlemail.com

Keith Packard wrote:

> On Sat, 2008-05-17 at 09:02 -0700, Pallipadi, Venkatesh wrote:
> 
>> I am thinking that may be X is depending on mprotect changes somehow and
>> failing when
>> it cannot change PAT attributes with mprotect call. Keith earlier
>> mentioned that X
>> will not depend on this. May be something to do with earlier X version.
>> Keith?
> 
> It looks like we can turn off read/write access, but we can't turn it
> back on. So,
> 
>         mprotect (map->memory, map->size, PROT_NONE);
>         mprotect (map->memory, map->size, PROT_READ|PROT_WRITE);
>  
> leaves the memory with no access.
> 
> Eric saw this last week; I don't know whether he debugged into it
> further.
> 
> Which repository is commit 1c12c4cf in?


It is in linus-git tree , http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=1c12c4cf9411eb130b245fa8d0fbbaf989477c7b


Comment 12 Theodore Tso 2008-05-17 11:47:26 UTC
On Sat, May 17, 2008 at 08:32:38PM +0200, Gabriel C wrote:
> > 
> > Which repository is commit 1c12c4cf in?
> 
> 
> It is in linus-git tree , http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=1c12c4cf9411eb130b245fa8d0fbbaf989477c7b
> 

...post 2.6.26-rc2-git3.  For Linux kernels 2.6.26-rc2-git4 and
newer, you need to revert this commit or the X Server shipped with
Ubuntu Gutsy will die horribly when run on an X61s laptop with the
Intel 965GM video chipset.

So the question is what's the right fix to keep the kernel compatible
with the X server, besides just reverting the commit entirely?

					- Ted

Comment 13 Hugh Dickins 2008-05-19 14:26:12 UTC
On Sat, 17 May 2008, Theodore Tso wrote:
> On Sat, May 17, 2008 at 08:32:38PM +0200, Gabriel C wrote:
> > > 
> > > Which repository is commit 1c12c4cf in?
> > 
> > It is in linus-git tree , http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=1c12c4cf9411eb130b245fa8d0fbbaf989477c7b
> 
> ...post 2.6.26-rc2-git3.  For Linux kernels 2.6.26-rc2-git4 and
> newer, you need to revert this commit or the X Server shipped with
> Ubuntu Gutsy will die horribly when run on an X61s laptop with the
> Intel 965GM video chipset.
> 
> So the question is what's the right fix to keep the kernel compatible
> with the X server, besides just reverting the commit entirely?

[PATCH] x86: fix mprotect's NX handling on PAE

2.6.26-rc3 with CONFIG_X86_PAE may leave the NX bit set when PROT_EXEC
is intending to clear it, causing irqbalance and others to segfault,
and X startup to fail.

This comes from an assumption in 1c12c4cf9411eb130b245fa8d0fbbaf989477c7b
mprotect: prevent alteration of the PAT bits, that PTE_MASK is what it's
supposed to be: whereas it's been wrong forever with PAE, staying 32-bit
where 64-bit is needed.

Jeremy Fitzhardinge already has patches to fix that in Ingo's tree;
but if they're not considered 2.6.26 material, or people want a quick
two-liner to get working, here's a shorter hack to fix up pte_modify.

I apologize to those we've broken, and to Venki Pallipadi: it was I who
persuaded him to change his working patch to make that false assumption.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
---

 include/asm-x86/pgtable.h |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- 2.6.26-rc3/include/asm-x86/pgtable.h	2008-05-19 11:19:03.000000000 +0100
+++ linux/include/asm-x86/pgtable.h	2008-05-19 21:51:08.000000000 +0100
@@ -57,8 +57,8 @@
 #define _KERNPG_TABLE	(_PAGE_PRESENT | _PAGE_RW | _PAGE_ACCESSED |	\
 			 _PAGE_DIRTY)
 
-#define _PAGE_CHG_MASK	(PTE_MASK | _PAGE_PCD | _PAGE_PWT |		\
-			 _PAGE_ACCESSED | _PAGE_DIRTY)
+#define _PAGE_CHG_MASK	(((pteval_t) PTE_MASK & ~_PAGE_NX) | 		\
+			 _PAGE_PCD | _PAGE_PWT | _PAGE_ACCESSED | _PAGE_DIRTY)
 
 #define _PAGE_CACHE_MASK	(_PAGE_PCD | _PAGE_PWT)
 #define _PAGE_CACHE_WB		(0)

Comment 14 Jeremy Fitzhardinge 2008-05-19 15:06:18 UTC
Hugh Dickins wrote:
> On Sat, 17 May 2008, Theodore Tso wrote:
>   
>> On Sat, May 17, 2008 at 08:32:38PM +0200, Gabriel C wrote:
>>     
>>>> Which repository is commit 1c12c4cf in?
>>>>         
>>> It is in linus-git tree , http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=1c12c4cf9411eb130b245fa8d0fbbaf989477c7b
>>>       
>> ...post 2.6.26-rc2-git3.  For Linux kernels 2.6.26-rc2-git4 and
>> newer, you need to revert this commit or the X Server shipped with
>> Ubuntu Gutsy will die horribly when run on an X61s laptop with the
>> Intel 965GM video chipset.
>>
>> So the question is what's the right fix to keep the kernel compatible
>> with the X server, besides just reverting the commit entirely?
>>     
>
> [PATCH] x86: fix mprotect's NX handling on PAE
>
> 2.6.26-rc3 with CONFIG_X86_PAE may leave the NX bit set when PROT_EXEC
> is intending to clear it, causing irqbalance and others to segfault,
> and X startup to fail.
>
> This comes from an assumption in 1c12c4cf9411eb130b245fa8d0fbbaf989477c7b
> mprotect: prevent alteration of the PAT bits, that PTE_MASK is what it's
> supposed to be: whereas it's been wrong forever with PAE, staying 32-bit
> where 64-bit is needed.
>
> Jeremy Fitzhardinge already has patches to fix that in Ingo's tree;
> but if they're not considered 2.6.26 material, or people want a quick
> two-liner to get working, here's a shorter hack to fix up pte_modify.
>   

 From what I can tell, my patches haven't caused any regressions, and so 
if they fix a real bug we could consider them for .26.

    J

Comment 15 Linus Torvalds 2008-05-19 16:10:50 UTC

On Mon, 19 May 2008, Hugh Dickins wrote:
> 
> This comes from an assumption in 1c12c4cf9411eb130b245fa8d0fbbaf989477c7b
> mprotect: prevent alteration of the PAT bits, that PTE_MASK is what it's
> supposed to be: whereas it's been wrong forever with PAE, staying 32-bit
> where 64-bit is needed.

Can we *please* just fix PTE_MASK?

And can we agree to never EVER use that PAGE_MASK thing (which was only 
ever meant to work on *addresses*) for any pte operations (including the 
definition of PTE_MASK)? Because PAGE_MASK is very much the word-size, and 
in 32-bit PAE, the page table entry is bigger.

IOE, PTE_MASK should be a "pteval_t". And it should have absolutely 
*nothing* to do with PAGE_MASK. EVER.

IOW, maybe something like this?

And no, I haven't tested this at all. But it should make PTE_MASK have
 (a) the right type ("pteval_t", not "long" - the latter is pure and utter 
     crap)
 (b) the right value (proper mask, not a sign-extended long - again, the 
     latter is pure and utter crap)

but for all I know there might be some broken code that depends on the 
current incorrect and totally broken #defines, so this needs testing and 
thinking about.

It also causes these warnings on 32-bit PAE:

	  AS      arch/x86/kernel/head_32.o
	arch/x86/kernel/head_32.S: Assembler messages:
	arch/x86/kernel/head_32.S:225: Warning: left operand is a bignum; integer 0 assumed
	arch/x86/kernel/head_32.S:609: Warning: left operand is a bignum; integer 0 assumed

and I do not see why (the end result seems to be identical).

Ingo, comments?

Oh, and those #define's should be moved from <asm/page.h> to 
<asm/pgtable.h>, I think. They have nothing to do with pages (despite the 
name of "physical_page_mask", and really are meaningful only in the 
context of some kind of page table entry.

		Linus

---
 include/asm-x86/page.h |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/include/asm-x86/page.h b/include/asm-x86/page.h
index b381f4a..34b4845 100644
--- a/include/asm-x86/page.h
+++ b/include/asm-x86/page.h
@@ -10,8 +10,8 @@
 
 #ifdef __KERNEL__
 
-#define PHYSICAL_PAGE_MASK	(PAGE_MASK & __PHYSICAL_MASK)
-#define PTE_MASK		(_AT(long, PHYSICAL_PAGE_MASK))
+#define PHYSICAL_PAGE_MASK	(__PHYSICAL_MASK & ~__PHYSICAL_LOW_BITS)
+#define PTE_MASK		(_AT(pteval_t, PHYSICAL_PAGE_MASK))
 
 #define PMD_PAGE_SIZE		(_AC(1, UL) << PMD_SHIFT)
 #define PMD_PAGE_MASK		(~(PMD_PAGE_SIZE-1))
@@ -24,6 +24,7 @@
 /* to align the pointer to the (next) page boundary */
 #define PAGE_ALIGN(addr)	(((addr)+PAGE_SIZE-1)&PAGE_MASK)
 
+#define __PHYSICAL_LOW_BITS	_AT(phys_addr_t, (PAGE_SIZE-1))
 #define __PHYSICAL_MASK		_AT(phys_addr_t, (_AC(1,ULL) << __PHYSICAL_MASK_SHIFT) - 1)
 #define __VIRTUAL_MASK		((_AC(1,UL) << __VIRTUAL_MASK_SHIFT) - 1)
 

Comment 16 Linus Torvalds 2008-05-19 19:34:09 UTC

On Mon, 19 May 2008, Linus Torvalds wrote:
> 
> IOE, PTE_MASK should be a "pteval_t". And it should have absolutely 
> *nothing* to do with PAGE_MASK. EVER.
> 
> IOW, maybe something like this?

.. and in addition to that, perhaps something like this too?

Making the actual _PAGE_XYZ bits be of the proper type makes using bitops 
and negation on them automatically DTRT, so that we don't need those 
insane casts in pte_mkclean() and friends.

Again, totally untested, but we really should just get the types right, 
instead of getting them wrogn and then messing around with various silly 
and incorrect work-arounds for the fact that we got them wrong.

And while it's untested - if this cleanup is buggy (or even generates 
different code), then we're doing something really wrong somewhere.

		Linus

---
 include/asm-x86/pgtable.h |   49 +++++++++++++++++++++++----------------------
 1 files changed, 25 insertions(+), 24 deletions(-)

diff --git a/include/asm-x86/pgtable.h b/include/asm-x86/pgtable.h
index 55c3a0e..b38b540 100644
--- a/include/asm-x86/pgtable.h
+++ b/include/asm-x86/pgtable.h
@@ -21,27 +21,28 @@
 #define _PAGE_BIT_NX           63       /* No execute: only valid after cpuid check */
 
 /*
- * Note: we use _AC(1, L) instead of _AC(1, UL) so that we get a
- * sign-extended value on 32-bit with all 1's in the upper word,
- * which preserves the upper pte values on 64-bit ptes:
+ * Note: we use _AT(pteval_t,1) so that we get the right type
+ * even when we're using PAE and everything is 64-bit with a
+ " 32-bit word-size. That's important for various bit masks.
  */
-#define _PAGE_PRESENT	(_AC(1, L)<<_PAGE_BIT_PRESENT)
-#define _PAGE_RW	(_AC(1, L)<<_PAGE_BIT_RW)
-#define _PAGE_USER	(_AC(1, L)<<_PAGE_BIT_USER)
-#define _PAGE_PWT	(_AC(1, L)<<_PAGE_BIT_PWT)
-#define _PAGE_PCD	(_AC(1, L)<<_PAGE_BIT_PCD)
-#define _PAGE_ACCESSED	(_AC(1, L)<<_PAGE_BIT_ACCESSED)
-#define _PAGE_DIRTY	(_AC(1, L)<<_PAGE_BIT_DIRTY)
-#define _PAGE_PSE	(_AC(1, L)<<_PAGE_BIT_PSE)	/* 2MB page */
-#define _PAGE_GLOBAL	(_AC(1, L)<<_PAGE_BIT_GLOBAL)	/* Global TLB entry */
-#define _PAGE_UNUSED1	(_AC(1, L)<<_PAGE_BIT_UNUSED1)
-#define _PAGE_UNUSED2	(_AC(1, L)<<_PAGE_BIT_UNUSED2)
-#define _PAGE_UNUSED3	(_AC(1, L)<<_PAGE_BIT_UNUSED3)
-#define _PAGE_PAT	(_AC(1, L)<<_PAGE_BIT_PAT)
-#define _PAGE_PAT_LARGE (_AC(1, L)<<_PAGE_BIT_PAT_LARGE)
+#define _PAGE_BIT(x)	(_AT(pteval_t,1) << (x))
+#define _PAGE_PRESENT	_PAGE_BIT(_PAGE_BIT_PRESENT)
+#define _PAGE_RW	_PAGE_BIT(_PAGE_BIT_RW)
+#define _PAGE_USER	_PAGE_BIT(_PAGE_BIT_USER)
+#define _PAGE_PWT	_PAGE_BIT(_PAGE_BIT_PWT)
+#define _PAGE_PCD	_PAGE_BIT(_PAGE_BIT_PCD)
+#define _PAGE_ACCESSED	_PAGE_BIT(_PAGE_BIT_ACCESSED)
+#define _PAGE_DIRTY	_PAGE_BIT(_PAGE_BIT_DIRTY)
+#define _PAGE_PSE	_PAGE_BIT(_PAGE_BIT_PSE)	/* 2MB page */
+#define _PAGE_GLOBAL	_PAGE_BIT(_PAGE_BIT_GLOBAL)	/* Global TLB entry */
+#define _PAGE_UNUSED1	_PAGE_BIT(_PAGE_BIT_UNUSED1)
+#define _PAGE_UNUSED2	_PAGE_BIT(_PAGE_BIT_UNUSED2)
+#define _PAGE_UNUSED3	_PAGE_BIT(_PAGE_BIT_UNUSED3)
+#define _PAGE_PAT	_PAGE_BIT(_PAGE_BIT_PAT)
+#define _PAGE_PAT_LARGE _PAGE_BIT(_PAGE_BIT_PAT_LARGE)
 
 #if defined(CONFIG_X86_64) || defined(CONFIG_X86_PAE)
-#define _PAGE_NX	(_AC(1, ULL) << _PAGE_BIT_NX)
+#define _PAGE_NX	_PAGE_BIT(_PAGE_BIT_NX)
 #else
 #define _PAGE_NX	0
 #endif
@@ -209,22 +210,22 @@ static inline int pmd_large(pmd_t pte)
 
 static inline pte_t pte_mkclean(pte_t pte)
 {
-	return __pte(pte_val(pte) & ~(pteval_t)_PAGE_DIRTY);
+	return __pte(pte_val(pte) & ~_PAGE_DIRTY);
 }
 
 static inline pte_t pte_mkold(pte_t pte)
 {
-	return __pte(pte_val(pte) & ~(pteval_t)_PAGE_ACCESSED);
+	return __pte(pte_val(pte) & ~_PAGE_ACCESSED);
 }
 
 static inline pte_t pte_wrprotect(pte_t pte)
 {
-	return __pte(pte_val(pte) & ~(pteval_t)_PAGE_RW);
+	return __pte(pte_val(pte) & ~_PAGE_RW);
 }
 
 static inline pte_t pte_mkexec(pte_t pte)
 {
-	return __pte(pte_val(pte) & ~(pteval_t)_PAGE_NX);
+	return __pte(pte_val(pte) & ~_PAGE_NX);
 }
 
 static inline pte_t pte_mkdirty(pte_t pte)
@@ -249,7 +250,7 @@ static inline pte_t pte_mkhuge(pte_t pte)
 
 static inline pte_t pte_clrhuge(pte_t pte)
 {
-	return __pte(pte_val(pte) & ~(pteval_t)_PAGE_PSE);
+	return __pte(pte_val(pte) & ~_PAGE_PSE);
 }
 
 static inline pte_t pte_mkglobal(pte_t pte)
@@ -259,7 +260,7 @@ static inline pte_t pte_mkglobal(pte_t pte)
 
 static inline pte_t pte_clrglobal(pte_t pte)
 {
-	return __pte(pte_val(pte) & ~(pteval_t)_PAGE_GLOBAL);
+	return __pte(pte_val(pte) & ~_PAGE_GLOBAL);
 }
 
 static inline pte_t pte_mkspecial(pte_t pte)

Comment 17 Hugh Dickins 2008-05-19 21:14:41 UTC
On Mon, 19 May 2008, Linus Torvalds wrote:
> On Mon, 19 May 2008, Hugh Dickins wrote:
> > 
> > This comes from an assumption in 1c12c4cf9411eb130b245fa8d0fbbaf989477c7b
> > mprotect: prevent alteration of the PAT bits, that PTE_MASK is what it's
> > supposed to be: whereas it's been wrong forever with PAE, staying 32-bit
> > where 64-bit is needed.
> 
> Can we *please* just fix PTE_MASK?

That's very much what I'd prefer too.  Jeremy has patches in Ingo's
tree to do that, which have been tested - though perhaps not in
combination with the PAT pte_modify changes.  I did check that they're
not incompatible in theory, but I sure better try them out later today.

> And can we agree to never EVER use that PAGE_MASK thing (which was only 
> ever meant to work on *addresses*) for any pte operations (including the 
> definition of PTE_MASK)? Because PAGE_MASK is very much the word-size, and 
> in 32-bit PAE, the page table entry is bigger.
> 
> IOE, PTE_MASK should be a "pteval_t". And it should have absolutely 
> *nothing* to do with PAGE_MASK. EVER.

Yes, Jeremy makes it a pteval_t.  (My builds and Ingo's builds succeed,
but I've not worked out how that goes down in assembly: there was an
_AT macro in there before, which you've kept too - Jeremy?)

> IOW, maybe something like this?
> 
> And no, I haven't tested this at all. But it should make PTE_MASK have
>  (a) the right type ("pteval_t", not "long" - the latter is pure and utter 
>      crap)
>  (b) the right value (proper mask, not a sign-extended long - again, the 
>      latter is pure and utter crap)
> 
> but for all I know there might be some broken code that depends on the 
> current incorrect and totally broken #defines, so this needs testing and 
> thinking about.

Yes, I'm highly resistant to taking untested patches here.  The two-liner
I sent last night was about my fifth attempt to get it working, and I did
start off from a small PTE_MASK correction which didn't work at all.  It
looked rather like yours, I guess I missed the  __PHYSICAL_LOW_BITS part.
Jeremy's goes a lot further, he'll know the gotchas better.

> It also causes these warnings on 32-bit PAE:
> 
> 	  AS      arch/x86/kernel/head_32.o
> 	arch/x86/kernel/head_32.S: Assembler messages:
> 	arch/x86/kernel/head_32.S:225: Warning: left operand is a bignum; integer 0 assumed
> 	arch/x86/kernel/head_32.S:609: Warning: left operand is a bignum; integer 0 assumed
> 
> and I do not see why (the end result seems to be identical).
> 
> Ingo, comments?
> 
> Oh, and those #define's should be moved from <asm/page.h> to 
> <asm/pgtable.h>, I think. They have nothing to do with pages (despite the 
> name of "physical_page_mask", and really are meaningful only in the 
> context of some kind of page table entry.

Jeremy still has them in asm/page.h.  Like your subsequent pte bit
cleanups, that can be added later: the important thing is to get X
working again on 32-bit NX systems.

Hugh

Comment 18 Jeremy Fitzhardinge 2008-05-19 23:32:40 UTC
Linus Torvalds wrote:
> On Mon, 19 May 2008, Hugh Dickins wrote:
>   
>> This comes from an assumption in 1c12c4cf9411eb130b245fa8d0fbbaf989477c7b
>> mprotect: prevent alteration of the PAT bits, that PTE_MASK is what it's
>> supposed to be: whereas it's been wrong forever with PAE, staying 32-bit
>> where 64-bit is needed.
>>     
>
> Can we *please* just fix PTE_MASK?
>
> And can we agree to never EVER use that PAGE_MASK thing (which was only 
> ever meant to work on *addresses*) for any pte operations (including the 
> definition of PTE_MASK)? Because PAGE_MASK is very much the word-size, and 
> in 32-bit PAE, the page table entry is bigger.
>
> IOE, PTE_MASK should be a "pteval_t". And it should have absolutely 
> *nothing* to do with PAGE_MASK. EVER.
>
> IOW, maybe something like this?
>   

That's pretty close to the core of my patches (just reposted), which 
have been cooking in x86.git for a week or so.

One thing I'd take from your patch is something like your 
__PHYSICAL_LOW_BITS definition, since its a bit clearer than what I 
did.  (I haven't updated my patch before posting just because I wanted 
to post exactly as tested.)

> And no, I haven't tested this at all. But it should make PTE_MASK have
>  (a) the right type ("pteval_t", not "long" - the latter is pure and utter 
>      crap)
>  (b) the right value (proper mask, not a sign-extended long - again, the 
>      latter is pure and utter crap)
>
> but for all I know there might be some broken code that depends on the 
> current incorrect and totally broken #defines, so this needs testing and 
> thinking about.
>
> It also causes these warnings on 32-bit PAE:
>
> 	  AS      arch/x86/kernel/head_32.o
> 	arch/x86/kernel/head_32.S: Assembler messages:
> 	arch/x86/kernel/head_32.S:225: Warning: left operand is a bignum; integer 0 assumed
> 	arch/x86/kernel/head_32.S:609: Warning: left operand is a bignum; integer 0 assumed
>
> and I do not see why (the end result seems to be identical).
>
> Ingo, comments?
>
> Oh, and those #define's should be moved from <asm/page.h> to 
> <asm/pgtable.h>, I think. They have nothing to do with pages (despite the 
> name of "physical_page_mask", and really are meaningful only in the 
> context of some kind of page table entry.
>
> 		Linus
>
> ---
>  include/asm-x86/page.h |    5 +++--
>  1 files changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/include/asm-x86/page.h b/include/asm-x86/page.h
> index b381f4a..34b4845 100644
> --- a/include/asm-x86/page.h
> +++ b/include/asm-x86/page.h
> @@ -10,8 +10,8 @@
>  
>  #ifdef __KERNEL__
>  
> -#define PHYSICAL_PAGE_MASK	(PAGE_MASK & __PHYSICAL_MASK)
> -#define PTE_MASK		(_AT(long, PHYSICAL_PAGE_MASK))
> +#define PHYSICAL_PAGE_MASK	(__PHYSICAL_MASK & ~__PHYSICAL_LOW_BITS)
> +#define PTE_MASK		(_AT(pteval_t, PHYSICAL_PAGE_MASK))
>  
>  #define PMD_PAGE_SIZE		(_AC(1, UL) << PMD_SHIFT)
>  #define PMD_PAGE_MASK		(~(PMD_PAGE_SIZE-1))
> @@ -24,6 +24,7 @@
>  /* to align the pointer to the (next) page boundary */
>  #define PAGE_ALIGN(addr)	(((addr)+PAGE_SIZE-1)&PAGE_MASK)
>  
> +#define __PHYSICAL_LOW_BITS	_AT(phys_addr_t, (PAGE_SIZE-1))
>  #define __PHYSICAL_MASK		_AT(phys_addr_t, (_AC(1,ULL) << __PHYSICAL_MASK_SHIFT) - 1)
>  #define __VIRTUAL_MASK		((_AC(1,UL) << __VIRTUAL_MASK_SHIFT) - 1)
>  
>   

    J

Comment 19 Jeremy Fitzhardinge 2008-05-19 23:33:09 UTC
Hugh Dickins wrote:
>> And can we agree to never EVER use that PAGE_MASK thing (which was only 
>> ever meant to work on *addresses*) for any pte operations (including the 
>> definition of PTE_MASK)? Because PAGE_MASK is very much the word-size, and 
>> in 32-bit PAE, the page table entry is bigger.
>>
>> IOE, PTE_MASK should be a "pteval_t". And it should have absolutely 
>> *nothing* to do with PAGE_MASK. EVER.
>>     
>
> Yes, Jeremy makes it a pteval_t.  (My builds and Ingo's builds succeed,
> but I've not worked out how that goes down in assembly: there was an
> _AT macro in there before, which you've kept too - Jeremy?)
>   

I got rid of a bunch of _AT() uses because the constants aren't used in 
.S files anywhere.  Also, I couldn't see how to represent a 64-bit 
constant in assembler, so I wasn't sure of their correctness (the as 
manual is irritatingly vague on the matter).

> Yes, I'm highly resistant to taking untested patches here.  The two-liner
> I sent last night was about my fifth attempt to get it working, and I did
> start off from a small PTE_MASK correction which didn't work at all.  It
> looked rather like yours, I guess I missed the  __PHYSICAL_LOW_BITS part.
> Jeremy's goes a lot further, he'll know the gotchas better.
>   

__PHYSICAL_LOW_BITS is a bit more elegant than what I did there (the 
problem is getting a physaddr_t-width PAGE_MASK).  But the formulation 
in my patch certainly works.

    J

Comment 20 H. Peter Anvin 2008-05-20 08:09:13 UTC
What do you mean "representing a 64-bit constant in assembly"?  Why are you assuming that there is anything special to it?
Comment 21 Theodore Tso 2008-05-27 05:50:53 UTC
This looks like it was fixed by Jeremy's PTE_MASK fixes in 2.6.26-rc3-git2. 

Note You need to log in before you can comment on or make changes to this bug.