Bug 10732
Summary: | REGRESSION: 2.6.26-rc2-git4: X server failed start on X61s laptop | ||
---|---|---|---|
Product: | Platform Specific/Hardware | Reporter: | Rafael J. Wysocki (rjw) |
Component: | i386 | Assignee: | Theodore Tso (tytso) |
Status: | CLOSED CODE_FIX | ||
Severity: | normal | CC: | mingo, tytso, venki |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.26-rc2-git4 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Bug Depends on: | |||
Bug Blocks: | 10492 | ||
Attachments: |
Xorg.0.log using a 2.6.26-rc2-git5 based kernel with commit 1c12c4c reverted
Patch to print PAT debugging information. Messages for the X server failure case with debugpat enabled. |
Description
Rafael J. Wysocki
2008-05-17 03:11:47 UTC
Caused by: commit 1c12c4cf9411eb130b245fa8d0fbbaf989477c7b Author: Venki Pallipadi <venkatesh.pallipadi@intel.com> Date: Wed May 14 16:05:51 2008 -0700 mprotect: prevent alteration of the PAT bits Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> On Sat, May 17, 2008 at 09:49:07AM +0000, Sitsofe Wheeler wrote: > On Sat, 17 May 2008 03:32:05 -0400, Theodore Ts'o wrote: > > > The X server failed to come up using the Intel driver, and forced a > > fallback to the 800x600 VESA server. This was on my Lenovo X61s laptop > > > > The git bisect identified this commit as the guilty one: > > > > 1c12c4cf9411eb130b245fa8d0fbbaf989477c7b is first bad commit commit > > 1c12c4cf9411eb130b245fa8d0fbbaf989477c7b Author: Venki Pallipadi > > <venkatesh.pallipadi@intel.com> Date: Wed May 14 16:05:51 2008 -0700 > > > > mprotect: prevent alteration of the PAT bits > > This sounds related to > > http://groups.google.com/group/linux.kernel/browse_frm/thread/1dba81049ce4ad16/546aff3e91055ab7#546aff3e91055ab7 > ... Yep, although that thread seemed to people trying to pursuade themselves that there would be no problems with existing X servers, and I have a demonstration that indeed, this patch *does* cause problems and causes a regression. I've been able to confirm that reverting this commit allows my X server (driving an Intel GM965/GL960 graphics controller) to function correctly on a 2.6.26-rc2-git5 kernel with the ext3. What version of the X server am I using? ii xserver-xorg 1:7.2-5ubuntu13 the X.Org X server ii xserver-xorg-core 2:1.3.0.0.dfsg-12ubuntu8. X.Org X server -- core server ii xserver-xorg-video-intel 2:2.1.1-0ubuntu9.1 X.Org X server -- Intel i8xx, i9xx display driver My Xorg.0.log file is attached. - Ted Created attachment 16172 [details]
Xorg.0.log using a 2.6.26-rc2-git5 based kernel with commit 1c12c4c reverted
Created attachment 16173 [details]
Patch to print PAT debugging information.
Per Venki's request, I adapted Ingo's patch so it would apply against 2.6.26-rc2-git5, and tried it without reverting commit 1c12c4c (which definitely does solve my problem).
Created attachment 16174 [details]
Messages for the X server failure case with debugpat enabled.
On Sat, May 17, 2008 at 06:21:18AM -0700, Pallipadi, Venkatesh wrote: > > Can you please send the complete dmesg after X failure, with patch here > and debugpat boot option. > http://www.ussg.iu.edu/hypermail/linux/kernel/0805.0/2657.html With debugpat enabled, the resulting dump that landed in /var/log/messages was something like 1.6 megabytes, so it totally overflowed the dmesg buffer. I've created a Bugzilla attachment of the bzip'ed messages excerpt here: http://bugzilla.kernel.org/attachment.cgi?id=16174&action=view The patch above was only available in HTML-only mode (and I couldn't find Ingo's patch on lkml.org, so I had to reconstruct it, and then forward part it to 2.6.26-rc2-git5). What I actually used can be found here: http://bugzilla.kernel.org/attachment.cgi?id=16173&action=view I'm not sure this will be helpful for you, but this should be completely trivial for you to reproduce; just use Ubuntu Gutsy on Intel Video based laptop, with a 2.6.26-rc2-git4 or more recent kernel, and watch it fail. For me, I'll just locally revert commit 1c12c4cf since otherwise X becomes more-or-less unsuable. (Getting used to 1024x768 after using a 1600x1200 display is one thing; 800x600 is quite another, and the performance using the VESA driver is quite abysmal.) - Ted >-----Original Message----- >From: Theodore Tso [mailto:tytso@mit.edu] >Sent: Saturday, May 17, 2008 8:42 AM >To: Pallipadi, Venkatesh >Cc: linux-kernel@vger.kernel.org; Ingo Molnar; Siddha, Suresh >B; bugme-daemon@bugzilla.kernel.org >Subject: [Bug 10732] REGRESSION: 2.6.26-rc2-git4: X server >failed start onX61s laptop > > >On Sat, May 17, 2008 at 06:21:18AM -0700, Pallipadi, Venkatesh wrote: >> >> Can you please send the complete dmesg after X failure, with >patch here >> and debugpat boot option. >> http://www.ussg.iu.edu/hypermail/linux/kernel/0805.0/2657.html > >With debugpat enabled, the resulting dump that landed in >/var/log/messages was something like 1.6 megabytes, so it totally >overflowed the dmesg buffer. > >I've created a Bugzilla attachment of the bzip'ed messages >excerpt here: > >http://bugzilla.kernel.org/attachment.cgi?id=16174&action=view > >The patch above was only available in HTML-only mode (and I couldn't >find Ingo's patch on lkml.org, so I had to reconstruct it, and then >forward part it to 2.6.26-rc2-git5). What I actually used can be >found here: > >http://bugzilla.kernel.org/attachment.cgi?id=16173&action=view > >I'm not sure this will be helpful for you, but this should be >completely trivial for you to reproduce; just use Ubuntu Gutsy on >Intel Video based laptop, with a 2.6.26-rc2-git4 or more recent >kernel, and watch it fail. > >For me, I'll just locally revert commit 1c12c4cf since otherwise X >becomes more-or-less unsuable. (Getting used to 1024x768 after using >a 1600x1200 display is one thing; 800x600 is quite another, and the >performance using the VESA driver is quite abysmal.) > Thanks for the log Ted. From first looks, I don't see any PAT related failures in the log. And as the revert of the commit 1c12c4cf is fixing the problem here, adding the X guys to the thread. I am thinking that may be X is depending on mprotect changes somehow and failing when it cannot change PAT attributes with mprotect call. Keith earlier mentioned that X will not depend on this. May be something to do with earlier X version. Keith? Thanks, Venki On Sat, 17 May 2008 11:41:31 -0400
> The patch above was only available in HTML-only mode
as a totally unrelated sidenote, you do know about the "dehtmldiff"
command in patchutils, right?
On Sat, May 17, 2008 at 09:02:32AM -0700, Pallipadi, Venkatesh wrote:
> Thanks for the log Ted. From first looks, I don't see any PAT
> related failures in the log.
I forgot to add that I had tried booting with the "nopat" option, and
that didn't change anything, so yeah, it doesn't seem likely to be
related.
- Ted
Reply-To: keithp@keithp.com On Sat, 2008-05-17 at 09:02 -0700, Pallipadi, Venkatesh wrote: > I am thinking that may be X is depending on mprotect changes somehow and > failing when > it cannot change PAT attributes with mprotect call. Keith earlier > mentioned that X > will not depend on this. May be something to do with earlier X version. > Keith? It looks like we can turn off read/write access, but we can't turn it back on. So, mprotect (map->memory, map->size, PROT_NONE); mprotect (map->memory, map->size, PROT_READ|PROT_WRITE); leaves the memory with no access. Eric saw this last week; I don't know whether he debugged into it further. Which repository is commit 1c12c4cf in? Reply-To: nix.or.die@googlemail.com Keith Packard wrote: > On Sat, 2008-05-17 at 09:02 -0700, Pallipadi, Venkatesh wrote: > >> I am thinking that may be X is depending on mprotect changes somehow and >> failing when >> it cannot change PAT attributes with mprotect call. Keith earlier >> mentioned that X >> will not depend on this. May be something to do with earlier X version. >> Keith? > > It looks like we can turn off read/write access, but we can't turn it > back on. So, > > mprotect (map->memory, map->size, PROT_NONE); > mprotect (map->memory, map->size, PROT_READ|PROT_WRITE); > > leaves the memory with no access. > > Eric saw this last week; I don't know whether he debugged into it > further. > > Which repository is commit 1c12c4cf in? It is in linus-git tree , http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=1c12c4cf9411eb130b245fa8d0fbbaf989477c7b On Sat, May 17, 2008 at 08:32:38PM +0200, Gabriel C wrote:
> >
> > Which repository is commit 1c12c4cf in?
>
>
> It is in linus-git tree ,
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=1c12c4cf9411eb130b245fa8d0fbbaf989477c7b
>
...post 2.6.26-rc2-git3. For Linux kernels 2.6.26-rc2-git4 and
newer, you need to revert this commit or the X Server shipped with
Ubuntu Gutsy will die horribly when run on an X61s laptop with the
Intel 965GM video chipset.
So the question is what's the right fix to keep the kernel compatible
with the X server, besides just reverting the commit entirely?
- Ted
On Sat, 17 May 2008, Theodore Tso wrote: > On Sat, May 17, 2008 at 08:32:38PM +0200, Gabriel C wrote: > > > > > > Which repository is commit 1c12c4cf in? > > > > It is in linus-git tree , > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=1c12c4cf9411eb130b245fa8d0fbbaf989477c7b > > ...post 2.6.26-rc2-git3. For Linux kernels 2.6.26-rc2-git4 and > newer, you need to revert this commit or the X Server shipped with > Ubuntu Gutsy will die horribly when run on an X61s laptop with the > Intel 965GM video chipset. > > So the question is what's the right fix to keep the kernel compatible > with the X server, besides just reverting the commit entirely? [PATCH] x86: fix mprotect's NX handling on PAE 2.6.26-rc3 with CONFIG_X86_PAE may leave the NX bit set when PROT_EXEC is intending to clear it, causing irqbalance and others to segfault, and X startup to fail. This comes from an assumption in 1c12c4cf9411eb130b245fa8d0fbbaf989477c7b mprotect: prevent alteration of the PAT bits, that PTE_MASK is what it's supposed to be: whereas it's been wrong forever with PAE, staying 32-bit where 64-bit is needed. Jeremy Fitzhardinge already has patches to fix that in Ingo's tree; but if they're not considered 2.6.26 material, or people want a quick two-liner to get working, here's a shorter hack to fix up pte_modify. I apologize to those we've broken, and to Venki Pallipadi: it was I who persuaded him to change his working patch to make that false assumption. Signed-off-by: Hugh Dickins <hugh@veritas.com> --- include/asm-x86/pgtable.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) --- 2.6.26-rc3/include/asm-x86/pgtable.h 2008-05-19 11:19:03.000000000 +0100 +++ linux/include/asm-x86/pgtable.h 2008-05-19 21:51:08.000000000 +0100 @@ -57,8 +57,8 @@ #define _KERNPG_TABLE (_PAGE_PRESENT | _PAGE_RW | _PAGE_ACCESSED | \ _PAGE_DIRTY) -#define _PAGE_CHG_MASK (PTE_MASK | _PAGE_PCD | _PAGE_PWT | \ - _PAGE_ACCESSED | _PAGE_DIRTY) +#define _PAGE_CHG_MASK (((pteval_t) PTE_MASK & ~_PAGE_NX) | \ + _PAGE_PCD | _PAGE_PWT | _PAGE_ACCESSED | _PAGE_DIRTY) #define _PAGE_CACHE_MASK (_PAGE_PCD | _PAGE_PWT) #define _PAGE_CACHE_WB (0) Hugh Dickins wrote: > On Sat, 17 May 2008, Theodore Tso wrote: > >> On Sat, May 17, 2008 at 08:32:38PM +0200, Gabriel C wrote: >> >>>> Which repository is commit 1c12c4cf in? >>>> >>> It is in linus-git tree , >>> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=1c12c4cf9411eb130b245fa8d0fbbaf989477c7b >>> >> ...post 2.6.26-rc2-git3. For Linux kernels 2.6.26-rc2-git4 and >> newer, you need to revert this commit or the X Server shipped with >> Ubuntu Gutsy will die horribly when run on an X61s laptop with the >> Intel 965GM video chipset. >> >> So the question is what's the right fix to keep the kernel compatible >> with the X server, besides just reverting the commit entirely? >> > > [PATCH] x86: fix mprotect's NX handling on PAE > > 2.6.26-rc3 with CONFIG_X86_PAE may leave the NX bit set when PROT_EXEC > is intending to clear it, causing irqbalance and others to segfault, > and X startup to fail. > > This comes from an assumption in 1c12c4cf9411eb130b245fa8d0fbbaf989477c7b > mprotect: prevent alteration of the PAT bits, that PTE_MASK is what it's > supposed to be: whereas it's been wrong forever with PAE, staying 32-bit > where 64-bit is needed. > > Jeremy Fitzhardinge already has patches to fix that in Ingo's tree; > but if they're not considered 2.6.26 material, or people want a quick > two-liner to get working, here's a shorter hack to fix up pte_modify. > From what I can tell, my patches haven't caused any regressions, and so if they fix a real bug we could consider them for .26. J
On Mon, 19 May 2008, Hugh Dickins wrote:
>
> This comes from an assumption in 1c12c4cf9411eb130b245fa8d0fbbaf989477c7b
> mprotect: prevent alteration of the PAT bits, that PTE_MASK is what it's
> supposed to be: whereas it's been wrong forever with PAE, staying 32-bit
> where 64-bit is needed.
Can we *please* just fix PTE_MASK?
And can we agree to never EVER use that PAGE_MASK thing (which was only
ever meant to work on *addresses*) for any pte operations (including the
definition of PTE_MASK)? Because PAGE_MASK is very much the word-size, and
in 32-bit PAE, the page table entry is bigger.
IOE, PTE_MASK should be a "pteval_t". And it should have absolutely
*nothing* to do with PAGE_MASK. EVER.
IOW, maybe something like this?
And no, I haven't tested this at all. But it should make PTE_MASK have
(a) the right type ("pteval_t", not "long" - the latter is pure and utter
crap)
(b) the right value (proper mask, not a sign-extended long - again, the
latter is pure and utter crap)
but for all I know there might be some broken code that depends on the
current incorrect and totally broken #defines, so this needs testing and
thinking about.
It also causes these warnings on 32-bit PAE:
AS arch/x86/kernel/head_32.o
arch/x86/kernel/head_32.S: Assembler messages:
arch/x86/kernel/head_32.S:225: Warning: left operand is a bignum; integer 0 assumed
arch/x86/kernel/head_32.S:609: Warning: left operand is a bignum; integer 0 assumed
and I do not see why (the end result seems to be identical).
Ingo, comments?
Oh, and those #define's should be moved from <asm/page.h> to
<asm/pgtable.h>, I think. They have nothing to do with pages (despite the
name of "physical_page_mask", and really are meaningful only in the
context of some kind of page table entry.
Linus
---
include/asm-x86/page.h | 5 +++--
1 files changed, 3 insertions(+), 2 deletions(-)
diff --git a/include/asm-x86/page.h b/include/asm-x86/page.h
index b381f4a..34b4845 100644
--- a/include/asm-x86/page.h
+++ b/include/asm-x86/page.h
@@ -10,8 +10,8 @@
#ifdef __KERNEL__
-#define PHYSICAL_PAGE_MASK (PAGE_MASK & __PHYSICAL_MASK)
-#define PTE_MASK (_AT(long, PHYSICAL_PAGE_MASK))
+#define PHYSICAL_PAGE_MASK (__PHYSICAL_MASK & ~__PHYSICAL_LOW_BITS)
+#define PTE_MASK (_AT(pteval_t, PHYSICAL_PAGE_MASK))
#define PMD_PAGE_SIZE (_AC(1, UL) << PMD_SHIFT)
#define PMD_PAGE_MASK (~(PMD_PAGE_SIZE-1))
@@ -24,6 +24,7 @@
/* to align the pointer to the (next) page boundary */
#define PAGE_ALIGN(addr) (((addr)+PAGE_SIZE-1)&PAGE_MASK)
+#define __PHYSICAL_LOW_BITS _AT(phys_addr_t, (PAGE_SIZE-1))
#define __PHYSICAL_MASK _AT(phys_addr_t, (_AC(1,ULL) << __PHYSICAL_MASK_SHIFT) - 1)
#define __VIRTUAL_MASK ((_AC(1,UL) << __VIRTUAL_MASK_SHIFT) - 1)
On Mon, 19 May 2008, Linus Torvalds wrote:
>
> IOE, PTE_MASK should be a "pteval_t". And it should have absolutely
> *nothing* to do with PAGE_MASK. EVER.
>
> IOW, maybe something like this?
.. and in addition to that, perhaps something like this too?
Making the actual _PAGE_XYZ bits be of the proper type makes using bitops
and negation on them automatically DTRT, so that we don't need those
insane casts in pte_mkclean() and friends.
Again, totally untested, but we really should just get the types right,
instead of getting them wrogn and then messing around with various silly
and incorrect work-arounds for the fact that we got them wrong.
And while it's untested - if this cleanup is buggy (or even generates
different code), then we're doing something really wrong somewhere.
Linus
---
include/asm-x86/pgtable.h | 49 +++++++++++++++++++++++----------------------
1 files changed, 25 insertions(+), 24 deletions(-)
diff --git a/include/asm-x86/pgtable.h b/include/asm-x86/pgtable.h
index 55c3a0e..b38b540 100644
--- a/include/asm-x86/pgtable.h
+++ b/include/asm-x86/pgtable.h
@@ -21,27 +21,28 @@
#define _PAGE_BIT_NX 63 /* No execute: only valid after cpuid check */
/*
- * Note: we use _AC(1, L) instead of _AC(1, UL) so that we get a
- * sign-extended value on 32-bit with all 1's in the upper word,
- * which preserves the upper pte values on 64-bit ptes:
+ * Note: we use _AT(pteval_t,1) so that we get the right type
+ * even when we're using PAE and everything is 64-bit with a
+ " 32-bit word-size. That's important for various bit masks.
*/
-#define _PAGE_PRESENT (_AC(1, L)<<_PAGE_BIT_PRESENT)
-#define _PAGE_RW (_AC(1, L)<<_PAGE_BIT_RW)
-#define _PAGE_USER (_AC(1, L)<<_PAGE_BIT_USER)
-#define _PAGE_PWT (_AC(1, L)<<_PAGE_BIT_PWT)
-#define _PAGE_PCD (_AC(1, L)<<_PAGE_BIT_PCD)
-#define _PAGE_ACCESSED (_AC(1, L)<<_PAGE_BIT_ACCESSED)
-#define _PAGE_DIRTY (_AC(1, L)<<_PAGE_BIT_DIRTY)
-#define _PAGE_PSE (_AC(1, L)<<_PAGE_BIT_PSE) /* 2MB page */
-#define _PAGE_GLOBAL (_AC(1, L)<<_PAGE_BIT_GLOBAL) /* Global TLB entry */
-#define _PAGE_UNUSED1 (_AC(1, L)<<_PAGE_BIT_UNUSED1)
-#define _PAGE_UNUSED2 (_AC(1, L)<<_PAGE_BIT_UNUSED2)
-#define _PAGE_UNUSED3 (_AC(1, L)<<_PAGE_BIT_UNUSED3)
-#define _PAGE_PAT (_AC(1, L)<<_PAGE_BIT_PAT)
-#define _PAGE_PAT_LARGE (_AC(1, L)<<_PAGE_BIT_PAT_LARGE)
+#define _PAGE_BIT(x) (_AT(pteval_t,1) << (x))
+#define _PAGE_PRESENT _PAGE_BIT(_PAGE_BIT_PRESENT)
+#define _PAGE_RW _PAGE_BIT(_PAGE_BIT_RW)
+#define _PAGE_USER _PAGE_BIT(_PAGE_BIT_USER)
+#define _PAGE_PWT _PAGE_BIT(_PAGE_BIT_PWT)
+#define _PAGE_PCD _PAGE_BIT(_PAGE_BIT_PCD)
+#define _PAGE_ACCESSED _PAGE_BIT(_PAGE_BIT_ACCESSED)
+#define _PAGE_DIRTY _PAGE_BIT(_PAGE_BIT_DIRTY)
+#define _PAGE_PSE _PAGE_BIT(_PAGE_BIT_PSE) /* 2MB page */
+#define _PAGE_GLOBAL _PAGE_BIT(_PAGE_BIT_GLOBAL) /* Global TLB entry */
+#define _PAGE_UNUSED1 _PAGE_BIT(_PAGE_BIT_UNUSED1)
+#define _PAGE_UNUSED2 _PAGE_BIT(_PAGE_BIT_UNUSED2)
+#define _PAGE_UNUSED3 _PAGE_BIT(_PAGE_BIT_UNUSED3)
+#define _PAGE_PAT _PAGE_BIT(_PAGE_BIT_PAT)
+#define _PAGE_PAT_LARGE _PAGE_BIT(_PAGE_BIT_PAT_LARGE)
#if defined(CONFIG_X86_64) || defined(CONFIG_X86_PAE)
-#define _PAGE_NX (_AC(1, ULL) << _PAGE_BIT_NX)
+#define _PAGE_NX _PAGE_BIT(_PAGE_BIT_NX)
#else
#define _PAGE_NX 0
#endif
@@ -209,22 +210,22 @@ static inline int pmd_large(pmd_t pte)
static inline pte_t pte_mkclean(pte_t pte)
{
- return __pte(pte_val(pte) & ~(pteval_t)_PAGE_DIRTY);
+ return __pte(pte_val(pte) & ~_PAGE_DIRTY);
}
static inline pte_t pte_mkold(pte_t pte)
{
- return __pte(pte_val(pte) & ~(pteval_t)_PAGE_ACCESSED);
+ return __pte(pte_val(pte) & ~_PAGE_ACCESSED);
}
static inline pte_t pte_wrprotect(pte_t pte)
{
- return __pte(pte_val(pte) & ~(pteval_t)_PAGE_RW);
+ return __pte(pte_val(pte) & ~_PAGE_RW);
}
static inline pte_t pte_mkexec(pte_t pte)
{
- return __pte(pte_val(pte) & ~(pteval_t)_PAGE_NX);
+ return __pte(pte_val(pte) & ~_PAGE_NX);
}
static inline pte_t pte_mkdirty(pte_t pte)
@@ -249,7 +250,7 @@ static inline pte_t pte_mkhuge(pte_t pte)
static inline pte_t pte_clrhuge(pte_t pte)
{
- return __pte(pte_val(pte) & ~(pteval_t)_PAGE_PSE);
+ return __pte(pte_val(pte) & ~_PAGE_PSE);
}
static inline pte_t pte_mkglobal(pte_t pte)
@@ -259,7 +260,7 @@ static inline pte_t pte_mkglobal(pte_t pte)
static inline pte_t pte_clrglobal(pte_t pte)
{
- return __pte(pte_val(pte) & ~(pteval_t)_PAGE_GLOBAL);
+ return __pte(pte_val(pte) & ~_PAGE_GLOBAL);
}
static inline pte_t pte_mkspecial(pte_t pte)
On Mon, 19 May 2008, Linus Torvalds wrote: > On Mon, 19 May 2008, Hugh Dickins wrote: > > > > This comes from an assumption in 1c12c4cf9411eb130b245fa8d0fbbaf989477c7b > > mprotect: prevent alteration of the PAT bits, that PTE_MASK is what it's > > supposed to be: whereas it's been wrong forever with PAE, staying 32-bit > > where 64-bit is needed. > > Can we *please* just fix PTE_MASK? That's very much what I'd prefer too. Jeremy has patches in Ingo's tree to do that, which have been tested - though perhaps not in combination with the PAT pte_modify changes. I did check that they're not incompatible in theory, but I sure better try them out later today. > And can we agree to never EVER use that PAGE_MASK thing (which was only > ever meant to work on *addresses*) for any pte operations (including the > definition of PTE_MASK)? Because PAGE_MASK is very much the word-size, and > in 32-bit PAE, the page table entry is bigger. > > IOE, PTE_MASK should be a "pteval_t". And it should have absolutely > *nothing* to do with PAGE_MASK. EVER. Yes, Jeremy makes it a pteval_t. (My builds and Ingo's builds succeed, but I've not worked out how that goes down in assembly: there was an _AT macro in there before, which you've kept too - Jeremy?) > IOW, maybe something like this? > > And no, I haven't tested this at all. But it should make PTE_MASK have > (a) the right type ("pteval_t", not "long" - the latter is pure and utter > crap) > (b) the right value (proper mask, not a sign-extended long - again, the > latter is pure and utter crap) > > but for all I know there might be some broken code that depends on the > current incorrect and totally broken #defines, so this needs testing and > thinking about. Yes, I'm highly resistant to taking untested patches here. The two-liner I sent last night was about my fifth attempt to get it working, and I did start off from a small PTE_MASK correction which didn't work at all. It looked rather like yours, I guess I missed the __PHYSICAL_LOW_BITS part. Jeremy's goes a lot further, he'll know the gotchas better. > It also causes these warnings on 32-bit PAE: > > AS arch/x86/kernel/head_32.o > arch/x86/kernel/head_32.S: Assembler messages: > arch/x86/kernel/head_32.S:225: Warning: left operand is a bignum; > integer 0 assumed > arch/x86/kernel/head_32.S:609: Warning: left operand is a bignum; > integer 0 assumed > > and I do not see why (the end result seems to be identical). > > Ingo, comments? > > Oh, and those #define's should be moved from <asm/page.h> to > <asm/pgtable.h>, I think. They have nothing to do with pages (despite the > name of "physical_page_mask", and really are meaningful only in the > context of some kind of page table entry. Jeremy still has them in asm/page.h. Like your subsequent pte bit cleanups, that can be added later: the important thing is to get X working again on 32-bit NX systems. Hugh Linus Torvalds wrote: > On Mon, 19 May 2008, Hugh Dickins wrote: > >> This comes from an assumption in 1c12c4cf9411eb130b245fa8d0fbbaf989477c7b >> mprotect: prevent alteration of the PAT bits, that PTE_MASK is what it's >> supposed to be: whereas it's been wrong forever with PAE, staying 32-bit >> where 64-bit is needed. >> > > Can we *please* just fix PTE_MASK? > > And can we agree to never EVER use that PAGE_MASK thing (which was only > ever meant to work on *addresses*) for any pte operations (including the > definition of PTE_MASK)? Because PAGE_MASK is very much the word-size, and > in 32-bit PAE, the page table entry is bigger. > > IOE, PTE_MASK should be a "pteval_t". And it should have absolutely > *nothing* to do with PAGE_MASK. EVER. > > IOW, maybe something like this? > That's pretty close to the core of my patches (just reposted), which have been cooking in x86.git for a week or so. One thing I'd take from your patch is something like your __PHYSICAL_LOW_BITS definition, since its a bit clearer than what I did. (I haven't updated my patch before posting just because I wanted to post exactly as tested.) > And no, I haven't tested this at all. But it should make PTE_MASK have > (a) the right type ("pteval_t", not "long" - the latter is pure and utter > crap) > (b) the right value (proper mask, not a sign-extended long - again, the > latter is pure and utter crap) > > but for all I know there might be some broken code that depends on the > current incorrect and totally broken #defines, so this needs testing and > thinking about. > > It also causes these warnings on 32-bit PAE: > > AS arch/x86/kernel/head_32.o > arch/x86/kernel/head_32.S: Assembler messages: > arch/x86/kernel/head_32.S:225: Warning: left operand is a bignum; > integer 0 assumed > arch/x86/kernel/head_32.S:609: Warning: left operand is a bignum; > integer 0 assumed > > and I do not see why (the end result seems to be identical). > > Ingo, comments? > > Oh, and those #define's should be moved from <asm/page.h> to > <asm/pgtable.h>, I think. They have nothing to do with pages (despite the > name of "physical_page_mask", and really are meaningful only in the > context of some kind of page table entry. > > Linus > > --- > include/asm-x86/page.h | 5 +++-- > 1 files changed, 3 insertions(+), 2 deletions(-) > > diff --git a/include/asm-x86/page.h b/include/asm-x86/page.h > index b381f4a..34b4845 100644 > --- a/include/asm-x86/page.h > +++ b/include/asm-x86/page.h > @@ -10,8 +10,8 @@ > > #ifdef __KERNEL__ > > -#define PHYSICAL_PAGE_MASK (PAGE_MASK & __PHYSICAL_MASK) > -#define PTE_MASK (_AT(long, PHYSICAL_PAGE_MASK)) > +#define PHYSICAL_PAGE_MASK (__PHYSICAL_MASK & ~__PHYSICAL_LOW_BITS) > +#define PTE_MASK (_AT(pteval_t, PHYSICAL_PAGE_MASK)) > > #define PMD_PAGE_SIZE (_AC(1, UL) << PMD_SHIFT) > #define PMD_PAGE_MASK (~(PMD_PAGE_SIZE-1)) > @@ -24,6 +24,7 @@ > /* to align the pointer to the (next) page boundary */ > #define PAGE_ALIGN(addr) (((addr)+PAGE_SIZE-1)&PAGE_MASK) > > +#define __PHYSICAL_LOW_BITS _AT(phys_addr_t, (PAGE_SIZE-1)) > #define __PHYSICAL_MASK _AT(phys_addr_t, (_AC(1,ULL) << > __PHYSICAL_MASK_SHIFT) - 1) > #define __VIRTUAL_MASK ((_AC(1,UL) << __VIRTUAL_MASK_SHIFT) - > 1) > > J Hugh Dickins wrote: >> And can we agree to never EVER use that PAGE_MASK thing (which was only >> ever meant to work on *addresses*) for any pte operations (including the >> definition of PTE_MASK)? Because PAGE_MASK is very much the word-size, and >> in 32-bit PAE, the page table entry is bigger. >> >> IOE, PTE_MASK should be a "pteval_t". And it should have absolutely >> *nothing* to do with PAGE_MASK. EVER. >> > > Yes, Jeremy makes it a pteval_t. (My builds and Ingo's builds succeed, > but I've not worked out how that goes down in assembly: there was an > _AT macro in there before, which you've kept too - Jeremy?) > I got rid of a bunch of _AT() uses because the constants aren't used in .S files anywhere. Also, I couldn't see how to represent a 64-bit constant in assembler, so I wasn't sure of their correctness (the as manual is irritatingly vague on the matter). > Yes, I'm highly resistant to taking untested patches here. The two-liner > I sent last night was about my fifth attempt to get it working, and I did > start off from a small PTE_MASK correction which didn't work at all. It > looked rather like yours, I guess I missed the __PHYSICAL_LOW_BITS part. > Jeremy's goes a lot further, he'll know the gotchas better. > __PHYSICAL_LOW_BITS is a bit more elegant than what I did there (the problem is getting a physaddr_t-width PAGE_MASK). But the formulation in my patch certainly works. J What do you mean "representing a 64-bit constant in assembly"? Why are you assuming that there is anything special to it? This looks like it was fixed by Jeremy's PTE_MASK fixes in 2.6.26-rc3-git2. |