after commit 4138cc3418f5eaa7524ff8e927102863f1ba0ea5 i expirience some grafik related perfomance issues. I used glxgears for test. before this patch: 1281.005 FPS and after: 765.000 FPS latest tested commit a4083c9271e0a697278e089f2c0b9a95363ada0a still hase bad performance. I use Pentium D with 2GB RAM, Grafick: i945G, ICH7
Reply-To: akpm@linux-foundation.org (switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). On Tue, 25 Mar 2008 15:11:15 -0700 (PDT) bugme-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=10328 > > Summary: [regression] performance drop for glx > Product: Memory Management > Version: 2.5 > KernelVersion: 2.6.25-rc6 > Platform: All > OS/Version: Linux > Tree: Mainline > Status: NEW > Severity: normal > Priority: P1 > Component: Other > AssignedTo: akpm@osdl.org > ReportedBy: bug-track@fisher-privat.net > > > after commit 4138cc3418f5eaa7524ff8e927102863f1ba0ea5 i expirience some > grafik > related perfomance issues. > > I used glxgears for test. > before this patch: 1281.005 FPS > and after: 765.000 FPS It nearly halved. > latest tested commit a4083c9271e0a697278e089f2c0b9a95363ada0a > still hase bad performance. > > I use Pentium D with 2GB RAM, Grafick: i945G, ICH7 > That's : commit 4138cc3418f5eaa7524ff8e927102863f1ba0ea5 : Author: Siddha, Suresh B <suresh.b.siddha@intel.com> : Date: Wed Jan 30 13:33:43 2008 +0100 : : x86: set strong uncacheable where UC is really desired : : Also use _PAGE_PWT for all the mappings which need uncache mapping. : Instead of existing PAT2 which is UC- (and can be overwritten by MTRRs), : we now use PAT3 which is strong uncacheable. : : This makes it consistent with pgprot_noncached() : : Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> : Signed-off-by: Ingo Molnar <mingo@elte.hu> : Signed-off-by: Thomas Gleixner <tglx@linutronix.de> :
which graphics driver version is this? It seems the code is ASKING for uncached, but WANTING something else... that's a big bug/problem in principle, need to know details of who/what/where
(alternatively, the "good" performance was a correctness bug that really shouldn't have been done.... I'll need to get the graphics guys involved on that just need more details first)
On Tue, Mar 25, 2008 at 03:28:09PM -0700, Andrew Morton wrote: > (switched to email. Please respond via emailed reply-to-all, not via the > bugzilla web interface). > > On Tue, 25 Mar 2008 15:11:15 -0700 (PDT) > bugme-daemon@bugzilla.kernel.org wrote: > > > http://bugzilla.kernel.org/show_bug.cgi?id=10328 > > > > Summary: [regression] performance drop for glx > > > > after commit 4138cc3418f5eaa7524ff8e927102863f1ba0ea5 i expirience some > grafik > > related perfomance issues. > > > > I used glxgears for test. > > before this patch: 1281.005 FPS > > and after: 765.000 FPS > > It nearly halved. > > > latest tested commit a4083c9271e0a697278e089f2c0b9a95363ada0a > > still hase bad performance. > > > > I use Pentium D with 2GB RAM, Grafick: i945G, ICH7 > > > > That's > > : commit 4138cc3418f5eaa7524ff8e927102863f1ba0ea5 > : Author: Siddha, Suresh B <suresh.b.siddha@intel.com> > : Date: Wed Jan 30 13:33:43 2008 +0100 > : > : x86: set strong uncacheable where UC is really desired > : > : Also use _PAGE_PWT for all the mappings which need uncache mapping. > : Instead of existing PAT2 which is UC- (and can be overwritten by > MTRRs), > : we now use PAT3 which is strong uncacheable. > : > : This makes it consistent with pgprot_noncached() Alexey, Can you please try the appended patch? Andrew, can you please push the appended patch for 2.6.25? Thanks. --- fb drivers are using ioremap()/ioremap_nocache(), followed by mtrr_add with WC attribute. Recent changes in page attribute code made both ioremap()/ioremap_nocache() mappings as UC (instead of previous UC-). This breaks the graphics performance, as the effective memory type is UC instead of expected WC. The correct way to fix this is to add ioremap_wc() (which uses UC- in the absence of PAT kernel support and WC with PAT) and change all the fb drivers to use this new ioremap_wc() API. We can take this correct and longer route for post 2.6.25. For now, revert back to the UC- behavior for ioremap/ioremap_nocache. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Cc: Arjan van de Ven <arjan@linux.intel.com> --- diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c index 4afaba0..794895c 100644 --- a/arch/x86/mm/ioremap.c +++ b/arch/x86/mm/ioremap.c @@ -137,7 +137,11 @@ static void __iomem *__ioremap(resource_size_t phys_addr, unsigned long size, switch (mode) { case IOR_MODE_UNCACHED: default: - prot = PAGE_KERNEL_NOCACHE; + /* + * FIXME: we will use UC MINUS for now, as video fb drivers + * depend on it. Upcoming ioremap_wc() will fix this behavior. + */ + prot = PAGE_KERNEL_UC_MINUS; break; case IOR_MODE_CACHED: prot = PAGE_KERNEL; diff --git a/include/asm-x86/pgtable.h b/include/asm-x86/pgtable.h index 174b877..9cf472a 100644 --- a/include/asm-x86/pgtable.h +++ b/include/asm-x86/pgtable.h @@ -85,6 +85,7 @@ extern pteval_t __PAGE_KERNEL, __PAGE_KERNEL_EXEC; #define __PAGE_KERNEL_RX (__PAGE_KERNEL_EXEC & ~_PAGE_RW) #define __PAGE_KERNEL_EXEC_NOCACHE (__PAGE_KERNEL_EXEC | _PAGE_PCD | _PAGE_PWT) #define __PAGE_KERNEL_NOCACHE (__PAGE_KERNEL | _PAGE_PCD | _PAGE_PWT) +#define __PAGE_KERNEL_UC_MINUS (__PAGE_KERNEL | _PAGE_PCD) #define __PAGE_KERNEL_VSYSCALL (__PAGE_KERNEL_RX | _PAGE_USER) #define __PAGE_KERNEL_VSYSCALL_NOCACHE (__PAGE_KERNEL_VSYSCALL | _PAGE_PCD | _PAGE_PWT) #define __PAGE_KERNEL_LARGE (__PAGE_KERNEL | _PAGE_PSE) @@ -101,6 +102,7 @@ extern pteval_t __PAGE_KERNEL, __PAGE_KERNEL_EXEC; #define PAGE_KERNEL_EXEC MAKE_GLOBAL(__PAGE_KERNEL_EXEC) #define PAGE_KERNEL_RX MAKE_GLOBAL(__PAGE_KERNEL_RX) #define PAGE_KERNEL_NOCACHE MAKE_GLOBAL(__PAGE_KERNEL_NOCACHE) +#define PAGE_KERNEL_UC_MINUS MAKE_GLOBAL(__PAGE_KERNEL_UC_MINUS) #define PAGE_KERNEL_EXEC_NOCACHE MAKE_GLOBAL(__PAGE_KERNEL_EXEC_NOCACHE) #define PAGE_KERNEL_LARGE MAKE_GLOBAL(__PAGE_KERNEL_LARGE) #define PAGE_KERNEL_LARGE_EXEC MAKE_GLOBAL(__PAGE_KERNEL_LARGE_EXEC)
Suresh Siddha wrote: > On Tue, Mar 25, 2008 at 03:28:09PM -0700, Andrew Morton wrote: >> (switched to email. Please respond via emailed reply-to-all, not via the >> bugzilla web interface). >> >> On Tue, 25 Mar 2008 15:11:15 -0700 (PDT) >> bugme-daemon@bugzilla.kernel.org wrote: >> >>> http://bugzilla.kernel.org/show_bug.cgi?id=10328 >>> >>> Summary: [regression] performance drop for glx >>> >>> after commit 4138cc3418f5eaa7524ff8e927102863f1ba0ea5 i expirience some >>> grafik >>> related perfomance issues. >>> >>> I used glxgears for test. >>> before this patch: 1281.005 FPS >>> and after: 765.000 FPS >> It nearly halved. >> >>> latest tested commit a4083c9271e0a697278e089f2c0b9a95363ada0a >>> still hase bad performance. >>> >>> I use Pentium D with 2GB RAM, Grafick: i945G, ICH7 >>> >> That's >> >> : commit 4138cc3418f5eaa7524ff8e927102863f1ba0ea5 >> : Author: Siddha, Suresh B <suresh.b.siddha@intel.com> >> : Date: Wed Jan 30 13:33:43 2008 +0100 >> : >> : x86: set strong uncacheable where UC is really desired >> : >> : Also use _PAGE_PWT for all the mappings which need uncache mapping. >> : Instead of existing PAT2 which is UC- (and can be overwritten by >> MTRRs), >> : we now use PAT3 which is strong uncacheable. >> : >> : This makes it consistent with pgprot_noncached() > > Alexey, Can you please try the appended patch? > > Andrew, can you please push the appended patch for 2.6.25? Thanks. > --- > > fb drivers are using ioremap()/ioremap_nocache(), followed by mtrr_add with > WC attribute. Recent changes in page attribute code made both > ioremap()/ioremap_nocache() mappings as UC (instead of previous UC-). This > breaks the graphics performance, as the effective memory type is UC instead > of expected WC. > > The correct way to fix this is to add ioremap_wc() (which uses UC- in the > absence of PAT kernel support and WC with PAT) and change all the > fb drivers to use this new ioremap_wc() API. > > We can take this correct and longer route for post 2.6.25. For now, > revert back to the UC- behavior for ioremap/ioremap_nocache. I would still like to add an ioremap_wc() even in 2.6.25; even if it's for now identical to ioremap_nocache(). Better get the right API in place as soon as possible
* Suresh Siddha <suresh.b.siddha@intel.com> wrote: > fb drivers are using ioremap()/ioremap_nocache(), followed by mtrr_add > with WC attribute. Recent changes in page attribute code made both > ioremap()/ioremap_nocache() mappings as UC (instead of previous UC-). > This breaks the graphics performance, as the effective memory type is > UC instead of expected WC. > > The correct way to fix this is to add ioremap_wc() (which uses UC- in > the absence of PAT kernel support and WC with PAT) and change all the > fb drivers to use this new ioremap_wc() API. > > We can take this correct and longer route for post 2.6.25. For now, > revert back to the UC- behavior for ioremap/ioremap_nocache. thanks Suresh, applied. Ingo
This patch do not make any difference to me. I use xserver-xorg-video-intel 2:2.2.1-1ubuntu5
Created attachment 15442 [details] xorg.log
On Wed, Mar 26, 2008 at 06:29:11AM +0100, Ingo Molnar wrote: > > * Suresh Siddha <suresh.b.siddha@intel.com> wrote: > > > fb drivers are using ioremap()/ioremap_nocache(), followed by mtrr_add > > with WC attribute. Recent changes in page attribute code made both > > ioremap()/ioremap_nocache() mappings as UC (instead of previous UC-). > > This breaks the graphics performance, as the effective memory type is > > UC instead of expected WC. > > > > The correct way to fix this is to add ioremap_wc() (which uses UC- in > > the absence of PAT kernel support and WC with PAT) and change all the > > fb drivers to use this new ioremap_wc() API. > > > > We can take this correct and longer route for post 2.6.25. For now, > > revert back to the UC- behavior for ioremap/ioremap_nocache. > > thanks Suresh, applied. Well, we need to take care of set_memory_uc() aswell, as the previous version didn't fix Alexey's issue. Alexey, can you please test and ack this, before Andrew/Ingo can push relevant bits to their trees. Thanks. --- fb drivers are using ioremap()/ioremap_nocache(), followed by mtrr_add with WC attribute. Recent changes in page attribute code made both ioremap()/ioremap_nocache() mappings as UC (instead of previous UC-). This breaks the graphics performance, as the effective memory type is UC instead of expected WC. The correct way to fix this is to add ioremap_wc() (which uses UC- in the absence of PAT kernel support and WC with PAT) and change all the drivers to use this new ioremap_wc() API. We can take this correct and longer route for post 2.6.25. For now, revert back to the UC- behavior for ioremap/ioremap_nocache. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Alexey Fisher <bug-track@fisher-privat.net> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> --- diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c index 4afaba0..794895c 100644 --- a/arch/x86/mm/ioremap.c +++ b/arch/x86/mm/ioremap.c @@ -137,7 +137,11 @@ static void __iomem *__ioremap(resource_size_t phys_addr, unsigned long size, switch (mode) { case IOR_MODE_UNCACHED: default: - prot = PAGE_KERNEL_NOCACHE; + /* + * FIXME: we will use UC MINUS for now, as video fb drivers + * depend on it. Upcoming ioremap_wc() will fix this behavior. + */ + prot = PAGE_KERNEL_UC_MINUS; break; case IOR_MODE_CACHED: prot = PAGE_KERNEL; diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c index 14e48b5..7b79f6b 100644 --- a/arch/x86/mm/pageattr.c +++ b/arch/x86/mm/pageattr.c @@ -771,7 +771,7 @@ static inline int change_page_attr_clear(unsigned long addr, int numpages, int set_memory_uc(unsigned long addr, int numpages) { return change_page_attr_set(addr, numpages, - __pgprot(_PAGE_PCD | _PAGE_PWT)); + __pgprot(_PAGE_PCD)); } EXPORT_SYMBOL(set_memory_uc); diff --git a/include/asm-x86/pgtable.h b/include/asm-x86/pgtable.h index 174b877..9cf472a 100644 --- a/include/asm-x86/pgtable.h +++ b/include/asm-x86/pgtable.h @@ -85,6 +85,7 @@ extern pteval_t __PAGE_KERNEL, __PAGE_KERNEL_EXEC; #define __PAGE_KERNEL_RX (__PAGE_KERNEL_EXEC & ~_PAGE_RW) #define __PAGE_KERNEL_EXEC_NOCACHE (__PAGE_KERNEL_EXEC | _PAGE_PCD | _PAGE_PWT) #define __PAGE_KERNEL_NOCACHE (__PAGE_KERNEL | _PAGE_PCD | _PAGE_PWT) +#define __PAGE_KERNEL_UC_MINUS (__PAGE_KERNEL | _PAGE_PCD) #define __PAGE_KERNEL_VSYSCALL (__PAGE_KERNEL_RX | _PAGE_USER) #define __PAGE_KERNEL_VSYSCALL_NOCACHE (__PAGE_KERNEL_VSYSCALL | _PAGE_PCD | _PAGE_PWT) #define __PAGE_KERNEL_LARGE (__PAGE_KERNEL | _PAGE_PSE) @@ -101,6 +102,7 @@ extern pteval_t __PAGE_KERNEL, __PAGE_KERNEL_EXEC; #define PAGE_KERNEL_EXEC MAKE_GLOBAL(__PAGE_KERNEL_EXEC) #define PAGE_KERNEL_RX MAKE_GLOBAL(__PAGE_KERNEL_RX) #define PAGE_KERNEL_NOCACHE MAKE_GLOBAL(__PAGE_KERNEL_NOCACHE) +#define PAGE_KERNEL_UC_MINUS MAKE_GLOBAL(__PAGE_KERNEL_UC_MINUS) #define PAGE_KERNEL_EXEC_NOCACHE MAKE_GLOBAL(__PAGE_KERNEL_EXEC_NOCACHE) #define PAGE_KERNEL_LARGE MAKE_GLOBAL(__PAGE_KERNEL_LARGE) #define PAGE_KERNEL_LARGE_EXEC MAKE_GLOBAL(__PAGE_KERNEL_LARGE_EXEC)
Last patch working for me: glxgears 6135 frames in 5.0 seconds = 1226.967 FPS 6202 frames in 5.0 seconds = 1240.361 FPS
fixed by commit d546b67a940eb42a99f56b86c5cd8d47c8348c2a
Same issue with commit 2e5d9c857d4e6c9e7b7d8c8c86a68a7842d213d6 kernel 2.6.25-git1
commit 2e5d9c857d4e6c9e7b7d8c8c86a68a7842d213d6 Author: venkatesh.pallipadi@intel.com <venkatesh.pallipadi@intel.com> Date: Tue Mar 18 17:00:14 2008 -0700 x86: PAT infrastructure patch Sets up pat_init() infrastructure. PAT MSR has following setting. PAT |PCD ||PWT ||| 000 WB _PAGE_CACHE_WB 001 WC _PAGE_CACHE_WC 010 UC- _PAGE_CACHE_UC_MINUS 011 UC _PAGE_CACHE_UC We are effectively changing WT from boot time setting to WC. UC_MINUS is used to provide backward compatibility to existing /dev/mem users(X). reserve_memtype and free_memtype are new interfaces for maintaining alias-free mapping. It is currently implemented in a simple way with a linked list and not optimized. reserve and free tracks the effective memory type, as a result of PAT and MTRR setting rather than what is actually requested in PAT. pat_init piggy backs on mtrr_init as the rules for setting both pat and mtrr are same. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Are you sure this is the same bug? If not, I'd prefer to use another bug entry for tracking it as a regression from 2.6.25.
This issue have same sumptomps: Grafik perfomance drop ( glxgears ) from 1200 fps to 700 fps. And couset by patch wich enable PAT. I would say this is the same bug, but i can't sure. I'm not programmer, just theology student ;)
Am Dienstag, den 25.03.2008, 21:42 -0700 schrieb Arjan van de Ven: > Suresh Siddha wrote: > > On Tue, Mar 25, 2008 at 03:28:09PM -0700, Andrew Morton wrote: > >> (switched to email. Please respond via emailed reply-to-all, not via the > >> bugzilla web interface). > >> > >> On Tue, 25 Mar 2008 15:11:15 -0700 (PDT) > >> bugme-daemon@bugzilla.kernel.org wrote: > >> > >>> http://bugzilla.kernel.org/show_bug.cgi?id=10328 > >>> > >>> Summary: [regression] performance drop for glx > >>> > >>> after commit 4138cc3418f5eaa7524ff8e927102863f1ba0ea5 i expirience some > grafik > >>> related perfomance issues. > >>> > >>> I used glxgears for test. > >>> before this patch: 1281.005 FPS > >>> and after: 765.000 FPS > >> It nearly halved. > >> > >>> latest tested commit a4083c9271e0a697278e089f2c0b9a95363ada0a > >>> still hase bad performance. > >>> > >>> I use Pentium D with 2GB RAM, Grafick: i945G, ICH7 > >>> > >> That's > >> > >> : commit 4138cc3418f5eaa7524ff8e927102863f1ba0ea5 > >> : Author: Siddha, Suresh B <suresh.b.siddha@intel.com> > >> : Date: Wed Jan 30 13:33:43 2008 +0100 > >> : > >> : x86: set strong uncacheable where UC is really desired > >> : > >> : Also use _PAGE_PWT for all the mappings which need uncache mapping. > >> : Instead of existing PAT2 which is UC- (and can be overwritten by > MTRRs), > >> : we now use PAT3 which is strong uncacheable. > >> : > >> : This makes it consistent with pgprot_noncached() > > > > Alexey, Can you please try the appended patch? > > > > Andrew, can you please push the appended patch for 2.6.25? Thanks. > > --- > > This issue come again with commit: commit 2e5d9c857d4e6c9e7b7d8c8c86a68a7842d213d6 Author: venkatesh.pallipadi@intel.com <venkatesh.pallipadi@intel.com> Date: Tue Mar 18 17:00:14 2008 -0700 x86: PAT infrastructure patch Sets up pat_init() infrastructure. PAT MSR has following setting. PAT |PCD ||PWT ||| 000 WB _PAGE_CACHE_WB 001 WC _PAGE_CACHE_WC 010 UC- _PAGE_CACHE_UC_MINUS 011 UC _PAGE_CACHE_UC We are effectively changing WT from boot time setting to WC. UC_MINUS is used to provide backward compatibility to existing /dev/mem users(X). reserve_memtype and free_memtype are new interfaces for maintaining alias-free mapping. It is currently implemented in a simple way with a linked list and not optimized. reserve and free tracks the effective memory type, as a result of PAT and MTRR setting rather than what is actually requested in PAT. pat_init piggy backs on mtrr_init as the rules for setting both pat and mtrr are same. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
On Fri, Apr 25, 2008 at 07:35:23AM +0200, Alexey Fisher wrote: > > >> On Tue, 25 Mar 2008 15:11:15 -0700 (PDT) > > >> bugme-daemon@bugzilla.kernel.org wrote: > > >> > > >>> http://bugzilla.kernel.org/show_bug.cgi?id=10328 > > >>> > > >>> Summary: [regression] performance drop for glx > > > > > > Alexey, Can you please try the appended patch? > > > > > > Andrew, can you please push the appended patch for 2.6.25? Thanks. > > > --- > > > > > > This issue come again with commit: > > commit 2e5d9c857d4e6c9e7b7d8c8c86a68a7842d213d6 > Author: venkatesh.pallipadi@intel.com <venkatesh.pallipadi@intel.com> > Date: Tue Mar 18 17:00:14 2008 -0700 > > x86: PAT infrastructure patch > > Sets up pat_init() infrastructure. > Alexey, Sorry. I missed posting a fix in the post PAT series for this issue. Can you please try the appended patch and Ack? copying X folks get their attention and their Ack on the patch and next steps outlined. Thanks. --- [patch] use UC minus for ioremap(), ioremap_nocache() and pci_mmap_page_range() Use UC_MINUS for ioremap(), ioremap_nocache() instead of strong UC. Once all the X drivers move to ioremap_wc(), we can go back to strong UC semantics for ioremap() and ioremap_nocache(). To avoid attribute aliasing issues, pci_mmap_page_range() will also use UC_MINUS for default non write-combining mapping request. Next steps: a) change all the video drivers using ioremap() or ioremap_nocache() and adding WC MTTR using mttr_add() to ioremap_wc() b) for strict usage, we can go back to strong uc semantics for ioremap() and ioremap_nocache() after some grace period for completing step-a. c) user level X server needs to use the appropriate method for setting up WC mapping (like using resourceX_wc sysfs file instead of adding MTRR for WC and using /dev/mem or resourceX under /sys) Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> --- diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c index 3a4baf9..c053977 100644 --- a/arch/x86/mm/ioremap.c +++ b/arch/x86/mm/ioremap.c @@ -176,11 +176,11 @@ static void __iomem *__ioremap(resource_size_t phys_addr, unsigned long size, /* * Do not fallback to certain memory types with certain * requested type: - * - request is uncached, return cannot be write-back - * - request is uncached, return cannot be write-combine + * - request is uc-, return cannot be write-back + * - request is uc-, return cannot be write-combine * - request is write-combine, return cannot be write-back */ - if ((prot_val == _PAGE_CACHE_UC && + if ((prot_val == _PAGE_CACHE_UC_MINUS && (new_prot_val == _PAGE_CACHE_WB || new_prot_val == _PAGE_CACHE_WC)) || (prot_val == _PAGE_CACHE_WC && @@ -201,6 +201,9 @@ static void __iomem *__ioremap(resource_size_t phys_addr, unsigned long size, default: prot = PAGE_KERNEL_NOCACHE; break; + case _PAGE_CACHE_UC_MINUS: + prot = PAGE_KERNEL_UC_MINUS; + break; case _PAGE_CACHE_WC: prot = PAGE_KERNEL_WC; break; @@ -255,7 +258,16 @@ static void __iomem *__ioremap(resource_size_t phys_addr, unsigned long size, */ void __iomem *ioremap_nocache(resource_size_t phys_addr, unsigned long size) { - return __ioremap(phys_addr, size, _PAGE_CACHE_UC); + /* + * Ideally, this should be: + * pat_wc_enabled ? _PAGE_CACHE_UC : _PAGE_CACHE_UC_MINUS; + * + * Till we fix all X drivers to use ioremap_wc(), we will use + * UC MINUS. + */ + unsigned long val = _PAGE_CACHE_UC_MINUS; + + return __ioremap(phys_addr, size, val); } EXPORT_SYMBOL(ioremap_nocache); diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c index c29ebd0..ad3202c 100644 --- a/arch/x86/mm/pageattr.c +++ b/arch/x86/mm/pageattr.c @@ -779,14 +779,20 @@ static inline int change_page_attr_clear(unsigned long addr, int numpages, int _set_memory_uc(unsigned long addr, int numpages) { + /* + * for now UC MINUS. see comments in ioremap_nocache() + */ return change_page_attr_set(addr, numpages, - __pgprot(_PAGE_CACHE_UC)); + __pgprot(_PAGE_CACHE_UC_MINUS)); } int set_memory_uc(unsigned long addr, int numpages) { + /* + * for now UC MINUS. see comments in ioremap_nocache() + */ if (reserve_memtype(addr, addr + numpages * PAGE_SIZE, - _PAGE_CACHE_UC, NULL)) + _PAGE_CACHE_UC_MINUS, NULL)) return -EINVAL; return _set_memory_uc(addr, numpages); diff --git a/arch/x86/pci/i386.c b/arch/x86/pci/i386.c index 94f6c73..8af0f0b 100644 --- a/arch/x86/pci/i386.c +++ b/arch/x86/pci/i386.c @@ -301,6 +301,13 @@ int pci_mmap_page_range(struct pci_dev *dev, struct vm_area_struct *vma, prot = pgprot_val(vma->vm_page_prot); if (pat_wc_enabled && write_combine) prot |= _PAGE_CACHE_WC; + else if (pat_wc_enabled) + /* + * ioremap() and ioremap_nocache() defaults to UC MINUS for now. + * To avoid attribute conflicts, request UC MINUS here + * aswell. + */ + prot |= _PAGE_CACHE_UC_MINUS; else if (boot_cpu_data.x86 > 3) prot |= _PAGE_CACHE_UC; @@ -319,9 +326,8 @@ int pci_mmap_page_range(struct pci_dev *dev, struct vm_area_struct *vma, * - request is uncached, return cannot be write-combine * - request is write-combine, return cannot be write-back */ - if ((flags == _PAGE_CACHE_UC && - (new_flags == _PAGE_CACHE_WB || - new_flags == _PAGE_CACHE_WC)) || + if ((flags == _PAGE_CACHE_UC_MINUS && + (new_flags == _PAGE_CACHE_WB)) || (flags == _PAGE_CACHE_WC && new_flags == _PAGE_CACHE_WB)) { free_memtype(addr, addr+len);
Am Freitag, den 25.04.2008, 17:07 -0700 schrieb Suresh Siddha: > On Fri, Apr 25, 2008 at 07:35:23AM +0200, Alexey Fisher wrote: > > > >> On Tue, 25 Mar 2008 15:11:15 -0700 (PDT) > > > >> bugme-daemon@bugzilla.kernel.org wrote: > > > >> > > > >>> http://bugzilla.kernel.org/show_bug.cgi?id=10328 > > > >>> > > > >>> Summary: [regression] performance drop for glx > > > > > > > > Alexey, Can you please try the appended patch? > > > > > > > > Andrew, can you please push the appended patch for 2.6.25? Thanks. > > > > --- > > > > > > > > > > This issue come again with commit: > > > > commit 2e5d9c857d4e6c9e7b7d8c8c86a68a7842d213d6 > > Author: venkatesh.pallipadi@intel.com <venkatesh.pallipadi@intel.com> > > Date: Tue Mar 18 17:00:14 2008 -0700 > > > > x86: PAT infrastructure patch > > > > Sets up pat_init() infrastructure. > > > > Alexey, Sorry. I missed posting a fix in the post PAT series for this issue. > Can you > please try the appended patch and Ack? > > copying X folks get their attention and their Ack on the patch and next steps > outlined. > > Thanks. > --- > > [patch] use UC minus for ioremap(), ioremap_nocache() and > pci_mmap_page_range() > > Use UC_MINUS for ioremap(), ioremap_nocache() instead of strong UC. > Once all the X drivers move to ioremap_wc(), we can go back to strong > UC semantics for ioremap() and ioremap_nocache(). > Thank you. This patch working. Ack ?
* Suresh Siddha <suresh.b.siddha@intel.com> wrote: > Alexey, Sorry. I missed posting a fix in the post PAT series for this > issue. Can you please try the appended patch and Ack? > > copying X folks get their attention and their Ack on the patch and > next steps outlined. i queued up the patch below - hand-merged to x86.git. Ingo -----------> Subject: x86 PAT: fix performance drop for glx, use UC minus for ioremap(), ioremap_nocache() and pci_mmap_page_range() From: Suresh Siddha <suresh.b.siddha@intel.com> Date: Fri, 25 Apr 2008 17:07:22 -0700 Use UC_MINUS for ioremap(), ioremap_nocache() instead of strong UC. Once all the X drivers move to ioremap_wc(), we can go back to strong UC semantics for ioremap() and ioremap_nocache(). To avoid attribute aliasing issues, pci_mmap_page_range() will also use UC_MINUS for default non write-combining mapping request. Next steps: a) change all the video drivers using ioremap() or ioremap_nocache() and adding WC MTTR using mttr_add() to ioremap_wc() b) for strict usage, we can go back to strong uc semantics for ioremap() and ioremap_nocache() after some grace period for completing step-a. c) user level X server needs to use the appropriate method for setting up WC mapping (like using resourceX_wc sysfs file instead of adding MTRR for WC and using /dev/mem or resourceX under /sys) Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> --- arch/x86/mm/ioremap.c | 20 ++++++++++++++++---- arch/x86/mm/pageattr.c | 10 ++++++++-- arch/x86/pci/i386.c | 12 +++++++++--- 3 files changed, 33 insertions(+), 9 deletions(-) Index: linux-x86.q/arch/x86/mm/ioremap.c =================================================================== --- linux-x86.q.orig/arch/x86/mm/ioremap.c +++ linux-x86.q/arch/x86/mm/ioremap.c @@ -176,11 +176,11 @@ static void __iomem *__ioremap(resource_ /* * Do not fallback to certain memory types with certain * requested type: - * - request is uncached, return cannot be write-back - * - request is uncached, return cannot be write-combine + * - request is uc-, return cannot be write-back + * - request is uc-, return cannot be write-combine * - request is write-combine, return cannot be write-back */ - if ((prot_val == _PAGE_CACHE_UC && + if ((prot_val == _PAGE_CACHE_UC_MINUS && (new_prot_val == _PAGE_CACHE_WB || new_prot_val == _PAGE_CACHE_WC)) || (prot_val == _PAGE_CACHE_WC && @@ -201,6 +201,9 @@ static void __iomem *__ioremap(resource_ default: prot = PAGE_KERNEL_NOCACHE; break; + case _PAGE_CACHE_UC_MINUS: + prot = PAGE_KERNEL_UC_MINUS; + break; case _PAGE_CACHE_WC: prot = PAGE_KERNEL_WC; break; @@ -255,7 +258,16 @@ static void __iomem *__ioremap(resource_ */ void __iomem *ioremap_nocache(resource_size_t phys_addr, unsigned long size) { - return __ioremap(phys_addr, size, _PAGE_CACHE_UC); + /* + * Ideally, this should be: + * pat_wc_enabled ? _PAGE_CACHE_UC : _PAGE_CACHE_UC_MINUS; + * + * Till we fix all X drivers to use ioremap_wc(), we will use + * UC MINUS. + */ + unsigned long val = _PAGE_CACHE_UC_MINUS; + + return __ioremap(phys_addr, size, val); } EXPORT_SYMBOL(ioremap_nocache); Index: linux-x86.q/arch/x86/mm/pageattr.c =================================================================== --- linux-x86.q.orig/arch/x86/mm/pageattr.c +++ linux-x86.q/arch/x86/mm/pageattr.c @@ -829,14 +829,20 @@ static inline int change_page_attr_clear int _set_memory_uc(unsigned long addr, int numpages) { + /* + * for now UC MINUS. see comments in ioremap_nocache() + */ return change_page_attr_set(&addr, numpages, - __pgprot(_PAGE_CACHE_UC), 0); + __pgprot(_PAGE_CACHE_UC_MINUS), 0); } int set_memory_uc(unsigned long addr, int numpages) { + /* + * for now UC MINUS. see comments in ioremap_nocache() + */ if (reserve_memtype(addr, addr + numpages * PAGE_SIZE, - _PAGE_CACHE_UC, NULL)) + _PAGE_CACHE_UC_MINUS, NULL)) return -EINVAL; return _set_memory_uc(addr, numpages); Index: linux-x86.q/arch/x86/pci/i386.c =================================================================== --- linux-x86.q.orig/arch/x86/pci/i386.c +++ linux-x86.q/arch/x86/pci/i386.c @@ -301,6 +301,13 @@ int pci_mmap_page_range(struct pci_dev * prot = pgprot_val(vma->vm_page_prot); if (pat_wc_enabled && write_combine) prot |= _PAGE_CACHE_WC; + else if (pat_wc_enabled) + /* + * ioremap() and ioremap_nocache() defaults to UC MINUS for now. + * To avoid attribute conflicts, request UC MINUS here + * aswell. + */ + prot |= _PAGE_CACHE_UC_MINUS; else if (boot_cpu_data.x86 > 3) prot |= _PAGE_CACHE_UC; @@ -319,9 +326,8 @@ int pci_mmap_page_range(struct pci_dev * * - request is uncached, return cannot be write-combine * - request is write-combine, return cannot be write-back */ - if ((flags == _PAGE_CACHE_UC && - (new_flags == _PAGE_CACHE_WB || - new_flags == _PAGE_CACHE_WC)) || + if ((flags == _PAGE_CACHE_UC_MINUS && + (new_flags == _PAGE_CACHE_WB)) || (flags == _PAGE_CACHE_WC && new_flags == _PAGE_CACHE_WB)) { free_memtype(addr, addr+len);