Latest working kernel version: Probably none Earliest failing kernel version: Probably all Distribution: Ubuntu Hardware Environment: Samsung Q45 Dalia laptop Software Environment: Ubuntu 8.04 LTS (hardy), Kernel 2.6.24-16-generic (32bit) Problem Description: When I upgraded my laptop from 1GB RAM to 4GB RAM the MTRRs got messed up so that X could no longer set up a write-combine region for the video memory, leading to a severe loss of performance. I'll attach dmesg, dmidecode and contents of /proc/mtrr with 2GB (which also does not exhibit this bug) and 4GB of RAM. The gist of it is this: /proc/mtrr with 2GB: reg00: base=0x00000000 ( 0MB), size=2048MB: write-back, count=1 reg01: base=0x7f700000 (2039MB), size= 1MB: uncachable, count=1 reg02: base=0x7f800000 (2040MB), size= 8MB: uncachable, count=1 reg03: base=0xd0000000 (3328MB), size= 256MB: write-combining, count=1 (last range added by X server) /proc/mtrr with 4GB: reg00: base=0xc0000000 (3072MB), size=1024MB: uncachable, count=1 reg01: base=0x00000000 ( 0MB), size=4096MB: write-back, count=1 reg02: base=0x100000000 (4096MB), size=1024MB: write-back, count=1 reg03: base=0xbf700000 (3063MB), size= 1MB: uncachable, count=1 reg04: base=0xbf800000 (3064MB), size= 8MB: uncachable, count=1 The video memory is at 0xd0000000 (256MB). Note that this range is already included in reg00 and reg01, so the X server cannot set up a write-combining range. If I manually fix the ranges to look like this: reg00: base=0xc0000000 (3072MB), size= 256MB: uncachable, count=1 reg01: base=0x00000000 ( 0MB), size=2048MB: write-back, count=1 reg02: base=0x100000000 (4096MB), size=1024MB: write-back, count=1 reg03: base=0xbf700000 (3063MB), size= 1MB: uncachable, count=1 reg04: base=0xbf800000 (3064MB), size= 8MB: uncachable, count=1 reg05: base=0x80000000 (2048MB), size=1024MB: write-back, count=1 reg06: base=0xe0000000 (3584MB), size= 512MB: uncachable, count=1 , i.e. explicitly excluding 0xd0000000 (256MB) from both problematic ranges, then the X server can set up the write-combining range again: reg00: base=0xc0000000 (3072MB), size= 256MB: uncachable, count=1 reg01: base=0x00000000 ( 0MB), size=2048MB: write-back, count=1 reg02: base=0x100000000 (4096MB), size=1024MB: write-back, count=1 reg03: base=0xbf700000 (3063MB), size= 1MB: uncachable, count=1 reg04: base=0xbf800000 (3064MB), size= 8MB: uncachable, count=1 reg05: base=0x80000000 (2048MB), size=1024MB: write-back, count=1 reg06: base=0xe0000000 (3584MB), size= 512MB: uncachable, count=1 reg07: base=0xd0000000 (3328MB), size= 256MB: write-combining, count=1 (last range added by X server) I'm not sure who is responsible for the MTRRs, BIOS or kernel or both. In case of a broken BIOS, maybe the kernel can sanitize them anyway, if it knows where the video memory is located? More info: Ubuntu bug: https://bugs.launchpad.net/bugs/210780 Laptop info: https://wiki.ubuntu.com/LaptopTestingTeam/SamsungQ45Dalia Let me know if I can test something or provide more info!
Created attachment 15852 [details] dmesg with 2GB
Created attachment 15853 [details] dmesg with 4GB
Created attachment 15854 [details] /proc/mtrr with 2GB RAM
Created attachment 15855 [details] /proc/mtrr with 4GB RAM
Created attachment 15856 [details] dmidecode with 2GB
Created attachment 15857 [details] dmidecode with 4GB
Thanks. I recategorised this as platform-i386
Could you please check if the following patch works ? http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blobdiff;f=arch/x86/kernel/cpu/mtrr/main.c;h=b6e136f23d3d3219094bc9fdadaeaba048f01b96;hp=1e27b69a7a0eca1750e4c16dd2470e49ab706112;hb=20651af9ac60fd6e31360688ad44861a7d05256a;hpb=971a52d66a3e87d4d2f5d3455e62680447cdb8e9 It apparently is a problem caused by improper mtrr trimming.
AFAICT trimming was not even done in 2.6.24... In any case I can't apply the patch on top of my running kernel, so I went and compiled 2.6.25 which should include this patch. It did not change anything. /proc/mtrr looks just like it does with my Ubuntu kernel. I'll attach the new dmesg with 2.6.25.
Created attachment 15885 [details] dmesg with 4GB (vanilla 2.6.25)
Yinghai Lu wrote a patch, that almost fixes this issue: http://lkml.org/lkml/2008/4/28/52 /proc/mtrr and dmesg are attached.
Created attachment 15943 [details] /proc/mtrr with proposed fix and mtrr_chunk_size=1g
Created attachment 15944 [details] /proc/mtrr with proposed fix and WITHOUT mtrr_chunk_size=1g
Created attachment 15945 [details] dmesg with proposed fix and mtrr_chunk_size=1g (on top of vanilla 2.6.25)
Created attachment 15946 [details] dmesg with proposed fix and WITHOUT mtrr_chunk_size=1g (on top of vanilla 2.6.25)
New version of the patch seems to work! http://lkml.org/lkml/2008/4/28/52
Created attachment 15948 [details] /proc/mtrr with proposed fix (v2) and mtrr_chunk_size=1g
Created attachment 15949 [details] /proc/mtrr with proposed fix (v2) and WITHOUT mtrr_chunk_size=1g
Created attachment 15950 [details] dmesg with proposed fix (v2) and mtrr_chunk_size=1g (on top of vanilla 2.6.25)
Created attachment 15951 [details] dmesg with proposed fix (v2) and WITHOUT mtrr_chunk_size=1g (on top of vanilla 2.6.25)
For what it is worth at this late date, my userland program can adjust the MTRR settings fix this problem. See http://lkml.org/lkml/2008/9/28/7 Here's what my program says: $ ./mtrr-uncover --mtrrfile mtrr.sample.kbt10508 0xd0000000-0xdfffffff Initial MTRR configuration: 1 0x000000000-0x0ffffffff write-back 3 0x0bf700000-0x0bf7fffff uncachable 4 0x0bf800000-0x0bfffffff uncachable 0 0x0c0000000-0x0ffffffff uncachable 2 0x100000000-0x13fffffff write-back Final MTRR configuration: 1' 0x000000000-0x07fffffff write-back 50' 0x080000000-0x0bfffffff write-back 3 0x0bf700000-0x0bf7fffff uncachable 4 0x0bf800000-0x0bfffffff uncachable 2 0x100000000-0x13fffffff write-back Commands for /proc/mtrr to make these changes: disable=0 disable=1 base=0x000000000 size=0x080000000 type=write-back base=0x080000000 size=0x040000000 type=write-back
this bug should be fixed in v2.6.26 by these commits: f5098d6: x86: mtrr cleanup for converting continuous to discrete layout v8 - fix 95ffa24: x86: mtrr cleanup for converting continuous to discrete layout, v8
I am still seeing this bug in kernel 2.6.32 (in Debian squeeze) on a Dell OptiPlex 755: $ cat /proc/mtrr reg00: base=0x000000000 ( 0MB), size=65536MB, count=1: write-back reg01: base=0x07d600000 ( 2006MB), size= 2MB, count=1: uncachable reg02: base=0x07d800000 ( 2008MB), size= 8MB, count=1: uncachable reg03: base=0x07e000000 ( 2016MB), size= 32MB, count=1: uncachable reg04: base=0x07d500000 ( 2005MB), size= 1MB, count=1: uncachable reg05: base=0x080000000 ( 2048MB), size= 2048MB, count=1: uncachable $ dmesg | grep -i mtrr [ 0.000000] Command line: initrd=initrd.img-2.6.32-5-amd64 root=/dev/sda1 ro quiet enable_mtrr_cleanup BOOT_IMAGE=vmlinuz-2.6.32-5-amd64 [ 0.000000] MTRR default type: uncachable [ 0.000000] MTRR fixed ranges enabled: [ 0.000000] MTRR variable ranges enabled: [ 0.000000] original variable MTRRs [ 0.000000] mtrr_cleanup: can not find optimal value [ 0.000000] please specify mtrr_gran_size/mtrr_chunk_size [ 0.000000] Kernel command line: initrd=initrd.img-2.6.32-5-amd64 root=/dev/sda1 ro quiet enable_mtrr_cleanup BOOT_IMAGE=vmlinuz-2.6.32-5-amd64 [ 10.157993] mtrr: type mismatch for d0000000,10000000 old: write-back new: write-combining [ 10.157997] [drm] MTRR allocation failed. Graphics performance may suffer.
@ Laurent: That's not a bug. It reflects a limitation: there are only 8 MTRRs to play with. That's not enough to create MTRR settings that match the initial ones but without overlap. The initial setting of MTRRs by the BIOS is pretty questionable. In particular, the first MTRR suggests that you have 64GB of RAM, with a few holes specified by the remaining MTRRs. How much RAM do you actually have? Do you actually know the memory layout? I think that you can find it in the output of dmesg. Look for something like: BIOS-provided physical RAM map: BIOS-e820: 0000000000000000 - 000000000009fc00 (usable) BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved) BIOS-e820: 00000000000e4000 - 0000000000100000 (reserved) BIOS-e820: 0000000000100000 - 00000000cffb0000 (usable) BIOS-e820: 00000000cffb0000 - 00000000cffbe000 (ACPI data) BIOS-e820: 00000000cffbe000 - 00000000cfff0000 (ACPI NVS) BIOS-e820: 00000000cfff0000 - 00000000d0000000 (reserved) BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved) BIOS-e820: 00000000fff00000 - 0000000100000000 (reserved) BIOS-e820: 0000000100000000 - 00000001b0000000 (usable) What is your motherboard? BIOS version? If there is a BIOS update, I would consider applying it in the hope that the MTRR settings are improved. Even without a BIOS fix, there are workarounds.
Thanks for your help! > How much RAM do you actually have? Those machines have 2GB of RAM. > Do you actually know the memory layout? I think that you can find it in the output of dmesg. Here it is: [ 0.000000] BIOS-provided physical RAM map: [ 0.000000] BIOS-e820: 0000000000000000 - 000000000009e800 (usable) [ 0.000000] BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved) [ 0.000000] BIOS-e820: 0000000000100000 - 000000007d3ff800 (usable) [ 0.000000] BIOS-e820: 000000007d3ff800 - 000000007d453c00 (ACPI NVS) [ 0.000000] BIOS-e820: 000000007d453c00 - 000000007d455c00 (ACPI data) [ 0.000000] BIOS-e820: 000000007d455c00 - 000000007e000000 (reserved) [ 0.000000] BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved) [ 0.000000] BIOS-e820: 00000000fec00000 - 00000000fed00400 (reserved) [ 0.000000] BIOS-e820: 00000000fed20000 - 00000000feda0000 (reserved) [ 0.000000] BIOS-e820: 00000000fee00000 - 00000000fef00000 (reserved) [ 0.000000] BIOS-e820: 00000000ffb00000 - 0000000100000000 (reserved) > What is your motherboard? Dell OptiPlex 755. > BIOS version? Here is what lshw outputs: pc-dg-039-01 description: Mini Tower Computer product: OptiPlex 755 vendor: Dell Inc. serial: xxx width: 64 bits capabilities: smbios-2.5 dmi-2.5 smp-1.4 smp vsyscall64 vsyscall32 configuration: administrator_password=enabled boot=normal chassis=mini-tower cpus=2 power-on_password=enabled uuid=xxx *-core description: Motherboard product: 0GM819 vendor: Dell Inc. physical id: 0 serial: ..xxx. *-firmware description: BIOS vendor: Dell Inc. physical id: 0 version: A09 (03/11/2008) size: 64KiB capacity: 4032KiB capabilities: pci pnp apm upgrade shadowing escd cdboot bootselect edd int13floppytoshiba int13floppy720 int5printscreen int9keyboard int14serial int17printer acpi usb biosbootspecification netboot > If there is a BIOS update, I would consider applying it in the hope that the MTRR settings are improved. I will try that... > Even without a BIOS fix, there are workarounds. Do you mean mtrr-uncover-2009august14.tgz ? I was hoping that the enable_mtrr_cleanup kernel option would work. Besides I did not find any doc about the mtrr_gran_size/mtrr_chunk_size kernel options.
@ Laurent Here's what I'd consider: First choice: fix BIOS via update from Dell (if any) Second choice: ignore the problem unless it has consequences. What affect does this have? Is X slower (most X video drivers no longer use MTRRs to change caching of the frame buffer)? Where is your video buffer? Perhaps 0x07e000000-0x07fffffff (32m is a bit small for a video buffer). Third choice: hand craft a sequence of echo >/proc/mtrr commands to fix up the MTRRs. mtrr-uncover can be used to show you some of the commands but you will at least have to do one command by hand: the one to fix MTRR reg00. The correct initial value for MTRR reg00 would look like this: reg00: base=0x000000000 ( 0MB), size= 2048MB, count=1: write-back These values (sorted by address) puzzle me. What are they about? reg04: base=0x07d500000 ( 2005MB), size= 1MB, count=1: uncachable reg01: base=0x07d600000 ( 2006MB), size= 2MB, count=1: uncachable reg02: base=0x07d800000 ( 2008MB), size= 8MB, count=1: uncachable reg03: base=0x07e000000 ( 2016MB), size= 32MB, count=1: uncachable These form one contiguous range, but not a convenient poser of 2 in size. This doesn't work well with the MTRR mechanism which works with ranges that are a power of two in size, aligned on a multiple of its size. This uncached size is 43mb, so it needs to be represented using 4 MTRRs. To uncover it takes even more (more than 8!). If you managed to round this area up to 64MB, it would be possible to uncover the MTRRs with the available MTRRs.