Bug 8883 - System performance is *very* slow with 8GB RAM on 64-bit Intel
Summary: System performance is *very* slow with 8GB RAM on 64-bit Intel
Status: REJECTED INVALID
Alias: None
Product: Memory Management
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Andi Kleen
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-08-12 10:34 UTC by Richard Neill
Modified: 2007-09-02 15:18 UTC (History)
1 user (show)

See Also:
Kernel Version: 2.6.22
Subsystem:
Regression: ---
Bisected commit-id:


Attachments

Description Richard Neill 2007-08-12 10:34:24 UTC
On a P35C motherboard (Gigabyte GA-P35C-DS3R), with a core-2 quad CPU (Q6600), and all 4 DIMMs fitted (total of 8GB), the system slows to a crawl. It runs approximately 100x slower than normal. Removing one of the DIMMs restores the expected performance. 

The test I used is:  
  for ((i=0;i<10000;i++)); do echo $i > /dev/null ;done
which takes about 20 seconds to run!

I can get normal performance by appending mem=8318M to the boot parameters, resulting in free -m reporting 7621 MB available. Higher than this, and it is very slow. Alternatively, removing one or more DIMMs will restore normal speed.

I originally reported this as a bug in Ubuntu; see here for more details:   
  https://bugs.launchpad.net/ubuntu/+source/linux-meta/+bug/129172

I've observed the same effect on: 
  The Ubuntu Gutsy (Tribe 3) install CD.
  The latest kernel for Ubuntu Gutsy (x86_64, 2.6.22-9)
  The Gentoo x86_64 install disc (2007)
  The Mandriva 2007 (64-bit) install disc.

It does not occur if I use a 32-bit kernel (but obviously, I can only use 4GB then). I'm booting in single-user mode. The slowdown occurs as soon as the kernel has begun to load; however, GRUB runs just fine. Memtest sees no problems. Once booted into single-user mode (takes 20 minutes!), the system runs normally - it's very slow, but everything still works.

All the diagnostics (dmesg, /var/log/kernel/messages etc) are attached to the Ubuntu bug above.  

Thanks very much for your help - please let me know if there's anything else I can do to test.
Comment 1 Andi Kleen 2007-08-16 13:28:49 UTC
Likely the MTRRs are misconfigured. BIOS bug.

You see slowdown in 32bit too if you start filling up the memory,
correct?

If you look in your handbook does it actually say they tested that much
memory? I bet it doesn't.

You need to ask Gigabyte for a BIOS update or use a different board
that actually supports that much memory.
Comment 2 Richard Neill 2007-08-17 10:24:22 UTC
Hi Andi,

Thanks for your msg. The handbook does state explicitly that the board supports up to 8GB in 4 slots. I'm currently running the F1 BIOS (updating it fails to work, but there is no mention of an MTRR fix in the release notes for the newer ones).

If I boot in 32bit (Knoppix 5.01), with all 4 DIMMs installed, then it runs fine at full speed; however if I boot the Fedora core 7 install CD (32-bit), then it does not. This is **weird**. I don't know why it happens!

Here are the MTRRs (cat /proc/mtrr), with the kernel used, and the value of boot parameter mem=xxx (where specified), and the value reported by free -m.

UBUNTU GUTSY, 2.6.22-9 x86-64, mem=8200, free -m => 7505M, FAST:

reg00: base=0x100000000 (4096MB), size= 512MB: write-back, count=1
reg01: base=0x100000000 (4096MB), size=4096MB: write-back, count=1
reg02: base=0x00000000 (   0MB), size=4096MB: write-back, count=1
reg03: base=0xe0000000 (3584MB), size= 512MB: uncachable, count=1
reg04: base=0xdff00000 (3583MB), size=   1MB: write-through, count=1


UBUNTU GUTSY, 2.6.22-9 x86-64, [mem= not specified], free -m => 8002M, SLOW:

reg00: base=0x100000000 (4096MB), size= 512MB: write-back, count=1
reg01: base=0x100000000 (4096MB), size=4096MB: write-back, count=1
reg02: base=0x00000000 (   0MB), size=4096MB: write-back, count=1
reg03: base=0xe0000000 (3584MB), size= 512MB: uncachable, count=1
reg04: base=0xdff00000 (3583MB), size=   1MB: write-through, count=1


FEDORA CORE 7, x86-32. [mem= not specified]. free -m => 3546M, SLOW:

reg00: base=0x100000000 (4096MB), size= 512MB: write-back, count=1
reg01: base=0x100000000 (4096MB), size=4096MB: write-back, count=1
reg02: base=0x00000000 (   0MB), size=4096MB: write-back, count=1
reg03: base=0xe0000000 (3584MB), size= 512MB: uncachable, count=1
reg04: base=0xdff00000 (3583MB), size=   1MB: write-through, count=1


KNOPPIX 5.01, x86-32. [mem= not specified]. free -m => 3546, FAST:

reg00: base=0x100000000 (4096MB), size= 512MB: write-back, count=1
reg01: base=0x100000000 (4096MB), size=4096MB: write-back, count=1
reg02: base=0x00000000 (   0MB), size=4096MB: write-back, count=1
reg03: base=0xe0000000 (3584MB), size= 512MB: uncachable, count=1
reg04: base=0xdff00000 (3583MB), size=   1MB: write-through, count=1


Are these correct? Is there any way to over-ride them manually if the BIOS is getting it wrong?

Also, if the BIOS is getting it wrong, wouldn't that break memtest86 too?

If it's useful, the manual is here:
http://www.gigabyte.com.tw/Support/Motherboard/BIOS_Model.aspx?ClassValue=Motherboard&ProductID=2551&ProductName=GA-P35C-DS3R

Thanks - Richard
Comment 3 Andi Kleen 2007-08-19 12:47:14 UTC
Then Gigabyte got it wrong anyways.

The full speed issue just depends on if some code or important data
falls onto a uncached area.  Anything uncached is slow.

If you use enough memory in 32bit it'll likely be slow too;
just the different memory layout there saves you by chance.

To say if the MTRRs are correct you would need to compare them
to the e820 map that is printed by the kernel (grep e820 /var/log/boot.msg)
Anything reported there as "usable" needs to be covered by a write-back MTRR.

These MTRRs are set up by the BIOS not Linux so should not change between
kernels.
Comment 4 Richard Neill 2007-08-19 19:29:12 UTC
Thanks - that's a very useful hint. Should I look at the result of 
  grep -r e820 /var/log/kern.log | grep usable 
  (which reports 42 entries (with mem=8200))
or
  dmesg | grep -r e820 | grep usable 
  (which only reports 3?)

If the kernel has got it wrong, is it possible to manually set the MTRRs? 

I'm now going to reboot the system into "slow" mode, and leave it for half an hour while it starts...

Thanks, Richard
Comment 5 Richard Neill 2007-08-19 20:12:12 UTC
In "slow" mode (Ubuntu 64-bit, with no "mem=XXX" specified at boot), I find:

1) rjn@ubuntu:~$ cat /proc/mtrr
reg00: base=0x100000000 (4096MB), size= 512MB: write-back, count=1
reg01: base=0x100000000 (4096MB), size=4096MB: write-back, count=1
reg02: base=0x00000000 (   0MB), size=4096MB: write-back, count=1
reg03: base=0xe0000000 (3584MB), size= 512MB: uncachable, count=1
reg04: base=0xdff00000 (3583MB), size=   1MB: write-through, count=1

2) grep -r e820 /var/log/kern.log
 #matches NOTHING.

3) rjn@ubuntu:~$ dmesg | grep -r e820
[    0.000000]  BIOS-e820: 0000000000000000 - 000000000009e800 (usable)
[    0.000000]  BIOS-e820: 000000000009f800 - 00000000000a0000 (reserved)
[    0.000000]  BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
[    0.000000]  BIOS-e820: 0000000000100000 - 00000000dfee0000 (usable)
[    0.000000]  BIOS-e820: 00000000dfee0000 - 00000000dfee3000 (ACPI NVS)
[    0.000000]  BIOS-e820: 00000000dfee3000 - 00000000dfef0000 (ACPI data)
[    0.000000]  BIOS-e820: 00000000dfef0000 - 00000000dff00000 (reserved)
[    0.000000]  BIOS-e820: 00000000f0000000 - 00000000f4000000 (reserved)
[    0.000000]  BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved)
[    0.000000]  BIOS-e820: 0000000100000000 - 0000000220000000 (usable)

4) rjn@ubuntu:~$ dmesg | grep -r e820 | grep usable
[    0.000000]  BIOS-e820: 0000000000000000 - 000000000009e800 (usable)
[    0.000000]  BIOS-e820: 0000000000100000 - 00000000dfee0000 (usable)
[    0.000000]  BIOS-e820: 0000000100000000 - 0000000220000000 (usable)

i.e.  Usable memory is from  0 -> 634kB, 1MB->3582MB, and 4096MB -> 8GB.

5)Experiment - based on googling + inference, I should (maybe) be able to force the kernel to set writeback MTRRs everywhere? So, I tried:
 echo "base=0x00000000 size=0x9e800 type=write-back" > /proc/mtrr
 echo "base=0x00100000 size=0xdfde0000 type=write-back" > /proc/mtrr
 echo "base=0x100000000 size=0x120000000 type=write-back" > /proc/mtrr
Unfortunately, this doesn't work - I get the message:
  bash: echo: write error: Invalid argument

What next?  
Thanks - Richard
Comment 6 Andi Kleen 2007-08-20 07:45:50 UTC
Must be some ubuntu problem when it matches nothing. Do they perhaps
boot with quiet? You don't need to boot slow, the e820 entries
should be always the same.

You should see something like

<6>BIOS-provided physical RAM map:
<6> BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
<6> BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
<6> BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
<6> BIOS-e820: 0000000000100000 - 00000000ded38000 (usable)
<6> BIOS-e820: 00000000ded38000 - 00000000dee6d000 (ACPI NVS)
<6> BIOS-e820: 00000000dee6d000 - 00000000dfea2000 (usable)
<6> BIOS-e820: 00000000dfea2000 - 00000000dfee9000 (ACPI NVS)

For details on how to change the MTRRs see Documentation/mtrr.txt
If that doesn't work I recommend you contact Gigabyte support.

I'm closing the bug now because it's not a kernel issue from all evidence
so far, but a BIOS bug.
Comment 7 Richard Neill 2007-08-25 17:36:15 UTC
Thanks for your help. Yes, this is a BIOS bug. For reference, here's what's needed to fix it:

1)Get the latest BIOS update from gigabyte. I used this one successfully:
 motherboard_bios_ga-p35c-ds3r_f4g_beta.exe

2)Extract the BIOS with wine (0.9.41 works fine). Gigabyte don't provide the md5sum, so, for info: 
  6f283d38e272ea433b4478f39a6cdb03  P35CDS3R.F4g

3)Copy the BIOS onto a floppy. [The manual claims that a USB key is also supported: the Q-flash utility loads the new image fine, but fails at the last step with "BIOS ID CHECK ERROR".] 

4)Flash BIOS, Load optimized defaults. Enjoy.

5)Once updated, free -m reports 8002 MB free (presumably, the rest is used by the kernel). The MTRRs are:

$ cat /proc/mtrr
reg00: base=0x00000000 (   0MB), size=4096MB: write-back, count=1
reg01: base=0xe0000000 (3584MB), size= 512MB: uncachable, count=1
reg02: base=0x100000000 (4096MB), size=4096MB: write-back, count=1
reg03: base=0x200000000 (8192MB), size= 512MB: write-back, count=1
reg04: base=0xdff00000 (3583MB), size=   1MB: write-through, count=1

And the performance is about 50% better than the best it was before :-)


I'm sure Gigabyte have shipped lots of boards with the original broken BIOS - is there any way for the kernel to print some sort of warning when the MTRRs are wrong?

Thanks once again for your help - Richard
Comment 8 Andi Kleen 2007-09-01 10:19:00 UTC
Yes there is a patch from Jesse Barnes in the works for this.
It didn't make .23 because it causes mysterious boot failures on some machines
unfortunately.

It would just limit the memory to the fast memory though and might lead
to significant less memory available. Fixing the BIOS is always better.


 
Comment 9 Richard Neill 2007-09-02 15:18:16 UTC
If it can be detected, that's most of the problem solved. 
A big fat warning  in /var/log/messages would be a really helpful diagnostic. Then the user would be able to identify the problem, and apply the and know what needs to be fixed.

Perhaps, even if the pending workaround cannot be applied yet, we can still have a warning message?

Note You need to log in before you can comment on or make changes to this bug.