Bug 6139

Summary: Kernel 2.6.* -686 system very slow on a system with 1GB of memory: caching bug
Product: Memory Management Reporter: Mark Sandler (sandler)
Component: OtherAssignee: Zwane Mwaikambo (zwane)
Status: REJECTED WILL_NOT_FIX    
Severity: normal CC: bunk, protasnb, zwane
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.22-14 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: dmesg contents

Description Mark Sandler 2006-02-27 12:20:44 UTC
Most recent kernel where this bug did not occur:
Distribution: 2.4.x, and 2.6.x-386 
Hardware Environment: Asus M3N, 1GB of RAM
Software Environment: Ubuntu (breezy badger),  kernel 2.6.12-686

Problem Description:

After upgrading to 2.6.x  kernel on
Debian/Ubuntu Linux system, the  computer has became very slow. And i
mean REALLY slow -- it was taking 50 minutes just to boot.

Reason

Apparently the problem is that there is a bug in BIOS (quite popular),
which prevents memory region from 1008 to 1016 from being cached, and
linux is using that memory region quite actively.  Given that updates
for bios are not always available, i thought it would be nice if kernel
would fix this automatically. Below is my adhoc solution and some further
problems i encounter, which someone might find useful. 

Ad hoc Solution:

mem=1008M added to kernel options  Now, everything works just fine. But in my
case sound, pcmcia and few other things including
acpi  stopped working.

Ad hoc improvement

Detailed study of the output of lspci command  tells us that sound card uses
memory region from 1016 to 1024MB... Apparently when one specifies mem=1008M,
kernel can not access that region and <sigh> no sound...

The "hackish" solution is to turn off the the mem_4G option in the
kernel. For that one needs to recompile kernel with Mem4G option turned
off... Note, that in that case some top 100M will be inaccessible for
programs - but i guess it doesn't really matter. Interestingly enough,
the sound card still continues to use 1016-1024M region, and so once
one recompiles the  kernel, s/he _has_ to remove the offending mem=1008
from the menu.lst....

I guess alternative would be to try to reconfigure sound not to use the
region above 1008 -- but i have not tried that.


Steps to reproduce:
Find a computer with 1GB of memory, check if the bios has lapse in caching
coverage (cat /proc/mtrr), if not, try another computer. 

Install kernel 2.6.12 with Mem_4g function turned on. Reboot, go out for dinner,
come back, try to log in if it is already booted, try to execute some processor
extensive script, see the difference.
Comment 1 Andrew Morton 2006-02-27 14:58:38 UTC
Could you attach the full `dmesg -s 1000000' output, please?  And
the contents of /proc/mtrr.

We seem to have more of these mtrr problems in 2.6 than we did in
2.4 and I continue to suspect that we broke something.

You'll probably find that you can fix things up in early init
by reading Documentation/mtrr.txt and playing with the setting.
Comment 2 Zwane Mwaikambo 2006-02-27 15:24:40 UTC
Could you test the latest -mm kernel and set the VMSPLIT_2G option, it should
fail in the very same way. Also do you have an onboard/builtin to motherboard
video card? If so, which "UMA" memory size options do you have in your BIOS?
Comment 3 Mark Sandler 2006-02-27 18:26:20 UTC
Here is /proc/mtrr:

reg00: base=0x00000000 (   0MB), size= 512MB: write-back, count=1
reg01: base=0x20000000 ( 512MB), size= 256MB: write-back, count=1
reg02: base=0x30000000 ( 768MB), size= 128MB: write-back, count=1
reg03: base=0x38000000 ( 896MB), size=  64MB: write-back, count=1
reg04: base=0x3c000000 ( 960MB), size=  32MB: write-back, count=1
reg05: base=0x3e000000 ( 992MB), size=  16MB: write-back, count=1
reg06: base=0x3f800000 (1016MB), size=   8MB: write-combining, count=1
reg07: base=0xf0000000 (3840MB), size= 128MB: write-combining, count=4

Yes, I have standard Intel 82852/855GM card, 
I will check UMA  next time i reboot, but i believe it is 128MB.

Dmesg -- do you want me to send the contents of it with the old kernel and
mem_4g turned on? 

The current output is:
re is not ready [0x1][0x700300]
[4294707.128000] codec_semaphore: semaphore is not ready [0x1][0x700300]
... same thing repeated many-many times....
[4294707.390000] codec_semaphore: semaphore is not ready [0x1][0x700300]

[4294707.440000] intel8x0_measure_ac97_clock: measured 49311 usecs
[4294707.440000] intel8x0: clocking to 48000
[4294707.840000] codec_semaphore: semaphore is not ready [0x1][0x700300]
[4294707.840000] codec_write 0: semaphore is not ready for register 0x2
[4294707.862000] ieee80211_crypt: registered algorithm 'NULL'
[4294707.878000] ieee80211: 802.11 data/management/control stack, 1.0.3
[4294707.878000] ieee80211: Copyright (C) 2004-2005 Intel Corporation
<jketreno@linux.intel.com>
[4294707.950000] ipw2100: Intel(R) PRO/Wireless 2100 Network Driver, 1.1.2
[4294707.950000] ipw2100: Copyright(c) 2003-2005 Intel Corporation
[4294707.955000] ACPI: PCI Interrupt 0000:01:04.0[A] -> Link [LNKC] -> GSI 4
(level, low) -> IRQ 4
[4294707.955000] ipw2100: Detected Intel PRO/Wireless 2100 Network Connection
[4294708.378000] Linux Kernel Card Services
[4294708.378000]   options:  [pci] [cardbus] [pm]
[4294708.398000] ACPI: PCI Interrupt 0000:01:05.0[A] -> Link [LNKB] -> GSI 11
(level, low) -> IRQ 11
[4294708.398000] Yenta: CardBus bridge found at 0000:01:05.0 [1043:1744]
[4294708.518000] Yenta: ISA IRQ mask 0x0000, PCI irq 11
[4294708.518000] Socket status: 30000006
[4294708.596000] ohci1394: $Rev: 1250 $ Ben Collins <bcollins@debian.org>
[4294708.596000] ACPI: PCI Interrupt 0000:01:05.1[B] -> Link [LNKA] -> GSI 11
(level, low) -> IRQ 11
[4294708.650000] ohci1394: fw-host0: OHCI-1394 1.0 (PCI): IRQ=[11] 
MMIO=[fe8ff000-fe8ff7ff]  Max Packet=[2048]
[4294708.729000] e100: Intel(R) PRO/100 Network Driver, 3.4.8-k2-NAPI
[4294708.729000] e100: Copyright(c) 1999-2005 Intel Corporation
[4294708.730000] ACPI: PCI Interrupt Link [LNKE] enabled at IRQ 11
[4294708.730000] ACPI: PCI Interrupt 0000:01:08.0[A] -> Link [LNKE] -> GSI 11
(level, low) -> IRQ 11
[4294708.754000] e100: eth1: e100_probe: addr 0xfe8fe000, irq 11, MAC addr
00:0C:6E:AF:AE:4F
[4294709.887000] Real Time Clock Driver v1.12
[4294709.916000] ieee1394: Host added: ID:BUS[0-00:1023]  GUID[00e0180003104ad3]
[4294710.022000] input: PC Speaker
[4294710.245000] hdc: ATAPI 24X DVD-ROM CD-R/RW drive, 2048kB Cache
[4294710.245000] Uniform CD-ROM driver Revision: 3.20
[4294711.813000] NET: Registered protocol family 17
[4294715.425000] NET: Registered protocol family 10
[4294715.425000] Disabled Privacy Extensions on device c02d91c0(lo)
[4294715.425000] IPv6 over IPv4 tunneling driver
[4294716.507000] ACPI: AC Adapter [AC0] (off-line)
[4294716.534000] Asus Laptop ACPI Extras version 0.29
[4294716.537000]   M3N model detected, supported
[4294716.634000] ACPI: Battery Slot [BAT0] (battery present)
[4294716.636000] ACPI: Battery Slot [BAT1] (battery absent)
[4294716.652000] ACPI: Power Button (FF) [PWRF]
[4294716.652000] ACPI: Sleep Button (CM) [SLPB]
[4294716.652000] ACPI: Lid Switch [LID]
[4294716.805000] ACPI: Video Device [VGA] (multi-head: yes  rom: no  post: no)
[4294721.735000] [drm] Initialized drm 1.0.0 20040925
[4294721.749000] ACPI: PCI Interrupt 0000:00:02.0[A] -> Link [LNKA] -> GSI 11
(level, low) -> IRQ 11
[4294721.753000] [drm] Initialized i915 1.2.0 20040405 on minor 0: Intel
Corporation 82852/855GM Integrated Graphics Device
[4294721.771000] [drm] Initialized i915 1.2.0 20040405 on minor 1: Intel
Corporation 82852/855GM Integrated Graphics Device (#2)
[4294721.771000] mtrr: base(0xf0020000) is not aligned on a size(0x834000) boundary
[4294725.494000] apm: BIOS version 1.2 Flags 0x03 (Driver version 1.16ac)
[4294725.494000] apm: overridden by ACPI.
[4294725.878000] eth0: no IPv6 routers present
[4294727.214000] speedstep_centrino: disagrees about version of symbol struct_module
[4294728.781000] Bluetooth: Core ver 2.7
[4294728.781000] NET: Registered protocol family 31
[4294728.781000] Bluetooth: HCI device and connection manager initialized
[4294728.781000] Bluetooth: HCI socket layer initialized
[4294728.818000] Bluetooth: L2CAP ver 2.7
[4294728.818000] Bluetooth: L2CAP socket layer initialized
[4294728.862000] Bluetooth: RFCOMM ver 1.5
[4294728.862000] Bluetooth: RFCOMM socket layer initialized
[4294728.862000] Bluetooth: RFCOMM TTY layer initialized
[4294757.464000] codec_semaphore: semaphore is not ready [0x1][0x700300]
[4294757.464000] codec_write 0: semaphore is not ready for register 0x2c
[4294757.698000] codec_semaphore: semaphore is not ready [0x1][0x700300]
[4294757.698000] codec_write 0: semaphore is not ready for register 0x2c
[4294757.943000] codec_semaphore: semaphore is not ready [0x1][0x700300]
[4294757.943000] codec_write 0: semaphore is not ready for register 0x2c
[4294840.644000] atkbd.c: Unknown key released (translated set 2, code 0xaa on
isa0060/serio0).
[4294840.644000] atkbd.c: Use 'setkeycodes e02a <keycode>' to make it known.
[4295498.399000] atkbd.c: Unknown key released (translated set 2, code 0xaa on
isa0060/serio0).
[4295498.399000] atkbd.c: Use 'setkeycodes e02a <keycode>' to make it known.
[4295498.607000] atkbd.c: Unknown key pressed (translated set 2, code 0xaa on
isa0060/serio0).
[4295498.607000] atkbd.c: Use 'setkeycodes e02a <keycode>' to make it known.
[4295691.566000] atkbd.c: Unknown key released (translated set 2, code 0xaa on
isa0060/serio0).
[4295691.566000] atkbd.c: Use 'setkeycodes e02a <keycode>' to make it known.
[4295691.693000] atkbd.c: Unknown key pressed (translated set 2, code 0xaa on
isa0060/serio0).
[4295691.693000] atkbd.c: Use 'setkeycodes e02a <keycode>' to make it known.
[4295691.846000] atkbd.c: Unknown key released (translated set 2, code 0xaa on
isa0060/serio0).
[4295691.846000] atkbd.c: Use 'setkeycodes e02a <keycode>' to make it known.
[4295691.998000] atkbd.c: Unknown key pressed (translated set 2, code 0xaa on
isa0060/serio0).
[4295691.998000] atkbd.c: Use 'setkeycodes e02a <keycode>' to make it known.
[4295692.126000] atkbd.c: Unknown key released (translated set 2, code 0xaa on
isa0060/serio0).
[4295692.126000] atkbd.c: Use 'setkeycodes e02a <keycode>' to make it known.
[4295692.278000] atkbd.c: Unknown key pressed (translated set 2, code 0xaa on
isa0060/serio0).
[4295692.278000] atkbd.c: Use 'setkeycodes e02a <keycode>' to make it known.
Comment 4 Zwane Mwaikambo 2006-02-27 22:35:35 UTC
I am fairly certain it's connected to the onboard video card memory mapping
types. Can you try a combination of onboard video memory size and report what
occurs?
Comment 5 Zwane Mwaikambo 2006-02-28 08:11:38 UTC
Yes, please send dmesg with mem_4g, thanks.
Comment 6 Mark Sandler 2006-03-03 08:22:42 UTC
Created attachment 7504 [details]
dmesg contents

Here is a dmesg when the system booted very slowly (actually under ubuntu it
takes only 10 minutes to boot (normally <2), under debian it was about 45...)
Comment 7 Mark Sandler 2006-03-03 08:24:05 UTC
i can't change it the amount of video memory from bios for some reason... Is
there any way i can do it once os loads? 
Comment 8 Zwane Mwaikambo 2006-03-07 14:18:44 UTC
Not that i know of, that's unfortunate as that would have helped narrow things
down.Have you also tried manually setting the 8MB region as writeback?
Comment 9 Natalie Protasevich 2007-07-03 18:23:05 UTC
Mark, does the problem still exist with latest kernel?
The dmesg output that you provided doesn't have e820 map unfortunately.
Is it possible to collect whole boot trace (with serial console) and provide one for 2.6 and another one with older version when it works fine? Also, mtrr for both cases?
Thanks.
Comment 10 Adrian Bunk 2007-09-18 11:27:39 UTC
Please reopen this bug if:
- it is still present with kernel 2.6.22 and
- you can provide the requested information.
Comment 11 Mark Sandler 2009-02-08 19:05:26 UTC
This bug is still  present as of 2.6.22.14. I would be happy to provide any info needed. 
Comment 12 Alan 2009-03-17 08:53:44 UTC
There isn't really much the kernel can do about this as it simply doesn't know when the BIOS is misconfiguring the registers. You can however hand reconfigure them via /proc/mtrr

Closing ss WILL_NOT_FIX therefore