Bug 11953

Summary: Most current bios version causes major slowdown and crash - Asus m3n laptop
Product: Memory Management Reporter: Tony White (tonywhite100)
Component: OtherAssignee: Andrew Morton (akpm)
Status: RESOLVED OBSOLETE    
Severity: high CC: alan, rui.zhang
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.27.4 Subsystem:
Regression: No Bisected commit-id:
Bug Depends on:    
Bug Blocks: 56331    
Attachments: dmseg
info
messages
lspci
dmidecode
acpidump

Description Tony White 2008-11-04 15:58:45 UTC
Latest working kernel version:2.6.27.4
Earliest failing kernel version:2.6.27.4
Distribution:Mandriva, Arch Linux
Hardware Environment:Asus m3n Laptop Model
Software Environment:i686
Problem Description: The system boots very slowly and then crashes after a bios upgrade.
With the old bios version I need to use nolapic to boot at all.
But the system ran fine using the old bios and no lapic.
Using nolapic seems to make no difference with the new bios installed.
With the new bios version installed, The boot process crawls and so does the user land stuff that starts after the kernel has finished.
Also udev hangs when highmem (4gb) is compiled into the kernel.

Steps to reproduce: Boot a kernel on an Asus m3n laptop flashed with most recent bios version (m3n0207a.rom available from the Asus website,) Using a mainline Linux kernel with the following compiled in to it.

High mem (4gb)
Local APIC Support on Uniprocessors
Symmetric multiprocessing Support (SMP)

Please note: This machine is a uniprocessor machine using a pentium-m processor.

I know that for sure I shouldn't use SMP support in the kernel but should it crash?

Everything else, Is it safe assume I should not experience issue in an ideal world?

I have completed a large amount of kernel builds to find out what has caused this bug for this machine and it is definately those three listed parts.
Discovered by process of elimination.

Without the above three things enabled, The kernel works perfectly, Without any noticeable issue.

I am attaching some dmesg logs from the original build that failed here.
Comment 1 Tony White 2008-11-04 16:00:54 UTC
Created attachment 18673 [details]
dmseg
Comment 2 Tony White 2008-11-04 16:01:26 UTC
Created attachment 18674 [details]
info
Comment 3 Tony White 2008-11-04 16:01:40 UTC
Created attachment 18675 [details]
messages
Comment 4 Tony White 2008-11-04 16:05:26 UTC
Created attachment 18676 [details]
lspci
Comment 5 Tony White 2008-11-04 16:06:56 UTC
Created attachment 18677 [details]
dmidecode
Comment 6 Andrew Morton 2008-11-04 16:09:44 UTC
> Latest working kernel version:2.6.27.4
> Earliest failing kernel version:2.6.27.4

That doesn't make sense.  2.6.27.4 both failed and worked?

We're trying to find out if this is a regression.  If if so, when
did it occur?
Comment 7 Tony White 2008-11-04 16:11:51 UTC
Created attachment 18678 [details]
acpidump
Comment 8 Tony White 2008-11-04 16:19:26 UTC
Sorry, 2.6.27.4 is the only kernel version that I have been able to test this new bios version with.

I flashed it last week and It has taken me a week to get a bootable system because of the bug and other things.

With the old bios using nolapic I was able to boot flawlessly from ever since I have had the machine.
Probably 2.6.22 or somewhere around then.

Sorry if this is confusing and I did not make it clear.

How far back do you want me to try to see if it is a regression?
Comment 9 Tony White 2008-11-04 16:21:07 UTC
I have a working build now by omiting :

High mem (4gb)
Local APIC Support on Uniprocessors
Symmetric multiprocessing Support (SMP)

From the build.
Comment 10 Tony White 2008-11-04 16:33:58 UTC
Thinking back, It should be 2.6.27.3 because that's what I was running when I flashed the bios.
So difference between 2.6.27.3 & 2.6.27.4 none.
Comment 11 Len Brown 2008-11-11 21:38:43 UTC
Just to clarify...
With this latest BIOS, you get the same failure no matter
what version of linux you run?

You mentioned that earlier you needed to boot with "nolapic",
and you stopped doing that because it seemed to be no longer
necessary.  When you start using "nolapic" again, does the
issue go away.  If yes, is "nolapictimer" sufficient to
work around the issue?

The messages attached to comment #1 and comment #2 are not
useful, for they do not go back to the start of boot.
perhaps you can use dmesg -s 64000 and that will do it?
(if not, increase CONFIG_LOG_BUF_SHIFT)

Can you describe more what you mean by "crash"?
eg. how about a screen shot or a backtrace?
Comment 12 Tony White 2008-11-15 07:14:18 UTC
Yes, Every kernel I try that has the three kernel options listed above reproduces the same bug.

No. Using nolapic with the new bios makes no difference.
The bug occurs in exactly the same way now with or without using nolapic as a boot option.
nolapictimer also makes no difference when used with either version of the bios.

That's as much log as there was, I honestly don't know exactly how to increase the buffer but I know what you are asking me to try.

When I say crash, In this instance, I mean that the keyboard and mouse are unresponsive and I can do nothing to enable them.
I also mean that the system does nothing for a good long while, Well over thirty minutes in this case.
The system froze and after completing the starting of the services it runs at every boot.

I don't have the ability to screen print a crashed system unless I use a vm and this bug is not reproduceable there.
I don't know how to backtrace, How would I obtain one please?
Comment 13 Shaohua 2008-11-25 18:50:41 UTC
does it work with 'acpi=off'?
Comment 14 Zhang Rui 2008-12-10 18:22:10 UTC
ping tony.
Comment 15 Tony White 2008-12-17 08:16:17 UTC
Sorry guys, I've been swamped, I will try with acpi=off but if I remember correctly, That didn't work.

Please bare with me, I will post back asap.
Comment 16 Tony White 2008-12-17 08:27:14 UTC
The same result occurs, acpi=off does not allow the computer to boot.
The scrren goes from the grub entry to black screen, No output and appears frozen.
Holding down the power button is then required to power down the machine.

The latest build I have running is of linux-2.6.27.8 and will only boot if I omit :

High mem (4gb)
Symmetric multiprocessing Support (SMP)

From the build.
It seems now that Local APIC Support on Uniprocessors does not need to be omitted.
Comment 17 Shaohua 2009-03-03 21:22:55 UTC
can you try a latest kernel? Latest kernel has some idle/timer related fixes, which might help you system.
Comment 18 Zhang Rui 2009-03-11 00:44:27 UTC
ping Tony,
does the problem still exist in the latest upstream kernel?
is this problem related with the comment #23 in bug #11785?
Comment 19 Tony White 2009-03-12 21:47:03 UTC
If I specify mem=1000M To boot, It will boot. As pointed out in bug #11785 but only if I specify the exact amount of installed RAM. It fails without it, In the same way, It boots very slowly.
I used 2.6.29rc6 to test.

I know that this machine's maximum RAM capacity is 1000M (1GB) Because I have read the manual for the machine and performed the RAM upgrade from 512M personally.
The machine ships with a minimum of 256M installed. 512M, 768M & 1000M total RAM were additional options offered by the manufacturer (Asus,) If that helps at all.

Is there a way the kernel can maybe work around this memory bug?

I'm using the most recent available bios update for this machine.
Would I need to try to convince Asus to release an update and if so, What data can I provide to prove this bug?

At least I know how to make it work now. ;)
Comment 20 Zhang Rui 2009-03-12 22:13:26 UTC
*** Bug 11785 has been marked as a duplicate of this bug. ***
Comment 21 Len Brown 2009-03-15 12:42:33 UTC
This issue doesn't appear to be specific to the ACPI sub-system.
Comment 22 Tony White 2009-07-02 09:51:55 UTC
This has got even worse since 2.6.29.x. Whereas before (2.6.28.x) I could use mem=1000M or mem=1024M to get a Highem kernel to boot, now with the 2.6.29.x and the 2.6.30 kernels; if I build a kernel with highmem enabled up to 4GB, the kernel will boot but it is very slow, about 20 times or more slower. I mean booting, running x, starting x applications, etc. So everything.
If I build a kernel without highmem it will boot at a correct speed and that appears to allow the machine to boot using 2.6.30.

The problem essentially is that there was a solution (Specifying mem=) But now there isn't. The workaround no longer works.
Any distribution live cd I try such as Fedora or Mandriva which contains a kernel version greater than the last 2.6.28.x kernel displays this problem even using the mem= line now. The machine will boot really slowly and specifying the memory amount appears to be ignored. Taking over ten minutes to boot into x or not booting at all for example.

Using a kernel that does not have highmem enabled is the only thing that works, however that means that 129 MB of RAM "Disappears." :



Jun  6 06:30:46 m3n kernel: Warning only 895MB will be used.
Jun  6 06:30:46 m3n kernel: Use a HIGHMEM enabled kernel.
Jun  6 06:30:46 m3n kernel: kernel direct mapping tables up to 37fe9000 @ 10000-16000
Jun  6 06:30:46 m3n kernel: RAMDISK: 37c9a000 - 37fef703
Jun  6 06:30:46 m3n kernel: Allocated new RAMDISK: 005c0000 - 00915703
Jun  6 06:30:46 m3n kernel: Move RAMDISK from 0000000037c9a000 - 0000000037fef702 to 005c0000 - 00915702
Jun  6 06:30:46 m3n kernel: ACPI: RSDP 000F4B50, 0014 (r0 ACPIAM)
Jun  6 06:30:46 m3n kernel: ACPI: RSDT 3F740000, 002C (r1 A M I  OEMRSDT   3000416 MSFT       97)
Jun  6 06:30:46 m3n kernel: ACPI: FACP 3F740200, 0081 (r2 A M I  OEMFACP   3000416 MSFT       97)
Jun  6 06:30:46 m3n kernel: ACPI: DSDT 3F740300, 72DE (r1  0ABBD 0ABBD001        1 MSFT  2000001)
Jun  6 06:30:46 m3n kernel: ACPI: FACS 3F750000, 0040
Jun  6 06:30:46 m3n kernel: ACPI: OEMB 3F750040, 004D (r1 A M I  OEMBIOS   3000416 MSFT       97)
Jun  6 06:30:46 m3n kernel: 895MB LOWMEM available.
Jun  6 06:30:46 m3n kernel:   mapped low ram: 0 - 37fe9000
Jun  6 06:30:46 m3n kernel:   low ram: 00000000 - 37fe9000
Jun  6 06:30:46 m3n kernel:   bootmap 00012000 - 00019000
Jun  6 06:30:46 m3n kernel: (7 early reservations) ==> bootmem [0000000000 - 0037fe9000]
Jun  6 06:30:46 m3n kernel:   #0 [0000000000 - 0000001000]   BIOS data page ==> [0000000000 - 0000001000]
Jun  6 06:30:46 m3n kernel:   #1 [0000100000 - 00005bc3fc]    TEXT DATA BSS ==> [0000100000 - 00005bc3fc]
Jun  6 06:30:46 m3n kernel:   #2 [00005bd000 - 00005c0000]    INIT_PG_TABLE ==> [00005bd000 - 00005c0000]
Jun  6 06:30:46 m3n kernel:   #3 [000009fc00 - 0000100000]    BIOS reserved ==> [000009fc00 - 0000100000]
Jun  6 06:30:46 m3n kernel:   #4 [0000010000 - 0000012000]          PGTABLE ==> [0000010000 - 0000012000]
Jun  6 06:30:46 m3n kernel:   #5 [00005c0000 - 0000915703]      NEW RAMDISK ==> [00005c0000 - 0000915703]
Jun  6 06:30:46 m3n kernel:   #6 [0000012000 - 0000019000]          BOOTMAP ==> [0000012000 - 0000019000]
Jun  6 06:30:46 m3n kernel: Zone PFN ranges:
Jun  6 06:30:46 m3n kernel:   DMA      0x00000010 -> 0x00001000
Jun  6 06:30:46 m3n kernel:   Normal   0x00001000 -> 0x00037fe9
Jun  6 06:30:46 m3n kernel: Movable zone start PFN for each node
Jun  6 06:30:46 m3n kernel: early_node_map[2] active PFN ranges
Jun  6 06:30:46 m3n kernel:     0: 0x00000010 -> 0x0000009f
Jun  6 06:30:46 m3n kernel:     0: 0x00000100 -> 0x00037fe9
Jun  6 06:30:46 m3n kernel: On node 0 totalpages: 229240
Jun  6 06:30:46 m3n kernel: free_area_init_node: node 0, pgdat c050b820, node_mem_map c1000200
Jun  6 06:30:46 m3n kernel:   DMA zone: 32 pages used for memmap
Jun  6 06:30:46 m3n kernel:   DMA zone: 0 pages reserved
Jun  6 06:30:46 m3n kernel:   DMA zone: 3951 pages, LIFO batch:0
Jun  6 06:30:46 m3n kernel:   Normal zone: 1760 pages used for memmap
Jun  6 06:30:46 m3n kernel:   Normal zone: 223497 pages, LIFO batch:31
Jun  6 06:30:46 m3n kernel:   Movable zone: 0 pages used for memmap
Jun  6 06:30:46 m3n kernel: ACPI: PM-Timer IO Port: 0xe408



This m3n laptop machine is pretty much all intel and asus, there is nothing exotic here and yet again there is nothing in any log that indicates the root cause of any problem. Just visibly extreme slowdown when booting a highmem enabled kernel (If it even actually boots in the first place.)
Comment 23 Tony White 2009-11-08 18:56:39 UTC
Yeah, memory problem. The only way to get it to boot a linux kernel is to add mem=1001M to the command line, even though the machine has 1024M installed and memtest says 1015M.
So guessing works...
Comment 24 Tony White 2009-11-17 15:03:12 UTC
I've found the data sheet for it seems like maybe lots of useful data? It's the graphics chipset that I think is causing this issue because it's doing funny things with the memory, according to the data sheet.

http://www.intel.com/Assets/PDF/datasheet/252615.pdf

Section 5. Or more specifically maybe 5.4.1 : 15-MB-16-MB Window, may be causing it?

The only other thing I can read in that sheet that might be causing this is in section 5 :
"It is the bios or system designer's responsibility to limit system memory population so that adequate PCI High BIOS and APIC space can be allocated."

However there is a detailed system address map at figure 7.