Bug 11953
Summary: | Most current bios version causes major slowdown and crash - Asus m3n laptop | ||
---|---|---|---|
Product: | Memory Management | Reporter: | Tony White (tonywhite100) |
Component: | Other | Assignee: | Andrew Morton (akpm) |
Status: | RESOLVED OBSOLETE | ||
Severity: | high | CC: | alan, rui.zhang |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.27.4 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Bug Depends on: | |||
Bug Blocks: | 56331 | ||
Attachments: |
dmseg
info messages lspci dmidecode acpidump |
Description
Tony White
2008-11-04 15:58:45 UTC
Created attachment 18673 [details]
dmseg
Created attachment 18674 [details]
info
Created attachment 18675 [details]
messages
Created attachment 18676 [details]
lspci
Created attachment 18677 [details]
dmidecode
> Latest working kernel version:2.6.27.4
> Earliest failing kernel version:2.6.27.4
That doesn't make sense. 2.6.27.4 both failed and worked?
We're trying to find out if this is a regression. If if so, when
did it occur?
Created attachment 18678 [details]
acpidump
Sorry, 2.6.27.4 is the only kernel version that I have been able to test this new bios version with. I flashed it last week and It has taken me a week to get a bootable system because of the bug and other things. With the old bios using nolapic I was able to boot flawlessly from ever since I have had the machine. Probably 2.6.22 or somewhere around then. Sorry if this is confusing and I did not make it clear. How far back do you want me to try to see if it is a regression? I have a working build now by omiting : High mem (4gb) Local APIC Support on Uniprocessors Symmetric multiprocessing Support (SMP) From the build. Thinking back, It should be 2.6.27.3 because that's what I was running when I flashed the bios. So difference between 2.6.27.3 & 2.6.27.4 none. Just to clarify... With this latest BIOS, you get the same failure no matter what version of linux you run? You mentioned that earlier you needed to boot with "nolapic", and you stopped doing that because it seemed to be no longer necessary. When you start using "nolapic" again, does the issue go away. If yes, is "nolapictimer" sufficient to work around the issue? The messages attached to comment #1 and comment #2 are not useful, for they do not go back to the start of boot. perhaps you can use dmesg -s 64000 and that will do it? (if not, increase CONFIG_LOG_BUF_SHIFT) Can you describe more what you mean by "crash"? eg. how about a screen shot or a backtrace? Yes, Every kernel I try that has the three kernel options listed above reproduces the same bug. No. Using nolapic with the new bios makes no difference. The bug occurs in exactly the same way now with or without using nolapic as a boot option. nolapictimer also makes no difference when used with either version of the bios. That's as much log as there was, I honestly don't know exactly how to increase the buffer but I know what you are asking me to try. When I say crash, In this instance, I mean that the keyboard and mouse are unresponsive and I can do nothing to enable them. I also mean that the system does nothing for a good long while, Well over thirty minutes in this case. The system froze and after completing the starting of the services it runs at every boot. I don't have the ability to screen print a crashed system unless I use a vm and this bug is not reproduceable there. I don't know how to backtrace, How would I obtain one please? does it work with 'acpi=off'? ping tony. Sorry guys, I've been swamped, I will try with acpi=off but if I remember correctly, That didn't work. Please bare with me, I will post back asap. The same result occurs, acpi=off does not allow the computer to boot. The scrren goes from the grub entry to black screen, No output and appears frozen. Holding down the power button is then required to power down the machine. The latest build I have running is of linux-2.6.27.8 and will only boot if I omit : High mem (4gb) Symmetric multiprocessing Support (SMP) From the build. It seems now that Local APIC Support on Uniprocessors does not need to be omitted. can you try a latest kernel? Latest kernel has some idle/timer related fixes, which might help you system. ping Tony, does the problem still exist in the latest upstream kernel? is this problem related with the comment #23 in bug #11785? If I specify mem=1000M To boot, It will boot. As pointed out in bug #11785 but only if I specify the exact amount of installed RAM. It fails without it, In the same way, It boots very slowly. I used 2.6.29rc6 to test. I know that this machine's maximum RAM capacity is 1000M (1GB) Because I have read the manual for the machine and performed the RAM upgrade from 512M personally. The machine ships with a minimum of 256M installed. 512M, 768M & 1000M total RAM were additional options offered by the manufacturer (Asus,) If that helps at all. Is there a way the kernel can maybe work around this memory bug? I'm using the most recent available bios update for this machine. Would I need to try to convince Asus to release an update and if so, What data can I provide to prove this bug? At least I know how to make it work now. ;) *** Bug 11785 has been marked as a duplicate of this bug. *** This issue doesn't appear to be specific to the ACPI sub-system. This has got even worse since 2.6.29.x. Whereas before (2.6.28.x) I could use mem=1000M or mem=1024M to get a Highem kernel to boot, now with the 2.6.29.x and the 2.6.30 kernels; if I build a kernel with highmem enabled up to 4GB, the kernel will boot but it is very slow, about 20 times or more slower. I mean booting, running x, starting x applications, etc. So everything. If I build a kernel without highmem it will boot at a correct speed and that appears to allow the machine to boot using 2.6.30. The problem essentially is that there was a solution (Specifying mem=) But now there isn't. The workaround no longer works. Any distribution live cd I try such as Fedora or Mandriva which contains a kernel version greater than the last 2.6.28.x kernel displays this problem even using the mem= line now. The machine will boot really slowly and specifying the memory amount appears to be ignored. Taking over ten minutes to boot into x or not booting at all for example. Using a kernel that does not have highmem enabled is the only thing that works, however that means that 129 MB of RAM "Disappears." : Jun 6 06:30:46 m3n kernel: Warning only 895MB will be used. Jun 6 06:30:46 m3n kernel: Use a HIGHMEM enabled kernel. Jun 6 06:30:46 m3n kernel: kernel direct mapping tables up to 37fe9000 @ 10000-16000 Jun 6 06:30:46 m3n kernel: RAMDISK: 37c9a000 - 37fef703 Jun 6 06:30:46 m3n kernel: Allocated new RAMDISK: 005c0000 - 00915703 Jun 6 06:30:46 m3n kernel: Move RAMDISK from 0000000037c9a000 - 0000000037fef702 to 005c0000 - 00915702 Jun 6 06:30:46 m3n kernel: ACPI: RSDP 000F4B50, 0014 (r0 ACPIAM) Jun 6 06:30:46 m3n kernel: ACPI: RSDT 3F740000, 002C (r1 A M I OEMRSDT 3000416 MSFT 97) Jun 6 06:30:46 m3n kernel: ACPI: FACP 3F740200, 0081 (r2 A M I OEMFACP 3000416 MSFT 97) Jun 6 06:30:46 m3n kernel: ACPI: DSDT 3F740300, 72DE (r1 0ABBD 0ABBD001 1 MSFT 2000001) Jun 6 06:30:46 m3n kernel: ACPI: FACS 3F750000, 0040 Jun 6 06:30:46 m3n kernel: ACPI: OEMB 3F750040, 004D (r1 A M I OEMBIOS 3000416 MSFT 97) Jun 6 06:30:46 m3n kernel: 895MB LOWMEM available. Jun 6 06:30:46 m3n kernel: mapped low ram: 0 - 37fe9000 Jun 6 06:30:46 m3n kernel: low ram: 00000000 - 37fe9000 Jun 6 06:30:46 m3n kernel: bootmap 00012000 - 00019000 Jun 6 06:30:46 m3n kernel: (7 early reservations) ==> bootmem [0000000000 - 0037fe9000] Jun 6 06:30:46 m3n kernel: #0 [0000000000 - 0000001000] BIOS data page ==> [0000000000 - 0000001000] Jun 6 06:30:46 m3n kernel: #1 [0000100000 - 00005bc3fc] TEXT DATA BSS ==> [0000100000 - 00005bc3fc] Jun 6 06:30:46 m3n kernel: #2 [00005bd000 - 00005c0000] INIT_PG_TABLE ==> [00005bd000 - 00005c0000] Jun 6 06:30:46 m3n kernel: #3 [000009fc00 - 0000100000] BIOS reserved ==> [000009fc00 - 0000100000] Jun 6 06:30:46 m3n kernel: #4 [0000010000 - 0000012000] PGTABLE ==> [0000010000 - 0000012000] Jun 6 06:30:46 m3n kernel: #5 [00005c0000 - 0000915703] NEW RAMDISK ==> [00005c0000 - 0000915703] Jun 6 06:30:46 m3n kernel: #6 [0000012000 - 0000019000] BOOTMAP ==> [0000012000 - 0000019000] Jun 6 06:30:46 m3n kernel: Zone PFN ranges: Jun 6 06:30:46 m3n kernel: DMA 0x00000010 -> 0x00001000 Jun 6 06:30:46 m3n kernel: Normal 0x00001000 -> 0x00037fe9 Jun 6 06:30:46 m3n kernel: Movable zone start PFN for each node Jun 6 06:30:46 m3n kernel: early_node_map[2] active PFN ranges Jun 6 06:30:46 m3n kernel: 0: 0x00000010 -> 0x0000009f Jun 6 06:30:46 m3n kernel: 0: 0x00000100 -> 0x00037fe9 Jun 6 06:30:46 m3n kernel: On node 0 totalpages: 229240 Jun 6 06:30:46 m3n kernel: free_area_init_node: node 0, pgdat c050b820, node_mem_map c1000200 Jun 6 06:30:46 m3n kernel: DMA zone: 32 pages used for memmap Jun 6 06:30:46 m3n kernel: DMA zone: 0 pages reserved Jun 6 06:30:46 m3n kernel: DMA zone: 3951 pages, LIFO batch:0 Jun 6 06:30:46 m3n kernel: Normal zone: 1760 pages used for memmap Jun 6 06:30:46 m3n kernel: Normal zone: 223497 pages, LIFO batch:31 Jun 6 06:30:46 m3n kernel: Movable zone: 0 pages used for memmap Jun 6 06:30:46 m3n kernel: ACPI: PM-Timer IO Port: 0xe408 This m3n laptop machine is pretty much all intel and asus, there is nothing exotic here and yet again there is nothing in any log that indicates the root cause of any problem. Just visibly extreme slowdown when booting a highmem enabled kernel (If it even actually boots in the first place.) Yeah, memory problem. The only way to get it to boot a linux kernel is to add mem=1001M to the command line, even though the machine has 1024M installed and memtest says 1015M. So guessing works... I've found the data sheet for it seems like maybe lots of useful data? It's the graphics chipset that I think is causing this issue because it's doing funny things with the memory, according to the data sheet. http://www.intel.com/Assets/PDF/datasheet/252615.pdf Section 5. Or more specifically maybe 5.4.1 : 15-MB-16-MB Window, may be causing it? The only other thing I can read in that sheet that might be causing this is in section 5 : "It is the bios or system designer's responsibility to limit system memory population so that adequate PCI High BIOS and APIC space can be allocated." However there is a detailed system address map at figure 7. |