Latest working kernel version: 2.6.27.5 Earliest failing kernel version: 2.6.27.6 Distribution: Gentoo vanilla-sources Hardware Environment: AMD Software Environment: vmware-workstation 6.5 Problem Description: 2.6.27.6 vmware guest panics on boot with CONFIG_VMI=Y Steps to reproduce: Compile 2.6.27.6 with CONFIG_VMI=Y and boot as guest in vmware-workstation 6.5. Kernel panics Int 14 CR2 Bisection between 2.6.27.5 and 2.6.27.6 gives 5c371b31be32033b0a4a993431484da8a2305369 is first bad commit commit 5c371b31be32033b0a4a993431484da8a2305369 Author: Yinghai Lu <yhlu.kernel@gmail.com> Date: Mon Sep 22 02:52:26 2008 -0700 x86: fix CONFIG_X86_RESERVE_LOW_64K=y commit 2216d199b1430d1c0affb1498a9ebdbd9c0de439 upstream The bad_bios_dmi_table() quirk never triggered because we do DMI setup too late. Move it a bit earlier. Also change the CONFIG_X86_RESERVE_LOW_64K quirk to operate on the e820 table directly instead of messing with early reservations - this handles overlaps (which do occur in this low range of RAM) more gracefully. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> :040000 040000 b7b81ffb62eddf60c2d8545a61566f0d34c1b2a9 858d983687c53db5304015a245ee0c23f10c266d M arch See http://bugs.gentoo.org/show_bug.cgi?id=249751
For completeness, here is the output from the VMI-enabled vmware guest as it crashes on boot: Decompressing Linux... Parsing ELF... done. Booting the kernel. BUG: Int 14: CR2 fbe00000 EDI c05b1f98 ESI fbe00000 EBP 00a6e003 ESP c05b1f7c EBX c05b1f98 EDX 0000000e ECX 00000003 EAX fbe00000 err 00000000 EIP c05db95c CS 00000062 flg 00010092 Stack: c00cc618 c00cc625 00000003 00000000 00000000 00000563 c05b1ff8 fbe00000 fbe10000 fbe00000 c05dba7e c05b1ff8 c05b1ff8 00646513 00609000 c05bac50 00000800 00099d00 c059a000 00a6e003 00000800 00099d00 c059a000 c05b66d2
I suspect that regression is caused by putting "dmi_scan_machine" BEFORE * NOTE: On x86-32, only from this point on, fixmaps are ready for use in arch/x86/kernel/setup.c See http://lkml.org/lkml/2008/8/7/298 http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.27.y.git;a=commit;h=3a6ddd5f18405ca92e004416af8ed44b9c9783d7 http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.27.y.git;a=commit;h=5c371b31be32033b0a4a993431484da8a2305369 Maybe just moving those two "dmi_*" lines AFTER the comment would solve the problem.
can you check 2.6.28-rc7 etc? it seems there is one following up patch for xen guest...in mainline
(In reply to comment #3) > can you check 2.6.28-rc7 etc? > it seems there is one following up patch for xen guest...in mainline Though we can of course test a lot of different kernels it might be interesting to know which of the patches added to 2.6.18-r7 might have solved the problem ... and what xen patch you are refering to. Thanks Axel
http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.27.y.git;a=commit;h=996d332bda837c93350e7f0ef4b85b90e4eec73f is the patch being referred to.
Ah, I see. But this means it's in 2.6.27.8 already.
(In reply to comment #3) > can you check 2.6.28-rc7 etc? > it seems there is one following up patch for xen guest...in mainline > 2.6.28-rc7 also panics with Int 14 CR2
please use gdb to check what is the code around c05db95c
Created attachment 19170 [details] System map of of 2.6.27.6 I'm not sure how to use gdb to get that from a kernel, however I have attached the System.map and crash screen (below) of the 2.6.27.6 kernel with "# CONFIG_X86_RESERVE_LOW_64K is not set" From Systme.map c05d1b80 t dmi_present c05d1c80 T dmi_scan_machine So looks with EIP c05d1ba0 like dmi_present+x20 Decompressing Linux... Parsing ELF... done. Booting the kernel. BUG: Int 14: CR2 fbe00000 EDI c05a5f70 ESI fbe00000 EBP c05a5f8c ESP c05a5f54 EBX c05a5f70 EDX 0000000e ECX 00000003 EAX fbe00000 err 00000000 EIP c05d1ba0 CS 00000062 flg 00010046 Stack: c05be709 00000163 80000000 00000000 00000001 000001ff 00000000 fbe00000 fbe10000 fbe00000 c05a5fa4 c05d1cb7 00000001 c05a5fdc 00000000 c05a2f94 c05a5fc8 c05aecc1 c04ee7e6 00099d00 c0597000 c05a5fc8 c04ee7e6 00099d00 VGA: Screenshot done.
After experimenting with gdb I got: Dump of assembler code for function dmi_present: 0xc05d1b80 <dmi_present+0>: push %ebp 0xc05d1b81 <dmi_present+1>: mov %esp,%ebp 0xc05d1b83 <dmi_present+3>: sub $0x28,%esp 0xc05d1b86 <dmi_present+6>: mov %ebx,-0xc(%ebp) 0xc05d1b89 <dmi_present+9>: mov %esi,-0x8(%ebp) 0xc05d1b8c <dmi_present+12>: mov %edi,-0x4(%ebp) 0xc05d1b8f <dmi_present+15>: call 0xc0104318 <mcount> 0xc05d1b94 <dmi_present+20>: lea -0x1c(%ebp),%ebx 0xc05d1b97 <dmi_present+23>: mov %eax,%esi 0xc05d1b99 <dmi_present+25>: mov $0x3,%ecx 0xc05d1b9e <dmi_present+30>: mov %ebx,%edi 0xc05d1ba0 <dmi_present+32>: rep movsl %ds:(%esi),%es:(%edi) 0xc05d1ba2 <dmi_present+34>: mov $0xf,%ecx 0xc05d1ba7 <dmi_present+39>: and $0x3,%ecx 0xc05d1baa <dmi_present+42>: je 0xc05d1bae <dmi_present+46> 0xc05d1bac <dmi_present+44>: rep movsb %ds:(%esi),%es:(%edi) 0xc05d1bae <dmi_present+46>: cld 0xc05d1baf <dmi_present+47>: mov $0xc0520bfc,%edi 0xc05d1bb4 <dmi_present+52>: mov $0x5,%ecx 0xc05d1bb9 <dmi_present+57>: mov %ebx,%esi 0xc05d1bbb <dmi_present+59>: repz cmpsb %es:(%edi),%ds:(%esi) 0xc05d1bbd <dmi_present+61>: je 0xc05d1bd3 <dmi_present+83> 0xc05d1bbf <dmi_present+63>: mov $0x1,%edx 0xc05d1bc4 <dmi_present+68>: mov -0xc(%ebp),%ebx 0xc05d1bc7 <dmi_present+71>: mov %edx,%eax 0xc05d1bc9 <dmi_present+73>: mov -0x8(%ebp),%esi 0xc05d1bcc <dmi_present+76>: mov -0x4(%ebp),%edi 0xc05d1bcf <dmi_present+79>: mov %ebp,%esp 0xc05d1bd1 <dmi_present+81>: pop %ebp 0xc05d1bd2 <dmi_present+82>: ret 0xc05d1bd3 <dmi_present+83>: mov %ebx,%eax 0xc05d1bd5 <dmi_present+85>: call 0xc05d1510 <dmi_checksum> 0xc05d1bda <dmi_present+90>: test %eax,%eax 0xc05d1bdc <dmi_present+92>: je 0xc05d1bbf <dmi_present+63> Does that look right?
Created attachment 19184 [details] Patch against 2.6.27.6 that moves those dmi_... lines after * NOTE ... Now, that it seems that kernel crashes when calling "dmi_scan_machine" would you mind applying the attached patch against 2.6.27.6 and check, if this solves the problem?
dmi_scan_machine will use dmi_ioremap and it is early_ioremap, we aready coulde use that after early_ioremap_init() so we don't need to move that later
(In reply to comment #11) > Created an attachment (id=19184) [details] > Patch against 2.6.27.6 that moves those dmi_... lines after * NOTE ... > > Now, that it seems that kernel crashes when calling "dmi_scan_machine" > would you mind applying the attached patch against 2.6.27.6 and > check, if this solves the problem? The patch resolves this issue. With the patch applied 2.6.27.6 boots to completion (logged into kde) without error.
Hmm, I have to admit that I've almost no understanding of (early) memory management on linux. Just by reading through http://lkml.org/lkml/2008/8/7/298 made me think about what would happen, if vmi_init() moves down the so-called FIXMAP area by 64MB.
Alok, Zach, can you help? thanks
This patch doesn't look right, moving dmi_scan_machine earlier fixes a real bug and should not be reverted. I believe the issue is an early_ioremap bug with VMI that was never exposed before.
(In reply to comment #16) > This patch doesn't look right, moving dmi_scan_machine earlier fixes a real > bug > and should not be reverted. I believe the issue is an early_ioremap bug with > VMI that was never exposed before. > Thanks for getting back on this bug. dmi_scan_machine used to be late in the code (later than my patch puts it), but it has been moved a few lines up. Probably just to be BEFORE dmi_check_system. Actually I don't know, because I'm no expert in this early_ioremap stuff ... BTW, there's another VMI bug that we (at gentoo) haven't been able yet to bisect down to a single commit. For this might be related stuff, I would like to point you to http://bugs.gentoo.org/show_bug.cgi?id=250094 Anyway, any help you could give to help resolve those bugs is greatly appreciated. Thanks
It seems you are not alone ... http://lkml.org/lkml/2008/12/9/13
I wish someone had contacted me about this earlier... the fix is quite easy. I have two fixes actually, I'll send out both and see which is preferred, they both have different tradeoffs as far as risks for future breakages. I should test the fixes before making such broad statements as calling them fixes, however.
(In reply to comment #19) > I wish someone had contacted me about this earlier... the fix is quite easy. > I > have two fixes actually, I'll send out both and see which is preferred, they > both have different tradeoffs as far as risks for future breakages. > > I should test the fixes before making such broad statements as calling them > fixes, however. > Great!
Created attachment 19262 [details] Patch to fix boot time ioremap crash with VMI I've sent the attached patch upstream for inclusion in 2.6.28.
Thanks. Great!
I do have exactly the same problem but i am not using VMware. I am using a Dual QuadCore Opteron using a Tyan S5376 mainboard with latest Bios 3.04 (AMI Bios). Any kernel later than 2.6.27.4 crashes immediatly so some of these x86 64k memory patches must have broken something. If i can assist on find the problem please let me know. Current 2.6.28.4 also crashes. Enabling/disabling CONFIG_X86_RESERVE_LOW_64K has no effect. # dmidecode 2.8 SMBIOS 2.5 present. 81 structures occupying 2869 bytes. Table at 0x000FCD70. Handle 0x0000, DMI type 0, 24 bytes BIOS Information Vendor: American Megatrends Inc. Version: 'V3.04 ' Release Date: 12/25/2008 Address: 0xF0000 Runtime Size: 64 kB ROM Size: 1024 kB Characteristics: PCI is supported PNP is supported APM is supported BIOS is upgradeable BIOS shadowing is allowed ESCD support is available Boot from CD is supported Selectable boot is supported BIOS ROM is socketed EDD is supported 5.25"/1.2 MB floppy services are supported (int 13h) 3.5"/720 KB floppy services are supported (int 13h) 3.5"/2.88 MB floppy services are supported (int 13h) Print screen service is supported (int 5h) 8042 keyboard services are supported (int 9h) Serial services are supported (int 14h) Printer services are supported (int 17h) CGA/mono video services are supported (int 10h) ACPI is supported USB legacy is supported LS-120 boot is supported ATAPI Zip drive boot is supported BIOS boot specification is supported Targeted content distribution is supported BIOS Revision: 8.14 Handle 0x0001, DMI type 1, 27 bytes System Information Manufacturer: empty Product Name: empty Version: empty Serial Number: empty UUID: 00020003-0004-0005-0006-000700080009 Wake-up Type: Power Switch SKU Number: To Be Filled By O.E.M. Family: Embedded
(In reply to comment #23) > I do have exactly the same problem but i am not using VMware. It's highly unlikely to be the same problem. This problem was only possible if using high address space reservation for VMI. Any chance you can bisect this down to the bad commit?
On Tue, Feb 10, 2009 at 7:32 AM, <bugme-daemon@bugzilla.kernel.org> wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=12167 > > > > > > ------- Comment #23 from jonas.frey@gmx.de 2009-02-10 07:32 ------- > I do have exactly the same problem but i am not using VMware. > I am using a Dual QuadCore Opteron using a Tyan S5376 mainboard with latest > Bios 3.04 (AMI Bios). Any kernel later than 2.6.27.4 crashes immediatly so > some > of these x86 64k memory patches must have broken something. If i can assist > on > find the problem please let me know. Current 2.6.28.4 also crashes. > Enabling/disabling CONFIG_X86_RESERVE_LOW_64K has no effect. can you post bootlog before 2.6.27.4? please add debug in command line. YH
(as discussed off bug with Zachary a while back I've now put together a test dmi robustness patch for corrupt dmi tables as promised)