Bug 10232

Summary: intel mtrr fixups apparently broke display and e1000 probe
Product: Platform Specific/Hardware Reporter: Stephen Gran (steve)
Component: x86-64Assignee: platform_x86_64 (platform_x86_64)
Status: CLOSED CODE_FIX    
Severity: normal CC: bunk, jbarnes, yhlu.kernel
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.25-rc4 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 9832    
Attachments: fix trimming #2
more generic patch

Description Stephen Gran 2008-03-12 08:37:16 UTC
Latest working kernel version:2.6.24.3
Earliest failing kernel version:2.6.25-rc3
Distribution:Debian
Hardware Environment:
00:00.0 Host bridge: Intel Corporation 82P965/G965 Memory Controller Hub (rev 02)
00:02.0 VGA compatible controller: Intel Corporation 82G965 Integrated Graphics Controller (rev 02)
00:03.0 Communication controller: Intel Corporation 82P965/G965 HECI Controller (rev 02)
00:19.0 Ethernet controller: Intel Corporation 82566DC Gigabit Network Connection (rev 02)
00:1a.0 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Contoller #4 (rev 02)
00:1a.1 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #5 (rev 02)
00:1a.7 USB Controller: Intel Corporation 82801H (ICH8 Family) USB2 EHCI Controller #2 (rev 02)
00:1b.0 Audio device: Intel Corporation 82801H (ICH8 Family) HD Audio Controller (rev 02)
00:1c.0 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 1 (rev 02)
00:1c.1 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 2 (rev 02)
00:1c.2 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 3 (rev 02)
00:1c.3 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 4 (rev 02)
00:1c.4 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 5 (rev 02)
00:1d.0 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #1 (rev 02)
00:1d.1 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #2 (rev 02)
00:1d.2 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #3 (rev 02)
00:1d.7 USB Controller: Intel Corporation 82801H (ICH8 Family) USB2 EHCI Controller #1 (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev f2)
00:1f.0 ISA bridge: Intel Corporation 82801HB/HR (ICH8/R) LPC Interface Controller (rev 02)
00:1f.2 IDE interface: Intel Corporation 82801H (ICH8 Family) 4 port SATA IDE Controller (rev 02)
00:1f.3 SMBus: Intel Corporation 82801H (ICH8 Family) SMBus Controller (rev 02)
00:1f.5 IDE interface: Intel Corporation 82801H (ICH8 Family) 2 port SATA IDE Controller (rev 02)
02:00.0 IDE interface: Marvell Technology Group Ltd. 88SE6101 single-port PATA133 interface (rev b1)
06:03.0 FireWire (IEEE 1394): Texas Instruments TSB43AB22/A IEEE-1394a-2000 Controller (PHY/Link)

Software Environment:Debian unstable
Problem Description: Since the introduction of the mtrr fixup patches 
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=99fc8d424bc5d803fe92cad56c068fe64e73747a

my machine no longer is able to have a working display or network card.  The e1000 probe fails with error 5.  The display is less helpful - no error is reported, but I can't actually see anything on the screen.  As this machine doesn't have working serial on it at the moment, my steps are reboot, wait, reboot again to old kernel, and read the kernel log.  Not so easy to debug, sorry :)

mtrr maps:
HIGHMEM4G=yes:
BIOS-e820: 0000000000000000 - 000000000008f000 (usable)
BIOS-e820: 000000000008f000 - 00000000000a0000 (reserved)
BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 00000000cf561000 (usable)
BIOS-e820: 00000000cf561000 - 00000000cf56e000 (reserved)
BIOS-e820: 00000000cf56e000 - 00000000cf612000 (usable)
BIOS-e820: 00000000cf612000 - 00000000cf6e9000 (ACPI NVS)
BIOS-e820: 00000000cf6e9000 - 00000000cf6ec000 (usable)
BIOS-e820: 00000000cf6ec000 - 00000000cf6f1000 (ACPI data)
BIOS-e820: 00000000cf6f1000 - 00000000cf6f2000 (usable)
BIOS-e820: 00000000cf6f2000 - 00000000cf6ff000 (ACPI data)
BIOS-e820: 00000000cf6ff000 - 00000000cf700000 (usable)
BIOS-e820: 00000000cf700000 - 00000000d0000000 (reserved)
BIOS-e820: 00000000fff00000 - 0000000100000000 (reserved)
BIOS-e820: 0000000100000000 - 000000012c000000 (usable)

HIGHMEM64G=yes:
BIOS-e820: 0000000000000000 - 000000000008f000 (usable)
BIOS-e820: 000000000008f000 - 00000000000a0000 (reserved)
BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 00000000cf561000 (usable)
BIOS-e820: 00000000cf561000 - 00000000cf56e000 (reserved)
BIOS-e820: 00000000cf56e000 - 00000000cf612000 (usable)
BIOS-e820: 00000000cf612000 - 00000000cf6e9000 (ACPI NVS)
BIOS-e820: 00000000cf6e9000 - 00000000cf6ec000 (usable)
BIOS-e820: 00000000cf6ec000 - 00000000cf6f1000 (ACPI data)
BIOS-e820: 00000000cf6f1000 - 00000000cf6f2000 (usable)
BIOS-e820: 00000000cf6f2000 - 00000000cf6ff000 (ACPI data)
BIOS-e820: 00000000cf6ff000 - 00000000cf700000 (usable)
BIOS-e820: 00000000cf700000 - 00000000d0000000 (reserved)
BIOS-e820: 00000000fff00000 - 0000000100000000 (reserved)
BIOS-e820: 0000000100000000 - 000000012c000000 (usable)

modified: 0000000000000000 - 000000000008f000 (usable)
modified: 000000000008f000 - 00000000000a0000 (reserved)
modified: 00000000000e0000 - 0000000000100000 (reserved)
modified: 0000000000100000 - 00000000cf561000 (usable)
modified: 00000000cf561000 - 00000000cf56e000 (reserved)
modified: 00000000cf56e000 - 00000000cf612000 (usable)
modified: 00000000cf612000 - 00000000cf6e9000 (ACPI NVS)
modified: 00000000cf6e9000 - 00000000cf6ec000 (usable)
modified: 00000000cf6ec000 - 00000000cf6f1000 (ACPI data)
modified: 00000000cf6f1000 - 00000000cf6f2000 (usable)
modified: 00000000cf6f2000 - 00000000cf6ff000 (ACPI data)
modified: 00000000cf6ff000 - 00000000cf700000 (usable)
modified: 00000000cf700000 - 000000012c000000 (reserved)

Steps to reproduce:
Comment 1 Jesse Barnes 2008-03-12 14:31:30 UTC
This may be due to the e820 fixup code Yinghai added.  Any ideas Yinghai?
Comment 2 Yinghai Lu 2008-03-12 18:29:06 UTC
32 bit trimming
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=093af8d7f0ba3c6be1485973508584ef081e9f93

and one fix 
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=20651af9ac60fd6e31360688ad44861a7d05256a

should fix that.

it seems
BIOS-e820: 00000000cf700000 - 00000000d0000000 (reserved)
BIOS-e820: 00000000fff00000 - 0000000100000000 (reserved)
BIOS-e820: 0000000100000000 - 000000012c000000 (usable)

change to

modified: 00000000cf700000 - 000000012c000000 (reserved)

and we should get
BIOS-e820: 00000000cf700000 - 00000000d0000000 (reserved)
BIOS-e820: 00000000fff00000 - 000000012c000000 (reserved)

can you send out your /proc/mtrr?
Comment 3 Yinghai Lu 2008-03-14 23:17:50 UTC
Created attachment 15271 [details]
fix trimming #2

please test this patch
Comment 4 Stephen Gran 2008-03-15 06:07:57 UTC
(In reply to comment #2)
> can you send out your /proc/mtrr?

2.6.24
reg00: base=0x00000000 (   0MB), size=2048MB: write-back, count=1
reg01: base=0x80000000 (2048MB), size=1024MB: write-back, count=1
reg02: base=0xc0000000 (3072MB), size= 256MB: write-back, count=1
reg03: base=0xcf800000 (3320MB), size=   8MB: uncachable, count=1
reg04: base=0xcf700000 (3319MB), size=   1MB: uncachable, count=1
reg05: base=0xd0000000 (3328MB), size= 256MB: write-combining, count=1

2.6.25 (+your patch)
reg00: base=0x00000000 (   0MB), size=2048MB: write-back, count=1
reg01: base=0x80000000 (2048MB), size=1024MB: write-back, count=1
reg02: base=0xc0000000 (3072MB), size= 256MB: write-back, count=1
reg03: base=0xcf800000 (3320MB), size=   8MB: uncachable, count=1
reg04: base=0xcf700000 (3319MB), size=   1MB: uncachable, count=1
reg05: base=0xd0000000 (3328MB), size= 256MB: write-combining, count=1
Comment 5 Stephen Gran 2008-03-15 06:08:31 UTC
(In reply to comment #3)
> Created an attachment (id=15271) [details]
> fix trimming #2
> 
> please test this patch

That fixes it.  Your rock!

Thanks a lot.
Comment 6 Stephen Gran 2008-03-15 06:10:32 UTC
(In reply to comment #5)
> (In reply to comment #3)
> > Created an attachment (id=15271) [details] [details]
> > fix trimming #2
> > 
> > please test this patch
> 
> That fixes it.  Your rock!

Oh, and just for reference in case it's helpful:

Mar 15 12:59:50 gashuffer kernel: BIOS-provided physical RAM map:
Mar 15 12:59:50 gashuffer kernel:  BIOS-e820: 0000000000000000 - 000000000008f000 (usable)
Mar 15 12:59:50 gashuffer kernel:  BIOS-e820: 000000000008f000 - 00000000000a0000 (reserved)
Mar 15 12:59:50 gashuffer kernel:  BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
Mar 15 12:59:50 gashuffer kernel:  BIOS-e820: 0000000000100000 - 00000000cf561000 (usable)
Mar 15 12:59:50 gashuffer kernel:  BIOS-e820: 00000000cf561000 - 00000000cf56e000 (reserved)
Mar 15 12:59:50 gashuffer kernel:  BIOS-e820: 00000000cf56e000 - 00000000cf612000 (usable)
Mar 15 12:59:50 gashuffer kernel:  BIOS-e820: 00000000cf612000 - 00000000cf6e9000 (ACPI NVS)
Mar 15 12:59:50 gashuffer kernel:  BIOS-e820: 00000000cf6e9000 - 00000000cf6ec000 (usable)
Mar 15 12:59:50 gashuffer kernel:  BIOS-e820: 00000000cf6ec000 - 00000000cf6f1000 (ACPI data)
Mar 15 12:59:50 gashuffer kernel:  BIOS-e820: 00000000cf6f1000 - 00000000cf6f2000 (usable)
Mar 15 12:59:50 gashuffer kernel:  BIOS-e820: 00000000cf6f2000 - 00000000cf6ff000 (ACPI data)
Mar 15 12:59:50 gashuffer kernel:  BIOS-e820: 00000000cf6ff000 - 00000000cf700000 (usable)
Mar 15 12:59:50 gashuffer kernel:  BIOS-e820: 00000000cf700000 - 00000000d0000000 (reserved)
Mar 15 12:59:50 gashuffer kernel:  BIOS-e820: 00000000fff00000 - 0000000100000000 (reserved)
Mar 15 12:59:50 gashuffer kernel:  BIOS-e820: 0000000100000000 - 000000012c000000 (usable)
Mar 15 12:59:50 gashuffer kernel: 3904MB HIGHMEM available.
Mar 15 12:59:50 gashuffer kernel: 896MB LOWMEM available.
Mar 15 12:59:50 gashuffer kernel: Scan SMP from c0000000 for 1024 bytes.
Mar 15 12:59:50 gashuffer kernel: Scan SMP from c009fc00 for 1024 bytes.
Mar 15 12:59:50 gashuffer kernel: Scan SMP from c00f0000 for 65536 bytes.
Mar 15 12:59:50 gashuffer kernel: found SMP MP-table at [c00fe200] 000fe200
Mar 15 12:59:50 gashuffer kernel: WARNING: BIOS bug: CPU MTRRs don't cover all of memory, losing 704MB of RAM.
Mar 15 12:59:50 gashuffer kernel: ------------[ cut here ]------------
Mar 15 12:59:50 gashuffer kernel: WARNING: at arch/x86/kernel/cpu/mtrr/main.c:716 mtrr_trim_uncached_memory+0x17e/0x1e2()
Mar 15 12:59:50 gashuffer kernel: Modules linked in:
Mar 15 12:59:50 gashuffer kernel: Pid: 0, comm: swapper Not tainted 2.6.25-rc4 #1
Mar 15 12:59:50 gashuffer kernel:  [<c0120dd7>] warn_on_slowpath+0x40/0x4f
Mar 15 12:59:50 gashuffer kernel:  [<c02b57a9>] _spin_unlock_irqrestore+0xd/0x10
Mar 15 12:59:50 gashuffer kernel:  [<c0121535>] release_console_sem+0x185/0x19e
Mar 15 12:59:50 gashuffer kernel:  [<c012194a>] vprintk+0x2bf/0x2ed
Mar 15 12:59:50 gashuffer kernel:  [<c0381425>] __alloc_bootmem_core+0x11e/0x2b8
Mar 15 12:59:50 gashuffer kernel:  [<c010d743>] generic_get_mtrr+0x2e/0xdc
Mar 15 12:59:50 gashuffer kernel:  [<c012198c>] printk+0x14/0x18
Mar 15 12:59:50 gashuffer kernel:  [<c0376add>] mtrr_trim_uncached_memory+0x17e/0x1e2
Mar 15 12:59:50 gashuffer kernel:  [<c0376d4d>] mtrr_bp_init+0x20c/0x214
Mar 15 12:59:50 gashuffer kernel:  [<c0374ed2>] setup_arch+0x28e/0x3e8
Mar 15 12:59:50 gashuffer kernel:  [<c036f5f9>] start_kernel+0x65/0x33f
Mar 15 12:59:50 gashuffer kernel:  =======================
Mar 15 12:59:50 gashuffer kernel: ---[ end trace ca143223eefdc828 ]---
Mar 15 12:59:50 gashuffer kernel: update e820 for mtrr
Mar 15 12:59:50 gashuffer kernel: modified physical RAM map:
Mar 15 12:59:50 gashuffer kernel:  modified: 0000000000000000 - 000000000008f000 (usable)
Mar 15 12:59:50 gashuffer kernel:  modified: 000000000008f000 - 00000000000a0000 (reserved)
Mar 15 12:59:50 gashuffer kernel:  modified: 00000000000e0000 - 0000000000100000 (reserved)
Mar 15 12:59:50 gashuffer kernel:  modified: 0000000000100000 - 00000000cf561000 (usable)
Mar 15 12:59:50 gashuffer kernel:  modified: 00000000cf561000 - 00000000cf56e000 (reserved)
Mar 15 12:59:50 gashuffer kernel:  modified: 00000000cf56e000 - 00000000cf612000 (usable)
Mar 15 12:59:50 gashuffer kernel:  modified: 00000000cf612000 - 00000000cf6e9000 (ACPI NVS)
Mar 15 12:59:50 gashuffer kernel:  modified: 00000000cf6e9000 - 00000000cf6ec000 (usable)
Mar 15 12:59:50 gashuffer kernel:  modified: 00000000cf6ec000 - 00000000cf6f1000 (ACPI data)
Mar 15 12:59:50 gashuffer kernel:  modified: 00000000cf6f1000 - 00000000cf6f2000 (usable)
Mar 15 12:59:50 gashuffer kernel:  modified: 00000000cf6f2000 - 00000000cf6ff000 (ACPI data)
Mar 15 12:59:50 gashuffer kernel:  modified: 00000000cf6ff000 - 00000000cf700000 (usable)
Mar 15 12:59:50 gashuffer kernel:  modified: 00000000cf700000 - 00000000d0000000 (reserved)
Mar 15 12:59:50 gashuffer kernel:  modified: 00000000fff00000 - 000000012c000000 (reserved)
Mar 15 12:59:50 gashuffer kernel: only 2423MB highmem pages available, ignoring highmem size of 3904MB.
Comment 7 Rafael J. Wysocki 2008-03-16 05:32:51 UTC
Regressions list annotation:
Handled-By : Yinghai Lu <yhlu.kenrel@gmail.com>
Patch : http://bugzilla.kernel.org/attachment.cgi?id=15271&action=view
Comment 8 Yinghai Lu 2008-03-17 13:05:17 UTC
Created attachment 15316 [details]
more generic patch

more generic by changing range type instead of add one big chunk for E820_RESERVED
Comment 9 Rafael J. Wysocki 2008-03-20 02:40:12 UTC
Regressions list annotation:
Patch : http://bugzilla.kernel.org/attachment.cgi?id=15316&action=view
Comment 10 Adrian Bunk 2008-03-21 10:39:05 UTC
fixed by commit 5dca6a1bb014875a17289fdaae8c31e0a3641c99