Bug 13042
Summary: | MTRR problems after 4GB RAM upgrade | ||
---|---|---|---|
Product: | Platform Specific/Hardware | Reporter: | higuita (higuita) |
Component: | x86-64 | Assignee: | platform_x86_64 (platform_x86_64) |
Status: | CLOSED OBSOLETE | ||
Severity: | normal | CC: | alan, hugh, marcus, mishu, sa, yinghai |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.29 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
lspci
dmesg.4G dmesg.3G my .config config for 2.6.30-rc4 mtrr-uncover output mtrr-uncover with debug dmesg with enable_mtrr_cleanup mtrr_spare_reg_nr=3 debug 4GB dmesg without fglrx Photo of the kernel ooops when removing mtrr 1 4GB dmesg - no extra modules dmesg, notaint, single mtrr for 4G, notaint, single |
Description
higuita
2009-04-07 23:46:58 UTC
Created attachment 20869 [details]
lspci
Created attachment 20870 [details]
dmesg.4G
Created attachment 20871 [details]
dmesg.3G
I had the same problem, setting CONFIG_MTRR_SANITIZER_ENABLE_DEFAULT=1 fixed it for me. Any chance of having this set as the default, as discussed here: http://lkml.org/lkml/2009/2/19/108 Created attachment 21339 [details]
my .config
i already have those flags enabled.. maybe i have some incompatible option?
i'm attaching my config for kernel 2.6.29.3
I have updated mtrr-uncover to address a gratuitous change to the format of entries in /proc/mtrr. My suspicion is that any kernel with this change also has the MTRR sanitizer code. Even so, there have been systems for which mtrr-uncover works and the MTRR sanitizer does not (at least not without options, ones that are documented opaquely). ftp://ftp.cs.utoronto.ca/pub/hugh/mtrr-uncover-2009may13.tgz Created attachment 21340 [details]
config for 2.6.30-rc4
I'm attaching the config I used for comparison.
I'm using an Asus P5Q-EM motherboard.
Created attachment 21349 [details]
mtrr-uncover output
in a quick compare between configs i dont see anything that might explain the different results, i will try to do a better compare later
also try the new mtrr-uncover... again, the system hard lock if i try to change the MTRR. I will attach next the output with debug, but anyway, it seems impossible for me to disable the original MTRR config after booting.
finally, also tried to boot with "enable_mtrr_cleanup mtrr_spare_reg_nr=3 debug" but didnt work, i always have the same mtrr config
Created attachment 21350 [details]
mtrr-uncover with debug
Created attachment 21351 [details]
dmesg with enable_mtrr_cleanup mtrr_spare_reg_nr=3 debug
as Sven CPU seems to be a intel (as most posts i see that were able to fix this) and mine is a AMD, maybe its a CPU related problem? here is my cpuinfo (core 0 only, core1 its the same) processor : 0 vendor_id : AuthenticAMD cpu family : 15 model : 35 model name : Dual Core AMD Opteron(tm) Processor 175 stepping : 2 cpu MHz : 1000.000 cache size : 1024 KB physical id : 0 siblings : 2 core id : 0 cpu cores : 2 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 3dnow rep_good pni lahf_lm cmp_legacy bogomips : 2003.89 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: ts fid vid ttp I've never seen overlapping with three MTRR's covering a range. The MTRRs in the first configuration of the original message has three MTRR's covering the range 0x0c0000000-0x0cfffffff (note: these are 36-bit numbers; ignore the leading 0 to see 32-bit numbers). The first message refers to this setup as "the mtrr with memory remapping enable". I take it that this refers to a BIOS setting. It is legal but generally useless to have MTRR's nested three deep. In this case, the three MTRR types are: write-back, uncachable, and write-combining. I find this very odd. In particular, uncachable trumps other types and so the write-combining range (MTRR 3) is useless since it is completely contained within the uncachable range (MTRR 2). (The rules are in section 10.11.4.1 "MTRR Precedences" of "Intel® 64 and IA-32 Architectures Software Developer’s Manual", Volume 3A: "System Programming Guide", Part 1) Another odd thing is that I've never seen a BIOS initialize an range to write-combining. Generally, an X video driver will attempt to set up a write-combining region. I've also seen a rare Infiniband driver do that too. But never a BIOS. Are you printing this before X gets started? Actually, the Kernel generally doesn't allow creation of a write-combining region like this, so it is unlikely to be driver code that did this. I admit that I've not seen a very wide variety of MTRR settings. I have seen a number as author of mtrr-uncover. It would be useful to know what causes the machine to lock up. You can issue the commands that mtrr-uncover proposes, one at a time. Run it without --execute. The last thing it prints is a set of commands. Send each of them, one line at a time, to /proc/mtrr using echo. For example echo "disable=0" >! /proc/mtrr After which command does the system hang? [I'm not a kernel hacker and I don't know the mtrr cleanup code.] I have a machine running 64-bit Ubuntu 9.04 that needs mtrr-fixup. The kernel is Ubuntu's 2.6.28, not the 2.6.29 that you are running, so the cleanup code might have changed. On my system, with enable_mtrr_cleanup, the cleanup starts chattering in dmesg very early. And it chatters a lot. I see nothing in your dmesg output. From my recollection of the cleanup code (very early on), it made some assumptions of the environment, checked them, and if they were not satisfied, it didn't do any cleanup. It might not have even logged anything to this effect. An example that I remember was that the default memory type must be UC (as set in a machine control register). It might be worth reading the code to see if any such assumptions are not satisfied in your evironment. Suspicion: the code might not be willing to deal with triple-nested ranges. You might wish to drop in a few printk calls. An early enough printk could let you know if the cleanup code is even compiled in. thanks for the help D.Hugh AFAIK, those MTRR are not changes by anything, i have done all the test in a console, without any X running (after all, fglrx locks if i try to start it in this MTRR config)... i may have fglrx module already loaded, but if i'm not wrong, its the exact same MTRR when i dont have it loaded ( i will confirm tomorrow in a few tests) i already tried to run the commands to change the MTRR, and if my memory isnt bad, it hardlocks when removing the reg 0 and a kernel oops when i remove one of the reg 1 or reg 2, not sure which one...again, i will confirm this tomorrow i'm now running 2.6.29, but i have this problem at least since the 2.6.27, time where i added more ram... no difference between kernel versions and i also dont see any special output during the boot diffing the 3GB dmesg and the 4GB i get this: $ diff dmesg.3G.1 dmesg.4G.1 2c2 < Command line: BOOT_IMAGE=2.6.29.1-k8_64 ro root=302 resume=/dev/sda6 memory_corruption_check=1 --- > Command line: auto BOOT_IMAGE=2.6.29.1-k8_64 ro root=302 resume=/dev/sda6 > memory_corruption_check=1 mtrr_chunk=256M mtrr_gran_size=256M 15a16 > BIOS-e820: 0000000100000000 - 0000000140000000 (usable) 18c19 < last_pfn = 0xbffb0 max_arch_pfn = 0x100000000 --- > last_pfn = 0x140000 max_arch_pfn = 0x100000000 19a21 > last_pfn = 0xbffb0 max_arch_pfn = 0x100000000 30a33 > modified: 0000000100000000 - 0000000140000000 (usable) 35a39,42 > init_memory_mapping: 0000000100000000-0000000140000000 > 0100000000 - 0140000000 page 2M > kernel direct mapping tables up to 140000000 @ 13000-19000 > last_map_addr: 140000000 end: 140000000 45c52 < (5 early reservations) ==> bootmem [0000000000 - 00bffb0000] --- > (6 early reservations) ==> bootmem [0000000000 - 0140000000] 51c58,59 < [ffffe20000000000-ffffe200029fffff] PMD -> [ffff880001200000-ffff880003bfffff] on node 0 --- > #5 [0000013000 - 0000014000] PGTABLE ==> [0000013000 - > 0000014000] > [ffffe20000000000-ffffe200045fffff] PMD -> > [ffff880028200000-ffff88002c7fffff] on node 0 55c63 < Normal 0x00100000 -> 0x00100000 --- > Normal 0x00100000 -> 0x00140000 57c65 < early_node_map[2] active PFN ranges --- > early_node_map[3] active PFN ranges 60c68,69 < On node 0 totalpages: 786239 --- > 0: 0x00100000 -> 0x00140000 > On node 0 totalpages: 1048383 62,65c71,77 < DMA zone: 1782 pages reserved < DMA zone: 2145 pages, LIFO batch:0 < DMA32 zone: 10695 pages used for memmap < DMA32 zone: 771561 pages, LIFO batch:31 --- > DMA zone: 1783 pages reserved > DMA zone: 2144 pages, LIFO batch:0 > DMA32 zone: 14280 pages used for memmap > DMA32 zone: 767976 pages, LIFO batch:31 > Normal zone: 3584 pages used for memmap > Normal zone: 258560 pages, LIFO batch:31 > Looks like a VIA chipset. Disabling IOMMU. Override with iommu=allowed 82a95,99 > PM: Registered nosave memory: 00000000bffb0000 - 00000000bffc0000 > PM: Registered nosave memory: 00000000bffc0000 - 00000000bfff0000 > PM: Registered nosave memory: 00000000bfff0000 - 00000000c0000000 > PM: Registered nosave memory: 00000000c0000000 - 00000000ff780000 > PM: Registered nosave memory: 00000000ff780000 - 0000000100000000 86,87c103,104 < Built 1 zonelists in Zone order, mobility grouping on. Total pages: 773706 < Kernel command line: BOOT_IMAGE=2.6.29.1-k8_64 ro root=302 resume=/dev/sda6 memory_corruption_check=1 --- > Built 1 zonelists in Zone order, mobility grouping on. Total pages: 1028680 > Kernel command line: auto BOOT_IMAGE=2.6.29.1-k8_64 ro root=302 > resume=/dev/sda6 memory_corruption_check=1 mtrr_chunk=256M > mtrr_gran_size=256M 93c110 < Detected 2202.736 MHz processor. --- > Detected 2202.636 MHz processor. 98,103c115,118 < Checking aperture... < AGP bridge at 00:00:00 < Aperture from AGP @ c0000000 old size 32 MB < Aperture from AGP @ c0000000 size 256 MB (APSIZE f00) < Node 0: aperture @ c0000000 size 256 MB < Memory: 3088348k/3145408k available (3751k kernel code, 452k absent, 56164k reserved, 1851k data, 372k init) --- > PCI-DMA: Using software bounce buffering for IO (SWIOTLB) > Placing 64MB software IO TLB between ffff880020000000 - ffff880024000000 > software IO TLB at phys 0x20000000 - 0x24000000 > Memory: 4041812k/5242880k available (3751k kernel code, 1049348k absent, > 150816k reserved, 1851k data, 372k init) 105c120 < Calibrating delay loop (skipped), value calculated using timer frequency.. 4407.03 BogoMIPS (lpj=7342453) --- > Calibrating delay loop (skipped), value calculated using timer frequency.. > 4407.82 BogoMIPS (lpj=7342120) 127c142 < Total of 2 processors activated (8814.96 BogoMIPS). --- > Total of 2 processors activated (8814.75 BogoMIPS). 140a156 > TOM2: 0000000140000000 aka 5120M 144c160,161 < bus: 00 index 2 mmio: [c0000000, fcffffffff] --- > bus: 00 index 2 mmio: [c0000000, ffffffff] > bus: 00 index 3 mmio: [140000000, fcffffffff] 233d249 < pci 0000:00:00.0: BAR 0: can't allocate resource 251,254d266 < pnp 00:0d: mem resource (0x0-0x9ffff) overlaps 0000:00:00.0 BAR 0 (0x0-0xfffffff), disabling < pnp 00:0d: mem resource (0xc0000-0xdffff) overlaps 0000:00:00.0 BAR 0 (0x0-0xfffffff), disabling < pnp 00:0d: mem resource (0xe0000-0xfffff) overlaps 0000:00:00.0 BAR 0 (0x0-0xfffffff), disabling < pnp 00:0d: mem resource (0x100000-0xbffeffff) overlaps 0000:00:00.0 BAR 0 (0x0-0xfffffff), disabling 265a278,281 > system 00:0d: iomem range 0x0-0x9ffff could not be reserved > system 00:0d: iomem range 0xc0000-0xdffff has been reserved > system 00:0d: iomem range 0xe0000-0xfffff could not be reserved > system 00:0d: iomem range 0x100000-0xbffeffff could not be reserved 289c305 < msgmni has been set to 6032 --- > msgmni has been set to 7895 332d347 < isa bounce pool size: 16 pages 442,444d456 < ReiserFS: hda2: Using r5 hash to sort names < VFS: Mounted root (reiserfs filesystem) readonly on device 3:2. < Freeing unused kernel memory: 372k freed 451c463,473 < Linux video capture interface: v2.00 --- > generic-usb 0003:0463:FFFF.0002: hidraw1: USB HID v1.00 Device [MGE UPS > SYSTEMS ellipse] on usb-0000:00:10.1-2/input0 > ReiserFS: hda2: replayed 215 transactions in 12 seconds > ReiserFS: hda2: Using r5 hash to sort names > VFS: Mounted root (reiserfs filesystem) readonly on device 3:2. > Freeing unused kernel memory: 372k freed > ohci1394 0000:00:07.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16 > ohci1394: fw-host0: OHCI-1394 1.0 (PCI): IRQ=[16] MMIO=[fb700000-fb7007ff] > Max Packet=[2048] IR/IT contexts=[4/8] > skge 0000:00:0a.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17 > skge 0000:00:0a.0: PCI: Disallowing DAC for device > skge 1.13 addr 0xfbb00000 irq 17 chip Yukon-Lite rev 9 > skge eth0: addr 00:11:d8:82:b1:0e 460,465c482 < ohci1394 0000:00:07.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16 < ohci1394: fw-host0: OHCI-1394 1.0 (PCI): IRQ=[16] MMIO=[fb700000-fb7007ff] Max Packet=[2048] IR/IT contexts=[4/8] < skge 0000:00:0a.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17 < skge 0000:00:0a.0: PCI: Disallowing DAC for device < skge 1.13 addr 0xfbb00000 irq 17 chip Yukon-Lite rev 9 < skge eth0: addr 00:11:d8:82:b1:0e --- > Linux video capture interface: v2.00 481a499,500 > [fglrx] Maximum main memory to use for locked dma buffers: 3786 MBytes. > [fglrx] vendor: 1002 device: 9586 count: 1 483a503,506 > [fglrx] ioport: bar 1, base 0xe000, size: 0x100 > pci 0000:01:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16 > [fglrx] Kernel PAT support detected, disabling driver built-in PAT support > [fglrx] module loaded - fglrx 8.59.2 [Mar 13 2009] with 1 minors 491,496d513 < [fglrx] Maximum main memory to use for locked dma buffers: 2870 MBytes. < [fglrx] vendor: 1002 device: 9586 count: 1 < [fglrx] ioport: bar 1, base 0xe000, size: 0x100 < pci 0000:01:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16 < [fglrx] Kernel PAT support detected, disabling driver built-in PAT support < [fglrx] module loaded - fglrx 8.59.2 [Mar 13 2009] with 1 minors 499a517,518 > ReiserFS: hda2: Removing [43 750871 0x0 SD]..done > ReiserFS: hda2: There were 1 uncompleted unlinks/truncates. Completed 505d523 < generic-usb 0003:0463:FFFF.0002: hidraw1: USB HID v1.00 Device [MGE UPS SYSTEMS ellipse] on usb-0000:00:10.1-2/input0 530c548 < Clocksource tsc unstable (delta = -274461584 ns) --- > Clocksource tsc unstable (delta = -273951755 ns) 602d619 < ip_tables: (C) 2000-2006 Netfilter Core Team 604a622 > ip_tables: (C) 2000-2006 Netfilter Core Team 617d634 < hdc: UDMA/33 mode selected 618a636 > hdc: UDMA/33 mode selected as you can see, no mtrr references, that is why i suspect that the mtrr clean up isnt being run in my machine.. i open the bug as a x86_64 problem but now looks like is related with my hardware i'm not a programmers (just shell scripts and a little of perl), but if you give me a example of printk call, i can try to randomly insert then in the /usr/src/linux-2.6.29/arch/x86/kernel/cpu/mtrr/ files using my almost non existent C knowledge :) tomorrow i will have more free time to boot the machine several times and do more tests, in the mean time, should i nag asus for a fix in the bios or a post in the lkml to alert about this problem? thanks This dmesg diff was interesting to me. Do notice that fglrx is running. Please try without it. First of all, it is closed source. Second of all, it could well muck with MTRRs (although it mentions usint PAT). In the 4G case, the AGP aperture was not mentioned. I don't know why or what that means. The AGP aperture is used to access memory on the video board. As far as advising you on how to adding printk calls, I don't feel that I can devote the time needed. I'd need to download the kernel, find the MTRR cleanup code, try to understand it, and then write up what I had figured out. But here's a start. The relevant code seems to be here: http://lxr.linux.no/linux+v2.6.29/arch/x86/kernel/cpu/mtrr/main.c Line 1364 is the start of the C function to clean up MTRRs. You will see line 1377 is an "if" that will cause the routine to quit, without diagnostic, if the default MTRR type is not uncachable. You can see an example of printk on line 1394. The KERN_DEBUG part specifies what logging level is needed for this to show up in the log. If you change that to KERN_WARNING, you are more likely to see the message. I don't know what your kernel loglevel is but it is likely set to suppress KERN_DEBUG messages. can you try tip/master? http://people.redhat.com/mingo/tip.git/readme.txt also please boot with debug show_msr=1 ok, lets start with the tests... without fglrx module in the system: the exact same mtrr after booting, without start X doing echo "disable=0" >| /proc/mtrr locks the machine, not even magic sysrq keys work disable=2 removes the entry and the system still works disable=3 removes the entry and the system still works disable=1 (not needed by mtrr-uncover) generate a kernel ooops, i cant do anything but i can ping the machine and i see new iptables nessages in the console... i will attach a screenshot of the ooops (should i open a new bug?) i'm compiling right now the tip tree Created attachment 21381 [details]
4GB dmesg without fglrx
Created attachment 21382 [details]
Photo of the kernel ooops when removing mtrr 1
I don't think that this is possible, but one explanation would be if the default type is NOT uncachable. Disabling 3 first should have no effect. As I mentioned in #12, it doesn't do anything. Once you do that, the setup appears normal. Disabling only 0 should only slow the machine down. A lot! Guess: the performance might go down almost to that of a PC/XT (the last PC that had not caching). Still, I imagine this to be fast enough to get to the next step in "mtrr-uncover --execute" Disabling only 2 could do bad things. Or maybe not. It depends what's in that memory range: some device registers are likely mapped into that range and should not be cached. Disabling only 1 would slow down any access to memory above address 4G. I don't know what Linux would choose to place there. Still, because enable_mtrr_cleanup isn't running, I think that there is something odd about your setup. It might just be the triple nesting, but it might be something else. Reading the dmesg #18. I'd suggest not running vboxdr and gShield. Try ditching as much as possible. Created attachment 21633 [details]
4GB dmesg - no extra modules
here is the latest dmesg (now with kernel 2.6.29.4) , without fglrx and vbox
gshield is just a firewall/iptables script, but i disabled it anyway
i tried the tip git tree, but i use reiserfs in the root filesystem and during this weeks it always crash during the mount of the root partition and the kernel oops shows a crash on xattr (or similar) on reiserfs
i forgot... of course, the mtrr is still the same and still locks up when i try to disable entry 0 For what it is worth, MTRRdefType Register says that the default type is UC. So that seems right. MSR000002ff: 0000000000000c00 This latest run does not include enable_mtrr_cleanup. Created attachment 21861 [details]
dmesg, notaint, single
new dmesg, this time with the missing option and booting in single mode, so to exclude even more external intreferences.
again, disable=0 locks up the system
Created attachment 21862 [details]
mtrr for 4G, notaint, single
mtrr didnt change, but for completeness.
i will try to test 2.6.30 when i have more time
It isn't clear from your message, but the attachment in #26 is a snapshot of /proc/mtrr. If you can set the MIME type to text, it would be easier to look at. No big deal. I note that a write-combining MTRR is present still. Since I assume that X isn't running, I find this quite surprising (as I mentioned earlier). Looking at the dmesg output in #25, the string "mtrr" only appears in a kernel parameter. It sure looks to me as if the mtrr-cleanup code isn't running. Why? Dropping a few printk calls in might help to narrow down what is going on. |