Bug 23542
Description
Bjorn Helgaas
2010-11-22 17:12:27 UTC
Created attachment 37892 [details]
/proc/iomem diff (-good, +bad)
Here's the interesting bit:
fee01000-ffffffff : PCI Bus 0000:00
+ ff900000-ffafffff : PCI Bus 0000:03
+ ffb00000-ffcfffff : PCI Bus 0000:02
+ ffd00000-ffefffff : PCI Bus 0000:01
fffa0000-fffa6fff : reserved
+ fffff000-ffffffff : Intel Flush Page
Putting the "Intel Flush Page" at the very last page before 4G is not
going to work. That page contains the ROM processor restart vector,
so I don't think accesses to it will go to the PCI bus.
Created attachment 37902 [details]
patch to avoid PCI allocations in 1M below 4GB
Handled-By : Bjorn Helgaas <bjorn.helgaas@hp.com> Created attachment 37972 [details]
v2 patch to avoid PCI allocations in 1M below 4GB
The previous version would make host bridge windows above 4GB completely
unusable.
References: https://patchwork.kernel.org/patch/365092/ New information: there is an ACPI INT0800 device that is using the 0xff000000-0xffffffff range: [mjg59@2530p 00:03]$ cat id INT0800 [mjg59@2530p 00:03]$ cat resources state = active mem 0xff000000-0xffffffff That's enough to tell us we shouldn't put any PCI devices in that range. (It's still sort of dubious that the PNP0A08:00 window: PNP0A08:00: host bridge window [mem 0xfee01000-0xffffffff] overlaps the INT0800 range, but Windows must handle that, so we'll have to manage it somehow, too.) Unfortunately, Linux ignores the resources used by ACPI devices (except PNP0C01 and PNP0C02 motherboard devices), so the INT0800 device didn't save us in this case. This is a Linux defect, but it's too complicated to fix right now. I'm still getting this error with 2.6.36.1-10.fc15.x86_64 on my 2530p which contains the above patch; I've got 4GB of RAM in this machine. Before the fixes for this the graphics would corrupt and a reboot would occur within 30 seconds of logging into the desktop. With the 2010-11-23 patch it happens about 5 minutes after doing so. mjg59 says he only has 2GB of RAM which could be why he's not seeing the problem anymore but I still am. Thanks a lot for testing this, Dan. Can you attach your 2.6.36.1-10.fc15.x86_64 dmesg please? I just want to double-check that the patch is doing what I expected. If it's not too much trouble, you might also try it with the 0xfff00000 changed to 0xffe00000, since it sounds like that might be safer for really old machines. But I doubt that 0xffe00000 will make a difference; we might just have to bite the bullet and figure out how to avoid all ACPI devices. mjg59 asked me for /proc/iomem diffs. 'bad' == 2.6.36.1-10.fc15.x86_64 'good' == 2.6.36-4.fc15 with csr patch reverted [dcbw@dcbw ~]$ diff -u badiomem.txt goodiomem.txt --- badiomem.txt 2010-12-03 09:55:39.668107911 -0600 +++ goodiomem.txt 2010-12-03 09:53:06.616013000 -0600 @@ -4,9 +4,9 @@ 000a0000-000bffff : PCI Bus 0000:00 000ef000-000fffff : reserved 00100000-b4f2efff : System RAM - 01000000-0146d3bc : Kernel code - 0146d3bd-01b7c61f : Kernel data - 01c6f000-01e3cf07 : Kernel bss + 01000000-0146c33c : Kernel code + 0146c33d-01b7c59f : Kernel data + 01c6f000-01e3cf47 : Kernel bss b4f2f000-b4f30fff : reserved b4f31000-b5d6ffff : System RAM b5d70000-b5d7ffff : ACPI Non-volatile Storage @@ -48,6 +48,10 @@ d4828000-d4828fff : 0000:00:19.0 d4828000-d4828fff : e1000e d4829000-d4829fff : 0000:00:03.3 + d482a000-d482afff : Intel Flush Page + d4900000-d4afffff : PCI Bus 0000:01 + d4b00000-d4cfffff : PCI Bus 0000:02 + d4d00000-d4efffff : PCI Bus 0000:03 e0000000-efffffff : PCI MMCONFIG 0000 [bus 00-ff] e0000000-efffffff : reserved e0000000-efffffff : pnp 00:01 @@ -68,9 +72,5 @@ fee00000-fee00fff : Local APIC fee00000-fee00fff : reserved fee01000-ffffffff : PCI Bus 0000:00 - ff800000-ff9fffff : PCI Bus 0000:03 - ffa00000-ffbfffff : PCI Bus 0000:02 - ffc00000-ffdfffff : PCI Bus 0000:01 - ffe7f000-ffe7ffff : Intel Flush Page ffe80000-ffffffff : reserved 100000000-13bffffff : System RAM Created attachment 38952 [details]
dmesg diff (-good, +bad)
Created attachment 38962 [details]
full dmesg from bad kernel 2.6.36.1-10.fc15
let me know if there's anything more you need, thanks! Created attachment 39142 [details]
/proc/iomem from a good kernel (2.6.36-4.fc15)
Created attachment 39262 [details]
patch to avoid allocating PNP resources
Linus thought it was ugly to hack up pcibios_align_resource() to avoid
regions, and as usual, he's right. Here's another try that takes a little
different approach -- instead of avoiding the hardcoded 2MB range just
below 4GB, this avoids all PNP device resources. The INT0800 device
should be enough for this particular issue.
This is still ugly in that the PNP devices should be directly in the
iomem_resource tree like PCI devices are, but we can't do that quite yet.
Ok, I'll try to backport this to current Fedora kernels (2.6.36.1 + the CRS patch + the "never allocate pci from the last 1M below 4G" patch) and see where I get. Just to be clear, my current proposed patches do not include a "never allocate pci from the last 1M below 4G" patch. The patch in comment 14 avoids PNP resources, which should be enough to solve the 2530p problem because there's an INT0800 device that claims 0xff000000-0xffffffff. Sorry for the backporting mess; there are many patches between 2.6.36.1 and now. If you want to point me at the patches you're applying to 2.6.36.1, I'd be happy to give you my opinions. Yeah, I found the "never allocate from last 1M" patch never got upstream, but it was applied to Fedora 2.6.36 kernels to fix the 2530p issue originally (and it does for mjg59's machine with only 2G). I haven't yet booted the kernel I built, but I'll attach the patches that are applied. In the end I think we do want to backport this to 2.6.36 and send to stable@ if we can. Created attachment 39622 [details]
Fedora 2.6.36 kernel patch fixing original 2530p issue
Created attachment 39632 [details]
Fedora 2.6.36 kernel CRS fixes patch that caused the problem originally
Created attachment 39642 [details]
My attempt to backport your PNP avoidance patch to the Fedora 2.6.36.2 kernel on top of the two above patches
Hmm, looking at my backport this morning I missed a bit in pcibios_align_resource() that should have gotten moved to arch_remove_reservations(). Other than that, comments on my backport attempt? I think you should drop the patch from comment 18. I don't plan to push that upstream. If we decide we do need to ignore that last 1M or 2M, I'll put that code in arch_remove_reservations() instead of in pcibios_align_resource(). You mentioned the code that moved from pcibios_align_resource() to arch_remove_reservations(); that change *is* in your comment 20 patch. But of course, that patch will change a little bit if you drop the comment 18 patch. So I've run at least an hour with the patches posted above and had no issues so far. Without the recent PNP resource patch things would fail within 10 minutes. I'll keep running with this kernel for a day or two before we declare unconditional success :) Thanks! I'll let you know. Still working fine after 6 hours, no issues so far. I think it's fixed. What patches do you have tested? Just so we don't loose track. Created attachment 40142 [details]
v3 patch to avoid top 1M below 4G
It's too late in the .37 cycle to consider the previous patch.
Here's the patch Linus proposed, an alternate way to avoid that
last 1MB. Can anybody test it?
Created attachment 40292 [details]
SeaBIOS/qemu test patch
This is just for documentation. Booting Windows 7 on qemu
with this SeaBIOS patch shows that Windows ignored the last
1MB of memory below 4GB, even if it's not mentioned in the
E820 table or any ACPI device _CRS method.
Created attachment 40462 [details]
avoid E820 regions
Boy, I hope we converge on a solution here soon...
The current proposal for .37 is to revert the top-down allocation
patches and add this additional patch.
If you're based on 2.6.36, you would want to add these patches, which
went in after .36 and will stay in .37:
a1862e3 resources: handle overflow when aligning start of available area
6909ba1 resources: ensure callback doesn't allocate outside available space
5d6b1fa resources: factor out resource_clip() to simplify find_resource()
a9cea01 resources: add a default alignf to simplify find_resource()
and add the patch I'm attaching here.
|