Bug 7917
Summary: | "PCI: Failed to allocate mem resource" for PCI E-to-PCI Bridge | ||
---|---|---|---|
Product: | Drivers | Reporter: | richlv |
Component: | PCI | Assignee: | Jesse Barnes (jbarnes) |
Status: | REJECTED INSUFFICIENT_DATA | ||
Severity: | normal | CC: | akpm, alan, garyhade, greg, protasnb, stephan.klein |
Priority: | P2 | ||
Hardware: | i386 | ||
OS: | Linux | ||
Kernel Version: | 2.6.19.2 | Subsystem: | |
Regression: | --- | Bisected commit-id: | |
Attachments: |
dmesg output
output of lspci -vvxxx /proc/iomem Output of dmesg after bootup. Output of lspci on 2.6.24-1 dmesg output after bootup with the 2.6.24-16 ubuntu generic kernel |
Description
richlv
2007-02-01 02:14:51 UTC
Rich, Can you please provide a boot trace? Does the system boot up? And have you tried latest kernel (2.6.22-rc7)? Thanks. Created attachment 11976 [details]
dmesg output
i guess 'boot trace' is dmesg output right ? if so, one from 2.6.20.3 is attached (which is the latest running on the machine).
yes, machine boots up successfully.
Created attachment 12198 [details]
output of lspci -vvxxx
attaching output of 'lspci -vvxxx' and contents of /proc/iomem, as requested by email.
Created attachment 12199 [details]
/proc/iomem
forgot to add : both of these are when running 2.6.22.1
It _looks_ like the BIOS may have configured both devices 00:1c.0 PCI bridge (bridging to bus #6) and 01:00.0 PCI bridge (bridging to bus #2) to have Memory behind bridge: dd200000-dd2fffff and the first bridge (00:1c) gets it, and then the second bridge (01:00) quite reasonably gets a resource allocation error. Now, the thing is, that the 00:1c device is *not* a bridge for the 01:xx bus that the 01:00.0 bus is on (it's a bridge to the 06:xx bus), so that BIOS allocation really wasn't right, afaik. The 01:00.0 PCI bridge is actually behind the 00:02.0 PCI bridge. That's quite confusing, and it appears to be made much worse by the fact that those 00:xx bridges are probably transparent PCI bridges (which is quite normal for Intel core bridges) even though they say "normal decode". So I think the BIOS made a mess of it, and Linux complains a bit, but everything is likely to work. It _is_ confusing, but I don't think we can fix it up any better without re-programming all the bridges (which is actually pretty hard, and likely to fail more often than fix anything up, since PCI bridges almost always have hidden regions that they bridge etc). IOW, the only thing I can think of is to remove the warning message, but at the same time, that warning message actually can be very useful for the cases where the end result really _is_ so messed up that something breaks. Rich - it looks like everything actually works well, no? All the devices do actually get resource allocations, both the MegaRAID controllers behind that confusing bus #2, _and_ the e1000 ethernet behind bus #6. So I would suggest we close this as "apparently confused BIOS, but Linux works". yes, the system appears to be working just fine (though we haven't tried populating all slots or other things). would it make sense to report this to the bios vendor ? Greg, could you please consider removing that message, or rephrasing it in some manner so that it is less alarming? The message is now gone in 2.6.23. Closing this bug. I don't want to break your fun, but the messages are still (again?) visible in 2.6.24. This was oberserved with the development version of Ubuntu (Hardy) using package version 2.6.24-1-generic (on i386). See the corresponding launchpad entry https://bugs.launchpad.net/ubuntu/+source/linux-meta/+bug/159241) and the output of lspci and dmesg right after bootup (that I will attach shortly). Unfortunately I can't reopen this bug, maybe someone else can. Created attachment 14020 [details]
Output of dmesg after bootup.
Created attachment 14021 [details]
Output of lspci on 2.6.24-1
Greg, the message is still there in arch/x86/pci/acpi.c. Since there is a theoretical chance that resource conflict might have real consequences (which never happened in my experience, so far), how about making this printk to be of debug level? Downstream lowered the printk to warning. https://bugs.launchpad.net/ubuntu/+source/linux/+bug/159241 See also the commits that actually lowered the messages: http://kernel.ubuntu.com/git?p=ubuntu/ubuntu-hardy.git;a=commit;h=e8b305b993f52c3d628f0c60ed9fd229533b64bb http://kernel.ubuntu.com/git?p=ubuntu/ubuntu-hardy.git;a=commit;h=913b8ab610f9ef2b141ddcf89cc9afd8ef7ea99d Thanks for the pointer, Stephan. I'll see if we can't do something smarter here, otherwise we'll clean up the warnings. You're welcome. Cleaning up the warnings would be the quick fix for this and prevent a lot of people from complaining. But it won't repair the problem itself. The thing is that this allocation thing takes up roughly 10 seconds before the booting process can continue. Just look at this (http://img126.imageshack.us/img126/7234/hardy200804011ek6.png) bootchart. There are about 10 seconds before busybox actually starts. This produces a very akward silence (if the warnings are "hidden"). Are you sure it's the resource allocation taking so long? Can you try booting with 'initcall_debug'? Created attachment 16010 [details]
dmesg output after bootup with the 2.6.24-16 ubuntu generic kernel
I've created a log right after bootup with initcall_debug set, as you requested. Please let me know if you need anything else.
Wow, yeah look at that, pci_init is taking >7s... that's bad. [ 35.169739] Calling initcall 0xc021ef50: pci_init+0x0/0x30() [ 43.162816] 0000:00:1d.7 EHCI: BIOS handoff failed (BIOS bug ?) 01010001 [ 43.162952] Boot video device is 0000:01:00.0 [ 43.162968] PCI: Firmware left 0000:08:08.0 e100 interrupts enabled, disabling [ 43.162981] initcall 0xc021ef50: pci_init+0x0/0x30() returned 0. [ 43.162989] initcall 0xc021ef50 ran for 7627 msecs: pci_init+0x0/0x30() I noticed that when I boot grub into Fedora Core 8 with kernel-2.6.25.6-27.fc8, I don't generate that pci-allocation message, however, when I boot into my custom kernel on the same machine, tried 2.5.25.6 and 2.6.25.8, I do get the allocation failure message. IMHO, that suggests something in the config, not the hardware or the kernel code. Now all I have to do is diff the config files to find the config line that generates the pci mem allocation failure. Fedora seems to use a config with all of the modules enabled, so the search will take a while. At least I've narrowed it down to a compile option. 2.6.25.10 on the original system for this report still shows the message. henry, aren't there also some patches to that fedora kernel that could impact the message ? |