Most recent kernel where this bug did not occur: 2.6.13.3 Distribution: Debian Hardware Environment: HP NX6125 laptop (AMD Mobile Sempron with ATI Radeon XPress 200M chipset) Problem Description: PCI registers for K8 host bridge are not listed in lspci, nor in sysfs. Under 2.6.13.3, they were listed as: 0000:00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration 0000:00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map 0000:00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller 0000:00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control Steps to reproduce: Run lspci.
Created attachment 6467 [details] dmesg output under 2.6.14
Created attachment 6468 [details] lspci output under 2.6.14
Created attachment 6469 [details] lspci output under 2.6.13.3
Created attachment 6470 [details] dmesg output under 2.6.13.3
Created attachment 6471 [details] kernel .config file (2.6.14)
Andi, any idea what could cause this?
It's related to the new resource allocation code from Linus et.al.: PCI: Cannot allocate resource region 7 of bridge 0000:00:04.0 PCI: Cannot allocate resource region 8 of bridge 0000:00:04.0 PCI: Cannot allocate resource region 9 of bridge 0000:00:04.0 PCI: Cannot allocate resource region 7 of bridge 0000:00:05.0 PCI: Cannot allocate resource region 8 of bridge 0000:00:05.0 PCI: Cannot allocate resource region 9 of bridge 0000:00:05.0 But it works on other K8 systems, so must be a bad BIOS interaction. The error message doesn't print out where the region actually is (that is probably a bug and needs to be fixed), so it is hard to tell what is actually wrong. It's a bit screwy that it completes hides the bridge then - that should be probably changed too. Looking at the code it should only hide the resource actually, but maybe something down in lspci ignores bridges with no resource Unfortunately there isn't a command line option to go back to the old behaviour to test the theory.
Please excuse me if these are irrelevant thoughts, but: - K8 bridge is on 0000:00:18.x, but error messages are about 0000:00:04.0 and 0000:00:05.0. Can they be related? - Both 2.6.14 and 2.6.13 (on which the problem doesn't occur) complain about resource regions 7,8,9 of 0000:00:04.0 and 0000:00:05.0. - When diffing both dmesg outputs, the only seemingly relevent differences are "PCI: Using MMCONFIG" instead of "PCI: Using configuration type 1" and different addresses for some devices.
I think you're right and I got confused. The integrated host bridge cannot be accessed using mmconfig. There is a special table that is supposed to tell which busses are accessible. In 2.6.13 we had a workaround of just disabling mmconfig always on AMD platforms, but 2.6.14 is supposed to read that table and figure out which busses can use mmconfig and which one not. So either that code in Linux is broken or your BIOS reports it incorrectly. Can you supply acpidmp output?
Created attachment 6543 [details] acpidump output
I don't see anything wrong with that MCFG entry, do you Andi?
I think it
So, this host bridge is definitely not accessible through mmconfig. However, the only MCFG entry claims that all busses (0x00 to 0xff) are mapped from 0xe0000000 to 0xefffffff. This seems wrong, since not all devices in bus 0 can be accessed this way. So I tried to change the accessible busses range to (0x01..0xff) by hacking the MCFG loading code, but this didn't solve the problem. Since bus 0 is no longer referenced in MCFG, Linux is supposed to fall back to the old port-based method to access this bus, right? But I could not find this fallback code. get_base_addr in mmconfig.c walks through the MCFG looking for the right bus, but when the search fails, it assumes the bus belongs to the first (and usually one and only) MCFG entry? I'm quite confused, maybe I got something wrong...
Yes exactly this is the problem. First the MCFG is wrong on these boxes and then the mmconfig code is missing a fallback (among some other problems) I have a patch, but needs a bit more testing before I can push it out.
Created attachment 6756 [details] implement mmconfig fallback These two patches together should fix it.
Created attachment 6757 [details] Detect unreachable busses and handle them
Please report if these patches work for you guys. Thanks.
Thanks for the patch. Some changes were needed to make it work for i386 (locking...) Works fine for me, at least... All K8 devices/functions are correctly detected, and I can read and write the PCI registers of the K8 memory controller.
Created attachment 6760 [details] Fixes for i386 To be applied with the two other patches. Probably needs checking and testing...
Created attachment 6766 [details] Revised fix for both i386 and x86-64 - Calling a printk while holding a spinlock probably wasn't a good idea... - In both i386 and x86-64 versions, the code seemed to be using a pointer to an array instead of the array itself (I wonder how it did work?...)
I don't think your locking changes are needed because the PCI initialization always runs single threaded.
Calling printk with spinlock is perfectly legal. What makes you think otherwise? I fixed the remaining problems on i386 and sent it off to Linus. Can't close the bug - Greg please close.
I tried Andi
> Calling printk with spinlock is perfectly legal. What makes you > think otherwise? I don't really know the internals of printk, so in doubt I thought it was better not to take the risk of a potential deadlock... (anyway, shouldn't there be as little code as possible in any critical section?) But since there is no need for locking at all, it's all right :) Thanks.
printk prevents reentrance of itself (or rather of the parts of itself that might require locking which is the console output etc.), which eliminates most possible deadlocks. The only spinlocks where are forbidden while calling printk are the scheduler locks because printk may need to take them to wake up klogd.
Thanks for the explanation. I would never have thought reporting bugs was that instructive :)
This should be fixed in 2.6.15, right? If not, please reopen.