Kernel Bug Tracker – Bug 5550
AMD K8 host bridge not detected
Last modified: 2006-01-04 14:41:36 UTC
Most recent kernel where this bug did not occur: 22.214.171.124
Hardware Environment: HP NX6125 laptop (AMD Mobile Sempron with ATI Radeon
XPress 200M chipset)
PCI registers for K8 host bridge are not listed in lspci, nor in sysfs.
Under 126.96.36.199, they were listed as:
0000:00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron]
HyperTransport Technology Configuration
0000:00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron]
0000:00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron]
0000:00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron]
Steps to reproduce:
Created attachment 6467 [details]
dmesg output under 2.6.14
Created attachment 6468 [details]
lspci output under 2.6.14
Created attachment 6469 [details]
lspci output under 188.8.131.52
Created attachment 6470 [details]
dmesg output under 184.108.40.206
Created attachment 6471 [details]
kernel .config file (2.6.14)
Andi, any idea what could cause this?
It's related to the new resource allocation code from Linus et.al.:
PCI: Cannot allocate resource region 7 of bridge 0000:00:04.0
PCI: Cannot allocate resource region 8 of bridge 0000:00:04.0
PCI: Cannot allocate resource region 9 of bridge 0000:00:04.0
PCI: Cannot allocate resource region 7 of bridge 0000:00:05.0
PCI: Cannot allocate resource region 8 of bridge 0000:00:05.0
PCI: Cannot allocate resource region 9 of bridge 0000:00:05.0
But it works on other K8 systems, so must be a bad BIOS interaction.
The error message doesn't print out where the region actually is
(that is probably a bug and needs to be fixed), so it is hard to tell
what is actually wrong. It's a bit screwy that it completes hides the bridge
then - that should be probably changed too.
Looking at the code it should only hide the resource actually, but maybe
something down in lspci ignores bridges with no resource
Unfortunately there isn't a command line option to go back to the old behaviour
to test the theory.
Please excuse me if these are irrelevant thoughts, but:
- K8 bridge is on 0000:00:18.x, but error messages are about 0000:00:04.0 and
0000:00:05.0. Can they be related?
- Both 2.6.14 and 2.6.13 (on which the problem doesn't occur) complain about
resource regions 7,8,9 of 0000:00:04.0 and 0000:00:05.0.
- When diffing both dmesg outputs, the only seemingly relevent differences are
"PCI: Using MMCONFIG" instead of "PCI: Using configuration type 1" and different
addresses for some devices.
I think you're right and I got confused.
The integrated host bridge cannot be accessed using mmconfig. There is a
special table that is supposed to tell which busses are accessible.
In 2.6.13 we had a workaround of just disabling mmconfig always on AMD
platforms, but 2.6.14 is supposed to read that table and figure out
which busses can use mmconfig and which one not.
So either that code in Linux is broken or your BIOS reports it incorrectly.
Can you supply acpidmp output?
Created attachment 6543 [details]
I don't see anything wrong with that MCFG entry, do you Andi?
I think it
So, this host bridge is definitely not accessible through mmconfig.
However, the only MCFG entry claims that all busses (0x00 to 0xff) are mapped
from 0xe0000000 to 0xefffffff. This seems wrong, since not all devices in bus 0
can be accessed this way.
So I tried to change the accessible busses range to (0x01..0xff) by hacking the
MCFG loading code, but this didn't solve the problem.
Since bus 0 is no longer referenced in MCFG, Linux is supposed to fall back to
the old port-based method to access this bus, right?
But I could not find this fallback code.
get_base_addr in mmconfig.c walks through the MCFG looking for the right bus,
but when the search fails, it assumes the bus belongs to the first (and usually
one and only) MCFG entry?
I'm quite confused, maybe I got something wrong...
Yes exactly this is the problem. First the MCFG is wrong on these
boxes and then the mmconfig code is missing a fallback
(among some other problems)
I have a patch, but needs a bit more testing before I can push it out.
Created attachment 6756 [details]
implement mmconfig fallback
These two patches together should fix it.
Created attachment 6757 [details]
Detect unreachable busses and handle them
Please report if these patches work for you guys. Thanks.
Thanks for the patch.
Some changes were needed to make it work for i386 (locking...)
Works fine for me, at least... All K8 devices/functions are correctly detected,
and I can read and write the PCI registers of the K8 memory controller.
Created attachment 6760 [details]
Fixes for i386
To be applied with the two other patches.
Probably needs checking and testing...
Created attachment 6766 [details]
Revised fix for both i386 and x86-64
- Calling a printk while holding a spinlock probably wasn't a good idea...
- In both i386 and x86-64 versions, the code seemed to be using a pointer to an
array instead of the array itself (I wonder how it did work?...)
I don't think your locking changes are needed because the PCI initialization
always runs single threaded.
Calling printk with spinlock is perfectly legal. What makes you think
I fixed the remaining problems on i386 and sent it off to Linus.
Can't close the bug - Greg please close.
I tried Andi
> Calling printk with spinlock is perfectly legal. What makes you
> think otherwise?
I don't really know the internals of printk, so in doubt I thought it was better
not to take the risk of a potential deadlock... (anyway, shouldn't there be as
little code as possible in any critical section?)
But since there is no need for locking at all, it's all right :)
printk prevents reentrance of itself (or rather of the parts of itself
that might require locking which is the console output etc.), which eliminates
most possible deadlocks.
The only spinlocks where are forbidden while calling printk are the scheduler
locks because printk may need to take them to wake up klogd.
Thanks for the explanation.
I would never have thought reporting bugs was that instructive :)
This should be fixed in 2.6.15, right?
If not, please reopen.