Bug 5550

Summary: AMD K8 host bridge not detected
Product: Drivers Reporter: Sylvain Collange (poubelle.scollange)
Component: PCIAssignee: Greg Kroah-Hartman (greg)
Status: RESOLVED CODE_FIX    
Severity: normal CC: akpm, andi-bz, bertro_simul
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.14 Subsystem:
Regression: --- Bisected commit-id:
Bug Depends on:    
Bug Blocks: 5829    
Attachments: dmesg output under 2.6.14
lspci output under 2.6.14
lspci output under 2.6.13.3
dmesg output under 2.6.13.3
kernel .config file (2.6.14)
acpidump output
implement mmconfig fallback
Detect unreachable busses and handle them
Fixes for i386
Revised fix for both i386 and x86-64

Description Sylvain Collange 2005-11-04 04:25:17 UTC
Most recent kernel where this bug did not occur: 2.6.13.3

Distribution: Debian

Hardware Environment: HP NX6125 laptop (AMD Mobile Sempron with ATI Radeon
XPress 200M chipset)

Problem Description:
PCI registers for K8 host bridge are not listed in lspci, nor in sysfs.

Under 2.6.13.3, they were listed as:
0000:00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron]
HyperTransport Technology Configuration
0000:00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron]
Address Map
0000:00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron]
DRAM Controller
0000:00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron]
Miscellaneous Control

Steps to reproduce:
Run lspci.
Comment 1 Sylvain Collange 2005-11-04 04:26:24 UTC
Created attachment 6467 [details]
dmesg output under 2.6.14
Comment 2 Sylvain Collange 2005-11-04 04:27:38 UTC
Created attachment 6468 [details]
lspci output under 2.6.14
Comment 3 Sylvain Collange 2005-11-04 04:28:20 UTC
Created attachment 6469 [details]
lspci output under 2.6.13.3
Comment 4 Sylvain Collange 2005-11-04 04:32:08 UTC
Created attachment 6470 [details]
dmesg output under 2.6.13.3
Comment 5 Sylvain Collange 2005-11-04 04:37:28 UTC
Created attachment 6471 [details]
kernel .config file (2.6.14)
Comment 6 Andrew Morton 2005-11-11 01:19:35 UTC
Andi, any idea what could cause this?
Comment 7 Andi Kleen 2005-11-11 05:34:56 UTC
It's related to the new resource allocation code from Linus et.al.:

PCI: Cannot allocate resource region 7 of bridge 0000:00:04.0
PCI: Cannot allocate resource region 8 of bridge 0000:00:04.0
PCI: Cannot allocate resource region 9 of bridge 0000:00:04.0
PCI: Cannot allocate resource region 7 of bridge 0000:00:05.0
PCI: Cannot allocate resource region 8 of bridge 0000:00:05.0
PCI: Cannot allocate resource region 9 of bridge 0000:00:05.0

But it works on other K8 systems, so must be a bad BIOS interaction.
The error message doesn't print out where the region actually is 
(that is probably a bug and needs to be fixed), so it is hard to tell
what is actually wrong. It's a bit screwy that it completes hides the bridge
then - that should be probably changed too.
Looking at the code it should only hide the resource actually, but maybe
something down in lspci ignores bridges with no resource


Unfortunately there isn't a command line option to go back to the old behaviour
to test the theory.
Comment 8 Sylvain Collange 2005-11-11 09:00:51 UTC
Please excuse me if these are irrelevant thoughts, but:

- K8 bridge is on 0000:00:18.x, but error messages are about 0000:00:04.0 and
0000:00:05.0. Can they be related?

- Both 2.6.14 and 2.6.13 (on which the problem doesn't occur) complain about
resource regions 7,8,9 of 0000:00:04.0 and 0000:00:05.0.

- When diffing both dmesg outputs, the only seemingly relevent differences are
"PCI: Using MMCONFIG" instead of "PCI: Using configuration type 1" and different
addresses for some devices.
Comment 9 Andi Kleen 2005-11-11 09:04:53 UTC
I think you're right and I got confused.

The integrated host bridge cannot be accessed using mmconfig. There is a 
special table that is supposed to tell which busses are accessible.

In 2.6.13 we had a workaround of just disabling mmconfig always on AMD
platforms, but 2.6.14 is supposed to read that table and figure out
which busses can use mmconfig and which one not.

So either that code in Linux is broken or your BIOS reports it incorrectly.

Can you supply acpidmp output?


Comment 10 Sylvain Collange 2005-11-11 09:27:50 UTC
Created attachment 6543 [details]
acpidump output
Comment 11 Greg Kroah-Hartman 2005-11-14 21:54:57 UTC
I don't see anything wrong with that MCFG entry, do you Andi?
Comment 12 Bertro Simul 2005-12-03 03:04:31 UTC
I think it
Comment 13 Sylvain Collange 2005-12-03 06:25:13 UTC
So, this host bridge is definitely not accessible through mmconfig.

However, the only MCFG entry claims that all busses (0x00 to 0xff) are mapped
from 0xe0000000 to 0xefffffff. This seems wrong, since not all devices in bus 0
can be accessed this way.

So I tried to change the accessible busses range to (0x01..0xff) by hacking the
MCFG loading code, but this didn't solve the problem.

Since bus 0 is no longer referenced in MCFG, Linux is supposed to fall back to
the old port-based method to access this bus, right?

But I could not find this fallback code.
get_base_addr in mmconfig.c walks through the MCFG looking for the right bus,
but when the search fails, it assumes the bus belongs to the first (and usually
one and only) MCFG entry?

I'm quite confused, maybe I got something wrong...
Comment 14 Andi Kleen 2005-12-03 09:09:08 UTC
Yes exactly this is the problem. First the MCFG is wrong on these
boxes and then the mmconfig code is missing a fallback
(among some other problems) 

I have a patch, but needs a bit more testing before I can push it out.
Comment 15 Andi Kleen 2005-12-03 20:35:31 UTC
Created attachment 6756 [details]
implement mmconfig fallback

These two patches together should fix it.
Comment 16 Andi Kleen 2005-12-03 20:36:15 UTC
Created attachment 6757 [details]
Detect unreachable busses and handle them
Comment 17 Andi Kleen 2005-12-03 20:36:57 UTC
Please report if these patches work for you guys. Thanks.
Comment 18 Sylvain Collange 2005-12-04 06:21:55 UTC
Thanks for the patch.

Some changes were needed to make it work for i386 (locking...)

Works fine for me, at least... All K8 devices/functions are correctly detected,
and I can read and write the PCI registers of the K8 memory controller.
Comment 19 Sylvain Collange 2005-12-04 06:26:40 UTC
Created attachment 6760 [details]
Fixes for i386

To be applied with the two other patches.
Probably needs checking and testing...
Comment 20 Sylvain Collange 2005-12-04 08:49:01 UTC
Created attachment 6766 [details]
Revised fix for both i386 and x86-64

- Calling a printk while holding a spinlock probably wasn't a good idea...
- In both i386 and x86-64 versions, the code seemed to be using a pointer to an
array instead of the array itself (I wonder how it did work?...)
Comment 21 Andi Kleen 2005-12-04 09:40:59 UTC
I don't think your locking changes are needed because the PCI initialization
always runs single threaded.
Comment 22 Andi Kleen 2005-12-04 10:26:32 UTC
Calling printk with spinlock is perfectly legal. What makes you think
otherwise?

I fixed the remaining problems on i386 and sent it off to Linus.

Can't close the bug - Greg please close.
Comment 23 Bertro Simul 2005-12-04 11:10:45 UTC
I tried Andi
Comment 24 Sylvain Collange 2005-12-04 11:45:37 UTC
> Calling printk with spinlock is perfectly legal. What makes you
> think otherwise?

I don't really know the internals of printk, so in doubt I thought it was better
not to take the risk of a potential deadlock... (anyway, shouldn't there be as
little code as possible in any critical section?)

But since there is no need for locking at all, it's all right :)

Thanks.
Comment 25 Andi Kleen 2005-12-04 12:08:56 UTC
printk prevents reentrance of itself (or rather of the parts of itself
that might require locking which is the console output etc.), which eliminates
most possible deadlocks. 

The only spinlocks where are forbidden while calling printk are the scheduler
locks because printk may need to take them to wake up klogd.
Comment 26 Sylvain Collange 2005-12-04 13:06:25 UTC
Thanks for the explanation.
I would never have thought reporting bugs was that instructive :)
Comment 27 Greg Kroah-Hartman 2006-01-04 14:41:36 UTC
This should be fixed in 2.6.15, right?

If not, please reopen.