Bug 8595

Summary: Problem with PCI resource assignment and device discovery with some buggy BIOSes
Product: Drivers Reporter: Martin Drab (martin.drab)
Component: PCIAssignee: Alan (alan)
Status: REJECTED INVALID    
Severity: normal    
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.21 Subsystem:
Regression: --- Bisected commit-id:
Bug Depends on:    
Bug Blocks: 7883    
Attachments: Output of the dmidecode for the ASUS A8Js
dmesg output on kernel with the quick and dirty patch
dmesg output on kernel without the quick and dirty patch
lspci -vvxxx on kernel with the quick and dirty patch
lspci -vvxxx on kernel without the quick and dirty patch

Description Martin Drab 2007-06-06 15:46:09 UTC
Most recent kernel where this bug did *NOT* occur:
Distribution:
Hardware Environment:
Software Environment:
Problem Description:

There is a problem with ASUS A8Js notebook, which has a buggy BIOS that
allocates resources for all the PCI devices, but forgets to set these resources
on the ICH7 PCI-Express Port 4 bridge (PCI device 0000:00:1c.3), which causes
this bridge to be inaccessible during the detection of the PCI devices on kernel
booting. And consequently that results in a failure to detect all the devices
that are attached to that bridge. Later in the boot process kernel eventually
assigns some resources to the bridge, but it is far too late for the devices
beyond that bridge, so they will not be discovered. (I guess that this was meant
 and is probably fine for non-bridge PCI devices, but it is not enough in this
case.)

This bug is very closely related to bug #7883, where you can find a lot of
attachments with information and an ugly hack to get around it, so I would not
attach the same information to this bug (unless you insist ;). 

The BIOS apparently allocates the resources, since there are evident gaps in the
allocation of the resources for other devices and these gaps exactly fit into
what was supposed to be assigned to that bridge. Based on the fact, that the
BIOS assigns the resources always to the same addresses, I've created a patch
that takes these addresses and hardwires them into the bridge before the
beginning of the PCI devices detection. However this solution is highly
dependent on the BIOS version, since each version assigns different addresses.
It is the two quick and dirty patches you can find attached to the bug #7883,
one is for BIOS version 210, the other is for BIOS version 211. ASUS promised me
4 months ago that they will fix the problem in the next BIOS release, but
nothing happened so far and I doubt if something ever will.

I think it would be nice to solve this problem in a much cleaner and general way
than hardwiring manually observed addresses dependent on BIOS version. A much
more universal solution (that would also work for other hardware with other
possibly buggy BIOSes) would be to have kernel check all the top-level PCI
devices for unassigned PCI resources _BEFORE_ the entire process of regularly
discovering the PCI devices begins (sort of a prepass) and if it finds any
unassigned resources it would allocate new ones and assign them to those devices
(again) _BEFORE_ the regular PCI device discovery routines take place.

This idea can be extended even further. This whole check for unassigned PCI
resources can be done before discovering devices under each PCI bridge for those
devices directly beyond the particular PCI bridge. So, this would be done for
every PCI bridge on each level before discovering devices beyond it. That would
also effectively replace the routines in kernel that currently try to do it in
the later stage of booting.

At first I thought I would try to implement it, but to tell the truth I got a
bit lost in all the PCI boot-up and discovery routines, so I'd rather leave this
to someone who knows the routines better than I do. But I can help by testing on
this peace of hardware (together with the buggy BIOS) that I have available.

Steps to reproduce:
Comment 1 Martin Drab 2007-06-06 16:05:14 UTC
Created attachment 11695 [details]
Output of the dmidecode for the ASUS A8Js
Comment 2 Martin Drab 2007-06-06 16:13:53 UTC
Created attachment 11696 [details]
dmesg output on kernel with the quick and dirty patch
Comment 3 Martin Drab 2007-06-06 16:14:48 UTC
Created attachment 11697 [details]
dmesg output on kernel without the quick and dirty patch
Comment 4 Martin Drab 2007-06-06 16:15:56 UTC
Created attachment 11698 [details]
lspci -vvxxx on kernel with the quick and dirty patch
Comment 5 Martin Drab 2007-06-06 16:22:12 UTC
Created attachment 11699 [details]
lspci -vvxxx on kernel without the quick and dirty patch

Hmm, this is odd. Now it seems that the 04:00.0 device beyond the bridge
actually does get discovered even without the patch. Strange. Why would I be
doing and using those quick and dirty patches if everything would have worked?
Is it possible I've just been under the wrong impression that it was not
working? I'd better get some disk to attach to that SATA controller and test it
all again thoroughly before anyone seriously tries to do something about this
bug. Hold on, I'll be back. ;)
Comment 6 Martin Drab 2007-06-07 13:16:50 UTC
OK, I did some testing and it seems that I was really wrong. It works even
without the patches with the current PCI code. So I'm closing this bug as
invalid. Sorry for bothering. (Though it doesn't change the fact that the ASUS
BIOS is buggy. If anyone of ASUS ever reads this, please fix it anyway.)