Bug 15960
Description
Clemens Ladisch
2010-05-11 15:09:38 UTC
(switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). On Tue, 11 May 2010 15:09:41 GMT bugzilla-daemon@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=15960 > > Summary: I/O port range not assigned, BIOS allocation gets lost > Product: Drivers > Version: 2.5 > Kernel Version: 2.6.34-rc1 > Platform: All > OS/Version: Linux > Tree: Mainline > Status: NEW > Severity: high > Priority: P1 > Component: PCI > AssignedTo: drivers_pci@kernel-bugs.osdl.org > ReportedBy: clemens@ladisch.de > CC: yinghai@kernel.org > Regression: Yes > > > (bug report originally from > <https://bugtrack.alsa-project.org/alsa-bug/view.php?id=4982>:) > > After upgrading kernel from 2.6.32 to 2.6.34-rc4 my Xonar DX stopped working. > Kernel log has: > > [kernel] AV200 0000:09:04.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17 > [kernel] invalid PCI I/O range > [kernel] AV200 0000:09:04.0: PCI INT A disabled > [kernel] ALSA device list: > [kernel] No soundcards found. > > According to git bisect the culprit is (from 2.6.34-rc1): > > commit 977d17bb1749517b353874ccdc9b85abc7a58c2a > Author: Yinghai Lu <yinghai@kernel.org> > Date: Fri Jan 22 01:02:24 2010 -0800 > PCI: update bridge resources to get more big ranges in PCI assign unssigned > > > lkml thread ("[regression, bisected] Xonar DX invalid PCI I/O range since > 977d17bb174"): > http://lkml.org/lkml/2010/4/19/20 > Guys? It's getting awfully close to 2.6.34 - should we just revert it? Clemens, have you confirmed that reverting 977d17bb1749517b353874ccdc9b85abc7a58c2a from current mainline fixes things? First-Bad-Commit : 977d17bb1749517b353874ccdc9b85abc7a58c2a On Wed, 12 May 2010 14:49:07 -0700 Andrew Morton <akpm@linux-foundation.org> wrote: > > (switched to email. Please respond via emailed reply-to-all, not via the > bugzilla web interface). > > On Tue, 11 May 2010 15:09:41 GMT > bugzilla-daemon@bugzilla.kernel.org wrote: > > > https://bugzilla.kernel.org/show_bug.cgi?id=15960 > > > > Summary: I/O port range not assigned, BIOS allocation gets lost > > Product: Drivers > > Version: 2.5 > > Kernel Version: 2.6.34-rc1 > > Platform: All > > OS/Version: Linux > > Tree: Mainline > > Status: NEW > > Severity: high > > Priority: P1 > > Component: PCI > > AssignedTo: drivers_pci@kernel-bugs.osdl.org > > ReportedBy: clemens@ladisch.de > > CC: yinghai@kernel.org > > Regression: Yes > > > > > > (bug report originally from > > <https://bugtrack.alsa-project.org/alsa-bug/view.php?id=4982>:) > > > > After upgrading kernel from 2.6.32 to 2.6.34-rc4 my Xonar DX stopped > working. > > Kernel log has: > > > > [kernel] AV200 0000:09:04.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17 > > [kernel] invalid PCI I/O range > > [kernel] AV200 0000:09:04.0: PCI INT A disabled > > [kernel] ALSA device list: > > [kernel] No soundcards found. > > > > According to git bisect the culprit is (from 2.6.34-rc1): > > > > commit 977d17bb1749517b353874ccdc9b85abc7a58c2a > > Author: Yinghai Lu <yinghai@kernel.org> > > Date: Fri Jan 22 01:02:24 2010 -0800 > > PCI: update bridge resources to get more big ranges in PCI assign unssigned > > > > > > lkml thread ("[regression, bisected] Xonar DX invalid PCI I/O range since > > 977d17bb174"): > > http://lkml.org/lkml/2010/4/19/20 > > > > Guys? It's getting awfully close to 2.6.34 - should we just revert it? > > Clemens, have you confirmed that reverting > 977d17bb1749517b353874ccdc9b85abc7a58c2a from current mainline fixes > things? I haven't heard back from Yinghai on a related bug, so reverting is definitely a possibility. But we do have a patch to work around the issue as well; it needs to be fixed though since "pci=try=" doesn't make much sense ("pci=realloc=" or similar would be better). Either way though it's risky; I'm afraid of breaking a now working setup by reverting the patch or changing the default. Yinghai, maybe there's an easy way to avoid reassigning everything if the result isn't much better (i.e. devices are still missing BARs)? That might avoid situations like this... On Wed, 12 May 2010 14:49:07 -0700 Andrew Morton <akpm@linux-foundation.org> wrote: > > (switched to email. Please respond via emailed reply-to-all, not via the > bugzilla web interface). > > On Tue, 11 May 2010 15:09:41 GMT > bugzilla-daemon@bugzilla.kernel.org wrote: > > > https://bugzilla.kernel.org/show_bug.cgi?id=15960 > > > > Summary: I/O port range not assigned, BIOS allocation gets lost > > Product: Drivers > > Version: 2.5 > > Kernel Version: 2.6.34-rc1 > > Platform: All > > OS/Version: Linux > > Tree: Mainline > > Status: NEW > > Severity: high > > Priority: P1 > > Component: PCI > > AssignedTo: drivers_pci@kernel-bugs.osdl.org > > ReportedBy: clemens@ladisch.de > > CC: yinghai@kernel.org > > Regression: Yes > > > > > > (bug report originally from > > <https://bugtrack.alsa-project.org/alsa-bug/view.php?id=4982>:) > > > > After upgrading kernel from 2.6.32 to 2.6.34-rc4 my Xonar DX stopped > working. > > Kernel log has: > > > > [kernel] AV200 0000:09:04.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17 > > [kernel] invalid PCI I/O range > > [kernel] AV200 0000:09:04.0: PCI INT A disabled > > [kernel] ALSA device list: > > [kernel] No soundcards found. > > > > According to git bisect the culprit is (from 2.6.34-rc1): > > > > commit 977d17bb1749517b353874ccdc9b85abc7a58c2a > > Author: Yinghai Lu <yinghai@kernel.org> > > Date: Fri Jan 22 01:02:24 2010 -0800 > > PCI: update bridge resources to get more big ranges in PCI assign unssigned > > > > > > lkml thread ("[regression, bisected] Xonar DX invalid PCI I/O range since > > 977d17bb174"): > > http://lkml.org/lkml/2010/4/19/20 > > > > Guys? It's getting awfully close to 2.6.34 - should we just revert it? > > Clemens, have you confirmed that reverting > 977d17bb1749517b353874ccdc9b85abc7a58c2a from current mainline fixes > things? I haven't heard back from Yinghai on a related bug, so reverting is definitely a possibility. But we do have a patch to work around the issue as well; it needs to be fixed though since "pci=try=" doesn't make much sense ("pci=realloc=" or similar would be better). Either way though it's risky; I'm afraid of breaking a now working setup by reverting the patch or changing the default. Yinghai, maybe there's an easy way to avoid reassigning everything if the result isn't much better (i.e. devices are still missing BARs)? That might avoid situations like this... Reply-To: yinghai.lu@oracle.com On 05/12/2010 03:05 PM, Jesse Barnes wrote: > On Wed, 12 May 2010 14:49:07 -0700 > Andrew Morton <akpm@linux-foundation.org> wrote: > >> >> (switched to email. Please respond via emailed reply-to-all, not via the >> bugzilla web interface). >> >> On Tue, 11 May 2010 15:09:41 GMT >> bugzilla-daemon@bugzilla.kernel.org wrote: >> >>> https://bugzilla.kernel.org/show_bug.cgi?id=15960 >>> >>> Summary: I/O port range not assigned, BIOS allocation gets lost >>> Product: Drivers >>> Version: 2.5 >>> Kernel Version: 2.6.34-rc1 >>> Platform: All >>> OS/Version: Linux >>> Tree: Mainline >>> Status: NEW >>> Severity: high >>> Priority: P1 >>> Component: PCI >>> AssignedTo: drivers_pci@kernel-bugs.osdl.org >>> ReportedBy: clemens@ladisch.de >>> CC: yinghai@kernel.org >>> Regression: Yes >>> >>> >>> (bug report originally from >>> <https://bugtrack.alsa-project.org/alsa-bug/view.php?id=4982>:) >>> >>> After upgrading kernel from 2.6.32 to 2.6.34-rc4 my Xonar DX stopped >>> working. >>> Kernel log has: >>> >>> [kernel] AV200 0000:09:04.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17 >>> [kernel] invalid PCI I/O range >>> [kernel] AV200 0000:09:04.0: PCI INT A disabled >>> [kernel] ALSA device list: >>> [kernel] No soundcards found. >>> >>> According to git bisect the culprit is (from 2.6.34-rc1): >>> >>> commit 977d17bb1749517b353874ccdc9b85abc7a58c2a >>> Author: Yinghai Lu <yinghai@kernel.org> >>> Date: Fri Jan 22 01:02:24 2010 -0800 >>> PCI: update bridge resources to get more big ranges in PCI assign unssigned >>> >>> >>> lkml thread ("[regression, bisected] Xonar DX invalid PCI I/O range since >>> 977d17bb174"): >>> http://lkml.org/lkml/2010/4/19/20 >>> >> >> Guys? It's getting awfully close to 2.6.34 - should we just revert it? >> >> Clemens, have you confirmed that reverting >> 977d17bb1749517b353874ccdc9b85abc7a58c2a from current mainline fixes >> things? > > I haven't heard back from Yinghai on a related bug, so reverting is > definitely a possibility. But we do have a patch to work around the > issue as well; it needs to be fixed though since "pci=try=" doesn't > make much sense ("pci=realloc=" or similar would be better). Either > way though it's risky; I'm afraid of breaking a now working setup by > reverting the patch or changing the default. > > Yinghai, maybe there's an easy way to avoid reassigning everything if > the result isn't much better (i.e. devices are still missing BARs)? > That might avoid situations like this... > pci=realloc looks like clear all bridge bar, and assign new one. YH On Wed, 12 May 2010, Jesse Barnes wrote: > > I haven't heard back from Yinghai on a related bug, so reverting is > definitely a possibility. Let's just revert the thing, and never bring it back. It was a mistake, just admit it. No sane setup has this problem, and the whole "let's reassign everything" is so fragile that it should _never_ have been done by default. We already had an issue with the "reassign everything" code deciding to do it just because a ROM resource didn't fit. We fixed that one, iirc, but it shows how fragile the whole thing was. So my suggestion: - revert it (but I agree with Andrew that we should get confirmation that reverting it makes things work) - REMEMBER that this doesn't work, and NEVER EVER do it again - if somebody wants to reassign all the PCI resources, make that be a special thing that requires actual user interaction. > I'm afraid of breaking a now working setup by reverting the patch or > changing the default. What "now workign setup?" It didn't work in 2.6.33 either. The rule about regressions is that IT DOES NOT MATTER HOW MANY THINGS YOU FIX! If it breaks something that used to work, it's buggy, and needs to be reverted. It's that simple. Something that got fixed by that patch simply DOES NOT MATTER. It's not an argument for not reverting. The only thing that matters is that it broke something, and thus needs to be reverted or fixed, and looking at the code, we can be pretty sure that 'fixed' is likely not even an option. Linus On Wed, 2010-05-12 at 16:20 -0700, Jesse Barnes wrote:
> Ok, that's fine. Let's pull it once we get confirmation that the
> revert works.
>
> Clemens or Peter, can you make sure reverting
> 977d17bb1749517b353874ccdc9b85abc7a58c2a gets your sound device back?
>
> Thanks,
Yup, reverting from current mainline fixes things for me.
Cheers,
Peter
On Thu, 13 May 2010, Peter Henriksson wrote:
>
> Yup, reverting from current mainline fixes things for me.
Ok. Jesse, do you want to do it and push it to me (or put another way: "Do
you have other things pending?") or should I just do the revert?
Linus
On Wed, 12 May 2010 15:49:45 -0700 (PDT) Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Wed, 12 May 2010, Jesse Barnes wrote: > > > > I haven't heard back from Yinghai on a related bug, so reverting is > > definitely a possibility. > > Let's just revert the thing, and never bring it back. It was a mistake, > just admit it. No sane setup has this problem, and the whole "let's > reassign everything" is so fragile that it should _never_ have been done > by default. > > We already had an issue with the "reassign everything" code deciding to do > it just because a ROM resource didn't fit. We fixed that one, iirc, but it > shows how fragile the whole thing was. Right. > > So my suggestion: > > - revert it (but I agree with Andrew that we should get confirmation that > reverting it makes things work) > > - REMEMBER that this doesn't work, and NEVER EVER do it again > > - if somebody wants to reassign all the PCI resources, make that be a > special thing that requires actual user interaction. Reassigning everything was a bad idea, but I wouldn't go so far as to say "never reassign" either. In some cases it's unavoidable whether due to firmware bugs or just more devices in the system than the firmware could handle. But we obviously need a more focused approach to handle those situations; doing it all at once like this wreaks too much havoc. > > I'm afraid of breaking a now working setup by reverting the patch or > > changing the default. > > What "now workign setup?" It didn't work in 2.6.33 either. The rule about > regressions is that IT DOES NOT MATTER HOW MANY THINGS YOU FIX! If it > breaks something that used to work, it's buggy, and needs to be reverted. > > It's that simple. Something that got fixed by that patch simply DOES NOT > MATTER. It's not an argument for not reverting. The only thing that > matters is that it broke something, and thus needs to be reverted or > fixed, and looking at the code, we can be pretty sure that 'fixed' is > likely not even an option. Ok, that's fine. Let's pull it once we get confirmation that the revert works. Clemens or Peter, can you make sure reverting 977d17bb1749517b353874ccdc9b85abc7a58c2a gets your sound device back? Thanks, On Wed, 12 May 2010 15:49:45 -0700 (PDT) Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Wed, 12 May 2010, Jesse Barnes wrote: > > > > I haven't heard back from Yinghai on a related bug, so reverting is > > definitely a possibility. > > Let's just revert the thing, and never bring it back. It was a mistake, > just admit it. No sane setup has this problem, and the whole "let's > reassign everything" is so fragile that it should _never_ have been done > by default. > > We already had an issue with the "reassign everything" code deciding to do > it just because a ROM resource didn't fit. We fixed that one, iirc, but it > shows how fragile the whole thing was. Right. > > So my suggestion: > > - revert it (but I agree with Andrew that we should get confirmation that > reverting it makes things work) > > - REMEMBER that this doesn't work, and NEVER EVER do it again > > - if somebody wants to reassign all the PCI resources, make that be a > special thing that requires actual user interaction. Reassigning everything was a bad idea, but I wouldn't go so far as to say "never reassign" either. In some cases it's unavoidable whether due to firmware bugs or just more devices in the system than the firmware could handle. But we obviously need a more focused approach to handle those situations; doing it all at once like this wreaks too much havoc. > > I'm afraid of breaking a now working setup by reverting the patch or > > changing the default. > > What "now workign setup?" It didn't work in 2.6.33 either. The rule about > regressions is that IT DOES NOT MATTER HOW MANY THINGS YOU FIX! If it > breaks something that used to work, it's buggy, and needs to be reverted. > > It's that simple. Something that got fixed by that patch simply DOES NOT > MATTER. It's not an argument for not reverting. The only thing that > matters is that it broke something, and thus needs to be reverted or > fixed, and looking at the code, we can be pretty sure that 'fixed' is > likely not even an option. Ok, that's fine. Let's pull it once we get confirmation that the revert works. Clemens or Peter, can you make sure reverting 977d17bb1749517b353874ccdc9b85abc7a58c2a gets your sound device back? Thanks, On Wed, 12 May 2010 17:06:47 -0700 (PDT) Linus Torvalds <torvalds@linux-foundation.org> wrote: > > > On Thu, 13 May 2010, Peter Henriksson wrote: > > > > Yup, reverting from current mainline fixes things for me. > > Ok. Jesse, do you want to do it and push it to me (or put another way: "Do > you have other things pending?") or should I just do the revert? Nope, nothing else pending atm, please just revert. Thanks, On Wed, 12 May 2010 17:06:47 -0700 (PDT) Linus Torvalds <torvalds@linux-foundation.org> wrote: > > > On Thu, 13 May 2010, Peter Henriksson wrote: > > > > Yup, reverting from current mainline fixes things for me. > > Ok. Jesse, do you want to do it and push it to me (or put another way: "Do > you have other things pending?") or should I just do the revert? Nope, nothing else pending atm, please just revert. Thanks, Fixed by commit 769d9968e42c995eaaf61ac5583d998f32e0769a . I have two SRIOV adapters, mellanox Connectx2 and ServerEngine be3, both fail MMIO allocation for its SRIOV BARs. A git bisect leads to this reverted patch :( . Offcourse this is on a platform where the BIOS is unaware of SRIOV BARs. Hence the OS is forced to assign the resources. Any chance this patch can be brought back again, and the specific issue with Xonar DX be debugged and fixed. I am willing to volunteer. Thanks, RP On 06/23/2010 09:39 AM, bugzilla-daemon@bugzilla.kernel.org wrote: > --- Comment #14 from Ram Pai <linuxram@us.ibm.com> 2010-06-23 16:39:06 --- > I have two SRIOV adapters, mellanox Connectx2 and ServerEngine be3, both fail > MMIO allocation for its SRIOV BARs. A git bisect leads to this reverted patch > :( . Offcourse this is on a platform where the BIOS is unaware of SRIOV > BARs. > Hence the OS is forced to assign the resources. > > Any chance this patch can be brought back again, and the specific issue with > Xonar DX be debugged and fixed. I am willing to volunteer. > three options 1. don't do io port second try 2. use pci=try=2 3. restore bridge resource if mutil trying get failed. Yinghai Peter, Do you still have your box with the Xonas sound card? If so can I get the output of your dmesg on the latest mainline kernel. I have a inkling that your kernel, even now, prints messages related to memory resource assignment failures. Fortunately it does not effect your sound card. RP Created attachment 34162 [details]
kernel log with mainline
Ram,
Kernel log with current mainline.
Thanks Peter, As suspected, I am seeing the following messages in your log. Oct 19 20:43:13 [kernel] pci 0000:04:00.0: BAR 8: can't assign mem (size 0xc00000) Oct 19 20:43:13 [kernel] pci 0000:05:01.0: BAR 8: can't assign mem (size 0x200000) Oct 19 20:43:13 [kernel] pci 0000:05:01.0: BAR 9: can't assign mem pref (size 0x200000) Oct 19 20:43:13 [kernel] pci 0000:05:02.0: BAR 8: can't assign mem (size 0x400000) Oct 19 20:43:13 [kernel] pci 0000:05:03.0: BAR 8: can't assign mem (size 0x400000) Oct 19 20:43:13 [kernel] pci 0000:05:02.0: BAR 7: can't assign io (size 0x1000) Oct 19 20:43:13 [kernel] pci 0000:05:03.0: BAR 7: can't assign io (size 0x1000) Ok. let me post a proposed test patch later today and see if that cures these messages. Created attachment 34192 [details]
patch with a proposed fix, and some debug messages in case the fix fails
Peter,
Attached is a patch that might fix the reason behind the failure
messages on your setup. Can you run this patch against the latest mainline kernel along with the debug command line option. If the problem is fixed you should not see the error messages as well as your xonar device should function properly. In any case please send me the output of your dmesg. I have added some debug messages to the patch that might provide me some insight in case the problem persists.
thanks a bunch,
RP
Created attachment 34242 [details]
log with the pci_xonar_debug patch
Attaching log with the patch applied. The Xonar is still working.
Regards,
Peter
Created attachment 34272 [details] patch to identify if the issue is hotplug related. Ok. I still see same error messages, that are captured in comment #18. On further analysis, it certainly looks like a issue where the BIOS has not allocated enough space for hotplug brigdes. The OS attempts to allocate some minimal space, 4k i/o and 2M mem window, to each of those hotplug bridges and fails. However to prove the above theory, can you try the attached patch. It goes on top of the earlier patch. If possible run the kernel with debug option and provide me the dmesg output. Thanks, I hope this will be last iteration towards identifying the root cause of the problem. I get a kernel oops with that patch on top. Perhaps the below warnings are related? I'm not a programmer so I don't know what to make of it. CC drivers/pci/setup-bus.o drivers/pci/setup-bus.c: In function ‘pbus_size_io’: drivers/pci/setup-bus.c:417: warning: format ‘%x’ expects type ‘unsigned int’, but argument 4 has type ‘resource_size_t’ drivers/pci/setup-bus.c:437: warning: passing argument 2 of ‘dev_printk’ from incompatible pointer type include/linux/device.h:646: note: expected ‘const struct device *’ but argument is of type ‘struct pci_dev *’ drivers/pci/setup-bus.c:437: warning: too few arguments for format drivers/pci/setup-bus.c: In function ‘pbus_size_mem’: drivers/pci/setup-bus.c:491: warning: format ‘%x’ expects type ‘unsigned int’, but argument 4 has type ‘resource_size_t’ drivers/pci/setup-bus.c:518: warning: passing argument 2 of ‘dev_printk’ from incompatible pointer type include/linux/device.h:646: note: expected ‘const struct device *’ but argument is of type ‘struct pci_dev *’ drivers/pci/setup-bus.c:518: warning: too few arguments for format Created attachment 34292 [details]
patch to identify if the issue is hotplug related.
Sorry Peter. Fixed all those compiler warning messages. Try this new patch, instead of the last one. thankx.
RP
Created attachment 34312 [details]
Updated log
Thanks. Works better now. Log attached.
Ok. As suspected, those are hotplug bridges to which the BIOS has not allocated any memory resource or io ports since there are no devices behind them. However the OS tries to allocate the minimum amount. Unfortunately since there are not enough resources available to satisfy all the hot-plug bridges, the allocation fails. Jesse: How is this supposed to work? Should we ignore pre-allocation to hotplug bridges if there are not enough resources available? Yes, I think we should. Bridges with nothing behind them at boot should have the lowest priority when it comes to allocating resources. Ok. Jesse, the plan of action is to fix this bug, and bring back Yinghai's patch? If yes. I will provide the patches. But this will be viewed as yet another band-aid by Linus, for sure. Linus are you listening? Anyway I will continue the discussion on lkml. RP On 10/21/2010 10:44 AM, bugzilla-daemon@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=15960 > > > > > > --- Comment #27 from Ram Pai <linuxram@us.ibm.com> 2010-10-21 17:44:26 --- > Ok. Jesse, the plan of action is to fix this bug, and bring back Yinghai's > patch? > i have updated version local. [PATCH] pci: update bridge resources to get more big ranges in PCI assign unssigned BIOS separates IO ranges between several IOHs, and on some slots, BIOS assigns resources to a bridge, but stops assigning resources to the device under that bridge, because the device needs a big resource. So: 1. allocate resources and record the failed device resources 2. clear the BIOS assigned resources of the parent bridge of failing device 3. go back and call pci assign unassigned 4. if it still fails, go up the tree, clear more bridges. and try again Use pci=realloc=mask to control it with io or mmio. 1: io 2: mmio 3: io and mmio Default is 0: disable all of reallocation Signed-off-by: Yinghai Lu <yinghai@kernel.org> --- Documentation/kernel-parameters.txt | 4 + drivers/pci/pci.c | 7 ++ drivers/pci/pci.h | 2 drivers/pci/setup-bus.c | 115 +++++++++++++++++++++++++++++++++++- 4 files changed, 125 insertions(+), 3 deletions(-) Index: linux-2.6/drivers/pci/setup-bus.c =================================================================== --- linux-2.6.orig/drivers/pci/setup-bus.c +++ linux-2.6/drivers/pci/setup-bus.c @@ -36,6 +36,8 @@ struct resource_list_x { unsigned long flags; }; +unsigned long pci_realloc_mask = IORESOURCE_MEM; + static void add_to_failed_list(struct resource_list_x *head, struct pci_dev *dev, struct resource *res) { @@ -102,7 +104,8 @@ static void __assign_resources_sorted(st res = list->res; idx = res - &list->dev->resource[0]; if (pci_assign_resource(list->dev, idx)) { - if (fail_head && !pci_is_root_bus(list->dev->bus)) + if (fail_head && !pci_is_root_bus(list->dev->bus) && + (res->flags & pci_realloc_mask)) add_to_failed_list(fail_head, list->dev, res); res->start = 0; res->end = 0; @@ -830,11 +833,63 @@ static void pci_bus_dump_resources(struc } } +static int __init pci_bus_get_depth(struct pci_bus *bus) +{ + int depth = 0; + struct pci_dev *dev; + + list_for_each_entry(dev, &bus->devices, bus_list) { + int ret; + struct pci_bus *b = dev->subordinate; + if (!b) + continue; + + ret = pci_bus_get_depth(b); + if (ret + 1 > depth) + depth = ret + 1; + } + + return depth; +} +static int __init pci_get_max_depth(void) +{ + int depth = 0; + struct pci_bus *bus; + + list_for_each_entry(bus, &pci_root_buses, node) { + int ret; + + ret = pci_bus_get_depth(bus); + if (ret > depth) + depth = ret; + } + + return depth; +} + +/* + * first try will not touch pci bridge res + * second and later try will clear small leaf bridge res + */ void __init pci_assign_unassigned_resources(void) { struct pci_bus *bus; + int tried_times = 0; + enum release_type rel_type = leaf_only; + struct resource_list_x head, *list; + unsigned long type_mask = IORESOURCE_IO | IORESOURCE_MEM | + IORESOURCE_PREFETCH; + unsigned long failed_type; + int pci_try_num; + int max_depth = pci_get_max_depth(); + + head.next = NULL; + pci_try_num = max_depth + 1; + printk(KERN_DEBUG "PCI: max bus depth: %d\n", max_depth); + +again: /* Depth first, calculate sizes and alignments of all subordinate buses. */ list_for_each_entry(bus, &pci_root_buses, node) { @@ -842,10 +897,64 @@ pci_assign_unassigned_resources(void) } /* Depth last, allocate resources and update the hardware. */ list_for_each_entry(bus, &pci_root_buses, node) { - pci_bus_assign_resources(bus); - pci_enable_bridges(bus); + __pci_bus_assign_resources(bus, &head); + } + tried_times++; + + /* any device complain? */ + if (!head.next) + goto enable_and_dump; + failed_type = 0; + for (list = head.next; list;) { + failed_type |= list->flags; + list = list->next; + } + + /* if reach the limit, don't want to try more */ + failed_type &= type_mask; + if (tried_times >= pci_try_num) { + free_failed_list(&head); + goto enable_and_dump; } + printk(KERN_DEBUG "PCI: No. %d try to assign unassigned res\n", + tried_times + 1); + + /* third times and later will not check if it is leaf */ + if ((tried_times + 1) > 2) + rel_type = whole_subtree; + + /* + * Try to release leaf bridge's resources that doesn't fit resource of + * child device under that bridge + */ + for (list = head.next; list;) { + bus = list->dev->bus; + pci_bus_release_bridge_resources(bus, list->flags & type_mask, + rel_type); + list = list->next; + } + /* restore size and flags */ + for (list = head.next; list;) { + struct resource *res = list->res; + + res->start = list->start; + res->end = list->end; + res->flags = list->flags; + if (list->dev->subordinate) + res->flags = 0; + + list = list->next; + } + free_failed_list(&head); + + goto again; + +enable_and_dump: + /* Depth last, update the hardware. */ + list_for_each_entry(bus, &pci_root_buses, node) + pci_enable_bridges(bus); + /* dump the resource on buses */ list_for_each_entry(bus, &pci_root_buses, node) { pci_bus_dump_resources(bus); Index: linux-2.6/Documentation/kernel-parameters.txt =================================================================== --- linux-2.6.orig/Documentation/kernel-parameters.txt +++ linux-2.6/Documentation/kernel-parameters.txt @@ -1932,6 +1932,10 @@ and is between 256 and 4096 characters. for broken drivers that don't call it. skip_isa_align [X86] do not align io start addr, so can handle more pci cards + realloc=n set mask for reallocating the pci bridge resource + 1: io port + 2: mem io : default + 3: io port and mem io firmware [ARM] Do not re-enumerate the bus but instead just use the configuration from the bootloader. This is currently used on Index: linux-2.6/drivers/pci/pci.c =================================================================== --- linux-2.6.orig/drivers/pci/pci.c +++ linux-2.6/drivers/pci/pci.c @@ -2997,6 +2997,13 @@ static int __init pci_setup(char *str) pci_no_aer(); } else if (!strcmp(str, "nodomains")) { pci_no_domains(); + } else if (!strncmp(str, "realloc=", 8)) { + int realloc = memparse(str + 8, &str); + if (realloc > 0) { + /* IORESOURCE_IO is bit 8, IORESOURCE_MEM is bit 9*/ + pci_realloc_mask = realloc << 8; + pci_realloc_mask &= IORESOURCE_IO | IORESOURCE_MEM; + } } else if (!strncmp(str, "cbiosize=", 9)) { pci_cardbus_io_size = memparse(str + 9, &str); } else if (!strncmp(str, "cbmemsize=", 10)) { Index: linux-2.6/drivers/pci/pci.h =================================================================== --- linux-2.6.orig/drivers/pci/pci.h +++ linux-2.6/drivers/pci/pci.h @@ -224,6 +224,8 @@ static inline int pci_ari_enabled(struct return bus->self && bus->self->ari_enabled; } +extern unsigned long pci_realloc_mask; + #ifdef CONFIG_PCI_QUIRKS extern int pci_is_reassigndev(struct pci_dev *dev); resource_size_t pci_specified_resource_alignment(struct pci_dev *dev); Created attachment 43502 [details]
patch with a proposed fix, and some debug messages in case the fix fails
Peter,
Can you please try the attached patch and provide the output off dmesg.
This patch has a proposed fix to your issue.
Hope you still have your setup to test this out,
thanks,
RP
Created attachment 43652 [details]
dmesg output with last patch
Latest patch appears to work fine.
Created attachment 43672 [details]
patch with a proposed fix, and some debug messages in case the fix fails
Peter,
thanx for the dmesg. Your testing exposed a small problem with my patch.
Jan 15 00:15:55 darwin kernel: pci 0000:05:02.0: BAR 7: can't assign io (size 0x0)
Jan 15 00:15:55 darwin kernel: pci 0000:05:03.0: BAR 7: can't assign io (size 0x0)
I have fixed the issue. Hopefully this time we wont see any of those failure messages anymore. Once we reach that state, we will have to apply yanghai's patch on top and confirm that your machine does not regress.
For now, I request you to apply my new patch and provide me the dmesg output.
thanks,
RP
Created attachment 43682 [details]
New log
Created attachment 43862 [details]
patch with a proposed fix, and some debug messages in case the fix fails
Peter,
Ok. it all looks good. Now we have to apply Yanghai's patch and check for regression. If your device continues to operate successfully, we are good.
BTW: I have tweaked my patch to take care of one other small issue.
Apply my patch first.
I will attach Yanghai's patch. Apply that one next.
thanks in advance for your help,
RP
Created attachment 43872 [details]
yinghai's patch. Allocate unallocated resources.
Yinghai,
I have modified your patch to just do the allocation retries. The realloc_mask code can be a different patch, and I think we dont need that code right now.
RP
Created attachment 43892 [details]
Log with both patches applied
I should have mentioned. The Xonar DX appears to be working as it should. Thanks a lot! The dmesg log also confirms that all devices got the necessary resources they needed. I think we are good. Unless I get pushed again in one other direction I hope to not trouble you with more testing :) thanks, RP A patch referencing this bug report has been merged in v2.6.38-8876-g036a982: commit c8adf9a3e873eddaaec11ac410a99ef6b9656938 Author: Ram Pai <linuxram@us.ibm.com> Date: Mon Feb 14 17:43:20 2011 -0800 PCI: pre-allocate additional resources to devices only after successful allocation of essential resources. A patch referencing a commit referencing this bug report has been merged in v2.6.39: commit 93d2175d3d31f11ba04fcfa0e9a496a1b4bc8b34 Author: Yinghai Lu <yinghai@kernel.org> Date: Fri May 13 18:06:17 2011 -0700 PCI: Clear bridge resource flags if requested size is 0 A patch referencing a commit referencing this bug report has been merged in Linux v3.1-rc3: commit be768912a49b10b68e96fbd8fa3cab0adfbd3091 Author: Yinghai Lu <yinghai@kernel.org> Date: Mon Jul 25 13:08:38 2011 -0700 PCI: honor child buses add_size in hot plug configuration Clemens, Can you check if our effects breaks your setup? please check: git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git for-pci2 Thanks A patch referencing this bug report has been merged in Linux v3.4-rc1: commit 0c5be0cb0edfe3b5c4b62eac68aa2aa15ec681af Author: Yinghai Lu <yinghai@kernel.org> Date: Thu Feb 23 19:23:29 2012 -0800 PCI: Retry on IORESOURCE_IO type allocations |