Created attachment 26439 [details] OOPS dmesg I am on x86_64 with latest (v2.6.34) kernel. When i set CONFIG_SND_HDA_INTEL=Y It hangs at an early stage in boot with kernel oops. When i use CONFIG_SND_HDA_INTEL=M the machine will boot, and i get the dmesg (below). I have bisected down to one commit that causes the problem: commit 3e3da00c01d050307e753fb7b3e84aefc16da0d0 x86/pci: AMD one chain system to use pci read out res ... dmesg attatched
Created attachment 26444 [details] boot log pci=earlydump boot log with pci=earlydump from a (working/patched) 2.6.34 kernel
> >>>> looks like your system have a very sick BIOS, > >>>> > >>>> system have two HT chains. > >>>> > >>>> PCI: Probing PCI hardware (bus 00) > >>>> PCI: Discovered primary peer bus 80 [IRQ] > >>>> > >>>> rt to non-coherent only set one link: > >>>> node 0 link 0: io port [1000, ffffff] > >>>> TOM: 0000000080000000 aka 2048M > >>>> node 0 link 0: mmio [e0000000, efffffff] > >>>> node 0 link 0: mmio [a0000, bffff] > >>>> node 0 link 0: mmio [80000000, ffffffff] > >>>> bus: [00, ff] on node 0 link 0 > >> ah, that 80:01.0 is standalone device, the system still only have one HT > chain. > >> that is CRAZY that they can sell those poor designed chips. > >> > >> actually 3e3da00c is fixing another bug with one HT chain. > >> > >> We have two options: > >> 1. revert that 3e3da00c > >> 2. or use quirks to black out system with VIA chipset. This is voodoo kernel development, and I don't think we should do it. Can you explain the cause of Graham's oops? All I can see is that we discovered a host bridge window of [mem 0x80000000-0xfcffffffff] to bus 00, we did *not* find a bridge leading to bus 80, we found a device on bus 80 that is inside the window forwarded to bus 00, so we moved that device outside the window: bus: 00 index 1 [mem 0x80000000-0xfcffffffff] pci 0000:80:01.0: reg 10: [mem 0xfebfc000-0xfebfffff 64bit] pci 0000:80:01.0: address space collision: [mem 0xfebfc000-0xfebfffff 64bit] conflicts with PCI Bus #00 [mem 0x80000000-0xfcffffffff] pci 0000:80:01.0: BAR 0: set to [mem 0xfd00000000-0xfd00003fff 64bit] I have no idea why this led to a page fault at ffffc90000078000: BUG: unable to handle kernel paging request at ffffc90000078000 IP: [<ffffffffa0018d11>] azx_probe+0x3a2/0xa6a [snd_hda_intel] It looks to me like amd_bus.c just failed to discover the host bridge to bus 80. If the BIOS can program the chipset to work that way, we should be able to figure that out, too. Graham, I think your "pci=earlydump" log is missing the KERN_DEBUG output. It would be interesting to see that for the patched kernel so we can compare it with 2.6.34. Bjorn
Created attachment 26467 [details] boot log pci=earlydump (with DEBUG messages) boot log with debug messages (v2.6.34 + patch)
One interesting thing here is that this machine supports ACPI, but the kernel is built without ACPI support, which forces us to rely on native host bridge drivers. Maybe this requires some VIA-specific support. I don't have any of the information that would be needed to add that. Graham also tried it with ACPI in the kernel. Plain 2.6.34 oopses the same way as the initial report, but with "pci=use_crs", things work correctly. I asked him to attach the logs and DMI information so we can debug any Linux issues in that area and add a quirk to automatically enable "pci=use_crs" if necessary.
Created attachment 26487 [details] dmesg ACPI (oops) dmesg from boot with ACPI support, (with oops)
Created attachment 26488 [details] dmesg ACPI use_crs (works) dmesg from boot of v2.6.34 (unpatched) with ACPI support and pci=use_crs (sucessful)
Created attachment 26489 [details] dmidecode
Created attachment 26508 [details] acpi quirk use_crs My own humble attempt at a quirk patch for the acpi case - setting use_crs. (works for me)
I think the basic problem is that Yinghai's patch broke your system, and this is a regression between 2.6.33 and 2.6.34. We could use a quirk like yours (which looks fine, BTW) to cover up this regression, but I don't like that approach because other machines are probably affected by the same issue, and we'd have to find and fix them one-by-one. I think it'd be better to figure out the problem with 3e3da00c01d and fix or revert it. I said earlier that I wasn't in favor of just reverting it, and I still don't like that option because it will likely break something. But Yinghai didn't supply any details about the system that 3e3da00c01d fixed, so I don't know how to fix things so both that system and yours work. I assume that 2.6.34 with 3e3da00c01d reverted will work fine even without "pci=use_crs". Can you try that and attach the dmesg log?
Created attachment 26638 [details] Boot log revert 3e3da00c01d (with acpi, without pci=use_crs) This is the log with acpi enabled and 3e3da00c01d reverted, (it also works if acpi is not built into kernel.)
Created attachment 26639 [details] debug patch Thanks, Graham. 2.6.34 fills in the pci_bus 0000:00 resources based on what amd_bus.c read from the chipset, so it has [mem 0x80000000-0xfcffffffff] rather than the default of iomem_resource ([mem 0x00000000-0xffffffffffffffff]). With 3e3da00c01d reverted, we leave the pci_bus 0000:00 resources at the default. I don't know why we see the collision in 2.6.34. It's almost as if we tried to allocate the 80:01.0 [mem 0xfebfc000-0xfebfffff 64bit] region from the bus 0000:00 resource ([mem 0x80000000-0xfcffffffff]), not from the bus 0000:80 resource ([mem 0x00000000-0xffffffffffffffff]), but I don't see how that would happen. When we find the PNP0A08:00 host bridge, we give it the default iomem_resource, so we should be allocating from that. Can you please try 2.6.34 + this debug patch and again attach the dmesg log?
Created attachment 26641 [details] dmesg (with debug patch)
Oh, I see my mistake. The pci_bus 0000:00 resource ends at 0xfc_ffff_ffff, not 0xfcff_ffff, so it *does* conflict with the 80:01.0 resource Skimming through the BIOS and Kernel Developer's Guide for AMD Family 10h Processors, especially sec 3.4, I can see the connection with amd_bus.c. However, this AMD routing configuration is at quite a low level -- it deals with AMD nodes, links, sublinks, etc. What we need for PCI routing is much more abstract -- ranges of PCI bus numbers, I/O ports, and MMIO addresses. I'm not convinced it's safe to simply identify the low-level AMD node/link information with the higher- level PCI topology. In this case, it seems that the PCI bus 80 information is buried somewhere in the VIA chipset, which is downstream from the AMD stuff, so amd_bus.c knows nothing about it. But I don't know that this indicates a *defect* in the VIA chipset; it's just that from the CPU's point of view, the bus range [00, ff] and the MMIO range [mem 0x80000000-0xfcffffffff] are connected and all go the same place. If VIA wants to split those ranges up internally into two separate PCI host bridges, that sounds like a legitimate thing to do. We have no native way to learn about that split, so we're stuck with the incomplete information we get from amd_bus.c. Unfortunately, that is actually incorrect: we think [mem 0x80000000-0xfcffffffff] all goes directly to PCI bus 00, when in fact, part of it is carved out for PCI bus 80, which isn't reachable from bus 00.
Created attachment 26695 [details] lspci -v
[If you haven't been following this bug, the report is at [3].] Here's a theory. I'm not an expert in HyperTransport, so maybe somebody who knows HyperTransport and/or VIA chipsets can validate or refute it. This is based on the _HyperTransport I/O Link Specification_, rev 3.10b [1], and the _BIOS and Kernel Developer's Guide (BKDG) for AMD Family 10h Processors_ [2]. In a nutshell, I think the problem is that amd_bus.c treats a HyperTransport (HT) host bridge as though it were a PCI host bridge. In particular, when an HT chain contains more than one PCI host bridge, the HT host bridge apertures encompass all the PCI host bridges, but amd_bus.c mistakenly assigns all those resources to one PCI host bridge. From a software point of view, HyperTransport is similar but not identical to PCI. It is possible to make native HyperTransport peripheral devices, but PCI devices must be attached via a HyperTransport-to-PCI bridge [1, sec 4.1]. A PCI host bridge has a platform-specific non-PCI connection, e.g., a front-side bus, on the primary (upstream) side and a PCI bus on the secondary (downstream) side. Note that in the HyperTransport spec, "host bridge" refers to the interface from the host, e.g., CPU cores, to a HyperTransport chain. This HyperTransport host bridge has a HyperTransport link on the secondary side, *not* a PCI bus. A HyperTransport-to-PCI bridge is one kind of PCI host bridge, because the primary side is HyperTransport and the secondary side is PCI. Graham's machine contains one HT host bridge leading to an HT chain, and it has PCI devices on buses 00, 02, 03, 06, and 80. In addition, the HT host bridge configuration registers appear at device 18 (hex) in bus 00 configuration space, though they are not actually PCI functions. PCI buses 02, 03, and 06 are reachable from bus 00 via the PCI-to-PCI bridges at 00:03.3, 00:03.2, and 00:02.0, respectively. However, there are no PCI-to-PCI bridges that lead to bus 00 or bus 80, so the HT chain must contain two separate PCI host bridges that lead to them. Now, here's the problem: amd_bus.c reads the HT host bridge configuration and learns that it routes buses 00-ff and the related address space, including the following range, down the HT chain at node 0, link 0: [mem 0x80000000-0xfcffffffff] That makes sense, because both PCI host bridges are on that HT chain, so the HT host bridge has to forward all that address space. The problem is that amd_bus.c assumes there's only one PCI host bridge on the chain, so it assigns *all* that address space to PCI bus 00. This doesn't work because parts of that address space belong to bus 80, not bus 00, and we can't reach bus 80 from PCI bus 00. In particular, we know that at least the following address space is routed to bus 80, because the 80:01.0 device does work at this address, which is in the middle of the range we found above: [mem 0xfebfc000-0xfebfffff] (Note that we can reach bus 80 from the HT chain, but the HT chain is outside the PCI domain, even though some of the HT registers appear in PCI bus 00 config space. We need a second PCI host bridge from the HT chain to PCI bus 80.) The HT spec does suggest that an HT/PCI host bridge should implement a HyperTransport Bridge Header [1, sec 7.4]. This header would make the HT/PCI host bridge look just like a PCI-to-PCI bridge, with the usual primary/secondary/subordinate bus numbers, memory, prefetchable memory, and I/O port apertures, etc. If all the HT/PCI host bridges on a chain were implemented this way, I think it probably would work to pretend the HT host bridge is a PCI host bridge. But this sort of implementation is apparently not universal. The VIA chipset in Graham's machine doesn't do it that way, and the Serverworks HT-2100 chipset in the HP DL785 doesn't either. [1] http://www.hypertransport.org/docs/twgdocs/HTC20051222-0046-0033_changes.pdf [2] http://support.amd.com/us/Embedded_TechDocs/31116-Public-GH-BKDG_3-28_5-28-09.pdf [3] https://bugzilla.kernel.org/show_bug.cgi?id=16007
On Fri, Jun 11, 2010 at 2:49 PM, Bjorn Helgaas <bjorn.helgaas@hp.com> wrote: > [If you haven't been following this bug, the report is at [3].] > > Here's a theory. I'm not an expert in HyperTransport, so maybe somebody > who knows HyperTransport and/or VIA chipsets can validate or refute it. > > This is based on the _HyperTransport I/O Link Specification_, rev 3.10b [1], > and the _BIOS and Kernel Developer's Guide (BKDG) for AMD Family 10h > Processors_ [2]. > > In a nutshell, I think the problem is that amd_bus.c treats a > HyperTransport (HT) host bridge as though it were a PCI host bridge. In > particular, when an HT chain contains more than one PCI host bridge, the > HT host bridge apertures encompass all the PCI host bridges, but > amd_bus.c mistakenly assigns all those resources to one PCI host bridge. I don't think so. that system only have one HT chain. May 19 23:20:33 ocham kernel: pci 0000:00:18.1 config space: May 19 23:20:33 ocham kernel: 00: 22 10 01 11 00 00 00 00 00 00 00 06 00 00 80 00 May 19 23:20:33 ocham kernel: 10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 May 19 23:20:33 ocham kernel: 20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 May 19 23:20:33 ocham kernel: 30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 May 19 23:20:33 ocham kernel: 40: 03 00 00 00 00 00 7f 00 00 00 00 00 01 00 00 00 May 19 23:20:33 ocham kernel: 50: 00 00 00 00 02 00 00 00 00 00 00 00 03 00 00 00 May 19 23:20:33 ocham kernel: 60: 00 00 00 00 04 00 00 00 00 00 00 00 05 00 00 00 May 19 23:20:33 ocham kernel: 70: 00 00 00 00 06 00 00 00 00 00 00 00 07 00 00 00 May 19 23:20:33 ocham kernel: 80: 03 00 e0 00 80 ff ef 00 00 00 00 00 00 00 00 00 May 19 23:20:33 ocham kernel: 90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 May 19 23:20:33 ocham kernel: a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 May 19 23:20:33 ocham kernel: b0: 03 0a 00 00 00 0b 00 00 03 00 80 00 00 ff ff 00 May 19 23:20:33 ocham kernel: c0: 13 10 00 00 00 f0 ff 00 00 00 00 00 00 00 00 00 May 19 23:20:33 ocham kernel: d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 May 19 23:20:33 ocham kernel: e0: 03 00 00 ff 00 00 00 00 00 00 00 00 00 00 00 00 May 19 23:20:33 ocham kernel: f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 the (0xe4) = ff 00 00 03 mean it will route pci operation all to node0 link0. that chip from VIA has some design problem that will produce one orphan device. May 19 23:20:33 ocham kernel: pci 0000:80:01.0 config space: May 19 23:20:33 ocham kernel: 00: 06 11 88 32 06 00 10 00 10 00 03 04 10 00 00 00 May 19 23:20:33 ocham kernel: 10: 04 c0 bf fe 00 00 00 00 00 00 00 00 00 00 00 00 May 19 23:20:33 ocham kernel: 20: 00 00 00 00 00 00 00 00 00 00 00 00 49 18 88 08 May 19 23:20:33 ocham kernel: 30: 00 00 00 00 50 00 00 00 00 00 00 00 0b 01 00 00 May 19 23:20:33 ocham kernel: 40: 00 30 00 00 00 00 00 00 00 00 00 00 00 00 00 00 May 19 23:20:33 ocham kernel: 50: 01 60 42 c8 00 00 00 00 00 00 00 00 00 00 00 00 May 19 23:20:33 ocham kernel: 60: 05 70 80 00 00 00 00 00 00 00 00 00 00 00 00 00 May 19 23:20:33 ocham kernel: 70: 10 00 91 00 00 00 00 00 00 00 30 00 00 00 00 00 May 19 23:20:33 ocham kernel: 80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 May 19 23:20:33 ocham kernel: 90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 May 19 23:20:33 ocham kernel: a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 May 19 23:20:33 ocham kernel: b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 May 19 23:20:33 ocham kernel: c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 May 19 23:20:33 ocham kernel: d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 May 19 23:20:33 ocham kernel: e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 May 19 23:20:33 ocham kernel: f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 YH
Reply-To: yinghai.lu@oracle.com please check if this one workaround the problem Thanks Yinghai Lu [PATCH] x86, pci: handle fallout pci devices with peer root bus Signed-off-by: Yinghai Lu <yinghai@kernel.org> --- arch/x86/pci/bus_numa.c | 4 +++- kernel/resource.c | 2 +- 2 files changed, 4 insertions(+), 2 deletions(-) Index: linux-2.6/arch/x86/pci/bus_numa.c =================================================================== --- linux-2.6.orig/arch/x86/pci/bus_numa.c +++ linux-2.6/arch/x86/pci/bus_numa.c @@ -22,7 +22,8 @@ void x86_pci_root_bus_res_quirks(struct return; for (i = 0; i < pci_root_num; i++) { - if (pci_root_info[i].bus_min == b->number) + if (pci_root_info[i].bus_min <= b->number && + pci_root_info[i].bus_max >= b->number) break; } @@ -37,6 +38,7 @@ void x86_pci_root_bus_res_quirks(struct for (j = 0; j < info->res_num; j++) { struct resource *res; struct resource *root; + struct resource *tmp; res = &info->res[j]; pci_bus_add_resource(b, res, 0); Index: linux-2.6/kernel/resource.c =================================================================== --- linux-2.6.orig/kernel/resource.c +++ linux-2.6/kernel/resource.c @@ -451,7 +451,7 @@ static struct resource * __insert_resour if (!first) return first; - if (first == parent) + if (first == parent || first == new) return first; if ((first->start > new->start) || (first->end < new->end))
Yinghai - .34 + your patch without use_crs works fine on my system. Instead of just bus 00, both bus 00 & bus 80 are added by bus_numa, (which prevents the conflict when adding 80.01.0 resource.)
Bjorn - here is the extra bridge info when use_crs ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-7f]) ACPI: PCI Root Bridge [PCI1] (domain 0000 [bus 80-ff])
Handled-By : Yinghai Lu <yinghai@kernel.org> Patch : https://patchwork.kernel.org/patch/105662/
On Friday, June 11, 2010 05:06:49 pm Yinghai Lu wrote: > > please check if this one workaround the problem > > Thanks > > Yinghai Lu > > [PATCH] x86, pci: handle fallout pci devices with peer root bus > > Signed-off-by: Yinghai Lu <yinghai@kernel.org> This patch apparently does cover up the problem, but it fails on so many levels: - incomprehensible summary - no changelog - no bugzilla pointer - unrelated junk in patch ("tmp") - completely unexplained change to generic resource.c - no indication that we understand the root cause > --- > arch/x86/pci/bus_numa.c | 4 +++- > kernel/resource.c | 2 +- > 2 files changed, 4 insertions(+), 2 deletions(-) > > Index: linux-2.6/arch/x86/pci/bus_numa.c > =================================================================== > --- linux-2.6.orig/arch/x86/pci/bus_numa.c > +++ linux-2.6/arch/x86/pci/bus_numa.c > @@ -22,7 +22,8 @@ void x86_pci_root_bus_res_quirks(struct > return; > > for (i = 0; i < pci_root_num; i++) { > - if (pci_root_info[i].bus_min == b->number) > + if (pci_root_info[i].bus_min <= b->number && > + pci_root_info[i].bus_max >= b->number) > break; > } > > @@ -37,6 +38,7 @@ void x86_pci_root_bus_res_quirks(struct > for (j = 0; j < info->res_num; j++) { > struct resource *res; > struct resource *root; > + struct resource *tmp; > > res = &info->res[j]; > pci_bus_add_resource(b, res, 0); > Index: linux-2.6/kernel/resource.c > =================================================================== > --- linux-2.6.orig/kernel/resource.c > +++ linux-2.6/kernel/resource.c > @@ -451,7 +451,7 @@ static struct resource * __insert_resour > if (!first) > return first; > > - if (first == parent) > + if (first == parent || first == new) > return first; > > if ((first->start > new->start) || (first->end < new->end)) >
Created attachment 26771 [details] experiment with routing to bus 00 and bus 80 amd_bus.c finds this: bus: [00, ff] on node 0 link 0 bus: 00 index 1 [mem 0x80000000-0xfcffffffff] pci_bus 0000:00: resource 5 [mem 0x80000000-0xfcffffffff] ACPI reports these PCI host bridges and windows: ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-7f]) pci_root PNP0A03:00: host bridge window [mem 0x80000000-0xff37ffff] ACPI: PCI Root Bridge [PCI1] (domain 0000 [bus 80-ff]) pci_root PNP0A08:00: host bridge window [mem 0xfebfc000-0xfebfffff] I think Yinghai's patch assigns the entire [mem 0x80000000-0xfcffffffff] range to both PCI bus 00 and PCI bus 80, meaning that devices on either bus can use available ranges there. I think we should see these bus resources: pci_bus 0000:00: resource 5 [mem 0x80000000-0xfcffffffff] pci_bus 0000:80: resource 5 [mem 0x80000000-0xfcffffffff] So as an experiment, let's move the 80:01.0 audio device outside the PNP0A08:00 aperture, and move the 00:10.4 USB device into the PNP0A08:00 aperture, i.e., pci 0000:00:10.4: reg 10: [mem 0xfbdfbc00-0xfbdfbcff] --> move to 0xfebfc000 pci 0000:80:01.0: reg 10: [mem 0xfebfc000-0xfebfffff 64bit] --> move to 0xfeb00000 If the bus resources assigned by amd_bus.c are correct, both devices should work at their new locations. (If they don't work, I expect the driver will complain, so you probably don't have to worry about actually testing anything.) This patch should be applied on top of 2.6.34 + this Yinghai patch: https://bugzilla.kernel.org/show_bug.cgi?id=16007#c17 Boot normally, without "pci=use_crs".
Created attachment 26804 [details] dmesg move experiment .34 + yinghai's patch + bjorn experimental patch Seems to be OK.
I think the best long-term fix is to always enable "pci=use_crs", regardless of the BIOS date (currently we only do it for 2008 and newer). System designers and BIOS writers expect the OS to pay attention to that information, and indications are that Windows does use it, so I think we will ultimately be better off if we use the expected, best-tested path. However, we have at least one known Linux issue (bug #16228) when _CRS is enabled, so I'm hesitant to enable it unconditionally at least until that is resolved. In the short term, I think we should apply Graham's quirk from comment #8, which enables pci=use_crs just for his system. Here's my response to Yinghai's patches. ACPI gives us these resources: pci_root PNP0A03:00: host bridge window [mem 0x80000000-0xff37ffff] (bus 00) pci_root PNP0A08:00: host bridge window [mem 0xfebfc000-0xfebfffff] (bus 80) Yinghai's patch (comment #17, with a v2 posted to the list but not in the bugzilla), gives us these resources: pci_bus 0000:00: resource 5 [mem 0x80000000-0xfcffffffff] pci_bus 0000:80: resource 5 [mem 0x80000000-0xfcffffffff] I think it's just a bad idea to assign the same range to both buses, especially when the BIOS is telling us what we should be using. I also think it's a mistake to mess with the resource code to deal with this specific case. A change like that makes resource.c hard to understand and maintain in the future.
I found a windows XP disk, so I installed it into a sperate partition and booted. I can confirm that windows shows bus 80 with mem = 0xfebfc000-0xfebfffff.
Handled-By : Bjorn Helgaas <bjorn.helgaas@hp.com>