Latest working kernel version: commit 5eb7f9fa847b8ab6e4864bfb8cb45f370844a47c (somewhere after 2.6.25-rc7) Earliest failing kernel version: commit 12c22d6ef299ccf0955e5756eb57d90d7577ac68 (somewhere after 2.6.25-rc7) Distribution: Gentoo Linux Hardware Environment: x86_64 OKI ANIMA 3300 laptop Software Environment: Kernel. Problem Description: The boot hangs after printing that MSI signaling has been enabled on a pair of PCI bridges. Sometimes the console gets garbled when the hang happens, sometimes not. I bisected the problem to this specific commit: commit 12c22d6ef299ccf0955e5756eb57d90d7577ac68 Author: Linus Torvalds <torvalds@linux-foundation.org> Date: Wed Mar 26 11:22:40 2008 -0700 Revert "PCI: remove transparent bridge sizing" This reverts commit 8fa5913d54f3b1e09948e6a0db34da887e05ff1f, which caused various interesting problems for people, including wrong resource allocations. See for example bugzilla entry "2.6.25-rc2: ohci1394 problem (MMIO broken)" at http://bugzilla.kernel.org/show_bug.cgi?id=10080 [...] I applied a reversion of that commit onto the mainline head (2.6.26-rc9) and it now boots flawlessly (rt61pci still doesn't work reliably, but I found the system stable when using a PCMCIA ath5k card). Steps to reproduce: Boot with bad version and hangs; boot with good version and it works.
I don't know about the other interesting problems caused by the commit reverted in the problematic commit. I know I need it un-reverted to boot my machine. Perhaps some kind of quirk is needed.
Created attachment 16878 [details] Detailed lspci output
I've attached the detailed output of lspci on my machine running some 2.6.24 kernel. I added debug messages to see which was the transparent bridge that required the patch, and this is the specific device info: 00:10.0 PCI bridge: nVidia Corporation MCP51 PCI Bridge (rev a2) (prog-if 01 [Subtractive decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Bus: primary=00, secondary=04, subordinate=08, sec-latency=64 I/O behind bridge: 0000f000-00000fff Memory behind bridge: c3000000-c30fffff Prefetchable memory behind bridge: fff00000-000fffff Secondary status: 66MHz- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- <SERR- <PERR- BridgeCtl: Parity- SERR- NoISA+ VGA- MAbort- >Reset- FastB2B- PriDiscTmr- SecDiscTmr+ DiscTmrStat- DiscTmrSERREn- Capabilities: [b8] Subsystem: Gammagraphx, Inc. Device 0000 Capabilities: [8c] HyperTransport: MSI Mapping Enable- Fixed- Mapping Address Base: 00000000fee00000 00: de 10 6f 02 07 01 b0 00 a2 01 04 06 00 00 81 00 10: 00 00 00 00 00 00 00 00 00 04 08 40 f0 00 80 02 20: 00 c3 00 c3 f0 ff 00 00 00 00 00 00 00 00 00 00 30: 00 00 00 00 b8 00 00 00 00 00 00 00 ff 00 04 02 40: 00 00 03 00 01 00 02 00 05 00 00 00 00 00 44 00 50: 00 00 fe 3f 00 00 00 00 ff 1f ff 1f 00 00 00 00 60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 80: 00 00 00 00 00 00 00 00 00 00 00 00 08 00 00 a8 90: 00 00 e0 fe 00 00 00 00 00 00 00 00 00 00 00 00 a0: 04 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 b0: 00 00 00 00 ff ff 00 00 0d 8c 00 00 00 00 00 00 c0: 00 00 00 00 03 00 00 00 00 00 00 00 00 00 00 00 d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
From Juan Jesus's email (http://lkml.org/lkml/2008/11/12/60): After reading a little bit on how PCI and PCI-to-PCI bridges work, and how they're handled in linux (http://tldp.org/LDP/tlk/dd/pci.html), I now know that ranges in the bridge (either I/O, mmio or prefetchable mem) are simply disabled when start < end, and that's the original configuration that the BIOS enforces when the bridge is not sized by Linux. After inserting kprintf()'s, I see that the hang happens actually while the positive decoding of the I/O range in the bridge is being activated in pci_setup_bridge(), sometime in between the writes to the I/O base/limit registers of the bridge; I don't remember exactly which was the last pci_Write_config_dword() that allowed the next kprintf() to succeed. I'll look at it tonight again, but I suspect that the final enabling write (the one that updates the PCI_IO_BASE_UPPER16 register with its final value) was the one hanging the machine. The I/O range being activated was the one in the range 0x1000-0x1fff, apparently correctly sized to accomodate the two I/O ranges (0x1000-0x10ff, 0x1400-0x14ff) assigned to the CardBus bridge on the secondary bus. One theory is that my system has something actually mapped to that I/O range in the root PCI bus. When only subtractive decoding is in place (the I/O range isn't activated), access to the secondary bus behind the PCI-to-PCI bridge is done when the transaction isn't claimed by any device in the root bus, after what the PCI docs describe as a 4-cycle timeout. When the I/O range is activated, that range is positively decoded by the bridge, which tries to claim the transaction before the timeout. Perhaps two devices (the bridge and the unknown device on the root bus) conflict when claiming the same transaction? Another possibility could be that activating the I/O range disables the negative decoding in the secondary-to-primary sense of the bridge for that I/O range. Perhaps some device behind the bridge depends on being able to forward transactions to the primary bus on that I/O range, but it's disallowed after the range is configured. For me this seems rather unlikely, because of the nature of the devices behind the bridge. I'll look at it more closely again, and I will test whether commenting out the I/O range sizing (leaving the other ranges to be sized) is enough to allow the system to run. If so, is there any way to use a system-specific quirk in order to remove the PCI-to-PCI bridge I/O range from being sized/activated? Best regards, Juan Jesus.
Created attachment 18961 [details] Disassembled DSDT contents
From my e-mail: http://permalink.gmane.org/gmane.linux.kernel.pci/1991 PCI bus conflict hang: how to avoid the allocation of an I/O range. From: GARCIA DE SORIA LUCENA, JUAN JESUS <juanj.g_soria <at> grupobbva.com> Subject: PCI bus conflict hang: how to avoid the allocation of an I/O range. Newsgroups: gmane.linux.kernel.pci Date: 2008-11-17 14:05:22 GMT After commit commit 12c22d6ef299ccf0955e5756eb57d90d7577ac68 Author: Linus Torvalds <torvalds <at> linux-foundation.org> Date: Wed Mar 26 11:22:40 2008 -0700 Revert "PCI: remove transparent bridge sizing" My laptop began hanging when booting, and I filed http://bugzilla.kernel.org/show_bug.cgi?id=11054. I had to disable the sizing of transparent bridges until, after a conversation in the kernel mailing list, I think I've found the root of the problem. A CardBus bridge is on the secondary bus of a transparent bridge. By default it gets assigned two I/O ranges: 0x1000-0x10ff and 0x-1400-0x14ff, which is translated to the transparent bridge positively forwarding the range 0x1000-0x1fff. There are no more I/O resources allocated behind the transparent PCI to PCI bridge. I suspect there's "something" (some device unknown by the kernel) decoding I/O accesses in the primary PCI bus, in the 0x1000-0x1fff range. This device must be causing bus conflicts with the range allocated to the PCI to PCI bridge. Not sizing the transparent bridge wouldn't configure any I/O range in it for positive decoding, thus avoiding the conflict. The system hangs when the bridge register for the IO base/limit (lower 16 bits, since it's 16 bit only) gets written to with the value 0x????0101. If I force the range to be allocated to be above 0x4000, everything works flawlessly. I've been able to do so by two means: 1. Changing the definition of PCIBIOS_MIN_IO in arch/x86/include/asm/pci.h from 0x1000 to 0x4000. This forces the CardBus ranges to be allocated above the problematic area, making the bridge forward 0x4000-0x4fff I/O addresses. BTW, PCIBIOS_MIN_CARDBUS_IO is defined to be 0x4000 in the same header, but it's only used in drivers/pcmcia/yenta_socket.c, not apparently when assigning resources to the CardBus bridge in the functions pci_setup_cardbus() or pci_bus_size_cardbus() in drivers/pci/setup-bus.c. I suppose that making the CardBus bridge I/O range allocation respect the defined PCIBIOS_MIN_CARDBUS_IO limit would fix my issue, but I don't know whether that's "the right fix" (TM). 2. I've managed to boot a stock Ubuntu Intrepid Ibex x86_64 kernel by supplying the parameter "pci=cbiosize=8k" to the grub command line. It doesn't work with a smaller size. With 8k the CardBus bridge I/O ranges are big enough that they have to be allocated above the "problem area" because of natural alignment restrictions. So far I've got what I really wanted (to be able to use my laptop with modern distributions without having to recompile each kernel version), although to do so I'm depending on the fact that a kernel parameter intended for a different use will alter I/O range alignment one PCI to PCI bridge away. I write to ask whether the definition of PCIBIOS_MIN_CARDBUS_IO was indeed intended to affect my case too (in which case what is happening is the result of a kernel bug that should be fixed) or not. And if it's not a bug, I'd like to know if there exists any reliable way to pre-allocate a given I/O range (0x1000-0x1fff in my case) so that it won't be assigned to PCI busses/devices (without the need to recompile every kernel version).
Moving the full version info here to unmess the formatting of the lists 2.6.25 + 12c22d6ef299ccf0955e5756eb57d90d7577ac68 up to mainline
Closing as obsolete, please re-open and update the kernel version to a recent one if still seen