Bug 6337
Summary: | commit b408cbc704352eccee301e1103b23203ba1c3a0e kills Cardbus bridge interrupts | ||
---|---|---|---|
Product: | Drivers | Reporter: | Larry Finger (Larry.Finger) |
Component: | PCI | Assignee: | Linus Torvalds (torvalds) |
Status: | CLOSED CODE_FIX | ||
Severity: | high | CC: | akpm, greg |
Priority: | P2 | ||
Hardware: | i386 | ||
OS: | Linux | ||
Kernel Version: | 2.6.17-rc1 | Subsystem: | |
Regression: | --- | Bisected commit-id: | |
Bug Depends on: | |||
Bug Blocks: | 5829 | ||
Attachments: |
dmesg output _with_ b408cbc7
dmesg output _without_ b408cbc7 lspci -vvxxx output _with_ b408cbc7 lspci -vvxxx output _without_ b408cbc7 Trial patch to test dmesg output with commit b408cbc7 and the trial patch of 4/09 dmesg, lspci and fbset Another patch to test Revised version of second patch |
Description
Larry Finger
2006-04-05 20:12:30 UTC
(CCing gregkh, who signs the commit) Yeah, I signed off on the commit, but it was Linus that wrote the patch. Reassigning to him. Larry, to help debug this, can you do the following: - with and without that patch, do a lspci -vvxxx and attach both ouputs (clearly saying which is which) to the bugzilla entry - same deal, with full "dmesg" output for both working and nonworking case. The latter preferably with CONFIG_PCI_DEBUG enabled _and_ with DEBUG enabled manually at the top of arch/i386/pci/pci.h, which gives a lot more output. Linus Created attachment 7820 [details]
dmesg output _with_ b408cbc7
Created attachment 7821 [details]
dmesg output _without_ b408cbc7
Created attachment 7822 [details]
lspci -vvxxx output _with_ b408cbc7
Created attachment 7823 [details]
lspci -vvxxx output _without_ b408cbc7
Linus, The outputs of dmesg and lspci -vvxxx with and without patch b408cbc7 have been attached. Each was run with CONFIG_PCI_DEBUG=y, and the #undef DEBUG at the top of arch/i386/pci/pci.h changed to #define DEBUG. If any further output will be of help, please let me know. Thanks for the help. Larry On Sat, 8 Apr 2006, bugme-daemon@bugzilla.kernel.org wrote: > > The outputs of dmesg and lspci -vvxxx with and without patch b408cbc7 have been > attached. Each was run with CONFIG_PCI_DEBUG=y, and the #undef DEBUG at the top > of arch/i386/pci/pci.h changed to #define DEBUG. > > If any further output will be of help, please let me know. Thanks, it's all clear. The BIOS has set up the cardbus controller registers at a bad area, and the change in commit b408cbc7.. basically assumes that any PCI register setup by the BIOS is good, and in this case it clearly just isn't. The whole reason for that commit is that the BIOS has set up the e820 "reserved region" to overlap with a _valid_ PCI allocation, but in your case, a e820 reserved region overlaps with an _invalid_ PCI allocation, and the fact that the kernel now accepts it (because it was the right thing to do on another machine) is wrong on your machine. I suspect we will need to just revert that change. Can you try this trivial patch which should effectively do exactly that (but without undoing the cleanups that were done in b408cbc7..) This will break the fbcon setup that caused the original patch ;/ Linus Created attachment 7826 [details]
Trial patch to test
Created attachment 7827 [details]
dmesg output with commit b408cbc7 and the trial patch of 4/09
The trial patch fixes the problem. It was run with commit b408cbc7 and DEBUG
set in arch/i386/pci/pci.h.
Larry
Linus Torvalds wrote:
>
> On Sat, 8 Apr 2006, bugme-daemon@bugzilla.kernel.org wrote:
>
> Thanks, it's all clear. The BIOS has set up the cardbus controller
> registers at a bad area, and the change in commit b408cbc7.. basically
> assumes that any PCI register setup by the BIOS is good, and in this case
> it clearly just isn't.
>
> The whole reason for that commit is that the BIOS has set up the e820
> "reserved region" to overlap with a _valid_ PCI allocation, but in your
> case, a e820 reserved region overlaps with an _invalid_ PCI allocation,
> and the fact that the kernel now accepts it (because it was the right
> thing to do on another machine) is wrong on your machine.
>
> I suspect we will need to just revert that change. Can you try this
> trivial patch which should effectively do exactly that (but without
> undoing the cleanups that were done in b408cbc7..)
>
> This will break the fbcon setup that caused the original patch ;/
The trial patch fixes the problem on my machine. I used "git checkout -f" to reset the source to
that of the git repository for 2.6.17-rc1, then did the new patch, followed by setting DEBUG in
arch/i386/pci/pci.h.
As before, I am willing to provide any further output, or test any more patches.
Larry
Created attachment 7828 [details]
dmesg, lspci and fbset
Framebuffer wont work without mem=.
Can you guys test this trivial one-liner? In arch/i386/pci/common.c, just change the line that says subsys_initcall(pcibios_init); into arch_initcall(pcibios_init); to move the "pcibios_init()" code sequence up. In particular, it should mean that we do the bus resource traversal early, but leave the actual device resources for later. That might make everybody happy: it means that the bus traversal will be done before the e820 resources have been set up, and that we trust the PCI _bus_ information more than we trust the BIOS tables. But when individual devices are scanned, we'll have scanned the bus and the e820 information first. This is inevitably a balance of "whom do you trust?" Do we trust the BIOS "reserved memory" tables over the actual device setup? We have two cases where one is "more correct" than the other, but sadly, it's a different one in those two cases. We could also have an even more involved setup: - we trust the "real RAM" e820 table information the most (because we already very fundamentally rely on that: it's what we use to initialize our regular free memory setup) - then we trust PCI bus scanning - then we trust "reserved RAM" etc information e820 table info - and finally, we trust regular PCI device resource scanning I'll send out a patch that tries to do all of this, and see how that works out.. Linus Created attachment 7829 [details]
Another patch to test
This goes on top of the previou patch, but INSTEAD of the
change I described manually (the subsys_initcall->arch_initcall
one).
It sets up resource allocation to trust the "real RAM"
resources more than anything else.
The second patch to test has some offsets in arch/i386/kernel/setup.c and generates an error as follows: finger@larrylap:~/linux-2.6> patch -p1 < linus_patch2 patching file arch/i386/kernel/efi.c patching file arch/i386/kernel/setup.c Hunk #1 succeeded at 1281 (offset -30 lines). Hunk #2 succeeded at 1329 (offset -30 lines). Hunk #3 succeeded at 1342 (offset -30 lines). patching file arch/ia64/kernel/efi.c patching file arch/ia64/kernel/setup.c patching file include/linux/efi.h finger@larrylap:~/linux-2.6> make CHK include/linux/version.h CC init/main.o CHK include/linux/compile.h LD init/built-in.o CHK usr/initramfs_list CC arch/i386/kernel/time.o CC arch/i386/kernel/setup.o arch/i386/kernel/setup.c:1355: error: On Sun, 9 Apr 2006, bugme-daemon@bugzilla.kernel.org wrote: > > The second patch to test has some offsets in arch/i386/kernel/setup.c and > generates an error as follows: It should apply cleanly against the current -git, but I think you can ignore the offsets against your older kernel too. > arch_initcall(request_standard_resources_early); > fs_initcall(request_standard_resources_late); > > Clearly, the arch_initcall and the fs_initcall belong in a function, but which one? Actually, no. They should be outside. They generate magic inline assembly that adds those two functions to the list of initcalls, and they should _not_ be inside a function, but at the top level. So the thing you cut-and-pasted looked fine. Linus Created attachment 7830 [details]
Revised version of second patch
The second patch works with 2 changes: (1) the routine is
request_standard_resources, not request_standard_resources_early, and (2)
request_standard_resources_late needs to return a value (presumably 0). The
revised diff is attached.
BTW, both patches are probably needed for the stable series 2.6.16.y.
Larry
Larry, is this fixed now? bugme-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=6337 > > > > > > ------- Additional Comments From akpm@osdl.org 2006-06-01 21:45 ------- > Larry, is this fixed now? > Yes. The code changes that Linus made fixed it completely. I just marked the bug closed. Sorry that I didn't do that earlier. Larry On Fri, 2 Jun 2006, bugme-daemon@bugzilla.kernel.org wrote: > > > Larry, is this fixed now? > > > Yes. The code changes that Linus made fixed it completely. I just marked > the bug closed. Sorry that I didn't do that earlier. Did I merge the proposed fix too? Remind me. I have the attention span of a rodent on crack. Linus bugme-daemon@bugzilla.kernel.org wrote: > Did I merge the proposed fix too? > Remind me. I have the attention span of a rodent on crack. > > Linus > I'm not surprised given all that you do. IIRC (not a certainty), you revised the patch that had broken my machine. I think the new version does things in a different order. In any case, my computer works OK and it doesn't seem that anyone else has complained. Larry |