Bug 5235
Summary: | PCI devices not working correctly on 2.6.13+, b44 network card causes freeze | ||
---|---|---|---|
Product: | Drivers | Reporter: | Pavol Gono (Palo.Gono) |
Component: | PCI | Assignee: | Greg Kroah-Hartman (greg) |
Status: | REJECTED DOCUMENTED | ||
Severity: | normal | CC: | acpi-bugzilla, akpm, aleksey_gorelov, bjorn.helgaas |
Priority: | P2 | ||
Hardware: | i386 | ||
OS: | Linux | ||
Kernel Version: | 2.6.13.1 | Subsystem: | |
Regression: | --- | Bisected commit-id: |
Description
Pavol Gono
2005-09-12 17:46:55 UTC
We dinked with the PCI windowing a bit post-2.6.13. Could you please test ftp://ftp.kernel.org/pub/linux/kernel/v2.6/snapshots/patch-2.6.13-git9.gz (against 2.6.13) 2.6.13-git9-x03 kernel has the very similar problems on my hardware - I don't hear sound card while playing mp3 - after connecting network cable to b44 network card system is frozen PCI assignement and outputs of lspci are exactly the same like 2.6.13.1 Maybe these differencies of /proc will be useful for you too (the same with diff 2.6.12.6 and 2.6.13.1): diff -urU5 2.6.12.6-x02/proc/iomem 2.6.13-git9-x03/proc/iomem --- 2.6.12.6-x02/proc/iomem 2005-09-13 00:56:35.611577384 +0200 +++ 2.6.13-git9-x03/proc/iomem 2005-09-13 07:31:54.306981080 +0200 @@ -2,20 +2,23 @@ 0009fc00-0009ffff : reserved 000a0000-000bffff : Video RAM area 000c0000-000cebff : Video ROM 000f0000-000fffff : System ROM 00100000-1fffbfff : System RAM - 00100000-002e7173 : Kernel code - 002e7174-003946ff : Kernel data + 00100000-002dc2f9 : Kernel code + 002dc2fa-00352ea7 : Kernel data 1fffc000-1fffefff : ACPI Tables 1ffff000-1fffffff : ACPI Non-volatile Storage +20000000-20003fff : 0000:00:0a.0 f1000000-f10000ff : 0000:00:10.3 f1800000-f1801fff : 0000:00:09.0 f1800000-f1801fff : b44 f2000000-f3dfffff : PCI Bus #01 f2000000-f2ffffff : 0000:01:00.0 +f3ef0000-f3ef3fff : 0000:00:09.0 f3f00000-f7ffffff : PCI Bus #01 + f3fe0000-f3ffffff : 0000:01:00.0 f4000000-f7ffffff : 0000:01:00.0 f8000000-fbffffff : 0000:00:00.0 fec00000-fec00fff : reserved fee00000-fee00fff : reserved ffff0000-ffffffff : reserved diff -urU5 2.6.12.6-x02/proc/ioports 2.6.13-git9-x03/proc/ioports --- 2.6.12.6-x02/proc/ioports 2005-09-13 00:56:35.611577384 +0200 +++ 2.6.13-git9-x03/proc/ioports 2005-09-13 07:31:54.306981080 +0200 @@ -26,5 +26,7 @@ d000-d01f : uhci_hcd d400-d47f : 0000:00:0d.0 d400-d47f : ALS4000 d800-d81f : 0000:00:0a.0 d800-d81f : ne2k-pci +e400-e47f : 0000:00:11.0 +e800-e80f : 0000:00:11.0 bugme-daemon@kernel-bugs.osdl.org wrote: > > http://bugzilla.kernel.org/show_bug.cgi?id=5235 > > > > > > ------- Additional Comments From pavol_gono@yahoo.com 2005-09-12 22:47 ------- > 2.6.13-git9-x03 kernel has the very similar problems on my hardware OK, thanks. So we haven't fixed it yet. Linus, Ivan, could you please take a look at http://bugzilla.kernel.org/show_bug.cgi?id=5235 ? especially this: - I/O behind bridge: 0000e000-0000dfff + I/O behind bridge: 0000f000-00000fff a size of -1... Reply-To: ink@jurassic.park.msu.ru On Mon, Sep 12, 2005 at 10:59:33PM -0700, Andrew Morton wrote: > especially this: > > - I/O behind bridge: 0000e000-0000dfff > + I/O behind bridge: 0000f000-00000fff > > a size of -1... This is perfectly fine - there are no IO ports behind the AGP bridge (video card has only MMIO registers), so IO window gets disabled. I'd rather blame supposedly wrong IRQ assignment: -PCI: Enabling device 0000:00:0a.0 (0000 -> 0001) -PCI: Found IRQ 11 for device 0000:00:0a.0 -PCI: Sharing IRQ 11 with 0000:00:0d.0 -PCI: Sharing IRQ 11 with 0000:01:00.0 -eth1: RealTek RTL-8029 found at 0xd800, IRQ 11, 52:54:AB:4D:BC:C1. +PCI: Enabling device 0000:00:0a.0 (0000 -> 0003) +PCI: setting IRQ 9 as level-triggered +PCI: Assigned IRQ 9 for device 0000:00:0a.0 +PCI: Sharing IRQ 9 with 0000:00:0d.0 +eth1: RealTek RTL-8029 found at 0xd800, IRQ 9, 52:54:AB:4D:BC:C1. Ivan. Ivan Kokshaysky <ink@jurassic.park.msu.ru> wrote: > > On Mon, Sep 12, 2005 at 10:59:33PM -0700, Andrew Morton wrote: > > especially this: > > > > - I/O behind bridge: 0000e000-0000dfff > > + I/O behind bridge: 0000f000-00000fff > > > > a size of -1... > > This is perfectly fine - there are no IO ports behind the > AGP bridge (video card has only MMIO registers), so IO window > gets disabled. > > I'd rather blame supposedly wrong IRQ assignment: > > -PCI: Enabling device 0000:00:0a.0 (0000 -> 0001) > -PCI: Found IRQ 11 for device 0000:00:0a.0 > -PCI: Sharing IRQ 11 with 0000:00:0d.0 > -PCI: Sharing IRQ 11 with 0000:01:00.0 > -eth1: RealTek RTL-8029 found at 0xd800, IRQ 11, 52:54:AB:4D:BC:C1. > +PCI: Enabling device 0000:00:0a.0 (0000 -> 0003) > +PCI: setting IRQ 9 as level-triggered > +PCI: Assigned IRQ 9 for device 0000:00:0a.0 > +PCI: Sharing IRQ 9 with 0000:00:0d.0 > +eth1: RealTek RTL-8029 found at 0xd800, IRQ 9, 52:54:AB:4D:BC:C1. > Crap. Len, could you please take a look and if this is an APCI problem, set the bug ownership appropriately? > Also these commandline parameters didn't help:
> pci=usepirqmask
> (empty)
> noapic nolapic apm=off noacpi pci=noacpi,usepirqmask
"noacpi" does nothing.
does booting with "acpi=off" make any difference?
If yes, does booting with "acpi=noirq" or "pnpacpi=off"
make the same difference?
it would be ideal if you could attach the complete
dmesg files rather than just the diff.
I was not able add atachements via "Create a New Attachment", so I put it to http://decef.elf.stuba.sk/~pg20207/bug_5235_2.6.13.tar.bz2 There are dmesgs, .configs, lspcis and /proc files from 2.6.12.6, 2.6.13.1 and 2.6.13-git9 I'll try "acpi=off" after 10 hours, currently I am only remotely connected to my machine. > I'll try "acpi=off" after 10 hours
But from the dmesg logs, it looks like CONFIG_ACPI isn't even
turned on.
I'm a dunce and haven't figured out a good way to browse per-file
revision history. But if I *could* do that, I'd start looking
at arch/i386/pci/irq.c.
> But from the dmesg logs, it looks like CONFIG_ACPI isn't even
> turned on.
Of course, I didn't realise this... I use only APM for powering down machine.
I tried "acpi=off" and logicaly no real change in behaviour of kernel.
I noticed another thing - when I leaved machine in frozen state (2.6.13-git9)
cca 2 hours, then I rebooted machine and it was not able to boot - DISK FAILURE
or such message. Then I switched off machine for some minutes and after that it
continued working normally (2.6.12.6). I don't know whether this has some
relation to kernel problem, but I don't remember disk failures in past.
Some ideas how to narrow problem? Now I have some time to experiments.
> Some ideas how to narrow problem? Now I have some time to experiments.
Somebody probably has better ideas, but since I don't know how
non-ACPI IRQ routing works, my approach would be the brute-force
one of sprinkling printks through pcibios_lookup_irq() and related
things, and try to figure out what changed between 2.6.12.6 and
2.6.13. And maybe booting with "debug apic=debug" would cause
more useful output.
How about enabling ACPI for your system? That might fix the routing issue... > How about enabling ACPI for your system? That might fix the routing issue... This solved my problem, now network and sound cards work correctly on 2.6.13.1 Changes in .config: CONFIG_ACPI=y CONFIG_ACPI_DEBUG=y CONFIG_APM is not set CONFIG_PCI_DEBUG=y Kernel command line: pci=usepirqmask debug apic=debug Details: http://decef.elf.stuba.sk/~pg20207/bug_5235_with_acpi.tar.bz2 Great, I'll mark this as fixed now... bugme-daemon@kernel-bugs.osdl.org wrote: > > I'm a dunce and haven't figured out a good way to browse per-file > revision history. http://www.kernel.org/git/gitweb.cgi?p=linux/kernel/git/torvalds/linux-2.6.git;a=tree > Great, I'll mark this as fixed now...
Well, but... it *used* to work without ACPI. And now it doesn't.
Is that an acceptable regression? How do we recognize and deal with
future, similar bug reports?
I'd feel better if we understood exactly what changed in the non-ACPI
case, and we made a conscious decision that it was unavoidable.<html><head><meta DEFANGED_name="qrichtext" content="1" /></head><body DEFANGED_style="font-size:12pt;font-family:Courier New">
<p>> Great, I'll mark this as fixed now...</p>
<p></p>
<p>Well, but... it *used* to work without ACPI. And now it doesn't.</p>
<p>Is that an acceptable regression? How do we recognize and deal with</p>
<p>future, similar bug reports?</p>
<p></p>
<p>I'd feel better if we understood exactly what changed in the non-ACPI</p>
<p>case, and we made a conscious decision that it was unavoidable.</p>
</body></html>
Hopefully bugzilla won't crash this time :) After narrowing this bug in code it seems the problem is hardware specific. In nonACPI case, kernel 2.6.12-git5 was working correctly, but 2.6.12-git6 not. I find out the small diff, which brought problems: diff -Naurp linux-x03/arch/i386/pci/irq.c linux6/arch/i386/pci/irq.c --- linux-x03/arch/i386/pci/irq.c 2005-06-17 21:48:29.000000000 +0200 +++ linux6/arch/i386/pci/irq.c 2005-09-18 22:46:25.619910392 +0200 @@ -227,6 +227,24 @@ static int pirq_via_set(struct pci_dev * } /* + * The VIA pirq rules are nibble-based, like ALI, + * but without the ugly irq number munging. + * However, for 82C586, nibble map is different . + */ +static int pirq_via586_get(struct pci_dev *router, struct pci_dev *dev, int pirq) +{ + static unsigned int pirqmap[4] = { 3, 2, 5, 1 }; + return read_config_nybble(router, 0x55, pirqmap[pirq-1]); +} + +static int pirq_via586_set(struct pci_dev *router, struct pci_dev *dev, int pirq, int irq) +{ + static unsigned int pirqmap[4] = { 3, 2, 5, 1 }; + write_config_nybble(router, 0x55, pirqmap[pirq-1], irq); + return 1; +} + +/* * ITE 8330G pirq rules are nibble-based * FIXME: pirqmap may be { 1, 0, 3, 2 }, * 2+3 are both mapped to irq 9 on my system @@ -512,6 +530,10 @@ static __init int via_router_probe(struc switch(device) { case PCI_DEVICE_ID_VIA_82C586_0: + r->name = "VIA"; + r->get = pirq_via586_get; + r->set = pirq_via586_set; + return 1; case PCI_DEVICE_ID_VIA_82C596: case PCI_DEVICE_ID_VIA_82C686: case PCI_DEVICE_ID_VIA_8231: After this patch the common problems appeared - sound and network cards broken. My motherboard is Asus A7V333-X, Northbridge VIA KT333, Southbridge VIA VT8235. Logs, .configs and /proc files are in http://decef.elf.stuba.sk/~pg20207/bug_5235_via.tar.bz2 if someone wants to see details You might ping Aleksey_Gorelov@Phoenix.com, the patch is added by him from the log. He should know about your issue. Similar problem has been reported before here: http://groups.google.com/group/linux.kernel/browse_thread/thread/def4ca19dbc3cd4/5cffbf349f2c87a4?tvc=2&q=Aleksey+Gorelov&hl=en#5cffbf349f2c87a4 and was related to bug in BIOS reporting 82C686 router compatible to 586. I suspect BIOS on this board has similar issue: reports VT8235 router to be compatible with 586 one - which is obviously not true. Patch from the link above has already incorporated in both 2.6 & 2.4 series, but might not work in this particular case. Can you please try something like this (patch agains 2.6.14-rc2): --- linux-2.6.14-rc2/arch/i386/pci/irq_old.c 2005-09-21 16:08:17.000000000 -0700 +++ linux-2.6.14-rc2/arch/i386/pci/irq.c 2005-09-21 16:13:15.000000000 -0700 @@ -552,10 +552,27 @@ { /* FIXME: We should move some of the quirk fixup stuff here */ - if (router->device == PCI_DEVICE_ID_VIA_82C686 && - device == PCI_DEVICE_ID_VIA_82C586_0) { - /* Asus k7m bios wrongly reports 82C686A as 586-compatible */ - device = PCI_DEVICE_ID_VIA_82C686; + /* + * work arounds for some buggy BIOSes + */ + if (device == PCI_DEVICE_ID_VIA_82C586_0) { + switch(router->device) + { + case PCI_DEVICE_ID_VIA_82C686: + /* + * Asus k7m bios wrongly reports 82C686A + * as 586-compatible + */ + device = PCI_DEVICE_ID_VIA_82C686; + break; + case PCI_DEVICE_ID_VIA_8235: + /** + * Asus a7v-x bios wrongly reports 8235 + * as 586-compatible + */ + device = PCI_DEVICE_ID_VIA_8235; + break; + } } switch(device) @@ -568,6 +585,7 @@ case PCI_DEVICE_ID_VIA_82C596: case PCI_DEVICE_ID_VIA_82C686: case PCI_DEVICE_ID_VIA_8231: + case PCI_DEVICE_ID_VIA_8235: /* FIXME: add new ones for 8233/5 */ r->name = "VIA"; r->get = pirq_via_get; > Can you please try something like this (patch agains 2.6.14-rc2): This patch worked fine for me original 2.6.14-rc2 - the same problems with net/sound like previous 2.6.14-rc2-x01 (with your patch) - without problems Details: http://decef.elf.stuba.sk/~pg20207/bug_5235_via_solved.tar.bz2 |