Bug 2243
Description
Stian Jordet
2004-03-03 18:35:08 UTC
Created attachment 2279 [details]
dmesg with acpi=off
Dmesg and interrupts are from one pc (running 2.6.4-rc1-mm1, but same symptoms
with stock 2.6.4-rc1). Can attach from the other pc as well, if that's needed.
Created attachment 2280 [details]
interrupts with acpi=off
Created attachment 2281 [details]
dmesg with acpi
Created attachment 2282 [details]
interrupts with acpi
Still happens with 2.5.1-rc1-bk4. When I cat /proc/interrupts it seems that the interrupts for the uhci_hcd controllers are steadily increasing, even without any usb devices attached. But I've been thinking, since I'm the only one with this problem, can it be a configuration error some place? I just found out that on my ASUS motherboard (which also has this problem, altough not the one from the dmesg I have posted), I get the same interrupts on eth0 (interrupt 19, shared with a tv-card and a pcmcia-adapter (which have stopped working as well)), as I do on irq 11 (uhci_hcd). So, if I have heavy network traffic, irq 11 and usb will be disabled very fast. I also noticed this message in the logs from my tv-card: saa7134[0]/irq[10,418175]: r=0x20 s=0x00 PE saa7134[0]/irq: looping -- clearing enable bits None of these problems occur without acpi. Jordet, would you please try the patch 2395 in Bug 2366. I want to know if there is any difference. thanks. No difference. I just downloaded 2.6.2 again, and verified that I have no problem then. I noticed though, that I still get interrupts increasing on the usb irq when it increases on eth0. I will try the later kernels as well, to find out exactly when it broke. Summary: With 2.6.2 both usb, acpi and cardbus adapter works perfect. With 2.6.3, there is only trouble. On two different computers. with the patch, do you still see
IRQ9 -> 0:9-> 0:9
IRQ10 -> 0:10
IRQ11 -> 0:11-> 0:11
in your dmesg?
can you attach the dmesg and interrupts when works?
>When this was working, usb had irq 5 on both computers, now it has 11
do you change the BIOS option?
Sorry for this late reply, I have been quite busy lately. As I told, I have the excact same problem on two quite different pc's. But both have VIA chipsets, Apollo Pro 133 and Apollo Pro 266. Anyway, most I have written here was about the one with a Rioworks motherboard. I have now started using the other one (with ASUS motherboard) regularily, so I have been testing with that one instead, so please disregard what I have written earlier. First, no, I don't see IRQ9 -> 0:9-> 0:9 IRQ10 -> 0:10 IRQ11 -> 0:11-> 0:11 anymore with the patch. I also found out that everything worked fine with 2.6.3, but broke with 2.6.4-rc1. I said my pcmcia adapter alseo ceased working, but it's working again with 2.6.5, so that's no problem. I will attach dmesg and /proc/interrupts from both 2.6.3 and 2.6.4-rc1, and hope you will figure it out. The problem that irq 9 is steadily increasing with the same rate as irq 19 is there with 2.6.3 as well, but it doesn't cause any "disabling interrupt" message. Still get the saa7134[0]/irq[10,-254395]: r=0x20 s=0x00 PE saa7134[0]/irq: looping -- clearing enable bits message in dmesg. Since none of these boards have ever worked with acpi=off without using noapic, I will not attach acpi=off dmesg and interrupts. If you want acpi=off and noapic dmesg and interrupts, please tell me. Happy easter to all of you :) Created attachment 2547 [details]
dmesg-2.6.3
Created attachment 2548 [details]
interrupts-2.6.3
Created attachment 2549 [details]
dmesg-2.6.4-rc1
Created attachment 2550 [details]
interrupts-2.6.4-rc1
Created attachment 2551 [details]
Windows 2000 irq-assignments
I forgot to tell that a 100% reproducible way to disable irq 11 is to copy a
couple of GB to a nfs-share. This always disables irq 11 with 2.6.4-rc1, and
never with 2.6.3.
Created attachment 2552 [details]
Windows 2000 irq-assignments
Sorry, the last one was bogus :(
Stian, Can you re-test both systems with 2.6.5? We' fixed a couple of things there. I must admit I'm a bit turned around by the multiple systems and multiple symptoms that you've mentioned in this bug. Maybe I'm working too late;-) thanks, -Len First, I'm really sorry I suck so much at writing good bugreports. And I'm shocked that I have this problem on two systems, while I haven't seen anyone else have the same problem. Perhaps I should file two different bugs, one for each system? Even though the cause and symptoms seem to be the same. I have retested with 2.6.6-rc1, and both systems still disables the usb-interrupt (9 on one of them, 11 on the other) when there is heavy network traffic. This is because both the usb-irq and the eth0-irq increases with the same number of interrupts during network traffic. Are there any information I could provide that would help track this down? I will soon ship one of those damn things to you, so you can try yourself :P this has got to be the "VIA quirk". Probably users of older VIA systems are running with ACPI off. It is fixed in legacy mode; we should be able to fix it in ACPI mode too. *** Bug 2528 has been marked as a duplicate of this bug. *** Quick testing shows that on one of my systems, everything seems to work fine with pci=noacpi. Then eth0 and uhci_hcd shares the same interrupt. This system had the "Using VIA IRQ router" or something message at boot. The other system won't boot with pci=noacpi (and never have), so I can't test. And in case I haven't told it earlier; with 2.6.3 everything works fine, but the interrupts are increasing on both usb and eth0 even then, it just didn't disable the interrupt then. I really hope you'll figure it out :) And I see from http://bugzilla.kernel.org/show_bug.cgi?id=2528 that this guy have done just the same observations as me, with 2.6.3 working, and none later :) Good! Btw. can the reason of my computer not booting with pci=noacpi have something to do about it? Is the patch still in progress? I assume it will be posted as an attachment to this bug. is there a dmesg of the properly functioning system in APIC mode someplace? seems with pci=noacpi, these boxes run in PIC mode, so that doesn't help. re: quirks please attach the lspci -v please boot 2.6.5 with the debug patch in bug 1681 -- perhaps it will give us some visibility into when/if quirks are being called. Hi Len, Kinda hard to tell what debug patch you're talking about. Obviously wasn't from the bug you referred to. Anyway, the system with the Asus motherboard have never booted with apic enabled. I have to boot it with either acpi on, or noapic. This has started to annoy me lately, because most modern linux installers enabled apic on smp boards as default. When I disable acpi I get these lines at bootup: ENABLING IO-APIC IRQs Setting 2 in the phys_id_present_map ...changing IO-APIC physical APIC ID to 2 ... ok. And then it just hangs forever. Should I file a seperate bug for that? As I said, this is starting to bother me more lately. Anyway, the Rioworks board boots fine with IO-APIC, and I get the same irq assignments as in Windows, but usb doesn't work. But I'll attach dmesg and interrupts from acpi=off boot. I just installed Linux on my girlfriends pc, and I observed the same behaviour there. She has a single cpu system with VIA Apollo KLE133 chipset. Would you like some information from that computer as well, or do you have enough? Created attachment 2701 [details]
interrupts with acpi
Created attachment 2702 [details]
dmesg with acpi
Created attachment 2703 [details]
Interrupts without acpi
Created attachment 2704 [details]
dmesg without acpi
Created attachment 2705 [details]
irq's in Windows XP on same motherboard
oops, sorry 'bout the typo. please try the debug patch from bug 1581 plus the debug patch from bug 1689 (at the same time) and attach the dmesg & interrupts from ACPI+IOAPIC mode. thanks, -Len Created attachment 2718 [details]
dmesg with acpi and both debug patches
Created attachment 2719 [details]
interrupts with acpi and both debug patches
Created attachment 2720 [details]
interrupts with apic
Created attachment 2721 [details]
dmesg with apic and both debug patches
Created attachment 2722 [details]
lspci -vv
Created attachment 2723 [details]
dmesg with apic and both debug patches
Ok, I'm stupid
Created attachment 2724 [details]
And one more time.. *blushing* - dmesg with apic and both debug patches
I guess it's needless to say, but I tested this patch: ftp://ftp.kernel.org/pub/linux/kernel/people/lenb/acpi/patches/release/2.6.6/acpi-20040326-2.6.6.diff.gz and the problem still exists, just as before. Was the dmesg I posted a couple of weeks ago good, or did I miss something? Created attachment 2845 [details]
2.6.6-rc1 dmesg
As of kernel 2.6.6-rc1 the problem still exists. I attached the dmesg output.
2.6.6-rc1 dmesg is too short, and now out-dated anyway. Please build 2.6.6 + patch from bug 2665 (or simply 2.6.6-mm2) with CONFIG_LOG_BUF_SHIFT=16 and run dmesg -s64000 re: IRQ9 increasing with IRQ19 I expect this "tying" is due to broken ACPI GPEs that are firing on ACPI activity. We should have a fix for that shortly. Re: 2.6.3 worked, but later doesn't. Can you attach the 2.6.3 dmest and /proc/interrupts that mathes the recent 2.6 output -- looks like your earlier attachments were for a different motherboard. debug output shows that PCI Links look good, and quirks are being called when we expect them to be called. lspci-vv confirms: 0000:00:07.2 USB Interrupt: pin A ACPI case keeps USB on IRQ11: v: 1106 d: 3038 n: VIA Technologies, Inc. USB PCI: Calling quirk c04d2d10 for 0000:00:07.2 PCI: Calling quirk c04d2bd0 for 0000:00:07.2 PCI: Via IRQ fixup for 0000:00:07.2, re-programming 11 ... uhci_hcd 0000:00:07.2: irq 11, io base 0000a400 uhci_hcd 0000:00:07.2: new USB bus registered, assigned bus number 4 hub 4-0:1.0: USB hub found hub 4-0:1.0: 2 ports detected ... Freeing unused kernel memory: 412k freed usb 4-2: new full speed USB device using address 2 usb 4-2: config 1 has an invalid descriptor of length 24 usb 4-2: can't read configurations, error -22 usb 4-2: new full speed USB device using address 3 usb 4-2: config 1 has an invalid descriptor of length 24 usb 4-2: can't read configurations, error -22 ------------- pci=noacpi says USB is on IRQ 11 and quirk reprograms to 3, but then device claims 19 = 0x13? v: 1106 d: 3038 n: VIA Technologies, Inc. USB PCI: Calling quirk c04d2d10 for 0000:00:07.2 PCI: Calling quirk c04d2bd0 for 0000:00:07.2 PCI: Via IRQ fixup for 0000:00:07.2, from 11 to 3 uhci_hcd 0000:00:07.2: irq 19, io base 0000a400 uhci_hcd 0000:00:07.2: new USB bus registered, assigned bus number 4 hub 4-0:1.0: USB hub found hub 4-0:1.0: 2 ports detected ... Freeing unused kernel memory: 412k freed usb 4-2: new full speed USB device using address 2 EXT3 FS on hdb1, internal journal usb 4-2: control timeout on ep0out uhci_hcd 0000:00:07.2: Unlink after no-IRQ? Different ACPI or APIC settings may help. Q: is USB working properly in the pci=noacpi case? >re: IRQ9 increasing with IRQ19 >I expect this "tying" is due to broken ACPI GPEs that are firing on ACPI >activity. >We should have a fix for that shortly. I'm quite sure that this is the entire problem, so I'm really looking forward to the fix :) >Re: 2.6.3 worked, but later doesn't. >Can you attach the 2.6.3 dmest and /proc/interrupts that >mathes the recent 2.6 output -- looks like your earlier attachments >were for a different motherboard. I will in a flash, sorry about reporting about two different motherboards :( And sorry for messing everything up so badly. >debug output shows that PCI Links look good, and quirks >are being called when we expect them to be called. >lspci-vv confirms: 0000:00:07.2 USB Interrupt: pin A >ACPI case keeps USB on IRQ11: >v: 1106 d: 3038 n: VIA Technologies, Inc. USB >PCI: Calling quirk c04d2d10 for 0000:00:07.2 >PCI: Calling quirk c04d2bd0 for 0000:00:07.2 >PCI: Via IRQ fixup for 0000:00:07.2, re-programming 11 >... >uhci_hcd 0000:00:07.2: irq 11, io base 0000a400 >uhci_hcd 0000:00:07.2: new USB bus registered, assigned bus number 4 >hub 4-0:1.0: USB hub found >hub 4-0:1.0: 2 ports detected >... >Freeing unused kernel memory: 412k freed >usb 4-2: new full speed USB device using address 2 >usb 4-2: config 1 has an invalid descriptor of length 24 >usb 4-2: can't read configurations, error -22 >usb 4-2: new full speed USB device using address 3 >usb 4-2: config 1 has an invalid descriptor of length 24 >usb 4-2: can't read configurations, error -22 > >------------- >pci=noacpi says USB is on IRQ 11 and quirk reprograms to 3, >but then device claims 19 = 0x13? What does this mean? >v: 1106 d: 3038 n: VIA Technologies, Inc. USB >PCI: Calling quirk c04d2d10 for 0000:00:07.2 >PCI: Calling quirk c04d2bd0 for 0000:00:07.2 >PCI: Via IRQ fixup for 0000:00:07.2, from 11 to 3 > >uhci_hcd 0000:00:07.2: irq 19, io base 0000a400 >uhci_hcd 0000:00:07.2: new USB bus registered, assigned bus number 4 >hub 4-0:1.0: USB hub found >hub 4-0:1.0: 2 ports detected >... >Freeing unused kernel memory: 412k freed >usb 4-2: new full speed USB device using address 2 >EXT3 FS on hdb1, internal journal >usb 4-2: control timeout on ep0out >uhci_hcd 0000:00:07.2: Unlink after no-IRQ? Different ACPI or APIC settings >may help. >Q: is USB working properly in the pci=noacpi case? No. It's working fine with ACPI (at least before 2.6.4. It's working now as well, just for a short while) and with noapic, but not with pci=noacpi. It never has. I'm quite sure the DSDT of this motherboard is good, since you have seen it before, but if you want, I can attach it again? Created attachment 2870 [details]
dmesg from 2.6.3 with acpi
This is with acpi, tell me if you need pci=noapic, or noapic.
Created attachment 2871 [details]
interrupts with acpi from 2.6.3
I have the same symptoms with the VIA KT266A chipset and 2.6.6 (Debian, optimized for K7). i don't suppose if you boot 2.6.7 + the patch in bug 2574 it makes any difference? Nope. On one of the computers the irq for usb changed, but else no difference. :( No, it didn't help. IRQ 12 was disabled after 500000 interrupts, as shown below. (Both from 2.6.7-rc1, with patch from bug 2574.) In 2.6.6, booting with "noapic" keeps the problem from occurring. /proc/interrupts: CPU0 0: 42234694 IO-APIC-edge timer 1: 2403 IO-APIC-edge i8042 4: 6 IO-APIC-edge serial 7: 0 IO-APIC-edge parport0 8: 4 IO-APIC-edge rtc 9: 0 IO-APIC-level acpi 12: 500000 IO-APIC-level uhci_hcd, uhci_hcd, uhci_hcd 14: 90616 IO-APIC-edge ide0 15: 2 IO-APIC-edge ide1 16: 3650815 IO-APIC-level nvidia 17: 104950 IO-APIC-level CS46XX 18: 3593249 IO-APIC-level nvidia 19: 741081 IO-APIC-level eth0 Excerpted from dmesg: irq 12: nobody cared! [<c010834a>] __report_bad_irq+0x2a/0x90 [<c010843c>] note_interrupt+0x6c/0xa0 [<c0108711>] do_IRQ+0x121/0x130 [<c01069f4>] common_interrupt+0x18/0x20 [<c0104053>] default_idle+0x23/0x30 [<c01040bc>] cpu_idle+0x2c/0x40 [<c0320671>] start_kernel+0x1a1/0x1e0 [<c0320370>] unknown_bootoption+0x0/0x120 handlers: [<f0b9e300>] (usb_hcd_irq+0x0/0x70 [usbcore]) [<f0b9e300>] (usb_hcd_irq+0x0/0x70 [usbcore]) [<f0b9e300>] (usb_hcd_irq+0x0/0x70 [usbcore]) Disabling IRQ #12 NMI: 0 LOC: 42234807 ERR: 0 MIS: 0 Created attachment 2973 [details]
dmesg from 2.6.7-rc1, patched, after IRQ disable
ACPI: PCI interrupt 0000:00:11.1[A]: no GSI ACPI: PCI interrupt 0000:00:11.2[D] -> GSI 12 (level, low) -> IRQ 12 ACPI: PCI interrupt 0000:00:11.3[D] -> GSI 12 (level, low) -> IRQ 12 ACPI: PCI interrupt 0000:00:11.4[D] -> GSI 12 (level, low) -> IRQ 12 The unknown GSI should be okay, since it is IDE: VP_IDE: VIA vt8233 (rev 00) IDE UDMA100 controller on pci0000:00:11.1 (unless, of course, this interacts with the VIA quirks and is the cause of the actual failure;-) Maybe we should hack that one to 14 and see if the quirk then fixes the problem. Alternatively, maybe we should experiment with disabling the VIA quirks... Tom, Can you reproduce this failure w/o the binary nvidia driver loaded? Created attachment 2985 [details]
dmesg 2.6.7-rc1 with patch
Here you have my dmesg as well, if that helps.
As I have said earlier, I am quite confident that the problem is that the irq
count is increasing on usb when there is network traffic...
The same failure occurred in the same kernel as before (2.6.7-rc1+patch from bug 2574) without the nvidia module loaded: CPU0 0: 37591351 IO-APIC-edge timer 1: 9708 IO-APIC-edge i8042 4: 14 IO-APIC-edge serial 7: 0 IO-APIC-edge parport0 8: 4 IO-APIC-edge rtc 9: 0 IO-APIC-level acpi 12: 800000 IO-APIC-level uhci_hcd, uhci_hcd, uhci_hcd 14: 98036 IO-APIC-edge ide0 15: 2 IO-APIC-edge ide1 17: 499838 IO-APIC-level CS46XX 19: 1315618 IO-APIC-level eth0 NMI: 0 LOC: 37591442 ERR: 0 MIS: 0 For what it's worth, I also observe the count for IRQ 12 increasing more during heavy network traffic than light, even though eth0 is on a different IRQ. Created attachment 2990 [details]
dmesg from above failure
I don't understand why the USB devices are getting interrupts in IOAPIC mode when the NIC is active. Apparently these devices share an interrupt when in PIC mode, and some sort of link between them is not being broken when we're in IOAPIC mode? I think the USB interrupts are "hard-coded" to be on certain IRQs, but I may be wrong. Can you attach (or in your case Stian, point me to) the acpidmp output associated with the system under test? But if the USB interrupt that is getting nailed is under the control of a PCI Interrupt Link Device, then we should be able to move it off that IRQ with "acpi_irq_isa=12", say to make IRQ12 less attractive. If that is possible, it will be interesting to see if the problem follows the device to the new IRQ. Created attachment 2991 [details]
2.6 debug patch
please test this 2.6 debug patch in ACPI/APIC mode, and let me know
what differences you notice in dmesg, /proc/interrutps, or behaviour.
It is somewhat of a stab in the dark -- it disables all the PCI quirks.
Just out of curiousity, it also prints out the PIC in case something strange
goes on there.
Created attachment 2997 [details] dmesg with last debug patch Using acpi_irq_isa=11 did not change the usb interrupt from 11. Neither did I notice any difference with the debug patch, no interrupts changed. Attaching dmesg anyway. The dsdt is from bug #1164, which you helped me get a new BIOS from Rioworks :) Thanks. Anyway, I'll attach the dsdt too. Sure hope something can help you understand what's going on here, really annoying. Seems like this problem exists on all Via Apollo chipsets, have seen reports of it many places, on Debian mailing lists among others. Created attachment 2998 [details]
dsdt
Stian, So USB interrupts on IRQ11 are still incrementing with ethernet activity? Created attachment 2999 [details]
dmesg after IRQ failure, 2.6.7-rc1+above patch
The failure still occurs. I didn't notice any significant differences other
than it failed at an IRQ count of a different multiple of 100000.
CPU0
0: 11729247 IO-APIC-edge timer
1: 25308 IO-APIC-edge i8042
4: 6 IO-APIC-edge serial
7: 0 IO-APIC-edge parport0
8: 4 IO-APIC-edge rtc
9: 0 IO-APIC-level acpi
12: 200000 IO-APIC-level uhci_hcd, uhci_hcd, uhci_hcd
14: 22075 IO-APIC-edge ide0
15: 56 IO-APIC-edge ide1
17: 122143 IO-APIC-level CS46XX
19: 3455620 IO-APIC-level eth0
NMI: 0
LOC: 11729221
ERR: 0
MIS: 0
Created attachment 3000 [details]
acpidmp output
Here is the acpidmp output from the machine under question. It was taken in
IOAPIC mode, during the same session as the failure document above.
Also, using acpi_irq_isa=12 didn't change the IRQ assignments for me either. Len: yes, still incrementing usb irq's with network activity... tom's system: ACPI: PCI interrupt 0000:00:11.2[D] -> GSI 12 (level, low) -> IRQ 12 ACPI: PCI interrupt 0000:00:11.3[D] -> GSI 12 (level, low) -> IRQ 12 ACPI: PCI interrupt 0000:00:11.4[D] -> GSI 12 (level, low) -> IRQ 12 The three USB controllers are all on the same pin, and in APIC mode that pin is hard-coded to IRQ 12: Package (0x04) { 0x0011FFFF, 0x03, 0x00, 0x0C }, This explains why you can't move it with acpi_irq_isa=12. I just tested kernel 2.4.26 and 2.6.3. I still have incrementing usb irq's with network activity. But they don't cause the usb irq to be disabled with these kernels. That started happening with 2.6.4-rc1. Created attachment 3033 [details]
2.6 test patch for uhci_irq to not mask USBSTS_HCH, ala 2.6.3
Looks like 2.6.3 and 2.4.26 uhci_irq interprets the USBSTS_HCH (HC halted)
status bit to mean that it got a valid interrupt, but starting in 2.6.4 it
ignores
this bit. Probably if you apply this 1-line patch, it will revert latest 2.6
to the 2.6.3
behaviour.
If this is the case, then it means USB is taking interrupts when only
USBSTS_HCH
bit is set. I expect the interrupts are not actually coming from USB and are
spurious,
so tricking USB into claiming them is actually just treating the symptom and
hiding
the root cause. You should be able to get the same effect w/o any patch by
booting
with "noirqdebug".
Indeed, that patch "fixes" it :) And so does noirqdebug, which is a very fine workaround for me, at least :) So, I'm actually quite happy now. Are you going to try to find out why there are spurious usb-interrupts, or is the workaround good enough? Just curious :) Thanks for your as always hard work on linux acpi :) In IOAPIC mode, USB spurious interrupts corrolate with, but are not equal to, Ethernet interrupts. No proof that there is an ACPI bug -- unless we get MPS to set up the IOAPIC and don't see the problem there. We've run the system with and without the known VIA quirks but it didn't make any difference. USB changes caused the spurious interrupts to be noticed. However, that doesn't necessarily implicate a USB -- particularly since the spurious interrupts seem somehow related to Ethernet interrupts on a different interrupt line. Seems like manual "noirqdebug" is all we've got right now. Maybe there are some VIA specific quirks in this chipset that we've yet to be told about... Created attachment 3041 [details]
inteerrupts without acpi with apic.
Just FYI, I'll attach interrupts in IOAPIC mode with acpi=off. Here usb and eth
is at the same interrupt... USB doesn't work as supposed. It works, but it is
dog slow. Don't know if this tells you something. Anyway, I'll accept that I
have to use noirqdebug, good enough for me :)
Part of my dmesg:
usb 1-1: new full speed USB device using address 2
scsi0 : SCSI emulation for USB Mass Storage devices
Vendor: USB MASS Model: STORAGE DEVICE Rev: 0.10
Type: Direct-Access ANSI SCSI revision: 02
SCSI device sda: 31680 512-byte hdwr sectors (16 MB)
sda: assuming Write Enabled
sda: assuming drive cache: write through
sda: sda1
Attached scsi removable disk sda at scsi0, channel 0, id 0, lun 0
Attached scsi generic sg0 at scsi0, channel 0, id 0, lun 0, type 0
USB Mass Storage device found at 2
Created attachment 3043 [details]
Disable PIRQ for suspended UHCI USB controllers
Stian, try applying this patch. It will disable the PIRQ signal on your
suspended UHCI USB controllers, thus (hopefully) preventing them from
generating any interrupt requests when only the USBSTS_HCH bit is set. If you
still see excess interrupts, that will be a pretty clear indication that they
aren't coming from the USB controller.
Note: This is meant only for testing! The PIRQ signals don't ever get
re-enabled, so make sure all your USB devices are plugged in before the UHCI
driver is loaded. Or if you can, for an even better test, unplug all your USB
devices and see if the error still occurs.
Alan, no difference even with this patch. I disconnected all the usb-devices before testing. But thanks for looking into this bug :) Btw. I used the patch without the patch Len provided earlier, I guess that was what I was supposed to do. With Alan's patch by itself if USB still get shut down, or with Alan's patch + my USB hack USB interrupts continue, then that is proof that the USB hardware is not causing the interrupts. I guess this isn't a big surprise since we'd see the USB interrupt rate vary with ethernet activity, but it is a good double check. Re: acpi=off testing Please repeat the acpi=off test but include the "2.6 debug patch" above which will disable the VIA quirks. It isn't at all clear that those quirks are correct for IOAPIC mode because the PCI config space interrupt register can't even hold the value 19... (19 = 0x13; 0x13 & 0xF = 0x3) In any case, the ACPI vs MPS IOAPIC interrupts show ACPI: 11: 1048 1 IO-APIC-level uhci_hcd 19: 883 1 IO-APIC-level eth0 MPS: 19: 997 1 IO-APIC-level uhci_hcd, eth0 Assuming the MPS mode can be made to work, this suggests that UHCI and eth0 really do share an interrupt wire? Double checking the ACPI-mode info for this board... ACPI: PCI interrupt 0000:00:07.2[D] -> GSI 11 (level, low) -> IRQ 11 uhci_hcd 0000:00:07.2: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller uhci_hcd 0000:00:07.2: irq 11, io base 0000a400 uhci_hcd 0000:00:07.2: new USB bus registered, assigned bus number 1 DSDT: Package (0x04) { 0x0007FFFF, 0x03, 0x00, 0x0B }, device 7 pin D is hard-coded to IRQ11 in APIC mode, no question about that. in PIC mode it uses LNKD: Package (0x04) { 0x0007FFFF, 0x03, \_SB.PCI0.LNKD, 0x00 }, It should not matter, but it also happens to be initialized by the BIOS to IRQ11, and we keep it there: ACPI: PCI Interrupt Link [LNKD] (IRQs 1 3 4 5 6 7 10 *11 12 14 15) So it is clear that the BIOS is telling ACPI to put UHCI on IRQ11. Unclear why MPS puts it on IRQ19, and if that is correct or incorrect. I don't suppose you've got the Windows IOAPIC assignments for this board? the previous windows IRQ attachment seems to be for your other board since it has all the USB on IRQ 9 with ACPI. Created attachment 3044 [details]
2.6.7 debug patch moving IRQ11 to IRQ19
Stian, Here's a debug patch to apply on top of vanilla 2.6.7.
If the DSDT says a device uses IRQ11, it overrides the DSDT and returns 19.
This should make ACPI interrupts come out exactly like MPS interrupts.
As in the acpi=off case, you may need the "2.6 debug patch" above to
disable the VIA quirks to get USB working on an IRQ > 15.
If this patch (with or without the 2.6 debug patch) works, then it suggests
that the DSDT is incorrect in putting uhci on IRQ11 instead of IRQ19.
However, since we also get USB interrupts on IRQ11 when eth0 is actually
pulling on IRQ19, it also means that this chip-set or mother board still
has some physical connection between INTIN11 and INTIN19. ie.
the PIC-mode IRQ router tying these two devices to IRQ11 still
seems to be enabled, even though the system is in APIC mode.
Created attachment 3045 [details]
2.6.7 debug patch disabling all PCI Interrupt Links
Stian, Tom
Please try this 2.6.7 debug patch (all by itself) in ACPI/IOAPIC mode.
It will disable the all the PCI Interrupt Links on your system.
They're unused anyway in IOAPIC mode on these systems,
so it should do no harm. Please note differences in before/after
dmesg, /proc/interrupts, and USB or ethernet functionality.
If this works, then perhaps Linux needs to take the extra step
to disable unused PCI Interrupt Link Devices. This would be
surprising because:
1. BIOS initializes them to enabled, Linux doesn't enable them
2. _STA.enabled seems to mean nothing on many systems.
But if the IRQ11->IRQ19 patch works, it suggests that there
is some physical connection between IRQ19 and IRQ11 wires,
and perhaps this patch will sever that connection.
Created attachment 3053 [details]
dmesg with debug patch disabling all PCI Interrupt Links
Len, your last patch fixed everything. Now usb won't get interrupts from eth0
activity, while both usb and eth0 works perfectly. No change in
/proc/interrupts. I have attached dmesg for your reading pleasure :)
Will try the other patch in acpi=off mode in a moment, but this oneliner gave
me hope that this issue eventually will get fixed :) Btw. I guess noone are
shocked that VIA bioses are buggy...
Created attachment 3054 [details]
IRQ's in Windows XP
Hmm. Ok, even with the debug patch, usb does not work reliably in ioapic mode without acpi. It's still dog slow. With the "2.6.7 debug patch moving IRQ11 to IRQ19", I get the same behaviour with acpi as I did in ioapic mode -> usb on irq 19 and dog slow. So the only thing working so far is "2.6.7 debug patch disabling all PCI Interrupt Links". Hope this helps :) Stian, Thanks for confirming that winxp also puts USB on IRQ11. This is consistent with the analysis above, and is further confirmation that MPS is simply broken on this machine with it puts USB on IRQ19. Thanks for confirming that hacking ACPI to put USB onto IRQ19 like MPS didn't work. I expect it is dog slow because the USB interrupts are being ignored on IRQ11, and the USB driver only gets interrupts when they are caused by ethernet activity. MPS tables are apparently incorrect on this system, but we're not going to worry about that... Thanks for confirming that disabling the PIC-mode PCI Interrupt Links fixed all the problems. ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT] Stian: disabling Link ACPI: PCI Interrupt Link [LNKA] (IRQs 1 3 4 5 6 7 10 11 12 14 15) *0 Stian: disabling Link ACPI: PCI Interrupt Link [LNKB] (IRQs 1 3 4 5 6 7 10 11 12 14 15) *0 Stian: disabling Link ACPI: PCI Interrupt Link [LNKC] (IRQs 1 3 4 5 6 7 10 11 12 14 15) *0, disabled. Stian: disabling Link ACPI: PCI Interrupt Link [LNKD] (IRQs 1 3 4 5 6 7 10 11 12 14 15) *0 _DIS on the links causd _CRS to not return a valid IRQ. Curious that it didn't cause _STA to return !enabled, like on LNKC which was disabled by the BIOS. We're going to need to experiment with this a bit to come up with a fix that works everywhere. Created attachment 3056 [details]
2.6.7 debug patch ignoring all PCI Interrupt Links
Stian, Tom,
Please try this patch all by itself in ACPI/IOAPIC mode on vanilla 2.6.7.
It will ignore all PCI Interrupt Link Devices and invoke no methods on them.
The goal is to discover is if this is sufficient, or if we need to go
and explicitly _DIS all the links that we're not using, like in the previous
patch.
That last patch did not fix it, I was back to increasing interrupts on irq 11 when there was network activity. Is there a problem with the first patch? Does it break any systems? I have a couple of questions (just to enlighten myself, not to question your conclusions :) You say that the MPS is incorrect on this system. Is that because it says usb is on irq 19? Isn't that what's causing the interrupts on irq 11 increasing when there are network activity on irq 19? Since several different motherboards have the same problem, do they all have broken MPS? And last, what OS should I try to check wether USB won't work because of broken MPS table? Windows NT doesn't support ACPI, but it don't support usb, either. Windows 98? Windows 95? Hope you understand my questions, I'm not very good at expressing my thoughts :( Hmm. On my other system this patch did not work. I don't want to confuse you, so I will not attach any logs or stuff unless you want me to. Anyway, if I get one system fixed, I'll be very happy :) Note to self: Never buy a VIA system again. Hmm. The other system: Stian: disabling Link ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 6 7 9 10 11 12 14 15) *0 (...) ACPI: PCI Interrupt Link [LNKB] enabled at IRQ 10 00:00:11[B] -> 2-10 -> IRQ 10 level low (...) PCI: Via IRQ fixup for 0000:00:11.2, from 9 to 10 PCI: Via IRQ fixup for 0000:00:11.3, from 9 to 10 PCI: Via IRQ fixup for 0000:00:11.4, from 9 to 10 This it _with_ the patch. So it seems this system enables PCI Interrupt links even with the patch. Oh well, enough confusion :) Stian, re: the "other" system (does it have a name)? its dmesg are in comment #11, yes dmesg shows that it uses its PCI Interrupt Links even in APIC mode. So the patch to disable all links, and the patch to ignore all links aren't really set up for that situation. Lets stick to primary debug system and when we get that working we can test on the other system. Stian, Tom, Any news with the patch in comment #79? Len, you seem to have ignored my comment #80 where I stated that the patch in comment #79 did _not_ fix the problem... Created attachment 3078 [details] dmesg from 2.6.7-rc2+comment #79, after IRQ disable I can also confirm that the patch in comment #79 did not fix the problem. eth0 on IRQ 19, uhci_hcd on IRQ 12, and the latter got disabled after 600000 interrupts on IRQ 12. Thanks for the test result Tom. Stian, oops, looks like i read comment #81 and #82, and skipped right over #80. man, this bug report is getting long! yes, the patch in comment #74 will fix systems such as yours where the links are unusd in IOAPIC mode, but will break pretty much the reset of the planet -- including your 2nd machine. The debug patch in comment #79 to ignore all the links was to determine if a simpler patch would work, and you both proved it will not. It is now clear that we need to _DIS the unused links, but the hard part is discovering that they're un-used. We may instead need to capture the _CRS for all links, _DIS all links, and then _SRS the links in use to re-enable them. comment #72 explains why it is certain that USB is on IRQ11 in IOAPIC mode. I belive that the fact that MPS puts USB on IRQ19 is a bug -- this based on the assumption that the system is designed so that MPS and ACPI modes should give the same routing; plus the observation that it doesn't work in MPS mode up on IRQ19. (if you unplug your ethernet in that coinfiguration, I expect your USB will get worse still -- depending on how the USB HA handles life w/o any interrupts.) I think what is happening here is that the chipset ties eth0 and USB together on IRQ11 in PIC mode with a PIRQ router (LNKB). In IOAPIC mode, USB is on IRQ11 and eth0 is on IRQ19. However, unless the PIRQ router is disabled, it is still picking up interrupts signals from eth0 and pulling on IRQ11. I had not expected we'd need to explicity disable unused PCI Interrupt Links, but these systems prove that is necessary. BTW. Questions are good for all, keep 'em coming. Created attachment 3082 [details] dmesg from 2.6.7-rc2+comment #74 FWIW, the patch in comment #74 also decoupled eth0 interrupts from uhci_hcd interrupts on my system. Created attachment 3142 [details]
proposed final patch
Please test this proposed final patch.
It should apply w/ small offset to both 2.4 and 2.6.
Yep, this fixes the first system :D Thanks. Should I file a seperate bug for the other one, which exhibits excactly the same behaviour, except that this patch doesn't fix it? Thanks for verifying that system #1 is fixed. Re: system #2. yes, if this fix plus the fix in bug 2574 (contained in the -mm tree) doesn't address that system, then please open a new bug. Even with both patches, I still get the same behaviour. Made a new bug, bug#2874 about the other system. Thanks for your kind help so far, and sorry for making new bugs all the time... The latest patch (comment #89) works for me. Thanks very much! I'm seeing this too, with Toshiba Satellite 2800-500, kernel-2.6.6-1.435.2.3. See http://groups.google.com/groups?q=IRQ+issues,+(nobody+cared,+disabled), +not+USB&hl=en&lr=&ie=UTF-8&selm=2gIc7-6pd-19%40gated-at.bofh.it&rnum=1 for a description. The problem remains the same with 2.6.7-mm7. full acpi, nor "noacpi" work. With "noacpi acpi=off", it works. I'll attach the full acpi dmesg. Created attachment 3341 [details]
2.6.7-mm7 dmesg with the acpi irq oops
Uhh, I seem to have confused the (non-existent) "noacpi" argument with the (correct) "pci=noacpi". Using "pci=noacpi acpi=noirq" it seems to work. ACPI: Subsystem revision 20040615 ACPI: IRQ9 SCI: Level Trigger. ACPI-0169: *** Error: No object was returned from [\_SB_.LNKA._STA] (Node cffee5e0), AE_NOT_EXIST Ville, your system has a completely different problem, please open a new bug. I opened a new one (bug #3096) - I hope you have time to look at it at some point... this fix shipped in 2.4.27 and 2.6.8.1 closing. |