This has been going on for a while (since mid-2005, at least), and it hasn't magically fixed itself, so I'm finally filing a bug. Distribution: Fedora Core 5 Hardware Environment: MSI Master2 FAR based dual-Opteron system. 00:00.0 Host bridge: VIA Technologies, Inc. VT8385 [K8T800 AGP] Host Bridge (rev 01) 00:01.0 PCI bridge: VIA Technologies, Inc. VT8237 PCI bridge [K8T800/K8T890 South] 00:05.0 Multimedia audio controller: Creative Labs SB Live! EMU10k1 (rev 07) 00:05.1 Input device controller: Creative Labs SB Live! MIDI/Game Port (rev 07) 00:0b.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5705 Gigabit Ethernet (rev 03) 00:0f.0 RAID bus controller: VIA Technologies, Inc. VIA VT6420 SATA RAID Controller (rev 80) 00:0f.1 IDE interface: VIA Technologies, Inc. VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06) 00:10.0 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81) 00:10.1 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81) 00:10.2 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81) 00:10.4 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 86) 00:11.0 ISA bridge: VIA Technologies, Inc. VT8237 ISA bridge [KT600/K8T800/K8T890 South] 00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration 00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map 00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller 00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control 00:19.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration 00:19.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map 00:19.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller 00:19.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control 01:00.0 VGA compatible controller: ATI Technologies Inc Radeon R200 QH [Radeon 8500] (rev 80) Problem Description: Randomly (but fairly often), I stop getting interrupts from my SATA controller or my UHCI controllers or both -- the counters in /proc/interrupts stop increasing and I get libata timeout errors or my USB HID mouse movement gets erratic (because the driver is polling instea of waiting for interrupts). The devices still work -- IO to the disk will (eventually) complete and my mouse or gamepad or whatever still work (but not very well). If I remove and then reload the driver modules (sata_via or uhci_hcd) or unbind/bind the devices in sysfs, the problem immediately goes away. Steps to reproduce: It's random, so there aren't any. I think GL usage may be a factor, because if I disable all my GL screensavers and don't otherwise use GL, the incidence goes down (but doesn't stop altogether). Likewise, network activity may also be a factor, because this almost always occurs when I'm downloading something to the SATA disk with BitTorrent (although that may be just when I'm more likely to notice it). Of course, it'll also happen when neither GL or network activity is going on, so I have no idea what is actually going on.
Please make sure you are using latest BIOS. Please try acpi=off to make sure this is acpi related issue.
I have the latest non-beta BIOS for this system. I can't reproduce this with acpi=off. (I can't reproduce with noapic, either.) However, I'm not able to reproduce it reliably even with ACPI on, and when I boot the system with acpi=off my UHCI controllers and my graphics card don't get interrupts (the error is something similar to "kernel: PCI: No IRQ known for interrupt pin A of device 0000:01:00.0. Probably buggy MP table." for each the devices), so the difference in configuration may be a contributing factor.
>This has been going on for a while (since mid-2005, at least), and it hasn't >magically fixed itself, so I'm finally filing a bug Does it work before? If you remove USB support in kernel, can you see interrupt lost on SATA?
> Does it work before? As far as I know, yes, but I didn't own any SATA disks and the SATA controller was disabled in the BIOS. I also don't remember if there was a time period when I was using SATA and this wasn't happening. > If you remove USB support in kernel, can you see interrupt lost on SATA? I tried temporarily blacklisting the uhci-hcd and ehci-hcd modules (which prevents them from ever being loaded), and wasn't able to reproduce the error, but, once again, this problem can't be reproduced on demand. I also wrote a little SystemTap script which directly calls unmask_IO_APIC_irq with the SATA controller's IRQ number. If I run this script when I'm not getting any interrupts from the SATA controller, I start getting interrupts again. This suggests to me that the interrupt may be disabled at the IOAPIC level without the kernel's knowledge. I also wrote another SystemTap script which records stack traces everytime unmask_IO_APIC_irq or mask_IO_APIC_irq are called in order to determine if the IOAPIC is actually disabling the IRQ without the kernel's knowledge. However, whenever I leave this script running, I've never been able to reproduce problem. This suggests to me that their may be IOAPIC access timing issues involved.
> VT8237 PCI bridge please try this patch http://lkml.org/lkml/diff/2006/9/7/235/1 and see this bug http://bugme.osdl.org/show_bug.cgi?id=6419
This problem predates the following commit: commit 75cf7456dd87335f574dcd53c4ae616a2ad71a11 Author: Chris Wedgwood <cw@f00f.org> Date: Tue Apr 18 23:57:09 2006 -0700 Subject: [PATCH] PCI quirk: VIA IRQ fixup should only run for VIA southbridges Prior to this commit, quirk_via_irq was running on my machine but not doing anything (no "PCI: VIA IRQ fixup for %s, from %d to %d\n" message was printed at boot.) Were I to apply the suggested patch, it still wouldn't do anything (the conditions for it's activation have gotten more restrictive, not less), so I'm not going to waste my time configuring and building a kernel to test it.
(In reply to comment #6) > Prior to this commit, quirk_via_irq was running on my machine but not doing > anything (no "PCI: VIA IRQ fixup for %s, from %d to %d\n" message was printed > at boot.) before this patch (http://lkml.org/lkml/2006/4/19/16) any_VIA_PCI was quirked, so yours too . yes "PCI: VIA IRQ fixup for %s, from %d to %d\n" message was printed at boot. you can just apply the patch and just make bzImage and install (which is cp arch/boot/bzImage over /boot/vmlinuz-2.6.17) and reboot, don't need to recompile all over again. Last thing, can you attach your dmesg and cat /proc/interrupts
Created attachment 9039 [details] dmesg from boot My dmesg, as requested.
Created attachment 9040 [details] /proc/interrupts /proc/interrupts, as requested.
Looks like I spoke too soon -- the quirk is getting applied to my system. I remember doing something to change whether or not that quirk gets run (see https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=190309 ), but I don't remember if what I did is similar in result to this new patch. Looks like I'll actually have to configure and build a kernel. I haven't had to do that for months. *sigh*
OK, that patch didn't help any, which is consistent with my prior experimentations with that VIA PCI quirk.
ok, let me see your dmesg , with patch applied, please
Created attachment 9063 [details] dmesg diff, post patch
OK I need other report vi /etc/X11/xorg.conf Please comment # Load "dri" and see if boot with X without problems, let see if it is a problem with via_agp please use the post past
This system doesn't use VIA AGP.
I saw this 177: 4143175 0 IO-APIC-level EMU10K1, eth0, radeon@pci:0000:01:00.0 so what hardware have this system ? may you attach "lspci -vvv" and "lsmod | grep via", please ?
Created attachment 9080 [details] lspci -vvv output
[nicholas@entropy ~]$ lsmod | grep via i2c_viapro 43353 0 i2c_core 60993 3 w83627hf,i2c_isa,i2c_viapro sata_via 42821 1 libata 113113 1 sata_via
can you reproduce this issue with 2.6.21-rc5 or later? can you reproduce this issue with nmi_watchdog=0?
Please reopen this bug if: - it is still present with kernel 2.6.21 and - you can provide the requested information.