Distribution: Debian (sarge) Hardware Environment: Dual Opteron 240, Rioworks HDAMA mainboard, 2GB mem, 2x160GB ATA to onboard controller, 4x160GB SATA to PDC20318, 2x80GB ATA to PDC20269 (RAID-0, md). lspci says: 00:06.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8111 PCI (rev 07) 00:07.0 ISA bridge: Advanced Micro Devices [AMD] AMD-8111 LPC (rev 05) 00:07.1 IDE interface: Advanced Micro Devices [AMD] AMD-8111 IDE (rev 03) 00:07.2 SMBus: Advanced Micro Devices [AMD] AMD-8111 SMBus 2.0 (rev 02) 00:07.3 Bridge: Advanced Micro Devices [AMD] AMD-8111 ACPI (rev 05) 00:0a.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8131 PCI-X Bridge (rev 12) 00:0a.1 PIC: Advanced Micro Devices [AMD] AMD-8131 PCI-X APIC (rev 01) 00:0b.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8131 PCI-X Bridge (rev 12) 00:0b.1 PIC: Advanced Micro Devices [AMD] AMD-8131 PCI-X APIC (rev 01) 00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 NorthBridge 00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 NorthBridge 00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 NorthBridge 00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 NorthBridge 00:19.0 Host bridge: Advanced Micro Devices [AMD] K8 NorthBridge 00:19.1 Host bridge: Advanced Micro Devices [AMD] K8 NorthBridge 00:19.2 Host bridge: Advanced Micro Devices [AMD] K8 NorthBridge 00:19.3 Host bridge: Advanced Micro Devices [AMD] K8 NorthBridge 01:00.0 USB Controller: Advanced Micro Devices [AMD] AMD-8111 USB (rev 0b) 01:00.1 USB Controller: Advanced Micro Devices [AMD] AMD-8111 USB (rev 0b) 01:04.0 Ethernet controller: 3Com Corporation 3c905 100BaseTX [Boomerang] 01:05.0 Unknown mass storage controller: Promise Technology, Inc. 20269 (rev 02) 01:06.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27) 02:02.0 Unknown mass storage controller: Promise Technology, Inc. PDC20318 (SATA150 TX4) (rev 02) 02:03.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5702 Gigabit Ethernet (rev 02) 02:04.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5702 Gigabit Ethernet (rev 02) 02:05.0 RAID bus controller: Promise Technology, Inc.: Unknown device 3319 (rev 02) Software Environment: This is a news reader box, standard Debian packages plus INN2. 2.6.1 stock kernel, 32bit mode. Problem Description: When running expireover the following occurs (nearly in all instances, though not necessarly at the same time into this 6 hours long task): --- Jan 18 06:47:08 nnrp kernel: PDC202XX: Secondary channel reset. Jan 18 06:47:08 nnrp kernel: hdg: DMA interrupt recovery Jan 18 06:47:08 nnrp kernel: hdg: lost interrupt Jan 18 06:47:28 nnrp kernel: hde: dma_timer_expiry: dma status == 0x24 Jan 18 06:47:28 nnrp kernel: hdg: dma_timer_expiry: dma status == 0x24 Jan 18 06:47:38 nnrp kernel: PDC202XX: Primary channel reset. Jan 18 06:47:38 nnrp kernel: hde: DMA interrupt recovery Jan 18 06:47:38 nnrp kernel: hde: lost interrupt [rinse, repeat, ad infinitivum] --- This is the aforementioned Promise 20269, the first half of the 2 attached 80GB drives form a RAID-0 for the overview data. Interrupt count does no longer increase for that card after this, if a reboot is issued it takes hours to take effect, etc. News overview is fairly I/O intense task even when expireover is not running, watch the interrupt counts before an expire run (int 17 is the card in question): --- CPU0 CPU1 0: 6001788 17 IO-APIC-edge timer 1: 8 1 IO-APIC-edge i8042 2: 0 0 XT-PIC cascade 8: 0 1 IO-APIC-edge rtc 14: 108464 1 IO-APIC-edge ide0 15: 110926 1 IO-APIC-edge ide1 17: 207723 9 IO-APIC-level ide2, ide3 25: 0 0 IO-APIC-level libata 26: 184958 1 IO-APIC-level libata 27: 5038624 5401352 IO-APIC-level eth0, eth1 NMI: 0 0 LOC: 6001337 6001556 ERR: 0 MIS: 8 --- If it survives an expire run, it's accumulated interrupt count exceeds that of all the other IDE subsystems combined. Kernel was compiled several times to exclude potential culprits like APCI, the card was moved from a 64bit/66Mhz slot (where it shared an interrupt with the inactive/unused onboard SATA controller) to a plain old standard PCI slot, etc. Nothing seems to make a difference. Current kernel was compiled non-preemptible, but I don't expect it to survive the night either. This machine was tested for about 2 weeks in parallel to the old newsreader box and this problem did not occur. Murphy waited for it to become the production of course. Possible causes for it's survival back then would be a smaller overview database (it will keep growing for some 2 more months) and running 2.6.0 at that time. Considering the security implications going back to 2.6.0 is not really an option, though. Steps to reproduce: run expireover on this box. ;) Alas similar loads (for example running makehistory to rebuild just the overview database) have failed to cause this problem. So far.
Is this problem still present in kernel 2.6.12.2?
I'm assuming this issue is already fixed. Please reopen this bug if it's still present in recent 2.6 kernels.