Bug 1933 - PDC202XX DMA lost interrupt under high I/O load
Summary: PDC202XX DMA lost interrupt under high I/O load
Status: REJECTED INSUFFICIENT_DATA
Alias: None
Product: IO/Storage
Classification: Unclassified
Component: IDE (show other bugs)
Hardware: i386 Linux
: P2 high
Assignee: Bartlomiej Zolnierkiewicz
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2004-01-22 04:28 UTC by Christian Balzer
Modified: 2006-01-31 03:52 UTC (History)
1 user (show)

See Also:
Kernel Version: 2.6.1
Subsystem:
Regression: ---
Bisected commit-id:


Attachments

Description Christian Balzer 2004-01-22 04:28:56 UTC
Distribution: Debian (sarge) 
Hardware Environment: Dual Opteron 240, Rioworks HDAMA mainboard, 2GB mem, 
2x160GB ATA to onboard controller, 4x160GB SATA to PDC20318, 2x80GB ATA 
to PDC20269 (RAID-0, md). 
lspci says: 
00:06.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8111 PCI (rev 07) 
00:07.0 ISA bridge: Advanced Micro Devices [AMD] AMD-8111 LPC (rev 05) 
00:07.1 IDE interface: Advanced Micro Devices [AMD] AMD-8111 IDE (rev 03) 
00:07.2 SMBus: Advanced Micro Devices [AMD] AMD-8111 SMBus 2.0 (rev 02) 
00:07.3 Bridge: Advanced Micro Devices [AMD] AMD-8111 ACPI (rev 05) 
00:0a.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8131 PCI-X Bridge (rev 12) 
00:0a.1 PIC: Advanced Micro Devices [AMD] AMD-8131 PCI-X APIC (rev 01) 
00:0b.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8131 PCI-X Bridge (rev 12) 
00:0b.1 PIC: Advanced Micro Devices [AMD] AMD-8131 PCI-X APIC (rev 01) 
00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 NorthBridge 
00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 NorthBridge 
00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 NorthBridge 
00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 NorthBridge 
00:19.0 Host bridge: Advanced Micro Devices [AMD] K8 NorthBridge 
00:19.1 Host bridge: Advanced Micro Devices [AMD] K8 NorthBridge 
00:19.2 Host bridge: Advanced Micro Devices [AMD] K8 NorthBridge 
00:19.3 Host bridge: Advanced Micro Devices [AMD] K8 NorthBridge 
01:00.0 USB Controller: Advanced Micro Devices [AMD] AMD-8111 USB (rev 0b) 
01:00.1 USB Controller: Advanced Micro Devices [AMD] AMD-8111 USB (rev 0b) 
01:04.0 Ethernet controller: 3Com Corporation 3c905 100BaseTX [Boomerang] 
01:05.0 Unknown mass storage controller: Promise Technology, Inc. 20269 (rev 
02) 
01:06.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27) 
02:02.0 Unknown mass storage controller: Promise Technology, Inc. PDC20318 
(SATA150 TX4) (rev 02) 
02:03.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5702 Gigabit 
Ethernet (rev 02) 
02:04.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5702 Gigabit 
Ethernet (rev 02) 
02:05.0 RAID bus controller: Promise Technology, Inc.: Unknown device 3319 (rev 
02) 
 
Software Environment: This is a news reader box, standard Debian packages  
plus INN2. 2.6.1 stock kernel, 32bit mode. 
 
Problem Description: When running expireover the following occurs (nearly 
in all instances, though not necessarly at the same time into this 6 hours 
long task): 
--- 
Jan 18 06:47:08 nnrp kernel: PDC202XX: Secondary channel reset. 
Jan 18 06:47:08 nnrp kernel: hdg: DMA interrupt recovery 
Jan 18 06:47:08 nnrp kernel: hdg: lost interrupt 
Jan 18 06:47:28 nnrp kernel: hde: dma_timer_expiry: dma status == 0x24 
Jan 18 06:47:28 nnrp kernel: hdg: dma_timer_expiry: dma status == 0x24 
Jan 18 06:47:38 nnrp kernel: PDC202XX: Primary channel reset. 
Jan 18 06:47:38 nnrp kernel: hde: DMA interrupt recovery 
Jan 18 06:47:38 nnrp kernel: hde: lost interrupt 
[rinse, repeat, ad infinitivum] 
--- 
This is the aforementioned Promise 20269, the first half of the 2 attached 
80GB drives form a RAID-0 for the overview data. Interrupt count does 
no longer increase for that card after this, if a reboot is issued it takes 
hours to take effect, etc. News overview is fairly I/O intense task even  
when expireover is not running, watch the interrupt counts before an expire  
run (int 17 is the card in question): 
--- 
           CPU0       CPU1 
  0:    6001788         17    IO-APIC-edge  timer 
  1:          8          1    IO-APIC-edge  i8042 
  2:          0          0          XT-PIC  cascade 
  8:          0          1    IO-APIC-edge  rtc 
 14:     108464          1    IO-APIC-edge  ide0 
 15:     110926          1    IO-APIC-edge  ide1 
 17:     207723          9   IO-APIC-level  ide2, ide3 
 25:          0          0   IO-APIC-level  libata 
 26:     184958          1   IO-APIC-level  libata 
 27:    5038624    5401352   IO-APIC-level  eth0, eth1 
NMI:          0          0 
LOC:    6001337    6001556 
ERR:          0 
MIS:          8 
--- 
If it survives an expire run, it's  accumulated interrupt count exceeds 
that of all the other IDE subsystems combined.  
Kernel was compiled several times to exclude potential culprits like  
APCI, the card was moved from a 64bit/66Mhz slot (where it shared an  
interrupt with the inactive/unused onboard SATA controller) to a plain old  
standard PCI slot, etc. Nothing seems to make a difference. Current 
kernel was compiled non-preemptible, but I don't expect it to survive the 
night either. This machine was tested for about 2 weeks in parallel to 
the old newsreader box and this problem did not occur. Murphy waited 
for it to become the production of course. Possible causes for it's 
survival back then would be a smaller overview database (it will keep  
growing for some 2 more months) and running 2.6.0 at that time. Considering 
the security implications going back to 2.6.0 is not really an option, though. 
 
Steps to reproduce: run expireover on this box. ;) 
Alas similar loads (for example running makehistory to rebuild just 
the overview database) have failed to cause this problem. So far.
Comment 1 Adrian Bunk 2005-07-04 18:58:40 UTC
Is this problem still present in kernel 2.6.12.2?
Comment 2 Adrian Bunk 2006-01-31 03:52:46 UTC
I'm assuming this issue is already fixed.

Please reopen this bug if it's still present in recent 2.6 kernels.

Note You need to log in before you can comment on or make changes to this bug.