Most recent kernel where this bug did not occur: none Distribution: Fedora Core 5 Hardware Environment: Intel 440BX chipset mobo + PIII + PDC20267 Software Environment: Problem Description: With the libata pdc202xx_old driver, as soon as the boot process starts accessing the file systems on the 20267, many ATA bus errors occur. It's not until EH reduces speed to PIO4 that the errors stop. With the IDE PDC202XX_OLD driver there aren't any problems. Steps to reproduce: Just boot.
Created attachment 13478 [details] dmesg from 2.6.24-rc2 booting with IDE PDC202XX_OLD Added boot log from IDE PDC202XX_OLD (working).
Created attachment 13479 [details] dmesg from 2.6.24-rc2 booting with libata pata_pdc202xx_old Added boot log from using libata pata_pdc202xx_old (broken).
I have this problem with AMD 768 SMP system, and joined OpenSuSE B'Zilla Id https://bugzilla.novell.com/show_bug.cgi?id=335505 Workround : brokenmodules=pata_pdc202xx_old boot parameter. That system is currently available for testing effort.
Does SuSE have CONFIG_PDC202XX_BURST set ?
Wow!!! I guess I gotta download Fedora now... Nice surprise Alan Thank you :) Anyway ... rob@oak:~> zcat /proc/config.gz | grep BURST CONFIG_PDC202XX_BURST=y # CONFIG_ATM_ENI_TUNE_BURST is not set I can download the src RPM and build a kernel with that turned off, or an -ac kernel from somewhere if you prefer.
I think the problem is the reverse - The old IDE driver supports burst mode and its enabled in vendor kernels - The libata one does not and this seems to cause timeouts for some users I'll push some patches upstream to enable burst mode in the new driver and see what happens with that change.
If you'ld like a test done, that dual Athlon box is idle as it's had a number of clean OS installs that I'm happy to repeat if things go belly up. Without the Promise controller, I've had to swap hardware around.
Possible patch attached
Created attachment 13714 [details] Patch to enable burst mode
Looks pretty good so far! I have now 2.6.24-rc3-mm1-pdcburst, bootable with burstmode libata and brokenmodule="pata_pdc202xx_old" to compare the libata SCSI IDE module chain, against the old IDE drivers. Both 2.6.24-rc3-mm1-pdcburst show an error logged : Nov 24 15:31:40 oak kernel: end_request: I/O error, dev sda, sector 14989671 Nov 24 15:31:40 oak kernel: sd 0:0:0:0: [sda] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK,SUGGEST_OK This occurs in bursts on booting, I'll append logfile. bonnie++ is running and I'm getting peaks of 21,500+ Blk_wrtn as shown by iostat(8). So I could benchmark (bonnie++, dbench are installed) the 3 kernels 2.6.22.12-0.1 old IDE, 2.6.24-rc3-mm1-pdcburst old IDE and 2.6.24-rc3-mm1-pdcburst pata_pdc202xx (AC fix) Maybe some other test would be more useful to you? I shall try and use the machine heavily next week, and see if I can break something.
Created attachment 13733 [details] Logs documenting boot 2.6.24-rc3-mm1 + pdcburst patch This is the first boot using brokenmodules=pata_pdc202xx_old, there's a new error message that crops up, and booting seems to "stutter" at this point. This may be just due to extra debug info, but it needs to be eliminated before release or end users will complain. Removing the "brokenmodules" and loading pata_pdc202xx_old via initrd, results in very similar logs (not submitted as duplicate). I suspect this isn't of much interest, apart from documenting tested hardware combo. Thanks for the efforts, and the box remains available for tests.
the burst of messages on booting is coming as far as I can tell from a different scsi midlayer bug in -rc3-mm (its an -rc so has other bugs to knock out still). I don't think its related to the pdc202xx stuff but obviously needs fixing anyway.
Created attachment 13742 [details] Boot Log with PDC burst There's "soft resetting link" errors using pata_202xx_old, similar to those logged in the Open SuSE Bugzilla. This occurs when swap is activated, just as before; and ATA mode is reduced, though IIRC it went from UDMA to PIO mode with SuSE 2.6.22.12.0-1 kernel. AFAIK both disks and cable are good, I have re-checked (smartctl -a) errors on 'sdb / hda' and they remain same at '39'. What I'll do is go back to "brokenmodule=pata_pdc202xx_old" but stay with 2.6.24-rc3-mm1-pdcburst and see if I can provoke this error using old ATA driver, or whether it continues at ATA100 speed. I guess I better check for "silent" data corruption, may be the old ATA driver doesn't do as much checking?
Created attachment 13744 [details] Boot Log with 2nd disk on Mobo controller Error messages on swap activation go. Disk seen as sdb again, hdparm reports ATA mode udma5, same disk, same cable, just attached to 2nd IDE replacing DVD/CDR drive. Seems like the problem remains with the controller, I have done a lot to try to eliminate possibility of hardware issues being cause. Slight anomaly is the "hdparm -tT /dev/sd<X>", the 1st drive exact same make claims 30MB/s, and 2nd generally 20MB/s.
Robert, can you please post the followings? 1. The result of "lspci -nnvvvxxx" w/ harddisk attached to the pdc controller and pata_pdc202xx_old loaded before any error occurs. 2. The result of "lspci -nnvvvxxx" w/ harddisk attached to the pdc controller and IDE pdc202xx_old driver loaded. Thanks.
Created attachment 13776 [details] libata 140116 2 pata_pdc202xx_old,pata_amd output /sbin/lspci -nnvvvxxx
Created attachment 13777 [details] libata 140116 1 pata_amd output /sbin/lspci -nnvvvxxx Command output as requested.
Sorry can you do them as root - otherwise it doesn't dump 0x40->0xFF as they aren't accessible to normal users.
Comment on attachment 13777 [details] libata 140116 1 pata_amd output /sbin/lspci -nnvvvxxx Wed Nov 28 15:51:18 GMT 2007 pata_amd 13316 8 libata 140116 1 pata_amd Filesystem 1K-blocks Used Available Use% Mounted on /dev/hda9 4806048 290528 4271384 7% /work 00:00.0 Host bridge [Class 0600]: Advanced Micro Devices [AMD] AMD-760 MP [IGD4-2P] System Controller [1022:700c] (rev 11) Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- <PERR- Latency: 32 Region 0: Memory at e8000000 (32-bit, prefetchable) [size=128M] Region 1: Memory at f7000000 (32-bit, prefetchable) [size=4K] Region 2: I/O ports at ec00 [disabled] [size=4] Capabilities: [a0] AGP version 2.0 Status: RQ=16 Iso- ArqSz=0 Cal=0 SBA+ ITACoh- GART64- HTrans- 64bit- FW+ AGP3- Rate=x1,x2,x4 Command: RQ=1 ArqSz=0 Cal=0 SBA+ AGP+ GART64- 64bit- FW- Rate=x1 00: 22 10 0c 70 06 00 30 22 11 00 00 06 00 20 00 00 10: 08 00 00 e8 08 00 00 f7 01 ec 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 30: 00 00 00 00 a0 00 00 00 00 00 00 00 00 00 00 00 40: 00 00 00 00 19 1b 00 00 02 0e 00 00 26 00 00 00 50: 60 59 69 00 4a 8c 01 fe 05 00 22 da 00 00 00 00 60: bd 0c b3 85 1b 36 e2 5e bd 0c b3 85 1b 36 e2 5e 70: 00 06 04 00 00 00 00 00 00 00 00 00 00 00 00 00 80: 01 01 0f 00 97 10 83 00 73 03 f0 00 00 00 00 00 90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 40 a0: 02 00 20 00 17 02 00 0f 01 03 00 00 05 00 01 00 b0: 00 00 00 00 8a 00 01 b8 8f ff 04 c5 00 00 00 00 c0: 85 1f 00 00 00 00 00 00 85 1f 00 20 00 00 00 00 d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00:01.0 PCI bridge [Class 0604]: Advanced Micro Devices [AMD] AMD-760 MP [IGD4-2P] AGP Bridge [1022:700d] (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- Status: Cap- 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 32 Bus: primary=00, secondary=01, subordinate=01, sec-latency=32 I/O behind bridge: 0000f000-00000fff Memory behind bridge: f2000000-f4ffffff Prefetchable memory behind bridge: f0000000-f1ffffff Secondary status: 66MHz+ FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ <SERR- <PERR- BridgeCtl: Parity- SERR+ NoISA+ VGA+ MAbort- >Reset- FastB2B- 00: 22 10 0d 70 07 01 20 02 00 00 04 06 00 20 01 00 10: 00 00 00 00 00 00 00 00 00 01 01 20 f1 01 20 22 20: 00 f2 f0 f4 00 f0 f0 f1 00 00 00 00 00 00 00 00 30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0e 00 40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00:07.0 ISA bridge [Class 0601]: Advanced Micro Devices [AMD] AMD-768 [Opus] ISA [1022:7440] (rev 05) Control: I/O+ Mem+ BusMaster+ SpecCycle+ MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap- 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 0 00: 22 10 40 74 0f 00 20 02 05 00 01 06 00 00 80 00 10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 40: 00 20 07 00 01 00 00 00 2b ff 00 81 00 04 00 00 50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 70: 00 00 00 00 00 de 00 00 00 00 00 00 00 00 00 00 80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00:07.1 IDE interface [Class 0101]: Advanced Micro Devices [AMD] AMD-768 [Opus] IDE [1022:7441] (rev 04) (prog-if 8a [Master SecP PriP]) Subsystem: Advanced Micro Devices [AMD] AMD-768 [Opus] IDE [1022:7441] Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 32 Region 0: [virtual] Memory at 000001f0 (32-bit, non-prefetchable) [disabled] [size=8] Region 1: [virtual] Memory at 000003f0 (type 3, non-prefetchable) [disabled] [size=1] Region 2: [virtual] Memory at 00000170 (32-bit, non-prefetchable) [disabled] [size=8] Region 3: [virtual] Memory at 00000370 (type 3, non-prefetchable) [disabled] [size=1] Region 4: I/O ports at e000 [size=16] 00: 22 10 41 74 05 00 00 02 04 8a 01 01 00 20 00 00 10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20: 01 e0 00 00 00 00 00 00 00 00 00 00 22 10 41 74 30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 40: 23 fc 03 00 00 00 00 00 99 99 99 20 2a 00 a8 20 50: 03 03 03 c6 00 00 00 00 00 00 00 00 00 00 00 00 60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 70: 22 10 41 74 00 00 00 00 00 00 00 00 00 00 00 00 80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00:07.3 Bridge [Class 0680]: Advanced Micro Devices [AMD] AMD-768 [Opus] ACPI [1022:7443] (rev 03) Subsystem: Advanced Micro Devices [AMD] AMD-768 [Opus] ACPI [1022:7443] Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- 00: 22 10 43 74 00 00 80 02 03 00 80 06 00 20 00 00 10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 22 10 43 74 30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 40: 80 b1 09 47 00 00 00 00 aa 0c 50 00 00 00 00 00 50: 01 80 00 00 0f 00 00 00 01 06 00 00 00 00 00 00 60: 00 00 80 06 1f 00 00 00 00 00 00 00 00 00 00 00 70: 00 00 00 00 00 00 00 00 00 00 00 00 22 10 43 74 80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f0: 29 b4 08 00 00 00 00 00 00 00 00 00 00 00 00 00 00:07.5 Multimedia audio controller [Class 0401]: Advanced Micro Devices [AMD] AMD-768 [Opus] Audio [1022:7445] (rev 03) Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 32 Interrupt: pin B routed to IRQ 18 Region 0: I/O ports at e400 [size=256] Region 1: I/O ports at e800 [size=64] 00: 22 10 45 74 05 00 00 02 03 00 01 04 00 20 00 00 10: 01 e4 00 00 01 e8 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 30: 00 00 00 00 00 00 00 00 00 00 00 00 05 02 00 00 40: 00 50 19 37 00 00 00 00 00 00 00 00 00 00 00 00 50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00:10.0 PCI bridge [Class 0604]: Advanced Micro Devices [AMD] AMD-768 [Opus] PCI [1022:7448] (rev 05) (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- Status: Cap- 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- <PERR- Latency: 32 Bus: primary=00, secondary=02, subordinate=02, sec-latency=32 I/O behind bridge: 0000c000-0000dfff Memory behind bridge: f5000000-f6ffffff Prefetchable memory behind bridge: fff00000-000fffff Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- <SERR- <PERR- BridgeCtl: Parity- SERR+ NoISA+ VGA- MAbort- >Reset- FastB2B- 00: 22 10 48 74 07 01 20 22 05 00 04 06 00 20 01 00 10: 00 00 00 00 00 00 00 00 00 02 02 20 c0 d0 00 02 20: 00 f5 f0 f6 f0 ff 00 00 00 00 00 00 00 00 00 00 30: 00 00 00 00 00 00 00 00 00 00 00 00 ff 00 06 00 40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01:05.0 VGA compatible controller [Class 0300]: Matrox Graphics, Inc. MGA G550 AGP [102b:2527] (rev 01) (prog-if 00 [VGA]) Subsystem: Matrox Graphics, Inc. Millennium G550 Dual Head DDR 32Mb [102b:0f84] Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 64 (4000ns min, 8000ns max), Cache Line Size: 32 bytes Interrupt: pin A routed to IRQ 18 Region 0: Memory at f0000000 (32-bit, prefetchable) [size=32M] Region 1: Memory at f2000000 (32-bit, non-prefetchable) [size=16K] Region 2: Memory at f3000000 (32-bit, non-prefetchable) [size=8M] [virtual] Expansion ROM at f2020000 [disabled] [size=128K] Capabilities: [dc] Power Management version 2 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=0 PME- Capabilities: [f0] AGP version 2.0 Status: RQ=32 Iso- ArqSz=0 Cal=0 SBA+ ITACoh- GART64- HTrans- 64bit- FW- AGP3- Rate=x1,x2,x4 Command: RQ=16 ArqSz=0 Cal=0 SBA+ AGP+ GART64- 64bit- FW- Rate=x1 00: 2b 10 27 25 07 00 90 02 01 00 00 03 08 40 00 00 10: 08 00 00 f0 00 00 00 f2 00 00 00 f3 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 2b 10 84 0f 30: 00 00 00 00 dc 00 00 00 00 00 00 00 05 01 10 20 40: 20 15 4a 40 08 3c 00 00 00 00 00 00 00 00 00 00 50: 00 ac 00 00 09 a4 90 00 04 0a 00 80 00 00 00 00 60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 d0: 00 00 00 00 00 00 00 00 00 00 00 00 01 f0 22 00 e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f0: 02 00 20 00 07 02 00 1f 01 03 00 0f 00 00 00 00 02:00.0 USB Controller [Class 0c03]: Advanced Micro Devices [AMD] AMD-768 [Opus] USB [1022:7449] (rev 07) (prog-if 10 [OHCI]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 32 (20000ns max), Cache Line Size: 32 bytes Interrupt: pin D routed to IRQ 17 Region 0: Memory at f6041000 (32-bit, non-prefetchable) [size=4K] 00: 22 10 49 74 07 00 80 02 07 10 03 0c 08 20 00 00 10: 00 10 04 f6 00 00 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 30: 00 00 00 00 00 00 00 00 00 00 00 00 0b 04 00 50 40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 02:04.0 Mass storage controller [Class 0180]: Promise Technology, Inc. PDC20267 (FastTrak100/Ultra100) [105a:4d30] (rev 02) Subsystem: Promise Technology, Inc. Ultra100 [105a:4d33] Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 32 Interrupt: pin A routed to IRQ 16 Region 0: I/O ports at c000 [size=8] Region 1: I/O ports at c400 [size=4] Region 2: I/O ports at c800 [size=8] Region 3: I/O ports at cc00 [size=4] Region 4: I/O ports at d000 [size=64] Region 5: Memory at f6000000 (32-bit, non-prefetchable) [size=128K] [virtual] Expansion ROM at f5000000 [disabled] [size=64K] Capabilities: [58] Power Management version 1 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=0 PME- 00: 5a 10 30 4d 07 00 10 02 02 00 80 01 00 20 00 00 10: 01 c0 00 00 01 c4 00 00 01 c8 00 00 01 cc 00 00 20: 01 d0 00 00 00 00 00 f6 00 00 00 00 5a 10 33 4d 30: 00 00 00 00 58 00 00 00 00 00 00 00 0b 01 00 00 40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 50: ce 33 00 00 00 00 00 00 01 00 01 00 00 00 00 00 60: f1 24 41 00 c4 f3 4f 00 04 f3 4f 00 04 f3 4f 00 70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 02:09.0 Ethernet controller [Class 0200]: Intel Corporation 82557/8/9 Ethernet Pro 100 [8086:1229] (rev 10) Subsystem: IBM Ethernet Pro/100 S [1014:0207] Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 32 (2000ns min, 14000ns max), Cache Line Size: 32 bytes Interrupt: pin A routed to IRQ 18 Region 0: Memory at f6040000 (32-bit, non-prefetchable) [size=4K] Region 1: I/O ports at d400 [size=64] Region 2: Memory at f6020000 (32-bit, non-prefetchable) [size=128K] [virtual] Expansion ROM at f5010000 [disabled] [size=64K] Capabilities: [dc] Power Management version 2 Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+) Status: D0 PME-Enable- DSel=0 DScale=2 PME- 00: 86 80 29 12 07 00 90 02 10 00 00 02 08 20 00 00 10: 00 00 04 f6 01 d4 00 00 00 00 02 f6 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 14 10 07 02 30: 00 00 00 00 dc 00 00 00 00 00 00 00 05 01 08 38 40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 d0: 00 00 00 00 00 00 00 00 00 00 00 00 01 00 22 fe e0: 00 40 00 4b 00 00 00 00 00 00 00 00 00 00 00 00 f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Created attachment 13780 [details] Root, libata pata_pdc202xx_old lspci -nnvvvxxx Apologies, attempt to edit previous attachment backfired. Will check email again soon, in case there's more info to provide.
I was about to upgrade FC5 box to F8 but found this bug, thankfully, first. My box has a PDC20268 controller as part of a RAID6 array. The "workaround" brokenmodules=pata_pdc202xx_old: does that "fix" the bug enough to have the drives work semi-properly or does that just disable the card and the drives on it completely? I cannot risk 2TB RAID6 corruption. Guess I'm not upgrading to F8 until this bug is fixed and F8 takes up the latest kernel.
To clarify the workround from an end user perspective. The brokenmodules=pata_pdc202xx_old workround provides full IDE device functionality, using traditional /dev/hdX block device of pre- libata IDE drivers, so they are fully functional until bitrot breaks them. Recently I have used the machine as a day to day workstation. However my recent attempt at installing Fedora 8 failed on this machine. Booting both Live CD, and post-installation kernel, I gained the impression that the "legacy" modules replaced by pata_* like pdc202xx_old were not compiled for the kernel. rob@oak:/fedora/boot> fgrep PDC202 !$ fgrep PDC202 config-2.6.23.1-42.fc8 CONFIG_PATA_PDC2027X=m In OpenSuSE 10.3, using OpenSuSE default kernel the machine functions as I expect; as does broken modules with the kernel.org kernel I tried. rob@oak:/usr/src> uname -a Linux oak 2.6.22.13-0.3-default #1 SMP 2007/11/19 15:02:58 UTC i686 athlon i386 GNU/Linux rob@oak:/usr/src> zcat /proc/config.gz | fgrep PDC CONFIG_BLK_DEV_PDC202XX_OLD=m CONFIG_PDC202XX_BURST=y CONFIG_BLK_DEV_PDC202XX_NEW=m CONFIG_PDC_ADMA=m CONFIG_PATA_PDC_OLD=m CONFIG_PATA_PDC2027X=m If you decide to compile your own kernel, deselecting pata_pdc202xx abd selecting pdc202xx modules, I would expect that to work but is untested by any distro QA.
I hope to have this all pinned down in detail in the new year - I've had trouble acquiring the right ancient hardware to test it but now have a card on the way
(In reply to comment #23) > I hope to have this all pinned down in detail in the new year - I've had > trouble acquiring the right ancient hardware to test it but now have a card > on > the way > Probably related: See http://bugzilla.kernel.org/show_bug.cgi?id=9474#c38 attachment and related replies. The attachment (LONG!) includes lspci output as well as syslog events.
Has there been any progress on this front? Thanks.
The brokenmodules=pata_pdc202xx_old work round I used before doesn't seem to work anymore see https://bugzilla.novell.com/show_bug.cgi?id=457037 So this bug has become more serious, the errors cause fsck failures, and a fall back to PIO mode in 2.6.27.7-4-default kernel distributed with OpenSUSE 11.1rc1
Basic problem I have is that I cannot duplicate this with the various combinations of hardware I have tried. It all just works for me, although with the very occassional lost IRQ and recovery as seen with both the old IDE and current libata drivers.
So is there any value in me trying to find a factor that makes it less reproducible, as I have a 100% rate on falling back to PIO mode. I could try a number of things : 1) Moving the PDC202xx card into another box suitable for PATA disk, with Intel Chipset 2) Trying different disk model with the AMD768 MSI mainboard + PDC20267 controller Do you actually need physical access to the hardware involved? If remote ssh login, with the box dedicated for a while sufficed, I can offer that.
Both of those would be really useful if you have time to do them and might (fingers crossed) narrow things down enormously. I'm not sure physical access without a bus analyser and similar weapons helps but logs from trying those two things would probably be very enlightening. Also interesting would be to know how it behaves booted with libata.dma=0 (beyond the obvious "slowly') - does it fail from the start if we are only doing PIO
I tried the libata.dma=0 boot paramter, though hdparm -i lied and claimed udma5, hdparm -tT showed all disks operating at about 2.4MB/s only. Copying files with tar into dd=of=/dev/null showed similar poor throughput when interrupted, on both disks. Whilst the system operated smoothly and without any errors it was just too slow without UDMA. Am working on work round, in https://bugzilla.novell.com/show_bug.cgi?id=457037 to try and prevent pata_pdc202xx_old being loaded for now. Then also try to test his HPT366 pata_hpt366 patches https://bugzilla.novell.com/show_bug.cgi?id=361259 After that I'll experiment changing disks and moving the controller. Once OS-11.1 has shipped, Alan can have the box to himself for a while with a clean distro install if it'll help.
Ok thats very useful. That means we are looking at a DMA side problem and all the underlying stuff seems happy. I will go and further review the DMA side logic.
Compile kernel self with CONFIG_PATA_PDC_OLD unset in https://bugzilla.novell.com/show_bug.cgi?id=457037 oak:~ # zcat /proc/config.gz |grep PDC CONFIG_BLK_DEV_PDC202XX_OLD=m CONFIG_BLK_DEV_PDC202XX_NEW=m CONFIG_PDC_ADMA=m # CONFIG_PATA_PDC_OLD is not set # CONFIG_PATA_PDC2027X is not set I got interesting results with hdparm -tT, the /dev/sda was much slower (and inconsistent speed varying 2x) than /dev/hda, despite identical disks. Earlier when testing PIO I got the same crummy 2.4 MB/s on both. Presumabably this shows something is likely wrong with the libata/pata_amd combo now, if I can't reproduce the issue with the 2.6.22 based kernel of the OS 10.3 install.
Created attachment 19240 [details] First test patch - remove theoretically un-needed extra speed set
Created attachment 19241 [details] Second patch - log BURST value to check that and also reload 66/33 bit
Just wondering what configuration differences there could be. My PDC22065 BIOS claims to be Ultra 100 Bios 2.0 Build 17, I have a vague recollection coming back, about some firmware update, that required a widely available commercial OS, which I had expunged, as the controller 'just worked' with Linux I have a feeling I never bothered doing it. So is there any way that the pdc202xx_old driver is doing a contra-documentation fix up, or work round that got incanted at the time, similar to how the HPT366 driver IDE and libata/PATA were slightly different? LVM's giving me jip at the moment, and I can't get the VolumeGroup up in 11.1 anymore, probably due to something missing, or in wrong order in the initrd's used to boot. There can't be anything really wrong, as it works fine in 10.3 still, but figuring out exactly what I screwed up is going to have to wait for tomorrow. Once I've got 11.1 back, even if it's with an old fashion monolithic kernel, I'll give these patches a whirl, sorry for the delay.
Moving the PDC202 controller to a PIII, i820 box with a Maxtor 20GB disk, I could not initially reproduce the problem with a Maxtor 20GB disk. Then booting up with one of the Seagate ST360015A 60GB disks, I have the same issue as in the AMD-768. So that explains the reproducability problem.
Created attachment 19432 [details] Boot log with test patch a1 (not to be applied) My controller's being replaced by a newer Promise Ultra 100 TX2, without this fault, and Tejun Heo want's my Promise Ultra 100, hopefully he'll find disks which expose the problem. So I thought I'd boot with those patches (despite the "do not apply" tag) and post the logs, in case some of the info would help.
Thanks - the do not apply tag is so if I ever leave them in my git tree and send them to Jeff by mistake he realises I've done something daft ;)
I'm compiling up with the 2nd patch, if you want any additional commands run with the -a1 version, then would be most convenient if you tell me soon, before I reboot and install the -a2 kernel, as I may have to scrub -a1 to install -a2's modules.
Created attachment 19433 [details] Boot log with test patch a2 (not to be applied) Still have the kernels with patches applied. I'll keep it as is, for a few days before sending the controller off to Tejun Heo. So if there's anything else to try I can for now.
Hi, What is the status of this bug? Since the old working drivers are now considered deprecated, this really needs to be fixed. On my system, Tyan Tiger LE MB with integrated PDC 20267, none of the fallbacks work. I get as far as replaying the journal on my root fs, then the errors start, then the kernel panics because it can't mount the root fs. Currently trying kernel 2.6.33.
Can you attach a trace of the errors you see. I' m not sure of the status of the bug and I don't know if Tejun found anything useful when be got the card ? There have been some further 2026x fixes over time but nothing post 2.6.33.
Since the kernel panics because it can't mount root, and the errors scroll off the screen too fast to manually copy them, providing a trace is difficult. I'm open to suggestions as to how to get a trace... It's taken a week of constant rebooting to get enough information to get this bug report to show up in a google search. Essentially, the controller is found, the partitions on the disk are found, the root partition's fs (reiserfs in my case) is properly determined, then there is an unreported (or maybe it scrolls off the screen before I can see it) error that makes the kernel try to replay the journal; there's no journal replay in dmesg with a kernel using the old driver. After the journal replay message I start getting errors. The errors I get vary depending on my kernel config. Sometimes I get SCSI checksum errors, sometimes DMA write timeouts, and the most consistent are the ATA bus errors which fall back to lower speeds, but in all cases the kernel eventually gives up, the journal replay fails, and the kernel panics.