Bug 9337

Summary: pata_pdc202xx_old excessive ATA bus errors
Product: IO/Storage Reporter: Mikael Pettersson (mikpelinux)
Component: Serial ATAAssignee: Alan (alan)
Status: RESOLVED OBSOLETE    
Severity: normal CC: alan, hendrik.groeneveld, kernelbugs, lkmlist, pyridin, rob.opensuse.linux, yaneti
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.24-rc2 Subsystem:
Regression: No Bisected commit-id:
Attachments: dmesg from 2.6.24-rc2 booting with IDE PDC202XX_OLD
dmesg from 2.6.24-rc2 booting with libata pata_pdc202xx_old
Patch to enable burst mode
Logs documenting boot 2.6.24-rc3-mm1 + pdcburst patch
Boot Log with PDC burst
Boot Log with 2nd disk on Mobo controller
libata 140116 2 pata_pdc202xx_old,pata_amd output /sbin/lspci -nnvvvxxx
libata 140116 1 pata_amd output /sbin/lspci -nnvvvxxx
Root, libata pata_pdc202xx_old lspci -nnvvvxxx
First test patch - remove theoretically un-needed extra speed set
Second patch - log BURST value to check that and also reload 66/33 bit
Boot log with test patch a1 (not to be applied)
Boot log with test patch a2 (not to be applied)

Description Mikael Pettersson 2007-11-09 02:15:15 UTC
Most recent kernel where this bug did not occur: none
Distribution: Fedora Core 5
Hardware Environment: Intel 440BX chipset mobo + PIII + PDC20267
Software Environment:
Problem Description:
With the libata pdc202xx_old driver, as soon as the boot process starts accessing
the file systems on the 20267, many ATA bus errors occur. It's not until EH reduces speed to PIO4 that the errors stop.
With the IDE PDC202XX_OLD driver there aren't any problems.

Steps to reproduce: Just boot.
Comment 1 Mikael Pettersson 2007-11-09 02:17:36 UTC
Created attachment 13478 [details]
dmesg from 2.6.24-rc2 booting with IDE PDC202XX_OLD

Added boot log from IDE PDC202XX_OLD (working).
Comment 2 Mikael Pettersson 2007-11-09 02:19:17 UTC
Created attachment 13479 [details]
dmesg from 2.6.24-rc2 booting with libata pata_pdc202xx_old

Added boot log from using libata pata_pdc202xx_old (broken).
Comment 3 Robert Davies 2007-11-22 06:31:50 UTC
I have this problem with AMD 768 SMP system, and joined OpenSuSE B'Zilla Id https://bugzilla.novell.com/show_bug.cgi?id=335505

Workround : brokenmodules=pata_pdc202xx_old boot parameter.

That system is currently available for testing effort.
Comment 4 Alan 2007-11-22 07:30:53 UTC
Does SuSE have CONFIG_PDC202XX_BURST set ?
Comment 5 Robert Davies 2007-11-22 20:16:39 UTC
Wow!!!  I guess I gotta download Fedora now...   Nice surprise Alan Thank you :)

Anyway ...

rob@oak:~> zcat /proc/config.gz | grep BURST
CONFIG_PDC202XX_BURST=y
# CONFIG_ATM_ENI_TUNE_BURST is not set

I can download the src RPM and build a kernel with that turned off, or an -ac kernel from somewhere if you prefer.
Comment 6 Alan 2007-11-23 03:19:13 UTC
I think the problem is the reverse
- The old IDE driver supports burst mode and its enabled in vendor kernels
- The libata one does not and this seems to cause timeouts for some users

I'll push some patches upstream to enable burst mode in the new driver and see what happens with that change.
Comment 7 Robert Davies 2007-11-23 03:32:40 UTC
If you'ld like a test done, that dual Athlon box is idle as it's had a number of clean OS installs that I'm happy to repeat if things go belly up.  Without the Promise controller, I've had to swap hardware around.
Comment 8 Alan 2007-11-23 07:03:22 UTC
Possible patch attached
Comment 9 Alan 2007-11-23 07:05:41 UTC
Created attachment 13714 [details]
Patch to enable burst mode
Comment 10 Robert Davies 2007-11-24 09:19:10 UTC
Looks pretty good so far!  I have now 2.6.24-rc3-mm1-pdcburst, bootable  with burstmode libata and brokenmodule="pata_pdc202xx_old" to compare the libata SCSI IDE module chain, against the old IDE drivers.

Both 2.6.24-rc3-mm1-pdcburst show an error logged :

Nov 24 15:31:40 oak kernel: end_request: I/O error, dev sda, sector 14989671
Nov 24 15:31:40 oak kernel: sd 0:0:0:0: [sda] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK,SUGGEST_OK

This occurs in bursts on booting, I'll append logfile.

bonnie++ is running and I'm getting peaks of 21,500+ Blk_wrtn as shown by iostat(8).

So I could benchmark (bonnie++, dbench are installed) the 3 kernels 2.6.22.12-0.1 old IDE, 2.6.24-rc3-mm1-pdcburst old IDE and 2.6.24-rc3-mm1-pdcburst pata_pdc202xx (AC fix)

Maybe some other test would be more useful to you?  I shall try and use the machine heavily next week, and see if I can break something.
Comment 11 Robert Davies 2007-11-24 09:31:16 UTC
Created attachment 13733 [details]
Logs documenting boot 2.6.24-rc3-mm1 + pdcburst patch

This is the first boot using brokenmodules=pata_pdc202xx_old, there's a new error message that crops up, and booting seems to "stutter" at this point.  This may be just due to extra debug info, but it needs to be eliminated before release or end users will complain.

Removing the "brokenmodules" and loading pata_pdc202xx_old via initrd, results in very similar logs (not submitted as duplicate).  I suspect this isn't of much interest, apart from documenting tested hardware combo.

Thanks for the efforts, and the box remains available for tests.
Comment 12 Alan 2007-11-24 15:42:45 UTC
the burst of messages on booting is coming as far as I can tell from a different scsi midlayer bug in -rc3-mm  (its an -rc so has other bugs to knock out still). I don't think its related to the pdc202xx stuff but obviously needs fixing anyway.
Comment 13 Robert Davies 2007-11-25 05:23:58 UTC
Created attachment 13742 [details]
Boot Log with PDC burst

There's "soft resetting link" errors using pata_202xx_old, similar to those logged in the Open SuSE Bugzilla.

This occurs when swap is activated, just as before; and ATA mode is reduced, though IIRC it went from UDMA to PIO mode with SuSE 2.6.22.12.0-1 kernel.

AFAIK both disks and cable are good, I have re-checked (smartctl -a) errors on 'sdb / hda' and they remain same at '39'.

What I'll do is go back to "brokenmodule=pata_pdc202xx_old" but stay with 2.6.24-rc3-mm1-pdcburst and see if I can provoke this error using old ATA driver, or whether it continues at ATA100 speed.

I guess I better check for "silent" data corruption, may be the old ATA driver doesn't do as much checking?
Comment 14 Robert Davies 2007-11-25 09:05:11 UTC
Created attachment 13744 [details]
Boot Log with 2nd disk on Mobo controller

Error messages on swap activation go.  Disk seen as sdb again, hdparm reports ATA mode udma5, same disk, same cable, just attached to 2nd IDE replacing DVD/CDR drive.  Seems like the problem remains with the controller, I have done a lot to try to eliminate possibility of hardware issues being cause.

Slight anomaly is the "hdparm -tT /dev/sd<X>", the 1st drive exact same make claims 30MB/s, and 2nd generally 20MB/s.
Comment 15 Tejun Heo 2007-11-28 00:20:54 UTC
Robert, can you please post the followings?

1. The result of "lspci -nnvvvxxx" w/ harddisk attached to the pdc controller and pata_pdc202xx_old loaded before any error occurs.

2. The result of "lspci -nnvvvxxx" w/ harddisk attached to the pdc controller and IDE pdc202xx_old driver loaded.

Thanks.
Comment 16 Robert Davies 2007-11-28 02:51:10 UTC
Created attachment 13776 [details]
libata                140116  2 pata_pdc202xx_old,pata_amd output /sbin/lspci -nnvvvxxx
Comment 17 Robert Davies 2007-11-28 02:56:58 UTC
Created attachment 13777 [details]
libata                140116  1 pata_amd output /sbin/lspci -nnvvvxxx

Command output as requested.
Comment 18 Alan 2007-11-28 05:18:46 UTC
Sorry can you do them as root - otherwise it doesn't dump 0x40->0xFF as they aren't accessible to normal users.
Comment 19 Robert Davies 2007-11-28 07:53:34 UTC
Comment on attachment 13777 [details]
libata                140116  1 pata_amd output /sbin/lspci -nnvvvxxx

Wed Nov 28 15:51:18 GMT 2007

pata_amd               13316  8 
libata                140116  1 pata_amd

Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/hda9              4806048    290528   4271384   7% /work

00:00.0 Host bridge [Class 0600]: Advanced Micro Devices [AMD] AMD-760 MP [IGD4-2P] System Controller [1022:700c] (rev 11)
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
	Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- <PERR-
	Latency: 32
	Region 0: Memory at e8000000 (32-bit, prefetchable) [size=128M]
	Region 1: Memory at f7000000 (32-bit, prefetchable) [size=4K]
	Region 2: I/O ports at ec00 [disabled] [size=4]
	Capabilities: [a0] AGP version 2.0
		Status: RQ=16 Iso- ArqSz=0 Cal=0 SBA+ ITACoh- GART64- HTrans- 64bit- FW+ AGP3- Rate=x1,x2,x4
		Command: RQ=1 ArqSz=0 Cal=0 SBA+ AGP+ GART64- 64bit- FW- Rate=x1
00: 22 10 0c 70 06 00 30 22 11 00 00 06 00 20 00 00
10: 08 00 00 e8 08 00 00 f7 01 ec 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 a0 00 00 00 00 00 00 00 00 00 00 00
40: 00 00 00 00 19 1b 00 00 02 0e 00 00 26 00 00 00
50: 60 59 69 00 4a 8c 01 fe 05 00 22 da 00 00 00 00
60: bd 0c b3 85 1b 36 e2 5e bd 0c b3 85 1b 36 e2 5e
70: 00 06 04 00 00 00 00 00 00 00 00 00 00 00 00 00
80: 01 01 0f 00 97 10 83 00 73 03 f0 00 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 40
a0: 02 00 20 00 17 02 00 0f 01 03 00 00 05 00 01 00
b0: 00 00 00 00 8a 00 01 b8 8f ff 04 c5 00 00 00 00
c0: 85 1f 00 00 00 00 00 00 85 1f 00 20 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

00:01.0 PCI bridge [Class 0604]: Advanced Micro Devices [AMD] AMD-760 MP [IGD4-2P] AGP Bridge [1022:700d] (prog-if 00 [Normal decode])
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B-
	Status: Cap- 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
	Latency: 32
	Bus: primary=00, secondary=01, subordinate=01, sec-latency=32
	I/O behind bridge: 0000f000-00000fff
	Memory behind bridge: f2000000-f4ffffff
	Prefetchable memory behind bridge: f0000000-f1ffffff
	Secondary status: 66MHz+ FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ <SERR- <PERR-
	BridgeCtl: Parity- SERR+ NoISA+ VGA+ MAbort- >Reset- FastB2B-
00: 22 10 0d 70 07 01 20 02 00 00 04 06 00 20 01 00
10: 00 00 00 00 00 00 00 00 00 01 01 20 f1 01 20 22
20: 00 f2 f0 f4 00 f0 f0 f1 00 00 00 00 00 00 00 00
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0e 00
40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

00:07.0 ISA bridge [Class 0601]: Advanced Micro Devices [AMD] AMD-768 [Opus] ISA [1022:7440] (rev 05)
	Control: I/O+ Mem+ BusMaster+ SpecCycle+ MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
	Status: Cap- 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
	Latency: 0
00: 22 10 40 74 0f 00 20 02 05 00 01 06 00 00 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
40: 00 20 07 00 01 00 00 00 2b ff 00 81 00 04 00 00
50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
70: 00 00 00 00 00 de 00 00 00 00 00 00 00 00 00 00
80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

00:07.1 IDE interface [Class 0101]: Advanced Micro Devices [AMD] AMD-768 [Opus] IDE [1022:7441] (rev 04) (prog-if 8a [Master SecP PriP])
	Subsystem: Advanced Micro Devices [AMD] AMD-768 [Opus] IDE [1022:7441]
	Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
	Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
	Latency: 32
	Region 0: [virtual] Memory at 000001f0 (32-bit, non-prefetchable) [disabled] [size=8]
	Region 1: [virtual] Memory at 000003f0 (type 3, non-prefetchable) [disabled] [size=1]
	Region 2: [virtual] Memory at 00000170 (32-bit, non-prefetchable) [disabled] [size=8]
	Region 3: [virtual] Memory at 00000370 (type 3, non-prefetchable) [disabled] [size=1]
	Region 4: I/O ports at e000 [size=16]
00: 22 10 41 74 05 00 00 02 04 8a 01 01 00 20 00 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 01 e0 00 00 00 00 00 00 00 00 00 00 22 10 41 74
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
40: 23 fc 03 00 00 00 00 00 99 99 99 20 2a 00 a8 20
50: 03 03 03 c6 00 00 00 00 00 00 00 00 00 00 00 00
60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
70: 22 10 41 74 00 00 00 00 00 00 00 00 00 00 00 00
80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

00:07.3 Bridge [Class 0680]: Advanced Micro Devices [AMD] AMD-768 [Opus] ACPI [1022:7443] (rev 03)
	Subsystem: Advanced Micro Devices [AMD] AMD-768 [Opus] ACPI [1022:7443]
	Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
	Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
00: 22 10 43 74 00 00 80 02 03 00 80 06 00 20 00 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 22 10 43 74
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
40: 80 b1 09 47 00 00 00 00 aa 0c 50 00 00 00 00 00
50: 01 80 00 00 0f 00 00 00 01 06 00 00 00 00 00 00
60: 00 00 80 06 1f 00 00 00 00 00 00 00 00 00 00 00
70: 00 00 00 00 00 00 00 00 00 00 00 00 22 10 43 74
80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 29 b4 08 00 00 00 00 00 00 00 00 00 00 00 00 00

00:07.5 Multimedia audio controller [Class 0401]: Advanced Micro Devices [AMD] AMD-768 [Opus] Audio [1022:7445] (rev 03)
	Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
	Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
	Latency: 32
	Interrupt: pin B routed to IRQ 18
	Region 0: I/O ports at e400 [size=256]
	Region 1: I/O ports at e800 [size=64]
00: 22 10 45 74 05 00 00 02 03 00 01 04 00 20 00 00
10: 01 e4 00 00 01 e8 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 00 00 00 00 00 00 00 00 05 02 00 00
40: 00 50 19 37 00 00 00 00 00 00 00 00 00 00 00 00
50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

00:10.0 PCI bridge [Class 0604]: Advanced Micro Devices [AMD] AMD-768 [Opus] PCI [1022:7448] (rev 05) (prog-if 00 [Normal decode])
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B-
	Status: Cap- 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- <PERR-
	Latency: 32
	Bus: primary=00, secondary=02, subordinate=02, sec-latency=32
	I/O behind bridge: 0000c000-0000dfff
	Memory behind bridge: f5000000-f6ffffff
	Prefetchable memory behind bridge: fff00000-000fffff
	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- <SERR- <PERR-
	BridgeCtl: Parity- SERR+ NoISA+ VGA- MAbort- >Reset- FastB2B-
00: 22 10 48 74 07 01 20 22 05 00 04 06 00 20 01 00
10: 00 00 00 00 00 00 00 00 00 02 02 20 c0 d0 00 02
20: 00 f5 f0 f6 f0 ff 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 00 00 00 00 00 00 00 00 ff 00 06 00
40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

01:05.0 VGA compatible controller [Class 0300]: Matrox Graphics, Inc. MGA G550 AGP [102b:2527] (rev 01) (prog-if 00 [VGA])
	Subsystem: Matrox Graphics, Inc. Millennium G550 Dual Head DDR 32Mb [102b:0f84]
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
	Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
	Latency: 64 (4000ns min, 8000ns max), Cache Line Size: 32 bytes
	Interrupt: pin A routed to IRQ 18
	Region 0: Memory at f0000000 (32-bit, prefetchable) [size=32M]
	Region 1: Memory at f2000000 (32-bit, non-prefetchable) [size=16K]
	Region 2: Memory at f3000000 (32-bit, non-prefetchable) [size=8M]
	[virtual] Expansion ROM at f2020000 [disabled] [size=128K]
	Capabilities: [dc] Power Management version 2
		Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [f0] AGP version 2.0
		Status: RQ=32 Iso- ArqSz=0 Cal=0 SBA+ ITACoh- GART64- HTrans- 64bit- FW- AGP3- Rate=x1,x2,x4
		Command: RQ=16 ArqSz=0 Cal=0 SBA+ AGP+ GART64- 64bit- FW- Rate=x1
00: 2b 10 27 25 07 00 90 02 01 00 00 03 08 40 00 00
10: 08 00 00 f0 00 00 00 f2 00 00 00 f3 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 2b 10 84 0f
30: 00 00 00 00 dc 00 00 00 00 00 00 00 05 01 10 20
40: 20 15 4a 40 08 3c 00 00 00 00 00 00 00 00 00 00
50: 00 ac 00 00 09 a4 90 00 04 0a 00 80 00 00 00 00
60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 01 f0 22 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 02 00 20 00 07 02 00 1f 01 03 00 0f 00 00 00 00

02:00.0 USB Controller [Class 0c03]: Advanced Micro Devices [AMD] AMD-768 [Opus] USB [1022:7449] (rev 07) (prog-if 10 [OHCI])
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
	Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
	Latency: 32 (20000ns max), Cache Line Size: 32 bytes
	Interrupt: pin D routed to IRQ 17
	Region 0: Memory at f6041000 (32-bit, non-prefetchable) [size=4K]
00: 22 10 49 74 07 00 80 02 07 10 03 0c 08 20 00 00
10: 00 10 04 f6 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 00 00 00 00 00 00 00 00 0b 04 00 50
40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

02:04.0 Mass storage controller [Class 0180]: Promise Technology, Inc. PDC20267 (FastTrak100/Ultra100) [105a:4d30] (rev 02)
	Subsystem: Promise Technology, Inc. Ultra100 [105a:4d33]
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
	Latency: 32
	Interrupt: pin A routed to IRQ 16
	Region 0: I/O ports at c000 [size=8]
	Region 1: I/O ports at c400 [size=4]
	Region 2: I/O ports at c800 [size=8]
	Region 3: I/O ports at cc00 [size=4]
	Region 4: I/O ports at d000 [size=64]
	Region 5: Memory at f6000000 (32-bit, non-prefetchable) [size=128K]
	[virtual] Expansion ROM at f5000000 [disabled] [size=64K]
	Capabilities: [58] Power Management version 1
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 PME-Enable- DSel=0 DScale=0 PME-
00: 5a 10 30 4d 07 00 10 02 02 00 80 01 00 20 00 00
10: 01 c0 00 00 01 c4 00 00 01 c8 00 00 01 cc 00 00
20: 01 d0 00 00 00 00 00 f6 00 00 00 00 5a 10 33 4d
30: 00 00 00 00 58 00 00 00 00 00 00 00 0b 01 00 00
40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
50: ce 33 00 00 00 00 00 00 01 00 01 00 00 00 00 00
60: f1 24 41 00 c4 f3 4f 00 04 f3 4f 00 04 f3 4f 00
70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

02:09.0 Ethernet controller [Class 0200]: Intel Corporation 82557/8/9 Ethernet Pro 100 [8086:1229] (rev 10)
	Subsystem: IBM Ethernet Pro/100 S [1014:0207]
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
	Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
	Latency: 32 (2000ns min, 14000ns max), Cache Line Size: 32 bytes
	Interrupt: pin A routed to IRQ 18
	Region 0: Memory at f6040000 (32-bit, non-prefetchable) [size=4K]
	Region 1: I/O ports at d400 [size=64]
	Region 2: Memory at f6020000 (32-bit, non-prefetchable) [size=128K]
	[virtual] Expansion ROM at f5010000 [disabled] [size=64K]
	Capabilities: [dc] Power Management version 2
		Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
		Status: D0 PME-Enable- DSel=0 DScale=2 PME-
00: 86 80 29 12 07 00 90 02 10 00 00 02 08 20 00 00
10: 00 00 04 f6 01 d4 00 00 00 00 02 f6 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 14 10 07 02
30: 00 00 00 00 dc 00 00 00 00 00 00 00 05 01 08 38
40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 01 00 22 fe
e0: 00 40 00 4b 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Comment 20 Robert Davies 2007-11-28 08:01:22 UTC
Created attachment 13780 [details]
Root, libata  pata_pdc202xx_old lspci -nnvvvxxx

Apologies, attempt to edit previous attachment backfired.

Will check email again soon, in case there's more info to provide.
Comment 21 Trevor Cordes 2007-12-13 18:09:44 UTC
I was about to upgrade  FC5 box to F8 but found this bug, thankfully, first.  My box has a PDC20268 controller as part of a RAID6 array.

The "workaround" brokenmodules=pata_pdc202xx_old: does that "fix" the bug enough to have the drives work semi-properly or does that just disable the card and the drives on it completely?  I cannot risk 2TB RAID6 corruption.

Guess I'm not upgrading to F8 until this bug is fixed and F8 takes up the latest kernel.
Comment 22 Robert Davies 2007-12-14 00:28:11 UTC
To clarify the workround from an end user perspective.

The brokenmodules=pata_pdc202xx_old workround provides full IDE device functionality, using traditional /dev/hdX block device of pre- libata IDE drivers, so they are fully functional until bitrot breaks them.  Recently I have used the machine as a day to day workstation.

However my recent attempt at installing Fedora 8 failed on this machine.  Booting both Live CD, and post-installation kernel, I gained the impression that the "legacy" modules replaced by pata_* like pdc202xx_old were not compiled for the kernel.

rob@oak:/fedora/boot> fgrep PDC202 !$
fgrep PDC202 config-2.6.23.1-42.fc8
CONFIG_PATA_PDC2027X=m

In OpenSuSE 10.3, using OpenSuSE default kernel the machine functions as I expect; as does broken modules with the kernel.org kernel I tried.

rob@oak:/usr/src> uname -a
Linux oak 2.6.22.13-0.3-default #1 SMP 2007/11/19 15:02:58 UTC i686 athlon i386 GNU/Linux
rob@oak:/usr/src> zcat /proc/config.gz | fgrep PDC
CONFIG_BLK_DEV_PDC202XX_OLD=m
CONFIG_PDC202XX_BURST=y
CONFIG_BLK_DEV_PDC202XX_NEW=m
CONFIG_PDC_ADMA=m
CONFIG_PATA_PDC_OLD=m
CONFIG_PATA_PDC2027X=m

If you decide to compile your own kernel, deselecting pata_pdc202xx abd  selecting pdc202xx modules, I would expect that to work but is untested by any distro QA.
Comment 23 Alan 2007-12-14 10:37:50 UTC
I hope to have this all pinned down in detail in the new year - I've had trouble acquiring the right ancient hardware to test it but now have a card on the way
Comment 24 jl-icase 2008-01-07 21:01:44 UTC
(In reply to comment #23)
> I hope to have this all pinned down in detail in the new year - I've had
> trouble acquiring the right ancient hardware to test it but now have a card
> on
> the way
> 

Probably related:  See http://bugzilla.kernel.org/show_bug.cgi?id=9474#c38 attachment and related replies.  The attachment (LONG!) includes lspci output as well as syslog events.
Comment 25 Tejun Heo 2008-06-23 23:04:23 UTC
Has there been any progress on this front?  Thanks.
Comment 26 Robert Davies 2008-12-06 11:22:08 UTC
The brokenmodules=pata_pdc202xx_old work round I used before doesn't seem to work anymore see https://bugzilla.novell.com/show_bug.cgi?id=457037

So this bug has become more serious, the errors cause fsck failures, and a fall back to PIO mode in 2.6.27.7-4-default kernel distributed with OpenSUSE 11.1rc1
Comment 27 Alan 2008-12-06 14:57:23 UTC
Basic problem I have is that I cannot duplicate this with the various combinations of hardware I have tried. It all just works for me,  although with the very occassional lost IRQ and recovery as seen with both the old IDE and current libata drivers.
Comment 28 Robert Davies 2008-12-07 11:30:36 UTC
So is there any value in me trying to find a factor that makes it less reproducible, as I  have a 100% rate on falling back to PIO mode.  I could try a number of things :

1) Moving the PDC202xx card into another box suitable for PATA disk, with Intel Chipset
2) Trying different disk model with the AMD768 MSI mainboard + PDC20267 controller

Do you actually need physical access to the hardware involved?  If remote ssh login, with the box dedicated for a while sufficed, I can offer that. 
Comment 29 Alan 2008-12-07 12:03:24 UTC
Both of those would be really useful if you have time to do them and might (fingers crossed) narrow things down enormously.

I'm not sure physical access without a bus analyser and similar weapons helps but logs from trying those two things would probably be very enlightening.

Also interesting would be to know how it behaves booted with libata.dma=0 (beyond the obvious "slowly') - does it fail from the start if we are only doing PIO
Comment 30 Robert Davies 2008-12-08 08:09:16 UTC
I tried the libata.dma=0 boot paramter, though hdparm -i lied and claimed udma5, hdparm -tT showed all disks operating at about 2.4MB/s only.  Copying files with tar into dd=of=/dev/null showed similar poor throughput when interrupted, on both disks.  Whilst the system operated smoothly and without any errors it was just too slow without UDMA.

Am working on work round, in https://bugzilla.novell.com/show_bug.cgi?id=457037  to try and prevent pata_pdc202xx_old being loaded for now.  Then also try to test his HPT366 pata_hpt366 patches https://bugzilla.novell.com/show_bug.cgi?id=361259

After that I'll experiment changing disks and moving the controller.  Once OS-11.1 has shipped, Alan can have the box to himself for a while with a clean distro install if it'll help.
Comment 31 Alan 2008-12-08 12:40:09 UTC
Ok thats very useful. That means we are looking at a DMA side problem and all the underlying stuff seems happy. I will go and further review the DMA side logic.
Comment 32 Robert Davies 2008-12-08 12:44:23 UTC
Compile kernel self with CONFIG_PATA_PDC_OLD unset in https://bugzilla.novell.com/show_bug.cgi?id=457037

oak:~ # zcat /proc/config.gz |grep PDC
CONFIG_BLK_DEV_PDC202XX_OLD=m
CONFIG_BLK_DEV_PDC202XX_NEW=m
CONFIG_PDC_ADMA=m
# CONFIG_PATA_PDC_OLD is not set
# CONFIG_PATA_PDC2027X is not set

I got interesting results with hdparm -tT, the /dev/sda was much slower (and inconsistent speed varying 2x) than  /dev/hda, despite identical disks.  Earlier when testing PIO I got the same crummy 2.4 MB/s on both.

Presumabably this shows something is likely wrong with the libata/pata_amd combo now, if I can't reproduce the issue with the 2.6.22 based kernel of the OS 10.3 install.
Comment 33 Alan 2008-12-10 09:48:23 UTC
Created attachment 19240 [details]
First test patch - remove theoretically un-needed extra speed set
Comment 34 Alan 2008-12-10 09:49:25 UTC
Created attachment 19241 [details]
Second patch - log BURST value to check that and also reload 66/33 bit
Comment 35 Robert Davies 2008-12-10 14:43:04 UTC
Just wondering what configuration differences there could be.  My PDC22065 BIOS claims to be Ultra 100 Bios 2.0 Build 17, I have a vague recollection coming back, about some firmware update, that required a widely available commercial OS, which I had expunged, as the controller 'just worked' with Linux I have a feeling I never bothered doing it.  So is there any way that the pdc202xx_old driver is doing a contra-documentation fix up, or work round that got incanted at the time, similar to how the HPT366 driver IDE and libata/PATA were slightly different?

LVM's giving me jip at the moment, and I can't get the VolumeGroup up in 11.1 anymore, probably due to something missing, or in wrong order in the initrd's used to boot.  There can't be anything really wrong, as it works fine in 10.3 still, but figuring out exactly what I screwed up is going to have to wait for tomorrow.  Once I've got 11.1 back, even if it's with an old fashion monolithic kernel, I'll give these patches a whirl, sorry for the delay.
Comment 36 Robert Davies 2008-12-15 23:31:13 UTC
Moving the PDC202 controller to a PIII, i820 box with a Maxtor 20GB disk, I could not initially reproduce the problem with a Maxtor 20GB disk.  Then booting up with one of the Seagate ST360015A 60GB disks, I have the same issue as in the AMD-768.  So that explains the reproducability problem.
Comment 37 Robert Davies 2008-12-22 10:00:25 UTC
Created attachment 19432 [details]
Boot log with test patch a1  (not to be applied)

My controller's being replaced by a newer Promise Ultra 100 TX2, without this fault, and Tejun Heo want's my Promise Ultra 100, hopefully he'll find disks which expose the problem.

So I thought I'd boot with those patches (despite the "do not apply" tag) and post the logs, in case some of the info would help.
Comment 38 Alan 2008-12-22 10:09:20 UTC
Thanks - the do not apply tag is so if I ever leave them in my git tree and send them to Jeff by mistake he realises I've done something daft ;)
Comment 39 Robert Davies 2008-12-22 10:16:20 UTC
I'm compiling up with the 2nd patch, if you want any additional commands run with the -a1 version, then would be most convenient if you tell me soon, before I reboot and install the -a2 kernel, as I may have to scrub -a1 to install -a2's modules.
Comment 40 Robert Davies 2008-12-22 12:08:29 UTC
Created attachment 19433 [details]
Boot log with test patch a2  (not to be applied)

Still have the kernels with patches applied.  I'll keep it as is, for a few days before sending the controller off to Tejun Heo.  So if there's anything else to try I can for now.
Comment 41 Hendrik 2010-03-26 19:15:34 UTC
Hi,

What is the status of this bug? Since the old working drivers are now considered deprecated, this really needs to be fixed. On my system, Tyan Tiger LE MB with integrated PDC 20267, none of the fallbacks work. I get as far as replaying the journal on my root fs, then the errors start, then the kernel panics because it can't mount the root fs. Currently trying kernel 2.6.33.
Comment 42 Alan 2010-03-26 20:46:51 UTC
Can you attach a trace of the errors you see. I' m not sure of the status of the bug and I don't know if Tejun found anything useful when be got the card ?

There have been some further 2026x fixes over time but nothing post 2.6.33.
Comment 43 Hendrik 2010-03-27 00:14:14 UTC
Since the kernel panics because it can't mount root, and the errors scroll off the screen too fast to manually copy them, providing a trace is difficult. I'm open to suggestions as to how to get a trace...

It's taken a week of constant rebooting to get enough information to get this bug report to show up in a google search. Essentially, the controller is found, the partitions on the disk are found, the root partition's fs (reiserfs in my case) is properly determined, then there is an unreported (or maybe it scrolls off the screen before I can see it) error that makes the kernel try to replay the journal; there's no journal replay in dmesg with a kernel using the old driver. After the journal replay message I start getting errors. The errors I get vary depending on my kernel config. Sometimes I get SCSI checksum errors, sometimes DMA write timeouts, and the most consistent are the ATA bus errors which fall back to lower speeds, but in all cases the kernel eventually gives up, the journal replay fails, and the kernel panics.