Bug 9261 (hpt37x_UDMA-33) - (pata hpt374) Mishandling of port 3/4 special cases ?
Summary: (pata hpt374) Mishandling of port 3/4 special cases ?
Status: CLOSED CODE_FIX
Alias: hpt37x_UDMA-33
Product: IO/Storage
Classification: Unclassified
Component: Serial ATA (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Alan
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-10-29 11:03 UTC by Bjoern Olausson
Modified: 2008-01-11 08:21 UTC (History)
1 user (show)

See Also:
Kernel Version: 2.6.24-rc5
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
Proposed fixes for HPT37x problems (3.04 KB, patch)
2007-11-05 07:34 UTC, Alan
Details | Diff
hpt_is_fixed (14.98 KB, text/plain)
2007-11-05 14:27 UTC, Bjoern Olausson
Details
patched 2.6.23.1 dmesg (26.93 KB, text/plain)
2007-12-12 06:41 UTC, Bjoern Olausson
Details
stock 2.6.24-rc5 dmesg (14.96 KB, application/octet-stream)
2007-12-12 06:41 UTC, Bjoern Olausson
Details
dmesg rc1 (14.97 KB, text/plain)
2007-12-13 05:17 UTC, Bjoern Olausson
Details
dmesg rc2 (14.96 KB, text/plain)
2007-12-13 05:18 UTC, Bjoern Olausson
Details
dmesg rc3 (14.96 KB, application/octet-stream)
2007-12-13 05:18 UTC, Bjoern Olausson
Details
dmesg rc4 (14.96 KB, application/octet-stream)
2007-12-13 05:18 UTC, Bjoern Olausson
Details
pata_hpt37x: Patch segment to reverse (patch -R) (783 bytes, patch)
2007-12-17 06:54 UTC, Alan
Details | Diff
Correct diff I hope patch -R drivers/ata/pata_hpt37x < a1 (1.17 KB, patch)
2007-12-17 06:56 UTC, Alan
Details | Diff
demsg 2.6.24-rc5_patched (14.97 KB, application/octet-stream)
2007-12-17 08:24 UTC, Bjoern Olausson
Details
Proposed fix (452 bytes, patch)
2007-12-17 11:42 UTC, Alan
Details | Diff
dmesg 2.6.24-rc5_proposed-fix (14.97 KB, application/octet-stream)
2007-12-18 13:20 UTC, Bjoern Olausson
Details

Description Bjoern Olausson 2007-10-29 11:03:40 UTC
Most recent kernel where this bug did not occur: -/-

Distribution:
Gentoo

Hardware Environment:

Software Environment:
System uname: 2.6.23 i686 Intel(R) Celeron(R) CPU 2.66GHz
Timestamp of tree: Sun, 28 Oct 2007 21:50:01 +0000
distcc 2.18.3 i686-pc-linux-gnu (protocols 1 and 2) (default port 3632) [disabled]
ccache version 2.4 [enabled]
app-shells/bash:     3.2_p17
dev-lang/python:     2.4.4-r6
dev-python/pycrypto: 2.0.1-r6
dev-util/ccache:     2.4-r7
sys-apps/baselayout: 1.12.9-r2
sys-apps/sandbox:    1.2.18.1-r2
sys-devel/autoconf:  2.13, 2.61-r1
sys-devel/automake:  1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2, 1.10
sys-devel/binutils:  2.18-r1
sys-devel/gcc-config: 1.3.16
sys-devel/libtool:   1.5.24
virtual/os-headers:  2.6.22-r2
ACCEPT_KEYWORDS="x86"
CBUILD="i686-pc-linux-gnu"
CFLAGS="-march=pentium4 -O2 -pipe -msse2 -mfpmath=sse -fomit-frame-pointer"
CHOST="i686-pc-linux-gnu"

Problem Description:

First of all: Does it look like a hardware defekt on the controller or is it a bug? If you tent to say its a HW defekt, please let me know and close this bug.

When I connect a HDD to a certain port (Interface 3) of the Highpoint RockedRaid 454 the kernel shows the following:

ata5.00: ATA-7: Maxtor 5A250J0, RAM51VV0, max UDMA/133
ata5.00: 490234752 sectors, multi 16: LBA48 
ata5.01: ATA-7: HDT722525DLAT80, V44OA96A, max UDMA/133
ata5.01: 488397168 sectors, multi 16: LBA48 
ata5.00: limited to UDMA/33 due to 40-wire cable
ata5.01: limited to UDMA/33 due to 40-wire cable
Find mode for 12 reports A81F442
Find mode for 12 reports A81F442
Find mode for DMA 66 reports 120C8242
Find mode for DMA 66 reports 120C8242
ata5.00: configured for UDMA/33
ata5.01: failed to IDENTIFY (I/O error, err_mask=0x2)
ata5.01: revalidation failed (errno=-5)
ata5: failed to recover some devices, retrying in 5 secs
ata5.01: failed to IDENTIFY (I/O error, err_mask=0x2)
ata5.01: revalidation failed (errno=-5)
ata5.01: limiting speed to UDMA/33:PIO3
ata5: failed to recover some devices, retrying in 5 secs
ata5.01: failed to IDENTIFY (I/O error, err_mask=0x2)
ata5.01: revalidation failed (errno=-5)
ata5.01: disabled
ata5: failed to recover some devices, retrying in 5 secs
ata5.00: failed to IDENTIFY (I/O error, err_mask=0x40)
ata5.00: revalidation failed (errno=-5)
ata5: failed to recover some devices, retrying in 5 secs
Find mode for 12 reports A81F442
Find mode for DMA 66 reports 120C8242
ata5.00: configured for UDMA/33
ata5: EH pending after completion, repeating EH (cnt=4)


Connecting the cable to another port (interface 4) works fine:

ata6: PATA max UDMA/100 cmd 0x0001a400 ctl 0x0001a002 bmdma 0x00019808 irq 19
ata6.00: ATA-7: Maxtor 5A250J0, RAM51VV0, max UDMA/133
ata6.00: 490234752 sectors, multi 16: LBA48 
ata6.01: ATA-7: HDT722525DLAT80, V44OA96A, max UDMA/133
ata6.01: 488397168 sectors, multi 16: LBA48 
Find mode for 12 reports A81F442
Find mode for 12 reports A81F442
Find mode for DMA 69 reports 12848242
Find mode for DMA 69 reports 12848242
ata6.00: configured for UDMA/100
ata6.01: configured for UDMA/100

All other ports (1, 2, 4) are working fine, only port 3 is buggy...

Steps to reproduce:
Connect UDM100 capable cable to interface 3 of the controller and boot.
Reproducable: Alway (with one or with both disks)

Full dmsg here:
http://olausson.name/temp/HPT374_failed
http://olausson.name/temp/HPT374_worked

Thanks and regards
Bjoern
Comment 1 Bjoern Olausson 2007-10-29 11:04:48 UTC
Here's the hw setup:

H/W path            Device     Class       Description
======================================================
                               system      P4V88
/0                             bus         P4V88
/0/0                           memory      64KB BIOS
/0/3                           processor   Intel(R) Celeron(R) CPU 2.66GHz
/0/3/4                         memory      16KB L1 cache
/0/3/5                         memory      256KB L2 cache
/0/3/6                         memory      L3 cache
/0/1                           memory      896MB System memory
/0/100                         bridge      PT880 Host Bridge
/0/100/1                       bridge      VT8237 PCI Bridge
/0/100/1/0                     display     NV43 [GeForce 6200]
/0/100/a            scsi2      storage     HPT374
/0/100/a/0          /dev/sda   disk        189GB Maxtor 6Y200P0
/0/100/a/0/1        /dev/sda1  volume      189GB Linux raid autodetect partition
/0/100/a/1          /dev/sdb   disk        189GB Maxtor 6Y200P0
/0/100/a/1/1        /dev/sdb1  volume      189GB Linux raid autodetect partition
/0/100/a/2          /dev/sdc   disk        233GB Maxtor 6L250R0
/0/100/a/2/1        /dev/sdc1  volume      232GB Linux raid autodetect partition
/0/100/a/3          /dev/sdd   disk        233GB Maxtor 6L250R0
/0/100/a/3/1        /dev/sdd1  volume      232GB Linux raid autodetect partition
/0/100/a.1          scsi5      storage     HPT374
/0/100/a.1/0.0.0    /dev/sde   disk        233GB Maxtor 5A250J0
/0/100/a.1/0.0.0/1  /dev/sde1  volume      232GB Linux raid autodetect partition
/0/100/a.1/0.1.0    /dev/sdf   disk        232GB HDT722525DLAT80
/0/100/a.1/0.1.0/1  /dev/sdf1  volume      232GB Linux raid autodetect partition
/0/100/b                       multimedia  SB Live! EMU10k1
/0/100/b.1                     input       SB Live! Game Port
/0/100/c            eth0       network     DGE-528T Gigabit Ethernet Adapter
/0/100/d            wifi0      network     AR5212 802.11abg NIC
/0/100/f                       storage     VIA VT6420 SATA RAID Controller
/0/100/f.1          scsi6      storage     VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master
/0/100/f.1/0.0.0    /dev/sdg   disk        152GB Maxtor 6Y160P0
/0/100/f.1/0.0.0/1  /dev/sdg1  volume      47MB Linux raid autodetect partition
/0/100/f.1/0.0.0/2  /dev/sdg2  volume      972MB Linux swap / Solaris partition
/0/100/f.1/0.0.0/3  /dev/sdg3  volume      151GB Linux raid autodetect partition
/0/100/f.1/0        /dev/sdh   disk        152GB Maxtor 6Y160P0
/0/100/f.1/0/1      /dev/sdh1  volume      47MB Linux raid autodetect partition
/0/100/f.1/0/2      /dev/sdh2  volume      972MB Linux swap / Solaris partition
/0/100/f.1/0/3      /dev/sdh3  volume      151GB Linux raid autodetect partition
/0/100/f.1/1                   disk        ROM-DRIVE-52MAX
/0/100/10                      bus         VT82xxxxx UHCI USB 1.1 Controller
/0/100/10/1         usb2       bus         UHCI Host Controller
/0/100/10.1                    bus         VT82xxxxx UHCI USB 1.1 Controller
/0/100/10.1/1       usb3       bus         UHCI Host Controller
/0/100/10.2                    bus         VT82xxxxx UHCI USB 1.1 Controller
/0/100/10.2/1       usb4       bus         UHCI Host Controller
/0/100/10.3                    bus         VT82xxxxx UHCI USB 1.1 Controller
/0/100/10.3/1       usb5       bus         UHCI Host Controller
/0/100/10.4                    bus         USB 2.0
/0/100/10.4/1       usb1       bus         EHCI Host Controller
/0/100/11                      bridge      VT8237 ISA bridge [KT600/K8T800/K8T890 South]
/0/100/11.5                    multimedia  VT8233/A/8235/8237 AC97 Audio Controller
/0/100/12           eth1       network     VT6102 [Rhine-II]
/0/101                         bridge      PT880 Host Bridge
/0/102                         bridge      PT880 Host Bridge
/0/103                         bridge      PT880 Host Bridge
/0/104                         bridge      PT880 Host Bridge
/0/105                         bridge      PT880 Host Bridge
/1                  dummy0     network     Ethernet interface
Comment 2 Alan 2007-11-02 10:33:51 UTC
Odd - does look like one of your ports isn't properly wired (eg a loose pin) but its always hard to be sure its not a weird software bug.

Could also be the cable if you are moving drives between cables not cables between ports ?
Comment 3 Bjoern Olausson 2007-11-02 12:34:48 UTC
I guess it's the port of the controller... If I keep the cable, and switch the port, everything works, If I switch the cable and keep the port --> bug

So now there are two options... the port of the controller is damaged or a bug in the driver.

Any ideas how I can eliminate one of those two options?

One more thing I noticed... attached is a DeskStar Hitachi disk and a Maxtor disk. If I run Drive Fitness Test whith the two disks connected to the damaged port, DFT hangs and will not recover. If I remove only the Hitachi disk, DFT detects all other drives.

Is there some way to test the controller port? Any ideas are welcome to eliminate or proof the hardware defect.

regards
Bjoern
Comment 4 Alan 2007-11-05 07:32:00 UTC
I've been having a look over this. There are a small number of things that ports 3/4 do differently to port 1/2. I've audited those and double checked against the reference information which is a bit limited unfortunately. From that I've got a patch you can try which may help, hinder, or do nothing but is worth trying I thinh

Will attach it in a moment
Comment 5 Alan 2007-11-05 07:34:17 UTC
Created attachment 13400 [details]
Proposed fixes for HPT37x problems
Comment 6 Bjoern Olausson 2007-11-05 14:27:28 UTC
Created attachment 13406 [details]
hpt_is_fixed

I attached the two disks back to the "faulty" port and they are working again.

compare the patched Kernel output with the output in the first post.

Thanks for the fix!

Maybe this fix will find it's way into 2.6.23.x?
Currently I can't run 2.6.24 layer7 patches do not work on this one yet.

ata5.00: ATA-7: Maxtor 5A250J0, RAM51VV0, max UDMA/133
ata5.00: 490234752 sectors, multi 16: LBA48 
ata5.01: ATA-7: HDT722525DLAT80, V44OA96A, max UDMA/133
ata5.01: 488397168 sectors, multi 16: LBA48 
Find mode for 12 reports A81F442
Find mode for 12 reports A81F442
Find mode for DMA 69 reports 12848242
Find mode for DMA 69 reports 12848242
ata5.00: configured for UDMA/100
ata5.01: configured for UDMA/100


attached a full dmsg output.

Thanks
Bjoern
Comment 7 Alan 2007-11-05 15:01:13 UTC
Will push into 2.6.24, not my call if it ends up in 2.6.23.x but it may well do if it shows no other problems
Comment 8 Bjoern Olausson 2007-11-05 15:48:56 UTC
I'll try to make the patch work against 2.6.23.1 to see if it makes some trouble...

thanks

Bjoern
Comment 9 Alan 2007-11-08 06:34:51 UTC
Pushed upstream along with another cable fix Sergei noticed was needed
Comment 10 Bjoern Olausson 2007-11-08 15:49:59 UTC
Thanks a lot!
Comment 11 Bjoern Olausson 2007-12-12 06:39:25 UTC
Sry to reopen this bug, but the patch works for 2.6.23.1 but 2.6.24-rc5 does not.

I tried the latest Kernel 2.6.24-rc5 but got the following misbehavior:

To make it short, the Kernel embezzled two 250GB Maxtor drives.

rebooting the patched 2.6.23.1 Kernel --> everything works again.

See attached dmesg output for 2.6.24-rc5 and 2.6.23.1 kernels

regards
Bjoern
Comment 12 Bjoern Olausson 2007-12-12 06:41:18 UTC
Created attachment 13999 [details]
patched 2.6.23.1 dmesg

patched 2.6.23.1 dmesg
Comment 13 Bjoern Olausson 2007-12-12 06:41:50 UTC
Created attachment 14000 [details]
stock 2.6.24-rc5 dmesg

stock 2.6.24-rc5 dmesg
Comment 14 Alan 2007-12-12 17:19:39 UTC
Any chance you can build -rc2 and let me know if that works (or ideally which -rc beaks it)
Comment 15 Bjoern Olausson 2007-12-13 02:08:33 UTC
Shure I can. I'll start with -rc1 and crawl upwards.

Results should be here within a few hours ;-)

regards
Bjoern
Comment 16 Bjoern Olausson 2007-12-13 05:17:57 UTC
Created attachment 14006 [details]
dmesg rc1

It fails from rc1 to rc2 and all following are failing.

so rc1 is the last working one.
Comment 17 Bjoern Olausson 2007-12-13 05:18:13 UTC
Created attachment 14007 [details]
dmesg rc2
Comment 18 Bjoern Olausson 2007-12-13 05:18:26 UTC
Created attachment 14008 [details]
dmesg rc3
Comment 19 Bjoern Olausson 2007-12-13 05:18:39 UTC
Created attachment 14009 [details]
dmesg rc4
Comment 20 Alan 2007-12-17 06:53:53 UTC
Can you try reversing the following patch segment and let me know if this fixes it ?
Comment 21 Alan 2007-12-17 06:54:52 UTC
Created attachment 14077 [details]
pata_hpt37x: Patch segment to reverse (patch -R)
Comment 22 Alan 2007-12-17 06:55:40 UTC
Comment on attachment 14077 [details]
pata_hpt37x: Patch segment to reverse (patch -R)

Wrong diff sorry
Comment 23 Alan 2007-12-17 06:56:42 UTC
Created attachment 14078 [details]
Correct diff I hope

patch -R drivers/ata/pata_hpt37x < a1
Comment 24 Bjoern Olausson 2007-12-17 08:24:26 UTC
Created attachment 14082 [details]
demsg 2.6.24-rc5_patched

Your patch did the trick.

All drives were found.

Thanks

16:42:08 [/usr/src/linux-2.6.24-rc5]
root@enterprise $ patch -R drivers/ata/pata_hpt37x.c < ../a1.txt
patching file drivers/ata/pata_hpt37x.c
Hunk #1 succeeded at 359 (offset -2 lines).
Comment 25 Alan 2007-12-17 11:40:21 UTC
Can you try the following change *instead*

This I think fixes the bug. If it does then I'll push that to Linus for 2.6.24 final, if not I'll push the backout you've tested.

Thanks a lot for the debugging
Comment 26 Alan 2007-12-17 11:42:32 UTC
Created attachment 14088 [details]
Proposed fix
Comment 27 Bjoern Olausson 2007-12-17 16:15:53 UTC
Will check it in a day or two.

Thanks for the help
Comment 28 Bjoern Olausson 2007-12-18 13:20:08 UTC
Created attachment 14108 [details]
dmesg 2.6.24-rc5_proposed-fix

Fix works. Thanks a lot!

I didn't stress test. I just dropped to /bin/bb, dumped dmesg and rebooted 2.6.23.1

have a nice christmas time ;)

regards
Bjoern
Comment 29 Adrian Bunk 2008-01-11 08:21:27 UTC
Patch was applied as commit f941b168a4d7281bf49e166f2febc49470c0149f in Linus' tree.

Note You need to log in before you can comment on or make changes to this bug.