Bug 6657 - Domain validation failures on aic7xxx
Summary: Domain validation failures on aic7xxx
Status: REJECTED INSUFFICIENT_DATA
Alias: None
Product: SCSI Drivers
Classification: Unclassified
Component: Other (show other bugs)
Hardware: i386 Linux
: P2 high
Assignee: scsi_drivers-other
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-06-06 08:49 UTC by William Brodie-Tyrrell
Modified: 2007-02-13 23:27 UTC (History)
2 users (show)

See Also:
Kernel Version: 2.6.16
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
dmesg output (first part missing, sorry) (15.03 KB, text/plain)
2006-06-06 08:52 UTC, William Brodie-Tyrrell
Details
/proc/scsi/aic7xxx/0 (2.99 KB, text/plain)
2006-06-06 08:54 UTC, William Brodie-Tyrrell
Details
dmesg output from working 2.6.10 kernel. (15.07 KB, text/plain)
2006-06-17 06:29 UTC, William Brodie-Tyrrell
Details
patch from James Bottomley for debug purposes (459 bytes, text/plain)
2006-06-17 06:31 UTC, William Brodie-Tyrrell
Details
dmesg from 2.6.16 after applying James' patch (attached above) (15.04 KB, text/plain)
2006-06-17 06:32 UTC, William Brodie-Tyrrell
Details

Description William Brodie-Tyrrell 2006-06-06 08:49:58 UTC
Most recent kernel where this bug did not occur: 2.6.10
Distribution: Gentoo
Hardware Environment: x86, AHA19160, WD18310
Software Environment:
Problem Description: 

In 2.6.14 and 2.6.16 I get domain validation errors on my WD18310 connected to a
19160, causing it to drop back to asynchronous rather than 40 or 80MHz wide.  

Domain validation reports parity errors, write buffer failures, performs resets
and generally stuffs around for a few minutes before deciding it will allow async.

I have 2 other discs (Seagate SX118202LS) attached to the same chain which still
work.  My scsi system works perfectly in 2.6.10 with no data corruption.

I have a very similar issue with a dual SYM53C896 in a different machine: it
works in 2.6.10 but produces occasional noise in dmesg:
sym0:9:0:phase change 6-7 11@17cd5f84 resid=6.
On newer (>= 2.6.14) kernels, it completely fails to boot, giving the same sort
of parity errors I'm having with the aic7xxx driver.  When I checked at the time
(a while ago now), there was NO change in the relevant driver between kernel
versions.  The only changes are to the scsi architecture.

This leads me to believe the bug is not in the aic7xxx driver but the new scsi
domain validation code that was overhauled somewhere around 2.6.13-14.

Some of my error messages look a lot like those in #5268.

Steps to reproduce: Use kernel >=2.6.14 with aic7892 or 53C896.
Comment 1 William Brodie-Tyrrell 2006-06-06 08:52:46 UTC
Created attachment 8267 [details]
dmesg output (first part missing, sorry)

dmesg has truncated the early messages, sorry.	You can see the tail end of the
final reset phase, which is about 10% of what it had to say for itself.
Comment 2 William Brodie-Tyrrell 2006-06-06 08:54:06 UTC
Created attachment 8268 [details]
/proc/scsi/aic7xxx/0
Comment 3 William Brodie-Tyrrell 2006-06-17 06:29:47 UTC
Created attachment 8325 [details]
dmesg output from working 2.6.10 kernel.
Comment 4 William Brodie-Tyrrell 2006-06-17 06:31:38 UTC
Created attachment 8326 [details]
patch from James Bottomley for debug purposes
Comment 5 William Brodie-Tyrrell 2006-06-17 06:32:30 UTC
Created attachment 8327 [details]
dmesg from 2.6.16 after applying James' patch (attached above)

Note the parity errors on data-in, even after skipping the write tests.
Comment 6 Adrian Bunk 2006-12-02 01:55:25 UTC
Is this issue still present in kernel 2.6.19?
Comment 7 Adrian Bunk 2007-02-13 23:27:40 UTC
Please reopen this bug if it's still present with kernel 2.6.20.

Note You need to log in before you can comment on or make changes to this bug.