Bug 2427

Summary: aic79xx occasionally loses track of SCBs
Product: SCSI Drivers Reporter: Herbert Xu (herbert)
Component: OtherAssignee: Mike Anderson (andmike)
Status: REJECTED INSUFFICIENT_DATA    
Severity: normal CC: bunk, d.cohrs, hare
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.4 Subsystem:
Regression: --- Bisected commit-id:
Attachments: Kernel log 1
Kernel log 2

Description Herbert Xu 2004-04-04 03:03:59 UTC
Distribution: Debian
Hardware Environment: i386
Problem Description:

The following is an extract of the report archived at
http://bugs.debian.org/241963.  It was submitted by David Cohrs.

Over the past few weeks, I've had problems with both an Adaptec AIC-29320
and an AIC-29320A card.  On the same system, using an AIC-29160 card has no
problems.  Given that I've had similar problems with 2 different cards
using the aic79xx driver, this makes me think the problem is with the
driver not the hardware.  I should note that I've experienced these
problems not only with kernel-image-2.6.4-1-686, but also the 2.6.3 image,
and the 2.4.24 image.
                                                                               
                                                                               
                  
I had the card plugged into PCI-32 slots on the Intel D865GBF motherboard.
I had 4 Maxtor ATLASU320_73_WLS Rev: B430 drives connected to the internal
LVD connector using an U320-rated cable (I tried replacing the cable and
terminator just to be sure; that had no effect either).  I also had an old
HP C5683A tape drive connected to the internal SE connector part of the
time.  My /etc/modules-2.6 contained
                                                                               
                                                                               
                  
sd_mod
ide-cd
ide-detect
aic79xx
uhci-hcd
md
intel-agp
pcspkr
                                                                               
                                                                               
                  
I have RAID-5 configured on the 4 SCSI drives using the md driver.
                                                                               
                                                                               
                  
Here's what the kernel reported on this card when it booted this morning,
as per kern.log:
                                                                               
                                                                               
                  
<see attachment>

Occasionally, the driver seems to lose track of command completions.
Usually, after some retries, it recovers from this situation, although
often afterward, as shown below, it degrades the speed of one or more of
the drives (also, I noticed that the "RTI" flag was no longer set according
to the driver output after recovery completed).  In one case a week ago,
however, two or more commands queued for different drives timed out,
causing the MD driver to mark the array as failed.  It was, in fact, OK.
Ie, after a reboot, forcing the array to re-assemble, and running fsck, I
inspected the parts of the filesystem being accessed at the time, and it
was all fine.  I've never had problems accessing the tape drive.
                                                                               
                                                                               
                  
I'm including the driver output captured from kern.log from a failure this
afternoon.  The system operated fine for about 3 hours before this failure.
After this failure, several additional similar failures occurred at
irregular intervals (e.g. 8 minutes later, then again about 8 minutes after
that).  The drive involved varied.  At the time, I was inducing load on the
system by tar'ing the contents of the filesytem on the RAID array to the
tape drive, although I've also had problems when not using the tape drive
at all (once, just running "ls" on a directory caused the problem to
occur).  I'll try to provide addtional information if requested, however,
I've stopped using the AIC-29320A card (ie. the drives have been moved to
the AIC-29160 card and the AIC-29320A has been removed from the system) and
no longer have access to the AIC-29220 card.  If I find someone I can
borrow a couple SCSI drives from, I'll be able to test the AIC-29320A again.
                                                                               
                                                                               
                  
<see attachment>
Comment 1 Herbert Xu 2004-04-04 03:04:43 UTC
Created attachment 2490 [details]
Kernel log 1
Comment 2 Herbert Xu 2004-04-04 03:05:08 UTC
Created attachment 2491 [details]
Kernel log 2
Comment 3 Adrian Bunk 2007-01-28 09:05:28 UTC
Is this issue still present with recent kernels?
Comment 4 Adrian Bunk 2007-03-06 15:12:49 UTC
Please reopen this bug if it's still present with kernel 2.6.20.