Bug 5268

Summary: aic79xx scsi driver causing tape drive not to work
Product: SCSI Drivers Reporter: James D Freels (freelsjd)
Component: OtherAssignee: Hannes Reinecke (hare)
Status: RESOLVED PATCH_ALREADY_AVAILABLE    
Severity: blocking CC: akpm, hare
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.22.1 Subsystem:
Regression: --- Bisected commit-id:
Attachments: error messages for aic79xxx scsi error on tape drive for kernel 2.6.14

Description James D Freels 2005-09-16 08:14:04 UTC
Most recent kernel where this bug did not occur: 2.6.12.6
Distribution: Debian/Stable (Sarge)
Hardware Environment: Dual-processor Xeon, 3.05 GHz
Software Environment: AMANDA backup
Problem Description: interactive manipulation at the console using amanda 
commands on the tape drive (such as amlabel, amcheck) cause the scsi chain to 
hang.  Here is the /var/log/messages log entry at the time of hang:

Sep 16 10:48:13 fea3 kernel: st0: Error with sense data: <6>st0: Current: sense 
key: Aborted Command
Sep 16 10:48:13 fea3 kernel:     Additional sense: Scsi parity error
Sep 16 10:48:13 fea3 kernel: st0: Can't set default compression.
Sep 16 10:48:13 fea3 kernel: (scsi1:A:5): 40.000MB/s transfers (20.000MHz, 
16bit)
Sep 16 10:48:13 fea3 kernel: st0: Error with sense data: <6>st0: Current: sense 
key: Aborted Command
Sep 16 10:48:13 fea3 kernel:     Additional sense: Scsi parity error
Sep 16 10:52:15 fea3 syslogd 1.4.1#17: restart.
S

Steps to reproduce:

Two Adaptec 7902 scsi cards are in the system and should be able to run at 320 
MB/.  The first card is connected to hard drives running at 320 MB/s.  The 
second is connected to a single tape drive only (all by itself) and running at 
80 MB/s.  This setup has been running flawlessly for years under the 2.4.x 
kernels and now the 2.6.x kernels.  Only at the 2.6.13.0 and 2.6.13.1 kernel has 
this problem started.

I simply login and run the AMANDA command amlabel (or any command that operates 
on the tape drive) and the system will hang.  Only interactive operations cause 
this to happen.  Cron jobs running amanda applications do not cause this to 
happen.  This has happened on both the 2.6.13 and 2.6.13.1 kernels.  I have 
experimented with two different settings on the pre-emptive kernel settings and 
it made no difference.  I think the pre-emptive settings did change between 2.6.
12.x and 2.6.13.x.
Comment 1 Andrew Morton 2005-09-16 11:50:42 UTC
Perhaps the problem is related to your keyboard driver?

What happens if you run those commands across an ethernet
session?  telnet/ssh?

Try adding `usb-handoff' to the kernel boot command line.
Comment 2 James D Freels 2005-09-16 11:59:42 UTC
I have miss-informed you.  When I stated that I issued the amanda
command "from the console", my normal mode of console access to this
machine is remotely via ssh using an xterm.
So, the command was issued over an ethernet session and not actually
directly at the console local to the machine in both instances of this
problem.

I will try the "usb-handoff" kernel boot option.

>From /usr/src/linux-2.6.13.1/Documentation/kernel-parameter.txt

usb-handoff     [HW] Enable early USB BIOS -> OS handoff

Out of curiousity as I might learn something, what might this do to
prevent a scsi-type error ?

On Fri, 2005-09-16 at 11:50 -0700, bugme-daemon@kernel-bugs.osdl.org
wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=5268
> 
> akpm@osdl.org changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                  CC|                            |akpm@osdl.org
> 
> 
> 
> ------- Additional Comments From akpm@osdl.org  2005-09-16 11:50 -------
> Perhaps the problem is related to your keyboard driver?
> 
> What happens if you run those commands across an ethernet
> session?  telnet/ssh?
> 
> Try adding `usb-handoff' to the kernel boot command line.
> 
> ------- You are receiving this mail because: -------
> You reported the bug, or are watching the reporter.

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 TRANSITIONAL//EN">
<HTML>
<HEAD>
  <META HTTP-EQUIV="Content-Type" CONTENT="text/html; CHARSET=UTF-8">
  <META NAME="GENERATOR" CONTENT="GtkHTML/3.2.5">
</HEAD>
<BODY>
I have miss-informed you.&nbsp; When I stated that I issued the amanda command &quot;from the console&quot;, my normal mode of console access to this machine is remotely via ssh using an xterm.<BR>
So, the command was issued over an ethernet session and not actually directly at the console local to the machine in both instances of this problem.<BR>
<BR>
I will try the &quot;usb-handoff&quot; kernel boot option.<BR>
<BR>
>From /usr/src/linux-2.6.13.1/Documentation/kernel-parameter.txt<BR>
<BR>
usb-handoff&nbsp;&nbsp;&nbsp;&nbsp; [HW] Enable early USB BIOS -&gt; OS handoff<BR>
<BR>
Out of curiousity as I might learn something, what might this do to prevent a scsi-type error ?<BR>
<BR>
On Fri, 2005-09-16 at 11:50 -0700, bugme-daemon@kernel-bugs.osdl.org wrote:
<BLOCKQUOTE TYPE=CITE>
<PRE>
<FONT COLOR="#000000"><A HREF="http://bugzilla.kernel.org/show_bug.cgi?id=5268">http://bugzilla.kernel.org/show_bug.cgi?id=5268</A></FONT>

<FONT COLOR="#000000"><A HREF="mailto:akpm@osdl.org">akpm@osdl.org</A> changed:</FONT>

<FONT COLOR="#000000">           What    |Removed                     |Added</FONT>
<FONT COLOR="#000000">----------------------------------------------------------------------------</FONT>
<FONT COLOR="#000000">                 CC|                            <A HREF="mailto:|akpm@osdl.org">|akpm@osdl.org</A></FONT>



<FONT COLOR="#000000">------- Additional Comments From <A HREF="mailto:akpm@osdl.org">akpm@osdl.org</A>  2005-09-16 11:50 -------</FONT>
<FONT COLOR="#000000">Perhaps the problem is related to your keyboard driver?</FONT>

<FONT COLOR="#000000">What happens if you run those commands across an ethernet</FONT>
<FONT COLOR="#000000">session?  telnet/ssh?</FONT>

<FONT COLOR="#000000">Try adding `usb-handoff' to the kernel boot command line.</FONT>

<FONT COLOR="#000000">------- You are receiving this mail because: -------</FONT>
<FONT COLOR="#000000">You reported the bug, or are watching the reporter.</FONT>
</PRE>
</BLOCKQUOTE>
<TABLE CELLSPACING="0" CELLPADDING="0" WIDTH="100%">
<TR>
<TD>
-- <BR>
Freels, James D. &lt;<A HREF="mailto:freelsjd@ornl.gov">freelsjd@ornl.gov</A>&gt;<BR>
Oak Ridge National Laboratory
</TD>
</TR>
</TABLE>
</BODY>
</HTML>
Comment 3 Andrew Morton 2005-09-16 12:03:01 UTC
Please trim the emails when responding.

In that case, usb-handoff probably won't help.

Comment 4 Andrew Morton 2005-09-17 13:31:33 UTC
(Add reporter and bugzilla to cc)

James Bottomley <James.Bottomley@SteelEye.com> wrote:
>
> On Fri, 2005-09-16 at 12:03 -0700, Andrew Morton wrote:
> > This one's a bit strange.
> > 
> > It's a post-2.6.12 regression.
> 
> Yes, strange to me too.  The changes that went in to aic79xx between
> 2.6.12 and 2.6.13 were tiny:
> 
>  aic79xx_osm.c |   18 +++++++++---------
>  aic79xx_osm.h |   17 -----------------
>  aic79xx_pci.c |    2 +-
>  3 files changed, 10 insertions(+), 27 deletions(-)
> 
> I've also attached them below, but I think they were
> 
> 1) #if -> #ifdef changes
> 2) removal of the duplicated ENDIAN macros
> 3) ahd_midlayer_entrypoint_lock/unlock -> ahd_lock/unlock
> 
> I can't see how anything in these could produce the shown behaviour.
> 
> James
> 
> diff --git a/drivers/scsi/aic7xxx/aic79xx_osm.c b/drivers/scsi/aic7xxx/aic79xx_osm.c
> --- a/drivers/scsi/aic7xxx/aic79xx_osm.c
> +++ b/drivers/scsi/aic7xxx/aic79xx_osm.c
> @@ -1505,23 +1505,23 @@ ahd_linux_dev_reset(Scsi_Cmnd *cmd)
>  	memset(recovery_cmd, 0, sizeof(struct scsi_cmnd));
>  	recovery_cmd->device = cmd->device;
>  	recovery_cmd->scsi_done = ahd_linux_dev_reset_complete;
> -#if AHD_DEBUG
> +#ifdef AHD_DEBUG
>  	if ((ahd_debug & AHD_SHOW_RECOVERY) != 0)
>  		printf("%s:%d:%d:%d: Device reset called for cmd %p\n",
>  		       ahd_name(ahd), cmd->device->channel, cmd->device->id,
>  		       cmd->device->lun, cmd);
>  #endif
> -	ahd_midlayer_entrypoint_lock(ahd, &s);
> +	ahd_lock(ahd, &s);
>  
>  	dev = ahd_linux_get_device(ahd, cmd->device->channel, cmd->device->id,
>  				   cmd->device->lun, /*alloc*/FALSE);
>  	if (dev == NULL) {
> -		ahd_midlayer_entrypoint_unlock(ahd, &s);
> +		ahd_unlock(ahd, &s);
>  		kfree(recovery_cmd);
>  		return (FAILED);
>  	}
>  	if ((scb = ahd_get_scb(ahd, AHD_NEVER_COL_IDX)) == NULL) {
> -		ahd_midlayer_entrypoint_unlock(ahd, &s);
> +		ahd_unlock(ahd, &s);
>  		kfree(recovery_cmd);
>  		return (FAILED);
>  	}
> @@ -1553,7 +1553,7 @@ ahd_linux_dev_reset(Scsi_Cmnd *cmd)
>  	ahd_queue_scb(ahd, scb);
>  
>  	scb->platform_data->flags |= AHD_SCB_UP_EH_SEM;
> -	spin_unlock_irq(&ahd->platform_data->spin_lock);
> +	ahd_unlock(ahd, &s);
>  	init_timer(&timer);
>  	timer.data = (u_long)scb;
>  	timer.expires = jiffies + (5 * HZ);
> @@ -1567,10 +1567,10 @@ ahd_linux_dev_reset(Scsi_Cmnd *cmd)
>  		printf("Timer Expired\n");
>  		retval = FAILED;
>  	}
> -	spin_lock_irq(&ahd->platform_data->spin_lock);
> +	ahd_lock(ahd, &s);
>  	ahd_schedule_runq(ahd);
>  	ahd_linux_run_complete_queue(ahd);
> -	ahd_midlayer_entrypoint_unlock(ahd, &s);
> +	ahd_unlock(ahd, &s);
>  	printf("%s: Device reset returning 0x%x\n", ahd_name(ahd), retval);
>  	return (retval);
>  }
> @@ -1591,11 +1591,11 @@ ahd_linux_bus_reset(Scsi_Cmnd *cmd)
>  		printf("%s: Bus reset called for cmd %p\n",
>  		       ahd_name(ahd), cmd);
>  #endif
> -	ahd_midlayer_entrypoint_lock(ahd, &s);
> +	ahd_lock(ahd, &s);
>  	found = ahd_reset_channel(ahd, cmd->device->channel + 'A',
>  				  /*initiate reset*/TRUE);
>  	ahd_linux_run_complete_queue(ahd);
> -	ahd_midlayer_entrypoint_unlock(ahd, &s);
> +	ahd_unlock(ahd, &s);
>  
>  	if (bootverbose)
>  		printf("%s: SCSI bus reset delivered. "
> diff --git a/drivers/scsi/aic7xxx/aic79xx_osm.h b/drivers/scsi/aic7xxx/aic79xx_osm.h
> --- a/drivers/scsi/aic7xxx/aic79xx_osm.h
> +++ b/drivers/scsi/aic7xxx/aic79xx_osm.h
> @@ -112,23 +112,6 @@ typedef Scsi_Cmnd      *ahd_io_ctx_t;
>  #define ahd_le32toh(x)	le32_to_cpu(x)
>  #define ahd_le64toh(x)	le64_to_cpu(x)
>  
> -#ifndef LITTLE_ENDIAN
> -#define LITTLE_ENDIAN 1234
> -#endif
> -
> -#ifndef BIG_ENDIAN
> -#define BIG_ENDIAN 4321
> -#endif
> -
> -#ifndef BYTE_ORDER
> -#if defined(__BIG_ENDIAN)
> -#define BYTE_ORDER BIG_ENDIAN
> -#endif
> -#if defined(__LITTLE_ENDIAN)
> -#define BYTE_ORDER LITTLE_ENDIAN
> -#endif
> -#endif /* BYTE_ORDER */
> -
>  /************************* Configuration Data *********************************/
>  extern uint32_t aic79xx_allow_memio;
>  extern int aic79xx_detect_complete;
> diff --git a/drivers/scsi/aic7xxx/aic79xx_pci.c b/drivers/scsi/aic7xxx/aic79xx_pci.c
> --- a/drivers/scsi/aic7xxx/aic79xx_pci.c
> +++ b/drivers/scsi/aic7xxx/aic79xx_pci.c
> @@ -582,7 +582,7 @@ ahd_check_extport(struct ahd_softc *ahd)
>  		}
>  	}
>  
> -#if AHD_DEBUG
> +#ifdef AHD_DEBUG
>  	if (have_seeprom != 0
>  	 && (ahd_debug & AHD_DUMP_SEEPROM) != 0) {
>  		uint16_t *sc_data;

Comment 5 James D Freels 2005-11-05 09:58:16 UTC
Created attachment 6480 [details]
error messages for aic79xxx scsi error on tape drive for kernel 2.6.14

error messages for aic79xxx scsi error on tape drive for kernel 2.6.14
Comment 6 James D Freels 2005-11-05 10:01:33 UTC
the update from 2.6.13.x to 2.6.14 caused much improvement in this driver.  It 
no longer hangs the system when doing a tape driver operation.  However, now the 
tape drive just doesn't work at all.  It is found by the kernel, but operations 
give I/O error messages due to parity errors.  I have attached the log file 
output during the time of tape-drive failure.  No other errors are persent.
Comment 7 James D Freels 2005-12-30 09:57:26 UTC
I had a similar bug on another machine.  This machine is a dual-processor amd64.
  I have an Adaptec 29160N scsi card and using the aic7xxx (new one) driver.  
It will not work at all in 64-bit mode, but in chroot 32-bit mode it works 
sometimes, but fails about 50% of the time in a similar manner when writing to 
the tape (using amdump which calls taper).  Today, I tried the aic7xxx_old 
driver, and it corrected the problem.  

I believe the aic79xx and aic7xxx drivers, which I understand are supported by 
Adaptec, are having problems with tape drives.  See http://www.linuxtapecert.org
/ for a separate verification of this issue.
Comment 8 James D Freels 2006-01-05 07:05:03 UTC
I continue to have this bug with the new 2.6.15 kernel.  aic79xx driver on 320 
MB/s hard drives + slower tape drive.
Comment 9 James D Freels 2006-01-06 20:09:40 UTC
I retested with the 2.6.15 kernel today, and this bug is very-much still 
present.  I know the tape drive hardware is functional because everything works 
as it should under the 2.6.12.6 kernel.

I snooped around the Adaptec web pages tonight and discovered that the latest 
driver version released by Adaptec for the 2.6 kernels is 2.0.15 released 9/25/
05.  The version in the 2.6.15 kernel is printed as 1.3.11 released 7/11/03.  
This version/date can be confirmed in the Adaptec changelog.  So, indeed, the 
source code/driver in the 2.6.15 kernel source tree is nearly 2 years behind 
the official Adaptec drivers.  So, if we have hardware that is younger than this
 (which I may have in this case), this could explain the problem !!

I also took a look at the latest binary drivers available from Adaptec (which 
of course are available from Red Hat Enterprise) and they are for kernel 2.6.9.
So, it is just inconsistent all the way around !  

I sure would like to get this working because this is the only machine I have 
out of 7 machines that cannot run the 2.6.15 kernel !

Comment 10 James D Freels 2006-01-11 10:28:49 UTC
I performed one more additional test yesterday.  This machine actually has two 
Adaptec adapters in it.  The first is built into the mother board and is also a 
39320a capable of 320 MB/s.  It is on board a Tyan Thunder K8SD Pro MB.  This 
adapter was originally disabled by the vendor who supplied it (Monarch 
Computers, Atlanta, GA) because they could not get it to test out without error.
  So, they disabled it and supplied a separate pci add-on Adaptec 39320a card.  
As supplied, this is what I originally had when this bug showed up.

So, I thought, why not configure the system with the tape drive connected to 
this entirely separate adapter and see if this helps ? So, I did, and it made 
no difference.  The kernel found both adapters and all devices, but this error 
still shows up.

Then after I wrote up this bug report, I thought I would reconfigure everything 
back to the original configuration and then disable the on-board device as 
before (thinking, perhaps this really is a bad adapter, and with a new kernel 
driver, perhaps things will now actually work).  So, now all the scsi devices 
are connected to the pci adapter.  Both the on-board and the pci adapter have 
two separate wide channels capable of 16 devices each (grand total of 64 
devices would be possible if all were connected and enabled).  On channel 1 of 
the pci adapter, I have 3 scsi hard drives.  On channel 2, I have a single tape 
drive (that works fine under 2.6.12.6 of the kernel).  Typical arguments you 
hear about scsi device problems are remedied by separating the faster hard 
drives from the slower devices like tape drives or CD drives.  So, this 
argument cannot be made here since all are separate (and working in 2.6.12.6 ! )

So, with all these attempts are reconfiguration, the bug is still present in 2.
6.15.  One improvement is that it does not hang the system.  It is just that 
the tape drive does not work.

If there is anything I can do to help debug this problem, please let me know.  
I have read over on the AMANDA-users mailing list that I am not the only user 
with problems getting their tape drives to work with the Adaptec drivers.  I 
have an LSI adapter that I may use next.  If this does not get fixed soon, I 
may never use Adaptec again.  This makes no sense why they (Adaptec) would have 
a driver out that fails.
Comment 11 James D Freels 2006-01-17 13:00:24 UTC
There were a number of changes in this kernel version for the aic79xx driver.  
So, I thought I would give it a try.  Indeed, it did behave differently this 
time...it hung the machine.  I am including here the relevant scsi debug 
messages printed to /var/log/kern.log at the time of the failure.  All I did 
was write a label to the scsi tape.  Again, the hard drive access is not a 
problem, and I have the scsi tape connected by itself to the second channel of 
a dual channel Adaptec 39320a scsi card.

Jan 17 15:52:17 fea3 kernel: st0: Block limits 1 - 16777215 bytes.
Jan 17 15:52:17 fea3 kernel: program stinit is using a deprecated SCSI ioctl, 
please convert it to SG_IO
Jan 17 15:52:17 fea3 kernel: st0: Error with sense data: <6>st: Current: sense 
key: Aborted Command
Jan 17 15:52:17 fea3 kernel:     Additional sense: Scsi parity error
Jan 17 15:52:19 fea3 kernel: program smartctl is using a deprecated SCSI ioctl, 
please convert it to SG_IO
Jan 17 15:52:20 fea3 last message repeated 17 times
Jan 17 15:52:20 fea3 kernel: program smartd is using a deprecated SCSI ioctl, 
please convert it to SG_IO
Jan 17 15:52:21 fea3 last message repeated 44 times
Jan 17 15:55:05 fea3 kernel: st0: Error with sense data: <6>st: Current: sense 
key: Aborted Command
Jan 17 15:55:05 fea3 kernel:     Additional sense: Scsi parity error
Jan 17 15:55:05 fea3 kernel: st0: Can't set default compression.
Jan 17 15:55:05 fea3 kernel: st0: Error with sense data: <6>st: Current: sense 
key: Aborted Command
Jan 17 15:55:05 fea3 kernel:     Additional sense: Scsi parity error
Jan 17 15:55:15 fea3 kernel: st 1:0:5:0: Attempting to queue an ABORT message:
CDB: 0x1e 0x0 0x0 0x0 0x0 0x0
Jan 17 15:55:15 fea3 kernel: scsi1: At time of recovery, card was not paused
Jan 17 15:55:15 fea3 kernel: >>>>>>>>>>>>>>>>>> Dump Card State Begins 
<<<<<<<<<<<<<<<<<
Jan 17 15:55:15 fea3 kernel: scsi1: Dumping Card State at program address 0x6 
Mode 0x33
Jan 17 15:55:15 fea3 kernel: Card was paused
Jan 17 15:55:15 fea3 kernel: HS_MAILBOX[0x0] INTCTL[0x80]:(SWTMINTMASK) 
SEQINTSTAT[0x0]
Jan 17 15:55:15 fea3 kernel: SAVED_MODE[0x11] DFFSTAT[0x33]:(CURRFIFO_NONE|FIFO0
FREE|FIFO1FREE)
Jan 17 15:55:15 fea3 kernel: SCSISIGI[0x0]:(P_DATAOUT) SCSIPHASE[0x0] SCSIBUS[0x
0]
Jan 17 15:55:15 fea3 kernel: LASTPHASE[0x1]:(P_DATAOUT|P_BUSFREE) SCSISEQ0[0x0]
Jan 17 15:55:15 fea3 kernel: SCSISEQ1[0x12]:(ENAUTOATNP|ENRSELI) SEQCTL0[0x0]
Jan 17 15:55:15 fea3 kernel: SEQINTCTL[0x0] SEQ_FLAGS[0xc0]:(NO_CDB_SENT|NOT_
IDENTIFIED)
Jan 17 15:55:15 fea3 kernel: SEQ_FLAGS2[0x4]:(SELECTOUT_QFROZEN) SSTAT0[0x0]
Jan 17 15:55:15 fea3 kernel: SSTAT1[0x8]:(BUSFREE) SSTAT2[0x0] SSTAT3[0x0] 
PERRDIAG[0xc0]:(HIPERR|HIZERO)
Jan 17 15:55:15 fea3 kernel: SIMODE1[0xa4]:(ENSCSIPERR|ENSCSIRST|ENSELTIMO)
Jan 17 15:55:15 fea3 kernel: LQISTAT0[0x0] LQISTAT1[0x0] LQISTAT2[0x0] LQOSTAT0
[0x0]
Jan 17 15:55:15 fea3 kernel: LQOSTAT1[0x0] LQOSTAT2[0x0]
Jan 17 15:55:15 fea3 kernel:
Jan 17 15:55:15 fea3 kernel: SCB Count = 4 CMDS_PENDING = 1 LASTSCB 0xffff 
CURRSCB 0x3 NEXTSCB 0x0
Jan 17 15:55:15 fea3 kernel: qinstart = 83 qinfifonext = 83
Jan 17 15:55:15 fea3 kernel: QINFIFO:
Jan 17 15:55:15 fea3 kernel: WAITING_TID_QUEUES:
Jan 17 15:55:15 fea3 kernel: Pending list:
Jan 17 15:55:15 fea3 kernel:   3 FIFO_USE[0x0] SCB_CONTROL[0x44]:(DISCONNECTED|
DISCENB)
Jan 17 15:55:15 fea3 kernel: SCB_SCSIID[0x57]
Jan 17 15:55:15 fea3 kernel: Total 1
Jan 17 15:55:15 fea3 kernel: Kernel Free SCB list: 2 1 0
Jan 17 15:55:15 fea3 kernel: Sequencer Complete DMA-inprog list:
Jan 17 15:55:15 fea3 kernel: Sequencer Complete list:
Jan 17 15:55:15 fea3 kernel: Sequencer DMA-Up and Complete list:
Jan 17 15:55:15 fea3 kernel: Sequencer On QFreeze and Complete list:
Jan 17 15:55:15 fea3 kernel:
Jan 17 15:55:15 fea3 kernel: scsi1: FIFO0 Free, LONGJMP == 0x80ff, SCB 0x0
Jan 17 15:55:15 fea3 kernel: SEQIMODE[0x3f]:(ENCFG4TCMD|ENCFG4ICMD|ENCFG4TSTAT|
ENCFG4ISTAT|ENCFG4DATA|ENSAVEPTRS)
Jan 17 15:55:15 fea3 kernel: SEQINTSRC[0x0] DFCNTRL[0x0] DFSTATUS[0x89]:(FIFOEMP
|HDONE|PRELOAD_AVAIL)
Jan 17 15:55:15 fea3 kernel: SG_CACHE_SHADOW[0x2]:(LAST_SEG) SG_STATE[0x0] 
DFFSXFRCTL[0x0]
Jan 17 15:55:15 fea3 kernel: SOFFCNT[0x0] MDFFSTAT[0x5]:(FIFOFREE|DLZERO) SHADDR
 = 0x00, SHCNT = 0x0
Jan 17 15:55:15 fea3 kernel: HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x10]:(SG_CACHE_
AVAIL)
Jan 17 15:55:15 fea3 kernel: scsi1: FIFO1 Free, LONGJMP == 0x8063, SCB 0x3
Jan 17 15:55:15 fea3 kernel: SEQIMODE[0x3f]:(ENCFG4TCMD|ENCFG4ICMD|ENCFG4TSTAT|
ENCFG4ISTAT|ENCFG4DATA|ENSAVEPTRS)
Jan 17 15:55:15 fea3 kernel: SEQINTSRC[0x0] DFCNTRL[0x4]:(DIRECTION) DFSTATUS[0x
89]:(FIFOEMP|HDONE|PRELOAD_AVAIL)
Jan 17 15:55:15 fea3 kernel: SG_CACHE_SHADOW[0x2]:(LAST_SEG) SG_STATE[0x0] 
DFFSXFRCTL[0x0]
Jan 17 15:55:15 fea3 kernel: SOFFCNT[0x0] MDFFSTAT[0x5]:(FIFOFREE|DLZERO) SHADDR
 = 0x00, SHCNT = 0x0
Jan 17 15:55:15 fea3 kernel: HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x10]:(SG_CACHE_
AVAIL)
Jan 17 15:55:15 fea3 kernel: LQIN: 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0
x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0
Jan 17 15:55:15 fea3 kernel: scsi1: LQISTATE = 0x0, LQOSTATE = 0x0, OPTIONMODE =
 0x52
Jan 17 15:55:15 fea3 kernel: scsi1: OS_SPACE_CNT = 0x20 MAXCMDCNT = 0x0
Jan 17 15:55:15 fea3 kernel:
Jan 17 15:55:15 fea3 kernel: SIMODE0[0xc]:(ENOVERRUN|ENIOERR)
Jan 17 15:55:15 fea3 kernel: CCSCBCTL[0x4]:(CCSCBDIR)
Jan 17 15:55:15 fea3 kernel: scsi1: REG0 == 0x3, SINDEX = 0x1ba, DINDEX = 0x1bc
Jan 17 15:55:15 fea3 kernel: scsi1: SCBPTR == 0x3, SCB_NEXT == 0xffc0, SCB_NEXT2
 == 0xff50
Jan 17 15:55:15 fea3 kernel: CDB 1e 0 0 0 0 0
Jan 17 15:55:15 fea3 kernel: STACK: 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0
Jan 17 15:55:15 fea3 kernel: <<<<<<<<<<<<<<<<< Dump Card State Ends 
>>>>>>>>>>>>>>>>>>
Jan 17 15:55:15 fea3 kernel: st 1:0:5:0: BDR message in message buffer
Jan 17 15:55:15 fea3 kernel: scsi1: Recovery code sleeping
Jan 17 15:55:20 fea3 kernel: scsi1: Recovery code awake
Jan 17 15:55:20 fea3 kernel: scsi1: Timer Expired (active 1)
Jan 17 15:55:20 fea3 kernel: aic79xx_abort returns 0x2003
Jan 17 15:55:20 fea3 kernel: st 1:0:5:0: Attempting to queue a TARGET RESET 
message:CDB: 0x1e 0x0 0x0 0x0 0x0 0x0
Jan 17 15:55:20 fea3 kernel: aic79xx_dev_reset returns 0x2003
Jan 17 15:55:20 fea3 kernel: Recovery SCB completes

Comment 12 Hannes Reinecke 2006-01-31 03:03:17 UTC
I've posted some additional fixes for aic79xx on linux-scsi.
Especially the last one (marked '[PATCH] aic79xx: Fix timer handling') should
help here. Could you try them out?
Comment 13 James D Freels 2006-03-24 18:58:38 UTC
I checked for this bug again with the release of 2.6.16 and it continues to be 
present.  I simply tried to write to the tape and it hung the system solid.  It 
required a complete reset in order to boot.  As long as I do not use the tape 
drive on this system, I can use the newer kernels.  So, until this bug is 
fixed, or I replace the scsi card with one of a different type, I am no longer 
using the tape drive on this system.  

Comment 14 Hannes Reinecke 2006-10-23 23:46:45 UTC
Looks like the tape drive is not getting configured correctly via DV. I've made
some patches for aic79xx (which went in for 2.6.17) which allowed the DV to be
overridden by the BIOS settings.
So please test with a recent kernel and set the tape speed to something lower;
IIRC 10MB/s or 20MB/s should be okay.
Comment 15 James D Freels 2006-10-24 04:15:17 UTC
At present, I am out of town on travel.  When I return, I will try this.
I have not been using this tape drive all this time due to the bug.  I
am currently running 2.6.18.1 on that machine, so it will not take long
to test.

let you know...

On Mon, 2006-10-23 at 23:59 -0700, bugme-daemon@bugzilla.kernel.org
wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=5268
> 
> hare@suse.de changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                  CC|                            |hare@suse.de
>               Owner|andmike@us.ibm.com          |hare@suse.de
>              Status|NEW                         |ASSIGNED
> 
> 
> 
> ------- Additional Comments From hare@suse.de  2006-10-23 23:46 -------
> Looks like the tape drive is not getting configured correctly via DV. I've made
> some patches for aic79xx (which went in for 2.6.17) which allowed the DV to be
> overridden by the BIOS settings.
> So please test with a recent kernel and set the tape speed to something lower;
> IIRC 10MB/s or 20MB/s should be okay.
> 
> ------- You are receiving this mail because: -------
> You reported the bug, or are watching the reporter.
Comment 16 James D Freels 2006-10-28 14:00:59 UTC
I have repeated the test, and continue to get the error.  The machine
does not hang, but the tape drive shows errors.

Here is the output to the console:

fea3::/home/amanda/: amlabel -f fea fea17
rewinding, reading label, reading label: Input/output error
rewinding, writing label fea17, checking label
amlabel: reading label: Input/output error

I can I enforce a 10 MB/s or 20 MB/s limit on the tape drive over what
the driver allows for ?

Here is the output from cat /proc/scsi/scsi for the tape drive separate
channel scsi1

Host: scsi1 Channel: 00 Id: 05 Lun: 00
  Vendor: SEAGATE  Model: DAT    9SP40-000 Rev: 9100
  Type:   Sequential-Access                ANSI SCSI revision: 03

It looks like it is trying for 80 MB/s on this card: (output from
cat /proc/scsi/aic79xx/1/ ):

Adaptec AIC79xx driver version: 3.0
Adaptec 39320A Ultra320 SCSI adapter
aic7902: Ultra320 Wide Channel B, SCSI Id=7, PCI-X 101-133Mhz, 512 SCBs
Allocated SCBs: 4, SG List Length: 128

Serial EEPROM:
0x17c8 0x17c8 0x17c8 0x17c8 0x17c8 0x17c8 0x17c8 0x17c8
0x17c8 0x17c8 0x17c8 0x17c8 0x17c8 0x17c8 0x17c8 0x17c8
0x09f4 0x0142 0x2807 0x0010 0xffff 0xffff 0xffff 0xffff
0xffff 0xffff 0xffff 0xffff 0xffff 0xffff 0x0430 0xb3f3

Target 0 Negotiation Settings
        User: 320.000MB/s transfers (160.000MHz RDSTRM|DT|IU|RTI|QAS,
16bit)
Target 1 Negotiation Settings
        User: 320.000MB/s transfers (160.000MHz RDSTRM|DT|IU|RTI|QAS,
16bit)
Target 2 Negotiation Settings
        User: 320.000MB/s transfers (160.000MHz RDSTRM|DT|IU|RTI|QAS,
16bit)
Target 3 Negotiation Settings
        User: 320.000MB/s transfers (160.000MHz RDSTRM|DT|IU|RTI|QAS,
16bit)
Target 4 Negotiation Settings
        User: 320.000MB/s transfers (160.000MHz RDSTRM|DT|IU|RTI|QAS,
16bit)
Target 5 Negotiation Settings
        User: 320.000MB/s transfers (160.000MHz RDSTRM|DT|IU|RTI|QAS,
16bit)
        Goal: 80.000MB/s transfers (40.000MHz, 16bit)
        Curr: 80.000MB/s transfers (40.000MHz, 16bit)
        Channel A Target 5 Lun 0 Settings
                Commands Queued 201
                Commands Active 0
                Command Openings 1
                Max Tagged Openings 0
                Device Queue Frozen Count 0
Target 6 Negotiation Settings
        User: 320.000MB/s transfers (160.000MHz RDSTRM|DT|IU|RTI|QAS,
16bit)
Target 7 Negotiation Settings
        User: 320.000MB/s transfers (160.000MHz RDSTRM|DT|IU|RTI|QAS,
16bit)
Target 8 Negotiation Settings
        User: 320.000MB/s transfers (160.000MHz RDSTRM|DT|IU|RTI|QAS,
16bit)
Target 9 Negotiation Settings
        User: 320.000MB/s transfers (160.000MHz RDSTRM|DT|IU|RTI|QAS,
16bit)
Target 10 Negotiation Settings
        User: 320.000MB/s transfers (160.000MHz RDSTRM|DT|IU|RTI|QAS,
16bit)
Target 11 Negotiation Settings
        User: 320.000MB/s transfers (160.000MHz RDSTRM|DT|IU|RTI|QAS,
16bit)
Target 12 Negotiation Settings
        User: 320.000MB/s transfers (160.000MHz RDSTRM|DT|IU|RTI|QAS,
16bit)
Target 13 Negotiation Settings
        User: 320.000MB/s transfers (160.000MHz RDSTRM|DT|IU|RTI|QAS,
16bit)
Target 14 Negotiation Settings
        User: 320.000MB/s transfers (160.000MHz RDSTRM|DT|IU|RTI|QAS,
16bit)
Target 15 Negotiation Settings
        User: 320.000MB/s transfers (160.000MHz RDSTRM|DT|IU|RTI|QAS,
16bit)

On Tue, 2006-10-24 at 07:27 -0400, Freels, James D. wrote:
> At present, I am out of town on travel.  When I return, I will try this.
> I have not been using this tape drive all this time due to the bug.  I
> am currently running 2.6.18.1 on that machine, so it will not take long
> to test.
> 
> let you know...
> 
> On Mon, 2006-10-23 at 23:59 -0700, bugme-daemon@bugzilla.kernel.org
> wrote:
> > http://bugzilla.kernel.org/show_bug.cgi?id=5268
> > 
> > hare@suse.de changed:
> > 
> >            What    |Removed                     |Added
> > ----------------------------------------------------------------------------
> >                  CC|                            |hare@suse.de
> >               Owner|andmike@us.ibm.com          |hare@suse.de
> >              Status|NEW                         |ASSIGNED
> > 
> > 
> > 
> > ------- Additional Comments From hare@suse.de  2006-10-23 23:46 -------
> > Looks like the tape drive is not getting configured correctly via DV. I've made
> > some patches for aic79xx (which went in for 2.6.17) which allowed the DV to be
> > overridden by the BIOS settings.
> > So please test with a recent kernel and set the tape speed to something lower;
> > IIRC 10MB/s or 20MB/s should be okay.
> > 
> > ------- You are receiving this mail because: -------
> > You reported the bug, or are watching the reporter.
Comment 17 James D Freels 2006-12-01 19:53:14 UTC
this bug is still present in the exact same way on both the 2.6.18.4 and 2.6.19 
kernels
Comment 18 Dylan Martin 2007-01-24 15:39:40 UTC
I have a similer problem, but not identical.  I will file a bug when I've done
enough research.  In the mean time, my notes might help if they are related:

http://www.seattlecentral.edu/cgi-bin/cgiwrap/dmartin/moin.cgi/Kernel
Comment 19 James D Freels 2007-07-12 07:19:08 UTC
This bug continues with the 2.6.22.1 kernel
Comment 20 James D Freels 2007-07-30 07:59:19 UTC
the patch for bug #8366 also fixed this bug