Most recent kernel where this bug did not occur: 2.6.12.6 Distribution: Debian/Stable (Sarge) Hardware Environment: Dual-processor Xeon, 3.05 GHz Software Environment: AMANDA backup Problem Description: interactive manipulation at the console using amanda commands on the tape drive (such as amlabel, amcheck) cause the scsi chain to hang. Here is the /var/log/messages log entry at the time of hang: Sep 16 10:48:13 fea3 kernel: st0: Error with sense data: <6>st0: Current: sense key: Aborted Command Sep 16 10:48:13 fea3 kernel: Additional sense: Scsi parity error Sep 16 10:48:13 fea3 kernel: st0: Can't set default compression. Sep 16 10:48:13 fea3 kernel: (scsi1:A:5): 40.000MB/s transfers (20.000MHz, 16bit) Sep 16 10:48:13 fea3 kernel: st0: Error with sense data: <6>st0: Current: sense key: Aborted Command Sep 16 10:48:13 fea3 kernel: Additional sense: Scsi parity error Sep 16 10:52:15 fea3 syslogd 1.4.1#17: restart. S Steps to reproduce: Two Adaptec 7902 scsi cards are in the system and should be able to run at 320 MB/. The first card is connected to hard drives running at 320 MB/s. The second is connected to a single tape drive only (all by itself) and running at 80 MB/s. This setup has been running flawlessly for years under the 2.4.x kernels and now the 2.6.x kernels. Only at the 2.6.13.0 and 2.6.13.1 kernel has this problem started. I simply login and run the AMANDA command amlabel (or any command that operates on the tape drive) and the system will hang. Only interactive operations cause this to happen. Cron jobs running amanda applications do not cause this to happen. This has happened on both the 2.6.13 and 2.6.13.1 kernels. I have experimented with two different settings on the pre-emptive kernel settings and it made no difference. I think the pre-emptive settings did change between 2.6. 12.x and 2.6.13.x.
Perhaps the problem is related to your keyboard driver? What happens if you run those commands across an ethernet session? telnet/ssh? Try adding `usb-handoff' to the kernel boot command line.
I have miss-informed you. When I stated that I issued the amanda command "from the console", my normal mode of console access to this machine is remotely via ssh using an xterm. So, the command was issued over an ethernet session and not actually directly at the console local to the machine in both instances of this problem. I will try the "usb-handoff" kernel boot option. >From /usr/src/linux-2.6.13.1/Documentation/kernel-parameter.txt usb-handoff [HW] Enable early USB BIOS -> OS handoff Out of curiousity as I might learn something, what might this do to prevent a scsi-type error ? On Fri, 2005-09-16 at 11:50 -0700, bugme-daemon@kernel-bugs.osdl.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=5268 > > akpm@osdl.org changed: > > What |Removed |Added > ---------------------------------------------------------------------------- > CC| |akpm@osdl.org > > > > ------- Additional Comments From akpm@osdl.org 2005-09-16 11:50 ------- > Perhaps the problem is related to your keyboard driver? > > What happens if you run those commands across an ethernet > session? telnet/ssh? > > Try adding `usb-handoff' to the kernel boot command line. > > ------- You are receiving this mail because: ------- > You reported the bug, or are watching the reporter. <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 TRANSITIONAL//EN"> <HTML> <HEAD> <META HTTP-EQUIV="Content-Type" CONTENT="text/html; CHARSET=UTF-8"> <META NAME="GENERATOR" CONTENT="GtkHTML/3.2.5"> </HEAD> <BODY> I have miss-informed you. When I stated that I issued the amanda command "from the console", my normal mode of console access to this machine is remotely via ssh using an xterm.<BR> So, the command was issued over an ethernet session and not actually directly at the console local to the machine in both instances of this problem.<BR> <BR> I will try the "usb-handoff" kernel boot option.<BR> <BR> >From /usr/src/linux-2.6.13.1/Documentation/kernel-parameter.txt<BR> <BR> usb-handoff [HW] Enable early USB BIOS -> OS handoff<BR> <BR> Out of curiousity as I might learn something, what might this do to prevent a scsi-type error ?<BR> <BR> On Fri, 2005-09-16 at 11:50 -0700, bugme-daemon@kernel-bugs.osdl.org wrote: <BLOCKQUOTE TYPE=CITE> <PRE> <FONT COLOR="#000000"><A HREF="http://bugzilla.kernel.org/show_bug.cgi?id=5268">http://bugzilla.kernel.org/show_bug.cgi?id=5268</A></FONT> <FONT COLOR="#000000"><A HREF="mailto:akpm@osdl.org">akpm@osdl.org</A> changed:</FONT> <FONT COLOR="#000000"> What |Removed |Added</FONT> <FONT COLOR="#000000">----------------------------------------------------------------------------</FONT> <FONT COLOR="#000000"> CC| <A HREF="mailto:|akpm@osdl.org">|akpm@osdl.org</A></FONT> <FONT COLOR="#000000">------- Additional Comments From <A HREF="mailto:akpm@osdl.org">akpm@osdl.org</A> 2005-09-16 11:50 -------</FONT> <FONT COLOR="#000000">Perhaps the problem is related to your keyboard driver?</FONT> <FONT COLOR="#000000">What happens if you run those commands across an ethernet</FONT> <FONT COLOR="#000000">session? telnet/ssh?</FONT> <FONT COLOR="#000000">Try adding `usb-handoff' to the kernel boot command line.</FONT> <FONT COLOR="#000000">------- You are receiving this mail because: -------</FONT> <FONT COLOR="#000000">You reported the bug, or are watching the reporter.</FONT> </PRE> </BLOCKQUOTE> <TABLE CELLSPACING="0" CELLPADDING="0" WIDTH="100%"> <TR> <TD> -- <BR> Freels, James D. <<A HREF="mailto:freelsjd@ornl.gov">freelsjd@ornl.gov</A>><BR> Oak Ridge National Laboratory </TD> </TR> </TABLE> </BODY> </HTML>
Please trim the emails when responding. In that case, usb-handoff probably won't help.
(Add reporter and bugzilla to cc) James Bottomley <James.Bottomley@SteelEye.com> wrote: > > On Fri, 2005-09-16 at 12:03 -0700, Andrew Morton wrote: > > This one's a bit strange. > > > > It's a post-2.6.12 regression. > > Yes, strange to me too. The changes that went in to aic79xx between > 2.6.12 and 2.6.13 were tiny: > > aic79xx_osm.c | 18 +++++++++--------- > aic79xx_osm.h | 17 ----------------- > aic79xx_pci.c | 2 +- > 3 files changed, 10 insertions(+), 27 deletions(-) > > I've also attached them below, but I think they were > > 1) #if -> #ifdef changes > 2) removal of the duplicated ENDIAN macros > 3) ahd_midlayer_entrypoint_lock/unlock -> ahd_lock/unlock > > I can't see how anything in these could produce the shown behaviour. > > James > > diff --git a/drivers/scsi/aic7xxx/aic79xx_osm.c b/drivers/scsi/aic7xxx/aic79xx_osm.c > --- a/drivers/scsi/aic7xxx/aic79xx_osm.c > +++ b/drivers/scsi/aic7xxx/aic79xx_osm.c > @@ -1505,23 +1505,23 @@ ahd_linux_dev_reset(Scsi_Cmnd *cmd) > memset(recovery_cmd, 0, sizeof(struct scsi_cmnd)); > recovery_cmd->device = cmd->device; > recovery_cmd->scsi_done = ahd_linux_dev_reset_complete; > -#if AHD_DEBUG > +#ifdef AHD_DEBUG > if ((ahd_debug & AHD_SHOW_RECOVERY) != 0) > printf("%s:%d:%d:%d: Device reset called for cmd %p\n", > ahd_name(ahd), cmd->device->channel, cmd->device->id, > cmd->device->lun, cmd); > #endif > - ahd_midlayer_entrypoint_lock(ahd, &s); > + ahd_lock(ahd, &s); > > dev = ahd_linux_get_device(ahd, cmd->device->channel, cmd->device->id, > cmd->device->lun, /*alloc*/FALSE); > if (dev == NULL) { > - ahd_midlayer_entrypoint_unlock(ahd, &s); > + ahd_unlock(ahd, &s); > kfree(recovery_cmd); > return (FAILED); > } > if ((scb = ahd_get_scb(ahd, AHD_NEVER_COL_IDX)) == NULL) { > - ahd_midlayer_entrypoint_unlock(ahd, &s); > + ahd_unlock(ahd, &s); > kfree(recovery_cmd); > return (FAILED); > } > @@ -1553,7 +1553,7 @@ ahd_linux_dev_reset(Scsi_Cmnd *cmd) > ahd_queue_scb(ahd, scb); > > scb->platform_data->flags |= AHD_SCB_UP_EH_SEM; > - spin_unlock_irq(&ahd->platform_data->spin_lock); > + ahd_unlock(ahd, &s); > init_timer(&timer); > timer.data = (u_long)scb; > timer.expires = jiffies + (5 * HZ); > @@ -1567,10 +1567,10 @@ ahd_linux_dev_reset(Scsi_Cmnd *cmd) > printf("Timer Expired\n"); > retval = FAILED; > } > - spin_lock_irq(&ahd->platform_data->spin_lock); > + ahd_lock(ahd, &s); > ahd_schedule_runq(ahd); > ahd_linux_run_complete_queue(ahd); > - ahd_midlayer_entrypoint_unlock(ahd, &s); > + ahd_unlock(ahd, &s); > printf("%s: Device reset returning 0x%x\n", ahd_name(ahd), retval); > return (retval); > } > @@ -1591,11 +1591,11 @@ ahd_linux_bus_reset(Scsi_Cmnd *cmd) > printf("%s: Bus reset called for cmd %p\n", > ahd_name(ahd), cmd); > #endif > - ahd_midlayer_entrypoint_lock(ahd, &s); > + ahd_lock(ahd, &s); > found = ahd_reset_channel(ahd, cmd->device->channel + 'A', > /*initiate reset*/TRUE); > ahd_linux_run_complete_queue(ahd); > - ahd_midlayer_entrypoint_unlock(ahd, &s); > + ahd_unlock(ahd, &s); > > if (bootverbose) > printf("%s: SCSI bus reset delivered. " > diff --git a/drivers/scsi/aic7xxx/aic79xx_osm.h b/drivers/scsi/aic7xxx/aic79xx_osm.h > --- a/drivers/scsi/aic7xxx/aic79xx_osm.h > +++ b/drivers/scsi/aic7xxx/aic79xx_osm.h > @@ -112,23 +112,6 @@ typedef Scsi_Cmnd *ahd_io_ctx_t; > #define ahd_le32toh(x) le32_to_cpu(x) > #define ahd_le64toh(x) le64_to_cpu(x) > > -#ifndef LITTLE_ENDIAN > -#define LITTLE_ENDIAN 1234 > -#endif > - > -#ifndef BIG_ENDIAN > -#define BIG_ENDIAN 4321 > -#endif > - > -#ifndef BYTE_ORDER > -#if defined(__BIG_ENDIAN) > -#define BYTE_ORDER BIG_ENDIAN > -#endif > -#if defined(__LITTLE_ENDIAN) > -#define BYTE_ORDER LITTLE_ENDIAN > -#endif > -#endif /* BYTE_ORDER */ > - > /************************* Configuration Data *********************************/ > extern uint32_t aic79xx_allow_memio; > extern int aic79xx_detect_complete; > diff --git a/drivers/scsi/aic7xxx/aic79xx_pci.c b/drivers/scsi/aic7xxx/aic79xx_pci.c > --- a/drivers/scsi/aic7xxx/aic79xx_pci.c > +++ b/drivers/scsi/aic7xxx/aic79xx_pci.c > @@ -582,7 +582,7 @@ ahd_check_extport(struct ahd_softc *ahd) > } > } > > -#if AHD_DEBUG > +#ifdef AHD_DEBUG > if (have_seeprom != 0 > && (ahd_debug & AHD_DUMP_SEEPROM) != 0) { > uint16_t *sc_data;
Created attachment 6480 [details] error messages for aic79xxx scsi error on tape drive for kernel 2.6.14 error messages for aic79xxx scsi error on tape drive for kernel 2.6.14
the update from 2.6.13.x to 2.6.14 caused much improvement in this driver. It no longer hangs the system when doing a tape driver operation. However, now the tape drive just doesn't work at all. It is found by the kernel, but operations give I/O error messages due to parity errors. I have attached the log file output during the time of tape-drive failure. No other errors are persent.
I had a similar bug on another machine. This machine is a dual-processor amd64. I have an Adaptec 29160N scsi card and using the aic7xxx (new one) driver. It will not work at all in 64-bit mode, but in chroot 32-bit mode it works sometimes, but fails about 50% of the time in a similar manner when writing to the tape (using amdump which calls taper). Today, I tried the aic7xxx_old driver, and it corrected the problem. I believe the aic79xx and aic7xxx drivers, which I understand are supported by Adaptec, are having problems with tape drives. See http://www.linuxtapecert.org / for a separate verification of this issue.
I continue to have this bug with the new 2.6.15 kernel. aic79xx driver on 320 MB/s hard drives + slower tape drive.
I retested with the 2.6.15 kernel today, and this bug is very-much still present. I know the tape drive hardware is functional because everything works as it should under the 2.6.12.6 kernel. I snooped around the Adaptec web pages tonight and discovered that the latest driver version released by Adaptec for the 2.6 kernels is 2.0.15 released 9/25/ 05. The version in the 2.6.15 kernel is printed as 1.3.11 released 7/11/03. This version/date can be confirmed in the Adaptec changelog. So, indeed, the source code/driver in the 2.6.15 kernel source tree is nearly 2 years behind the official Adaptec drivers. So, if we have hardware that is younger than this (which I may have in this case), this could explain the problem !! I also took a look at the latest binary drivers available from Adaptec (which of course are available from Red Hat Enterprise) and they are for kernel 2.6.9. So, it is just inconsistent all the way around ! I sure would like to get this working because this is the only machine I have out of 7 machines that cannot run the 2.6.15 kernel !
I performed one more additional test yesterday. This machine actually has two Adaptec adapters in it. The first is built into the mother board and is also a 39320a capable of 320 MB/s. It is on board a Tyan Thunder K8SD Pro MB. This adapter was originally disabled by the vendor who supplied it (Monarch Computers, Atlanta, GA) because they could not get it to test out without error. So, they disabled it and supplied a separate pci add-on Adaptec 39320a card. As supplied, this is what I originally had when this bug showed up. So, I thought, why not configure the system with the tape drive connected to this entirely separate adapter and see if this helps ? So, I did, and it made no difference. The kernel found both adapters and all devices, but this error still shows up. Then after I wrote up this bug report, I thought I would reconfigure everything back to the original configuration and then disable the on-board device as before (thinking, perhaps this really is a bad adapter, and with a new kernel driver, perhaps things will now actually work). So, now all the scsi devices are connected to the pci adapter. Both the on-board and the pci adapter have two separate wide channels capable of 16 devices each (grand total of 64 devices would be possible if all were connected and enabled). On channel 1 of the pci adapter, I have 3 scsi hard drives. On channel 2, I have a single tape drive (that works fine under 2.6.12.6 of the kernel). Typical arguments you hear about scsi device problems are remedied by separating the faster hard drives from the slower devices like tape drives or CD drives. So, this argument cannot be made here since all are separate (and working in 2.6.12.6 ! ) So, with all these attempts are reconfiguration, the bug is still present in 2. 6.15. One improvement is that it does not hang the system. It is just that the tape drive does not work. If there is anything I can do to help debug this problem, please let me know. I have read over on the AMANDA-users mailing list that I am not the only user with problems getting their tape drives to work with the Adaptec drivers. I have an LSI adapter that I may use next. If this does not get fixed soon, I may never use Adaptec again. This makes no sense why they (Adaptec) would have a driver out that fails.
There were a number of changes in this kernel version for the aic79xx driver. So, I thought I would give it a try. Indeed, it did behave differently this time...it hung the machine. I am including here the relevant scsi debug messages printed to /var/log/kern.log at the time of the failure. All I did was write a label to the scsi tape. Again, the hard drive access is not a problem, and I have the scsi tape connected by itself to the second channel of a dual channel Adaptec 39320a scsi card. Jan 17 15:52:17 fea3 kernel: st0: Block limits 1 - 16777215 bytes. Jan 17 15:52:17 fea3 kernel: program stinit is using a deprecated SCSI ioctl, please convert it to SG_IO Jan 17 15:52:17 fea3 kernel: st0: Error with sense data: <6>st: Current: sense key: Aborted Command Jan 17 15:52:17 fea3 kernel: Additional sense: Scsi parity error Jan 17 15:52:19 fea3 kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Jan 17 15:52:20 fea3 last message repeated 17 times Jan 17 15:52:20 fea3 kernel: program smartd is using a deprecated SCSI ioctl, please convert it to SG_IO Jan 17 15:52:21 fea3 last message repeated 44 times Jan 17 15:55:05 fea3 kernel: st0: Error with sense data: <6>st: Current: sense key: Aborted Command Jan 17 15:55:05 fea3 kernel: Additional sense: Scsi parity error Jan 17 15:55:05 fea3 kernel: st0: Can't set default compression. Jan 17 15:55:05 fea3 kernel: st0: Error with sense data: <6>st: Current: sense key: Aborted Command Jan 17 15:55:05 fea3 kernel: Additional sense: Scsi parity error Jan 17 15:55:15 fea3 kernel: st 1:0:5:0: Attempting to queue an ABORT message: CDB: 0x1e 0x0 0x0 0x0 0x0 0x0 Jan 17 15:55:15 fea3 kernel: scsi1: At time of recovery, card was not paused Jan 17 15:55:15 fea3 kernel: >>>>>>>>>>>>>>>>>> Dump Card State Begins <<<<<<<<<<<<<<<<< Jan 17 15:55:15 fea3 kernel: scsi1: Dumping Card State at program address 0x6 Mode 0x33 Jan 17 15:55:15 fea3 kernel: Card was paused Jan 17 15:55:15 fea3 kernel: HS_MAILBOX[0x0] INTCTL[0x80]:(SWTMINTMASK) SEQINTSTAT[0x0] Jan 17 15:55:15 fea3 kernel: SAVED_MODE[0x11] DFFSTAT[0x33]:(CURRFIFO_NONE|FIFO0 FREE|FIFO1FREE) Jan 17 15:55:15 fea3 kernel: SCSISIGI[0x0]:(P_DATAOUT) SCSIPHASE[0x0] SCSIBUS[0x 0] Jan 17 15:55:15 fea3 kernel: LASTPHASE[0x1]:(P_DATAOUT|P_BUSFREE) SCSISEQ0[0x0] Jan 17 15:55:15 fea3 kernel: SCSISEQ1[0x12]:(ENAUTOATNP|ENRSELI) SEQCTL0[0x0] Jan 17 15:55:15 fea3 kernel: SEQINTCTL[0x0] SEQ_FLAGS[0xc0]:(NO_CDB_SENT|NOT_ IDENTIFIED) Jan 17 15:55:15 fea3 kernel: SEQ_FLAGS2[0x4]:(SELECTOUT_QFROZEN) SSTAT0[0x0] Jan 17 15:55:15 fea3 kernel: SSTAT1[0x8]:(BUSFREE) SSTAT2[0x0] SSTAT3[0x0] PERRDIAG[0xc0]:(HIPERR|HIZERO) Jan 17 15:55:15 fea3 kernel: SIMODE1[0xa4]:(ENSCSIPERR|ENSCSIRST|ENSELTIMO) Jan 17 15:55:15 fea3 kernel: LQISTAT0[0x0] LQISTAT1[0x0] LQISTAT2[0x0] LQOSTAT0 [0x0] Jan 17 15:55:15 fea3 kernel: LQOSTAT1[0x0] LQOSTAT2[0x0] Jan 17 15:55:15 fea3 kernel: Jan 17 15:55:15 fea3 kernel: SCB Count = 4 CMDS_PENDING = 1 LASTSCB 0xffff CURRSCB 0x3 NEXTSCB 0x0 Jan 17 15:55:15 fea3 kernel: qinstart = 83 qinfifonext = 83 Jan 17 15:55:15 fea3 kernel: QINFIFO: Jan 17 15:55:15 fea3 kernel: WAITING_TID_QUEUES: Jan 17 15:55:15 fea3 kernel: Pending list: Jan 17 15:55:15 fea3 kernel: 3 FIFO_USE[0x0] SCB_CONTROL[0x44]:(DISCONNECTED| DISCENB) Jan 17 15:55:15 fea3 kernel: SCB_SCSIID[0x57] Jan 17 15:55:15 fea3 kernel: Total 1 Jan 17 15:55:15 fea3 kernel: Kernel Free SCB list: 2 1 0 Jan 17 15:55:15 fea3 kernel: Sequencer Complete DMA-inprog list: Jan 17 15:55:15 fea3 kernel: Sequencer Complete list: Jan 17 15:55:15 fea3 kernel: Sequencer DMA-Up and Complete list: Jan 17 15:55:15 fea3 kernel: Sequencer On QFreeze and Complete list: Jan 17 15:55:15 fea3 kernel: Jan 17 15:55:15 fea3 kernel: scsi1: FIFO0 Free, LONGJMP == 0x80ff, SCB 0x0 Jan 17 15:55:15 fea3 kernel: SEQIMODE[0x3f]:(ENCFG4TCMD|ENCFG4ICMD|ENCFG4TSTAT| ENCFG4ISTAT|ENCFG4DATA|ENSAVEPTRS) Jan 17 15:55:15 fea3 kernel: SEQINTSRC[0x0] DFCNTRL[0x0] DFSTATUS[0x89]:(FIFOEMP |HDONE|PRELOAD_AVAIL) Jan 17 15:55:15 fea3 kernel: SG_CACHE_SHADOW[0x2]:(LAST_SEG) SG_STATE[0x0] DFFSXFRCTL[0x0] Jan 17 15:55:15 fea3 kernel: SOFFCNT[0x0] MDFFSTAT[0x5]:(FIFOFREE|DLZERO) SHADDR = 0x00, SHCNT = 0x0 Jan 17 15:55:15 fea3 kernel: HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x10]:(SG_CACHE_ AVAIL) Jan 17 15:55:15 fea3 kernel: scsi1: FIFO1 Free, LONGJMP == 0x8063, SCB 0x3 Jan 17 15:55:15 fea3 kernel: SEQIMODE[0x3f]:(ENCFG4TCMD|ENCFG4ICMD|ENCFG4TSTAT| ENCFG4ISTAT|ENCFG4DATA|ENSAVEPTRS) Jan 17 15:55:15 fea3 kernel: SEQINTSRC[0x0] DFCNTRL[0x4]:(DIRECTION) DFSTATUS[0x 89]:(FIFOEMP|HDONE|PRELOAD_AVAIL) Jan 17 15:55:15 fea3 kernel: SG_CACHE_SHADOW[0x2]:(LAST_SEG) SG_STATE[0x0] DFFSXFRCTL[0x0] Jan 17 15:55:15 fea3 kernel: SOFFCNT[0x0] MDFFSTAT[0x5]:(FIFOFREE|DLZERO) SHADDR = 0x00, SHCNT = 0x0 Jan 17 15:55:15 fea3 kernel: HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x10]:(SG_CACHE_ AVAIL) Jan 17 15:55:15 fea3 kernel: LQIN: 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0 x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 Jan 17 15:55:15 fea3 kernel: scsi1: LQISTATE = 0x0, LQOSTATE = 0x0, OPTIONMODE = 0x52 Jan 17 15:55:15 fea3 kernel: scsi1: OS_SPACE_CNT = 0x20 MAXCMDCNT = 0x0 Jan 17 15:55:15 fea3 kernel: Jan 17 15:55:15 fea3 kernel: SIMODE0[0xc]:(ENOVERRUN|ENIOERR) Jan 17 15:55:15 fea3 kernel: CCSCBCTL[0x4]:(CCSCBDIR) Jan 17 15:55:15 fea3 kernel: scsi1: REG0 == 0x3, SINDEX = 0x1ba, DINDEX = 0x1bc Jan 17 15:55:15 fea3 kernel: scsi1: SCBPTR == 0x3, SCB_NEXT == 0xffc0, SCB_NEXT2 == 0xff50 Jan 17 15:55:15 fea3 kernel: CDB 1e 0 0 0 0 0 Jan 17 15:55:15 fea3 kernel: STACK: 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 Jan 17 15:55:15 fea3 kernel: <<<<<<<<<<<<<<<<< Dump Card State Ends >>>>>>>>>>>>>>>>>> Jan 17 15:55:15 fea3 kernel: st 1:0:5:0: BDR message in message buffer Jan 17 15:55:15 fea3 kernel: scsi1: Recovery code sleeping Jan 17 15:55:20 fea3 kernel: scsi1: Recovery code awake Jan 17 15:55:20 fea3 kernel: scsi1: Timer Expired (active 1) Jan 17 15:55:20 fea3 kernel: aic79xx_abort returns 0x2003 Jan 17 15:55:20 fea3 kernel: st 1:0:5:0: Attempting to queue a TARGET RESET message:CDB: 0x1e 0x0 0x0 0x0 0x0 0x0 Jan 17 15:55:20 fea3 kernel: aic79xx_dev_reset returns 0x2003 Jan 17 15:55:20 fea3 kernel: Recovery SCB completes
I've posted some additional fixes for aic79xx on linux-scsi. Especially the last one (marked '[PATCH] aic79xx: Fix timer handling') should help here. Could you try them out?
I checked for this bug again with the release of 2.6.16 and it continues to be present. I simply tried to write to the tape and it hung the system solid. It required a complete reset in order to boot. As long as I do not use the tape drive on this system, I can use the newer kernels. So, until this bug is fixed, or I replace the scsi card with one of a different type, I am no longer using the tape drive on this system.
Looks like the tape drive is not getting configured correctly via DV. I've made some patches for aic79xx (which went in for 2.6.17) which allowed the DV to be overridden by the BIOS settings. So please test with a recent kernel and set the tape speed to something lower; IIRC 10MB/s or 20MB/s should be okay.
At present, I am out of town on travel. When I return, I will try this. I have not been using this tape drive all this time due to the bug. I am currently running 2.6.18.1 on that machine, so it will not take long to test. let you know... On Mon, 2006-10-23 at 23:59 -0700, bugme-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=5268 > > hare@suse.de changed: > > What |Removed |Added > ---------------------------------------------------------------------------- > CC| |hare@suse.de > Owner|andmike@us.ibm.com |hare@suse.de > Status|NEW |ASSIGNED > > > > ------- Additional Comments From hare@suse.de 2006-10-23 23:46 ------- > Looks like the tape drive is not getting configured correctly via DV. I've made > some patches for aic79xx (which went in for 2.6.17) which allowed the DV to be > overridden by the BIOS settings. > So please test with a recent kernel and set the tape speed to something lower; > IIRC 10MB/s or 20MB/s should be okay. > > ------- You are receiving this mail because: ------- > You reported the bug, or are watching the reporter.
I have repeated the test, and continue to get the error. The machine does not hang, but the tape drive shows errors. Here is the output to the console: fea3::/home/amanda/: amlabel -f fea fea17 rewinding, reading label, reading label: Input/output error rewinding, writing label fea17, checking label amlabel: reading label: Input/output error I can I enforce a 10 MB/s or 20 MB/s limit on the tape drive over what the driver allows for ? Here is the output from cat /proc/scsi/scsi for the tape drive separate channel scsi1 Host: scsi1 Channel: 00 Id: 05 Lun: 00 Vendor: SEAGATE Model: DAT 9SP40-000 Rev: 9100 Type: Sequential-Access ANSI SCSI revision: 03 It looks like it is trying for 80 MB/s on this card: (output from cat /proc/scsi/aic79xx/1/ ): Adaptec AIC79xx driver version: 3.0 Adaptec 39320A Ultra320 SCSI adapter aic7902: Ultra320 Wide Channel B, SCSI Id=7, PCI-X 101-133Mhz, 512 SCBs Allocated SCBs: 4, SG List Length: 128 Serial EEPROM: 0x17c8 0x17c8 0x17c8 0x17c8 0x17c8 0x17c8 0x17c8 0x17c8 0x17c8 0x17c8 0x17c8 0x17c8 0x17c8 0x17c8 0x17c8 0x17c8 0x09f4 0x0142 0x2807 0x0010 0xffff 0xffff 0xffff 0xffff 0xffff 0xffff 0xffff 0xffff 0xffff 0xffff 0x0430 0xb3f3 Target 0 Negotiation Settings User: 320.000MB/s transfers (160.000MHz RDSTRM|DT|IU|RTI|QAS, 16bit) Target 1 Negotiation Settings User: 320.000MB/s transfers (160.000MHz RDSTRM|DT|IU|RTI|QAS, 16bit) Target 2 Negotiation Settings User: 320.000MB/s transfers (160.000MHz RDSTRM|DT|IU|RTI|QAS, 16bit) Target 3 Negotiation Settings User: 320.000MB/s transfers (160.000MHz RDSTRM|DT|IU|RTI|QAS, 16bit) Target 4 Negotiation Settings User: 320.000MB/s transfers (160.000MHz RDSTRM|DT|IU|RTI|QAS, 16bit) Target 5 Negotiation Settings User: 320.000MB/s transfers (160.000MHz RDSTRM|DT|IU|RTI|QAS, 16bit) Goal: 80.000MB/s transfers (40.000MHz, 16bit) Curr: 80.000MB/s transfers (40.000MHz, 16bit) Channel A Target 5 Lun 0 Settings Commands Queued 201 Commands Active 0 Command Openings 1 Max Tagged Openings 0 Device Queue Frozen Count 0 Target 6 Negotiation Settings User: 320.000MB/s transfers (160.000MHz RDSTRM|DT|IU|RTI|QAS, 16bit) Target 7 Negotiation Settings User: 320.000MB/s transfers (160.000MHz RDSTRM|DT|IU|RTI|QAS, 16bit) Target 8 Negotiation Settings User: 320.000MB/s transfers (160.000MHz RDSTRM|DT|IU|RTI|QAS, 16bit) Target 9 Negotiation Settings User: 320.000MB/s transfers (160.000MHz RDSTRM|DT|IU|RTI|QAS, 16bit) Target 10 Negotiation Settings User: 320.000MB/s transfers (160.000MHz RDSTRM|DT|IU|RTI|QAS, 16bit) Target 11 Negotiation Settings User: 320.000MB/s transfers (160.000MHz RDSTRM|DT|IU|RTI|QAS, 16bit) Target 12 Negotiation Settings User: 320.000MB/s transfers (160.000MHz RDSTRM|DT|IU|RTI|QAS, 16bit) Target 13 Negotiation Settings User: 320.000MB/s transfers (160.000MHz RDSTRM|DT|IU|RTI|QAS, 16bit) Target 14 Negotiation Settings User: 320.000MB/s transfers (160.000MHz RDSTRM|DT|IU|RTI|QAS, 16bit) Target 15 Negotiation Settings User: 320.000MB/s transfers (160.000MHz RDSTRM|DT|IU|RTI|QAS, 16bit) On Tue, 2006-10-24 at 07:27 -0400, Freels, James D. wrote: > At present, I am out of town on travel. When I return, I will try this. > I have not been using this tape drive all this time due to the bug. I > am currently running 2.6.18.1 on that machine, so it will not take long > to test. > > let you know... > > On Mon, 2006-10-23 at 23:59 -0700, bugme-daemon@bugzilla.kernel.org > wrote: > > http://bugzilla.kernel.org/show_bug.cgi?id=5268 > > > > hare@suse.de changed: > > > > What |Removed |Added > > ---------------------------------------------------------------------------- > > CC| |hare@suse.de > > Owner|andmike@us.ibm.com |hare@suse.de > > Status|NEW |ASSIGNED > > > > > > > > ------- Additional Comments From hare@suse.de 2006-10-23 23:46 ------- > > Looks like the tape drive is not getting configured correctly via DV. I've made > > some patches for aic79xx (which went in for 2.6.17) which allowed the DV to be > > overridden by the BIOS settings. > > So please test with a recent kernel and set the tape speed to something lower; > > IIRC 10MB/s or 20MB/s should be okay. > > > > ------- You are receiving this mail because: ------- > > You reported the bug, or are watching the reporter.
this bug is still present in the exact same way on both the 2.6.18.4 and 2.6.19 kernels
I have a similer problem, but not identical. I will file a bug when I've done enough research. In the mean time, my notes might help if they are related: http://www.seattlecentral.edu/cgi-bin/cgiwrap/dmartin/moin.cgi/Kernel
This bug continues with the 2.6.22.1 kernel
the patch for bug #8366 also fixed this bug