Bug 5003 - Problem with symbios driver on recent -mm trees
Summary: Problem with symbios driver on recent -mm trees
Status: RESOLVED CODE_FIX
Alias: None
Product: IO/Storage
Classification: Unclassified
Component: SCSI (show other bugs)
Hardware: i386 Linux
: P2 normal
Assignee: Mike Anderson
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-08-05 07:30 UTC by Martin J. Bligh
Modified: 2005-08-09 17:07 UTC (History)
1 user (show)

See Also:
Kernel Version: 2.6.13-rc4-mm1
Subsystem:
Regression: ---
Bisected commit-id:


Attachments

Description Martin J. Bligh 2005-08-05 07:30:07 UTC
Most recent kernel where this bug did not occur: 2.6.13-rc5 (no mainline kernels
affected yet, just -mm)

If you look on http://test.kernel.org/, you'll see in the rightmost
column there's a yellow box under elm3b70 for 2.6.13-rc4-mm1, but
current mainline kernels are all green (ie no problems). That means
one test failed, in this case making an fs on the spare partition. 
Odd. I went digging ...

Looks like this got introduced between 2.6.12-mm1 and 2.6.12-mm2
Sorry, should've caught it earlier. I'll blame OLS or something.
This is an 8x power4 box running "bare metal" (ie not on top of
the hypervisor).

seems /dev/sdc1 doesn't exist.
07/31/05-02:44:32 processing command: (5) 'fs --partition=1 --mkext2fs --mount
-l /mnt/tmp'
n format
PARTITION='/dev/sdc1'
mke2fs 1.35 (28-Feb-2004)
mkfs.ext2: No such device or address while trying to determine filesystem size
07/31/05-02:44:32 fs: creating filesystem ext2 Failed rc = 1

Looking back at the bootlog (http://test.kernel.org/9609/debug/console.log),
I see it really not looking very happy (snapshot below).

Good bootlog is here for comparsion: 
(http://test.kernel.org/9445/debug/console.log)

sym0: <1010-66> rev 0x1 at pci 0001:01:01.0 irq 115
sym0: No NVRAM, ID 7, Fast-80, LVD, parity checking
sym0: SCSI BUS has been reset.
scsi0 : sym-2.2.1
 target0:0:8: FAST-40 WIDE SCSI 80.0 MB/s ST (25 ns, offset 31)
  Vendor: IBM       Model: IC35L036UCDY10-0  Rev: S25M
  Type:   Direct-Access                      ANSI SCSI revision: 03
 target0:0:8: tagged command queuing enabled, command queue depth 16.
 target0:0:8: Beginning Domain Validation
 target0:0:8: asynchronous.
 target0:0:8: wide asynchronous.
 target0:0:8: FAST-80 WIDE SCSI 160.0 MB/s DT IU QAS (12.5 ns, offset 31)
sym0: unexpected disconnect
 target0:0:8: Write Buffer failure 700ff
 target0:0:8: Domain Validation Disabing Information Units
 target0:0:8: FAST-80 WIDE SCSI 160.0 MB/s DT (12.5 ns, offset 31)
sym0: unexpected disconnect
 target0:0:8: Write Buffer failure 700ff
 target0:0:8: Domain Validation detected failure, dropping back
 target0:0:8: FAST-40 WIDE SCSI 80.0 MB/s DT (25 ns, offset 31)
 target0:0:8: Ending Domain Validation
 target0:0:9: FAST-40 WIDE SCSI 80.0 MB/s ST (25 ns, offset 31)
  Vendor: IBM       Model: IC35L036UCDY10-0  Rev: S25M
  Type:   Direct-Access                      ANSI SCSI revision: 03
 target0:0:9: tagged command queuing enabled, command queue depth 16.
 target0:0:9: Beginning Domain Validation
 target0:0:9: asynchronous.
 target0:0:9: wide asynchronous.
 target0:0:9: FAST-80 WIDE SCSI 160.0 MB/s DT IU QAS (12.5 ns, offset 31)
sym0: unexpected disconnect
 target0:0:9: Write Buffer failure 700ff
 target0:0:9: Domain Validation Disabing Information Units
 target0:0:9: FAST-80 WIDE SCSI 160.0 MB/s DT (12.5 ns, offset 31)
sym0: unexpected disconnect
 target0:0:9: Write Buffer failure 700ff
 target0:0:9: Domain Validation detected failure, dropping back
 target0:0:9: FAST-40 WIDE SCSI 80.0 MB/s DT (25 ns, offset 31)
 target0:0:9: Ending Domain Validation
 target0:0:10: FAST-40 WIDE SCSI 80.0 MB/s ST (25 ns, offset 31)
  Vendor: IBM       Model: IC35L036UCDY10-0  Rev: S25M
  Type:   Direct-Access                      ANSI SCSI revision: 03
 target0:0:10: tagged command queuing enabled, command queue depth 16.
 target0:0:10: Beginning Domain Validation
 target0:0:10: asynchronous.
 target0:0:10: wide asynchronous.
 target0:0:10: Domain Validation skipping write tests
 target0:0:10: FAST-80 WIDE SCSI 160.0 MB/s DT IU QAS (12.5 ns, offset 31)
sym0: unexpected disconnect
 target0:0:10: Domain Validation Disabing Information Units
 target0:0:10: FAST-80 WIDE SCSI 160.0 MB/s DT (12.5 ns, offset 31)
sym0: unexpected disconnect
 target0:0:10: Domain Validation detected failure, dropping back
 target0:0:10: FAST-40 WIDE SCSI 80.0 MB/s DT (25 ns, offset 31)
 target0:0:10: Ending Domain Validation
  Vendor: IBM       Model: HSBPM2   PU2SCSI  Rev: 0016
  Type:   Enclosure                          ANSI SCSI revision: 02
 target0:0:14: Beginning Domain Validation
 0:0:14:0: phase change 6-7 9@100503a8 resid=7.
 0:0:14:0: phase change 6-7 9@100503a8 resid=7.
 0:0:14:0: phase change 6-7 9@100503a8 resid=7.
 0:0:14:0: phase change 6-7 9@100503a8 resid=7.
 target0:0:14: Ending Domain Validation
  Vendor: IBM       Model: HSBPD4M  PU3SCSI  Rev: 0016
  Type:   Enclosure                          ANSI SCSI revision: 02
 target0:0:15: Beginning Domain Validation
 0:0:15:0: phase change 6-7 9@100503a8 resid=7.
 0:0:15:0: phase change 6-7 9@100503a8 resid=7.
 0:0:15:0: phase change 6-7 9@100503a8 resid=7.
 0:0:15:0: phase change 6-7 9@100503a8 resid=7.
 target0:0:15: Ending Domain Validation
sym1: <1010-66> rev 0x1 at pci 0001:01:01.1 irq 116
sym1: No NVRAM, ID 7, Fast-80, LVD, parity checking
sym1: SCSI BUS has been reset.
scsi1 : sym-2.2.1
sym2: <1010-66> rev 0x1 at pci 0001:41:01.0 irq 119
sym2: No NVRAM, ID 7, Fast-80, LVD, parity checking
sym2: SCSI BUS has been reset.
scsi2 : sym-2.2.1
sym3: <1010-66> rev 0x1 at pci 0001:41:01.1 irq 120
sym3: No NVRAM, ID 7, Fast-80, LVD, parity checking
sym3: SCSI BUS has been reset.
scsi3 : sym-2.2.1
st: Version 20050501, fixed bufsize 32768, s/g segs 256
SCSI device sda: 71096640 512-byte hdwr sectors (36401 MB)
SCSI device sda: drive cache: write through
SCSI device sda: 71096640 512-byte hdwr sectors (36401 MB)
SCSI device sda: drive cache: write through
 sda: sda1 sda2 sda3 sda4 < sda5 sda6 >
Attached scsi disk sda at scsi0, channel 0, id 8, lun 0
SCSI device sdb: 71096640 512-byte hdwr sectors (36401 MB)
SCSI device sdb: drive cache: write through
SCSI device sdb: 71096640 512-byte hdwr sectors (36401 MB)
SCSI device sdb: drive cache: write through
 sdb: sdb1 sdb2
Attached scsi disk sdb at scsi0, channel 0, id 9, lun 0
 target0:0:10: FAST-40 WIDE SCSI 80.0 MB/s DT (25 ns, offset 31)
 target0:0:10: FAST-40 WIDE SCSI 80.0 MB/s DT (25 ns, offset 31)
 target0:0:10: FAST-40 WIDE SCSI 80.0 MB/s DT (25 ns, offset 31)
sdc: Unit Not Ready, sense:
: Current: sense key=0x0
    ASC=0x0 ASCQ=0x0
 target0:0:10: FAST-40 WIDE SCSI 80.0 MB/s DT (25 ns, offset 31)
Device  not ready.
 target0:0:10: FAST-40 WIDE SCSI 80.0 MB/s DT (25 ns, offset 31)
Device  not ready.
 target0:0:10: FAST-40 WIDE SCSI 80.0 MB/s DT (25 ns, offset 31)
Device  not ready.
sdc : READ CAPACITY failed.
sdc : status=1, message=00, host=0, driver=08 
sd: Current: sense key=0x0
    ASC=0x0 ASCQ=0x0
 target0:0:10: FAST-40 WIDE SCSI 80.0 MB/s DT (25 ns, offset 31)
Device  not ready.
sdc: asking for cache data failed
sdc: assuming drive cache: write through
 target0:0:10: FAST-40 WIDE SCSI 80.0 MB/s DT (25 ns, offset 31)
 target0:0:10: FAST-40 WIDE SCSI 80.0 MB/s DT (25 ns, offset 31)
 target0:0:10: FAST-40 WIDE SCSI 80.0 MB/s DT (25 ns, offset 31)
sdc: Unit Not Ready, sense:
: Current: sense key=0x0
    ASC=0x0 ASCQ=0x0
 target0:0:10: FAST-40 WIDE SCSI 80.0 MB/s DT (25 ns, offset 31)
Device  not ready.
 target0:0:10: FAST-40 WIDE SCSI 80.0 MB/s DT (25 ns, offset 31)
Device  not ready.
 target0:0:10: FAST-40 WIDE SCSI 80.0 MB/s DT (25 ns, offset 31)
Device  not ready.
sdc : READ CAPACITY failed.
sdc : status=1, message=00, host=0, driver=08 
sd: Current: sense key=0x0
    ASC=0x0 ASCQ=0x0
 target0:0:10: FAST-40 WIDE SCSI 80.0 MB/s DT (25 ns, offset 31)
Device  not ready.
sdc: asking for cache data failed
sdc: assuming drive cache: write through
 sdc:<6> target0:0:10: FAST-40 WIDE SCSI 80.0 MB/s DT (25 ns, offset 31)
Device sdc not ready.
end_request: I/O error, dev sdc, sector 0
Buffer I/O error on device sdc, logical block 0
 target0:0:10: FAST-40 WIDE SCSI 80.0 MB/s DT (25 ns, offset 31)
Device sdc not ready.
end_request: I/O error, dev sdc, sector 0
Buffer I/O error on device sdc, logical block 0
 unable to read partition table
Comment 1 Martin J. Bligh 2005-08-05 07:33:33 UTC
Comment from Andrew:

------------------------------

sym2 works OK on my little test box with latest -mm lineup, and I have zero
patches which touch that driver.

James, could some of the scsi core rework have caused this?

----------------------------

I shall rudely cc jejb on this bug without asking him, in the light of the above
- sorry ;-)
Comment 2 Martin J. Bligh 2005-08-05 07:36:11 UTC

--James Bottomley <James.Bottomley@SteelEye.com> wrote (on Friday, August 05, 2005 09:24:52 -0500):

> On Thu, 2005-08-04 at 23:39 -0700, Andrew Morton wrote:
>> James, could some of the scsi core rework have caused this?
> 
> Well, I don't think so.  The error below:
> 
>> > sdc: Unit Not Ready, sense:
>> > : Current: sense key=0x0
>> >     ASC=0x0 ASCQ=0x0
>> >  target0:0:10: FAST-40 WIDE SCSI 80.0 MB/s DT (25 ns, offset 31)
>> > Device  not ready.
>> >  target0:0:10: FAST-40 WIDE SCSI 80.0 MB/s DT (25 ns, offset 31)
>> > Device  not ready.
>> >  target0:0:10: FAST-40 WIDE SCSI 80.0 MB/s DT (25 ns, offset 31)
>> > Device  not ready.
>> > sdc : READ CAPACITY failed.
>> > sdc : status=1, message=00, host=0, driver=08 
>> > sd: Current: sense key=0x0
>> >     ASC=0x0 ASCQ=0x0
> 
> Is coming from the disk not the symbios driver ... I think you have a
> disk failure.

Howcome it works on all mainline kernels, and not -mm then? ;-)
Did we fix an error path to detect failures, maybe?

M.

Comment 3 Anonymous Emailer 2005-08-05 08:01:47 UTC
Reply-To: James.Bottomley@SteelEye.com

On Fri, 2005-08-05 at 07:36 -0700, Martin J. Bligh wrote:
> Howcome it works on all mainline kernels, and not -mm then? ;-)
> Did we fix an error path to detect failures, maybe?

Well, OK, it might be something to do with your drives trying to
negotiate IU and QAS.  Support for this was added to the sym2 driver but
never verified (because no-one seemed to have drives that could do it).

The attached should stop the driver from negotiating these two
parameters, if you could try it (it will produce complaints about static
functions defined but not used, but you can ignore them).

James

diff --git a/drivers/scsi/sym53c8xx_2/sym_glue.c b/drivers/scsi/sym53c8xx_2/sym_glue.c
--- a/drivers/scsi/sym53c8xx_2/sym_glue.c
+++ b/drivers/scsi/sym53c8xx_2/sym_glue.c
@@ -2122,10 +2122,12 @@ static struct spi_function_template sym2
 	.show_width	= 1,
 	.set_dt		= sym2_set_dt,
 	.show_dt	= 1,
+#if 0
 	.set_iu		= sym2_set_iu,
 	.show_iu	= 1,
 	.set_qas	= sym2_set_qas,
 	.show_qas	= 1,
+#endif
 	.get_signalling	= sym2_get_signalling,
 };
 


Comment 4 Martin J. Bligh 2005-08-08 21:41:14 UTC
> On Fri, 2005-08-05 at 07:36 -0700, Martin J. Bligh wrote:
>> Howcome it works on all mainline kernels, and not -mm then? ;-)
>> Did we fix an error path to detect failures, maybe?
> 
> Well, OK, it might be something to do with your drives trying to
> negotiate IU and QAS.  Support for this was added to the sym2 driver but
> never verified (because no-one seemed to have drives that could do it).
> 
> The attached should stop the driver from negotiating these two
> parameters, if you could try it (it will produce complaints about static
> functions defined but not used, but you can ignore them).

Nope, is the same as before with this patch ....

M.
 
> James
> 
> diff --git a/drivers/scsi/sym53c8xx_2/sym_glue.c b/drivers/scsi/sym53c8xx_2/sym_glue.c
> --- a/drivers/scsi/sym53c8xx_2/sym_glue.c
> +++ b/drivers/scsi/sym53c8xx_2/sym_glue.c
> @@ -2122,10 +2122,12 @@ static struct spi_function_template sym2
>  	.show_width	= 1,
>  	.set_dt		= sym2_set_dt,
>  	.show_dt	= 1,
> +#if 0
>  	.set_iu		= sym2_set_iu,
>  	.show_iu	= 1,
>  	.set_qas	= sym2_set_qas,
>  	.show_qas	= 1,
> +#endif
>  	.get_signalling	= sym2_get_signalling,
>  };
>  
> 
> 
> 
> 


Comment 5 Anonymous Emailer 2005-08-09 07:27:02 UTC
Reply-To: James.Bottomley@SteelEye.com

On Mon, 2005-08-08 at 21:41 -0700, Martin J. Bligh wrote:
> Nope, is the same as before with this patch ....

Dear novice bug reporter,

Thank you for taking the trouble to test this.  Unfortunately, without
any dmesg output, it's rather hard to tell what's going on here.  Would
you be so kind as to send this information so we can try to diagnose
what's going on.

Thanks,

James


Comment 6 Martin J. Bligh 2005-08-09 07:59:43 UTC

--James Bottomley <James.Bottomley@SteelEye.com> wrote (on Tuesday, August 09, 2005 09:26:44 -0500):

> On Mon, 2005-08-08 at 21:41 -0700, Martin J. Bligh wrote:
>> Nope, is the same as before with this patch ....
> 
> Dear novice bug reporter,
> 
> Thank you for taking the trouble to test this.  Unfortunately, without
> any dmesg output, it's rather hard to tell what's going on here.  Would
> you be so kind as to send this information so we can try to diagnose
> what's going on.

Dear novice test examiner,

It's in http://test.kernel.org with everything else ;-)
2.6.13-rc4-mm1+jejb_fix ... drills down to:

http://test.kernel.org/10080/debug/console.log

M.

Comment 7 Anonymous Emailer 2005-08-09 09:55:47 UTC
Reply-To: James.Bottomley@SteelEye.com

On Tue, 2005-08-09 at 07:59 -0700, Martin J. Bligh wrote:
> Dear novice test examiner,
> 
> It's in http://test.kernel.org with everything else ;-)
> 2.6.13-rc4-mm1+jejb_fix ... drills down to:
> 
> http://test.kernel.org/10080/debug/console.log

Well, OK, apparently some novice coder made an error converting from a
stack allocated buffer to a kmalloc'd one in the sense handling
routines.

I think this patch should fix it (or at least restore it to the level of
bugginess it had before).

James

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -342,12 +342,12 @@ int scsi_execute_req(struct scsi_device 
 		sense = kmalloc(SCSI_SENSE_BUFFERSIZE, GFP_KERNEL);
 		if (!sense)
 			return DRIVER_ERROR << 24;
-		memset(sense, 0, sizeof(*sense));
+		memset(sense, 0, SCSI_SENSE_BUFFERSIZE);
 	}
 	result = scsi_execute(sdev, cmd, data_direction, buffer, bufflen,
 				  sense, timeout, retries, 0);
 	if (sshdr)
-		scsi_normalize_sense(sense, sizeof(*sense), sshdr);
+		scsi_normalize_sense(sense, SCSI_SENSE_BUFFERSIZE, sshdr);
 
 	kfree(sense);
 	return result;


Comment 8 Martin J. Bligh 2005-08-09 16:23:37 UTC
--On Tuesday, August 09, 2005 11:55:36 -0500 James Bottomley <James.Bottomley@SteelEye.com> wrote:

> On Tue, 2005-08-09 at 07:59 -0700, Martin J. Bligh wrote:
>> Dear novice test examiner,
>> 
>> It's in http://test.kernel.org with everything else ;-)
>> 2.6.13-rc4-mm1+jejb_fix ... drills down to:
>> 
>> http://test.kernel.org/10080/debug/console.log
> 
> Well, OK, apparently some novice coder made an error converting from a
> stack allocated buffer to a kmalloc'd one in the sense handling
> routines.
> 
> I think this patch should fix it (or at least restore it to the level of
> bugginess it had before).


Wheeeeeee! that fixed it. Thanks very much. Log is here if you want to
peek at it:


http://test.kernel.org/10431/debug/console.log

Triples all round!

M.
 
> James
> 
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -342,12 +342,12 @@ int scsi_execute_req(struct scsi_device 
>               sense = kmalloc(SCSI_SENSE_BUFFERSIZE, GFP_KERNEL);
>               if (!sense)
>                       return DRIVER_ERROR << 24;
> -             memset(sense, 0, sizeof(*sense));
> +             memset(sense, 0, SCSI_SENSE_BUFFERSIZE);
>       }
>       result = scsi_execute(sdev, cmd, data_direction, buffer, bufflen,
>                                 sense, timeout, retries, 0);
>       if (sshdr)
> -             scsi_normalize_sense(sense, sizeof(*sense), sshdr);
> +             scsi_normalize_sense(sense, SCSI_SENSE_BUFFERSIZE, sshdr);
>  
>       kfree(sense);
>       return result;
> 
> 
> 
> 
Comment 9 Martin J. Bligh 2005-08-09 17:07:52 UTC
Fixed! I owe James a triple.

Note You need to log in before you can comment on or make changes to this bug.