Most recent kernel where this bug did not occur: 2.6.13-rc5 (no mainline kernels affected yet, just -mm) If you look on http://test.kernel.org/, you'll see in the rightmost column there's a yellow box under elm3b70 for 2.6.13-rc4-mm1, but current mainline kernels are all green (ie no problems). That means one test failed, in this case making an fs on the spare partition. Odd. I went digging ... Looks like this got introduced between 2.6.12-mm1 and 2.6.12-mm2 Sorry, should've caught it earlier. I'll blame OLS or something. This is an 8x power4 box running "bare metal" (ie not on top of the hypervisor). seems /dev/sdc1 doesn't exist. 07/31/05-02:44:32 processing command: (5) 'fs --partition=1 --mkext2fs --mount -l /mnt/tmp' n format PARTITION='/dev/sdc1' mke2fs 1.35 (28-Feb-2004) mkfs.ext2: No such device or address while trying to determine filesystem size 07/31/05-02:44:32 fs: creating filesystem ext2 Failed rc = 1 Looking back at the bootlog (http://test.kernel.org/9609/debug/console.log), I see it really not looking very happy (snapshot below). Good bootlog is here for comparsion: (http://test.kernel.org/9445/debug/console.log) sym0: <1010-66> rev 0x1 at pci 0001:01:01.0 irq 115 sym0: No NVRAM, ID 7, Fast-80, LVD, parity checking sym0: SCSI BUS has been reset. scsi0 : sym-2.2.1 target0:0:8: FAST-40 WIDE SCSI 80.0 MB/s ST (25 ns, offset 31) Vendor: IBM Model: IC35L036UCDY10-0 Rev: S25M Type: Direct-Access ANSI SCSI revision: 03 target0:0:8: tagged command queuing enabled, command queue depth 16. target0:0:8: Beginning Domain Validation target0:0:8: asynchronous. target0:0:8: wide asynchronous. target0:0:8: FAST-80 WIDE SCSI 160.0 MB/s DT IU QAS (12.5 ns, offset 31) sym0: unexpected disconnect target0:0:8: Write Buffer failure 700ff target0:0:8: Domain Validation Disabing Information Units target0:0:8: FAST-80 WIDE SCSI 160.0 MB/s DT (12.5 ns, offset 31) sym0: unexpected disconnect target0:0:8: Write Buffer failure 700ff target0:0:8: Domain Validation detected failure, dropping back target0:0:8: FAST-40 WIDE SCSI 80.0 MB/s DT (25 ns, offset 31) target0:0:8: Ending Domain Validation target0:0:9: FAST-40 WIDE SCSI 80.0 MB/s ST (25 ns, offset 31) Vendor: IBM Model: IC35L036UCDY10-0 Rev: S25M Type: Direct-Access ANSI SCSI revision: 03 target0:0:9: tagged command queuing enabled, command queue depth 16. target0:0:9: Beginning Domain Validation target0:0:9: asynchronous. target0:0:9: wide asynchronous. target0:0:9: FAST-80 WIDE SCSI 160.0 MB/s DT IU QAS (12.5 ns, offset 31) sym0: unexpected disconnect target0:0:9: Write Buffer failure 700ff target0:0:9: Domain Validation Disabing Information Units target0:0:9: FAST-80 WIDE SCSI 160.0 MB/s DT (12.5 ns, offset 31) sym0: unexpected disconnect target0:0:9: Write Buffer failure 700ff target0:0:9: Domain Validation detected failure, dropping back target0:0:9: FAST-40 WIDE SCSI 80.0 MB/s DT (25 ns, offset 31) target0:0:9: Ending Domain Validation target0:0:10: FAST-40 WIDE SCSI 80.0 MB/s ST (25 ns, offset 31) Vendor: IBM Model: IC35L036UCDY10-0 Rev: S25M Type: Direct-Access ANSI SCSI revision: 03 target0:0:10: tagged command queuing enabled, command queue depth 16. target0:0:10: Beginning Domain Validation target0:0:10: asynchronous. target0:0:10: wide asynchronous. target0:0:10: Domain Validation skipping write tests target0:0:10: FAST-80 WIDE SCSI 160.0 MB/s DT IU QAS (12.5 ns, offset 31) sym0: unexpected disconnect target0:0:10: Domain Validation Disabing Information Units target0:0:10: FAST-80 WIDE SCSI 160.0 MB/s DT (12.5 ns, offset 31) sym0: unexpected disconnect target0:0:10: Domain Validation detected failure, dropping back target0:0:10: FAST-40 WIDE SCSI 80.0 MB/s DT (25 ns, offset 31) target0:0:10: Ending Domain Validation Vendor: IBM Model: HSBPM2 PU2SCSI Rev: 0016 Type: Enclosure ANSI SCSI revision: 02 target0:0:14: Beginning Domain Validation 0:0:14:0: phase change 6-7 9@100503a8 resid=7. 0:0:14:0: phase change 6-7 9@100503a8 resid=7. 0:0:14:0: phase change 6-7 9@100503a8 resid=7. 0:0:14:0: phase change 6-7 9@100503a8 resid=7. target0:0:14: Ending Domain Validation Vendor: IBM Model: HSBPD4M PU3SCSI Rev: 0016 Type: Enclosure ANSI SCSI revision: 02 target0:0:15: Beginning Domain Validation 0:0:15:0: phase change 6-7 9@100503a8 resid=7. 0:0:15:0: phase change 6-7 9@100503a8 resid=7. 0:0:15:0: phase change 6-7 9@100503a8 resid=7. 0:0:15:0: phase change 6-7 9@100503a8 resid=7. target0:0:15: Ending Domain Validation sym1: <1010-66> rev 0x1 at pci 0001:01:01.1 irq 116 sym1: No NVRAM, ID 7, Fast-80, LVD, parity checking sym1: SCSI BUS has been reset. scsi1 : sym-2.2.1 sym2: <1010-66> rev 0x1 at pci 0001:41:01.0 irq 119 sym2: No NVRAM, ID 7, Fast-80, LVD, parity checking sym2: SCSI BUS has been reset. scsi2 : sym-2.2.1 sym3: <1010-66> rev 0x1 at pci 0001:41:01.1 irq 120 sym3: No NVRAM, ID 7, Fast-80, LVD, parity checking sym3: SCSI BUS has been reset. scsi3 : sym-2.2.1 st: Version 20050501, fixed bufsize 32768, s/g segs 256 SCSI device sda: 71096640 512-byte hdwr sectors (36401 MB) SCSI device sda: drive cache: write through SCSI device sda: 71096640 512-byte hdwr sectors (36401 MB) SCSI device sda: drive cache: write through sda: sda1 sda2 sda3 sda4 < sda5 sda6 > Attached scsi disk sda at scsi0, channel 0, id 8, lun 0 SCSI device sdb: 71096640 512-byte hdwr sectors (36401 MB) SCSI device sdb: drive cache: write through SCSI device sdb: 71096640 512-byte hdwr sectors (36401 MB) SCSI device sdb: drive cache: write through sdb: sdb1 sdb2 Attached scsi disk sdb at scsi0, channel 0, id 9, lun 0 target0:0:10: FAST-40 WIDE SCSI 80.0 MB/s DT (25 ns, offset 31) target0:0:10: FAST-40 WIDE SCSI 80.0 MB/s DT (25 ns, offset 31) target0:0:10: FAST-40 WIDE SCSI 80.0 MB/s DT (25 ns, offset 31) sdc: Unit Not Ready, sense: : Current: sense key=0x0 ASC=0x0 ASCQ=0x0 target0:0:10: FAST-40 WIDE SCSI 80.0 MB/s DT (25 ns, offset 31) Device not ready. target0:0:10: FAST-40 WIDE SCSI 80.0 MB/s DT (25 ns, offset 31) Device not ready. target0:0:10: FAST-40 WIDE SCSI 80.0 MB/s DT (25 ns, offset 31) Device not ready. sdc : READ CAPACITY failed. sdc : status=1, message=00, host=0, driver=08 sd: Current: sense key=0x0 ASC=0x0 ASCQ=0x0 target0:0:10: FAST-40 WIDE SCSI 80.0 MB/s DT (25 ns, offset 31) Device not ready. sdc: asking for cache data failed sdc: assuming drive cache: write through target0:0:10: FAST-40 WIDE SCSI 80.0 MB/s DT (25 ns, offset 31) target0:0:10: FAST-40 WIDE SCSI 80.0 MB/s DT (25 ns, offset 31) target0:0:10: FAST-40 WIDE SCSI 80.0 MB/s DT (25 ns, offset 31) sdc: Unit Not Ready, sense: : Current: sense key=0x0 ASC=0x0 ASCQ=0x0 target0:0:10: FAST-40 WIDE SCSI 80.0 MB/s DT (25 ns, offset 31) Device not ready. target0:0:10: FAST-40 WIDE SCSI 80.0 MB/s DT (25 ns, offset 31) Device not ready. target0:0:10: FAST-40 WIDE SCSI 80.0 MB/s DT (25 ns, offset 31) Device not ready. sdc : READ CAPACITY failed. sdc : status=1, message=00, host=0, driver=08 sd: Current: sense key=0x0 ASC=0x0 ASCQ=0x0 target0:0:10: FAST-40 WIDE SCSI 80.0 MB/s DT (25 ns, offset 31) Device not ready. sdc: asking for cache data failed sdc: assuming drive cache: write through sdc:<6> target0:0:10: FAST-40 WIDE SCSI 80.0 MB/s DT (25 ns, offset 31) Device sdc not ready. end_request: I/O error, dev sdc, sector 0 Buffer I/O error on device sdc, logical block 0 target0:0:10: FAST-40 WIDE SCSI 80.0 MB/s DT (25 ns, offset 31) Device sdc not ready. end_request: I/O error, dev sdc, sector 0 Buffer I/O error on device sdc, logical block 0 unable to read partition table
Comment from Andrew: ------------------------------ sym2 works OK on my little test box with latest -mm lineup, and I have zero patches which touch that driver. James, could some of the scsi core rework have caused this? ---------------------------- I shall rudely cc jejb on this bug without asking him, in the light of the above - sorry ;-)
--James Bottomley <James.Bottomley@SteelEye.com> wrote (on Friday, August 05, 2005 09:24:52 -0500): > On Thu, 2005-08-04 at 23:39 -0700, Andrew Morton wrote: >> James, could some of the scsi core rework have caused this? > > Well, I don't think so. The error below: > >> > sdc: Unit Not Ready, sense: >> > : Current: sense key=0x0 >> > ASC=0x0 ASCQ=0x0 >> > target0:0:10: FAST-40 WIDE SCSI 80.0 MB/s DT (25 ns, offset 31) >> > Device not ready. >> > target0:0:10: FAST-40 WIDE SCSI 80.0 MB/s DT (25 ns, offset 31) >> > Device not ready. >> > target0:0:10: FAST-40 WIDE SCSI 80.0 MB/s DT (25 ns, offset 31) >> > Device not ready. >> > sdc : READ CAPACITY failed. >> > sdc : status=1, message=00, host=0, driver=08 >> > sd: Current: sense key=0x0 >> > ASC=0x0 ASCQ=0x0 > > Is coming from the disk not the symbios driver ... I think you have a > disk failure. Howcome it works on all mainline kernels, and not -mm then? ;-) Did we fix an error path to detect failures, maybe? M.
Reply-To: James.Bottomley@SteelEye.com On Fri, 2005-08-05 at 07:36 -0700, Martin J. Bligh wrote: > Howcome it works on all mainline kernels, and not -mm then? ;-) > Did we fix an error path to detect failures, maybe? Well, OK, it might be something to do with your drives trying to negotiate IU and QAS. Support for this was added to the sym2 driver but never verified (because no-one seemed to have drives that could do it). The attached should stop the driver from negotiating these two parameters, if you could try it (it will produce complaints about static functions defined but not used, but you can ignore them). James diff --git a/drivers/scsi/sym53c8xx_2/sym_glue.c b/drivers/scsi/sym53c8xx_2/sym_glue.c --- a/drivers/scsi/sym53c8xx_2/sym_glue.c +++ b/drivers/scsi/sym53c8xx_2/sym_glue.c @@ -2122,10 +2122,12 @@ static struct spi_function_template sym2 .show_width = 1, .set_dt = sym2_set_dt, .show_dt = 1, +#if 0 .set_iu = sym2_set_iu, .show_iu = 1, .set_qas = sym2_set_qas, .show_qas = 1, +#endif .get_signalling = sym2_get_signalling, };
> On Fri, 2005-08-05 at 07:36 -0700, Martin J. Bligh wrote: >> Howcome it works on all mainline kernels, and not -mm then? ;-) >> Did we fix an error path to detect failures, maybe? > > Well, OK, it might be something to do with your drives trying to > negotiate IU and QAS. Support for this was added to the sym2 driver but > never verified (because no-one seemed to have drives that could do it). > > The attached should stop the driver from negotiating these two > parameters, if you could try it (it will produce complaints about static > functions defined but not used, but you can ignore them). Nope, is the same as before with this patch .... M. > James > > diff --git a/drivers/scsi/sym53c8xx_2/sym_glue.c b/drivers/scsi/sym53c8xx_2/sym_glue.c > --- a/drivers/scsi/sym53c8xx_2/sym_glue.c > +++ b/drivers/scsi/sym53c8xx_2/sym_glue.c > @@ -2122,10 +2122,12 @@ static struct spi_function_template sym2 > .show_width = 1, > .set_dt = sym2_set_dt, > .show_dt = 1, > +#if 0 > .set_iu = sym2_set_iu, > .show_iu = 1, > .set_qas = sym2_set_qas, > .show_qas = 1, > +#endif > .get_signalling = sym2_get_signalling, > }; > > > > >
Reply-To: James.Bottomley@SteelEye.com On Mon, 2005-08-08 at 21:41 -0700, Martin J. Bligh wrote: > Nope, is the same as before with this patch .... Dear novice bug reporter, Thank you for taking the trouble to test this. Unfortunately, without any dmesg output, it's rather hard to tell what's going on here. Would you be so kind as to send this information so we can try to diagnose what's going on. Thanks, James
--James Bottomley <James.Bottomley@SteelEye.com> wrote (on Tuesday, August 09, 2005 09:26:44 -0500): > On Mon, 2005-08-08 at 21:41 -0700, Martin J. Bligh wrote: >> Nope, is the same as before with this patch .... > > Dear novice bug reporter, > > Thank you for taking the trouble to test this. Unfortunately, without > any dmesg output, it's rather hard to tell what's going on here. Would > you be so kind as to send this information so we can try to diagnose > what's going on. Dear novice test examiner, It's in http://test.kernel.org with everything else ;-) 2.6.13-rc4-mm1+jejb_fix ... drills down to: http://test.kernel.org/10080/debug/console.log M.
Reply-To: James.Bottomley@SteelEye.com On Tue, 2005-08-09 at 07:59 -0700, Martin J. Bligh wrote: > Dear novice test examiner, > > It's in http://test.kernel.org with everything else ;-) > 2.6.13-rc4-mm1+jejb_fix ... drills down to: > > http://test.kernel.org/10080/debug/console.log Well, OK, apparently some novice coder made an error converting from a stack allocated buffer to a kmalloc'd one in the sense handling routines. I think this patch should fix it (or at least restore it to the level of bugginess it had before). James diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -342,12 +342,12 @@ int scsi_execute_req(struct scsi_device sense = kmalloc(SCSI_SENSE_BUFFERSIZE, GFP_KERNEL); if (!sense) return DRIVER_ERROR << 24; - memset(sense, 0, sizeof(*sense)); + memset(sense, 0, SCSI_SENSE_BUFFERSIZE); } result = scsi_execute(sdev, cmd, data_direction, buffer, bufflen, sense, timeout, retries, 0); if (sshdr) - scsi_normalize_sense(sense, sizeof(*sense), sshdr); + scsi_normalize_sense(sense, SCSI_SENSE_BUFFERSIZE, sshdr); kfree(sense); return result;
--On Tuesday, August 09, 2005 11:55:36 -0500 James Bottomley <James.Bottomley@SteelEye.com> wrote: > On Tue, 2005-08-09 at 07:59 -0700, Martin J. Bligh wrote: >> Dear novice test examiner, >> >> It's in http://test.kernel.org with everything else ;-) >> 2.6.13-rc4-mm1+jejb_fix ... drills down to: >> >> http://test.kernel.org/10080/debug/console.log > > Well, OK, apparently some novice coder made an error converting from a > stack allocated buffer to a kmalloc'd one in the sense handling > routines. > > I think this patch should fix it (or at least restore it to the level of > bugginess it had before). Wheeeeeee! that fixed it. Thanks very much. Log is here if you want to peek at it: http://test.kernel.org/10431/debug/console.log Triples all round! M. > James > > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c > --- a/drivers/scsi/scsi_lib.c > +++ b/drivers/scsi/scsi_lib.c > @@ -342,12 +342,12 @@ int scsi_execute_req(struct scsi_device > sense = kmalloc(SCSI_SENSE_BUFFERSIZE, GFP_KERNEL); > if (!sense) > return DRIVER_ERROR << 24; > - memset(sense, 0, sizeof(*sense)); > + memset(sense, 0, SCSI_SENSE_BUFFERSIZE); > } > result = scsi_execute(sdev, cmd, data_direction, buffer, bufflen, > sense, timeout, retries, 0); > if (sshdr) > - scsi_normalize_sense(sense, sizeof(*sense), sshdr); > + scsi_normalize_sense(sense, SCSI_SENSE_BUFFERSIZE, sshdr); > > kfree(sense); > return result; > > > >
Fixed! I owe James a triple.