Bug 13399 - kernel crash SONY DVD-ROM with cd
kernel crash SONY DVD-ROM with cd
Status: CLOSED CODE_FIX
Product: IO/Storage
Classification: Unclassified
Component: IDE
All Linux
: P1 normal
Assigned To: io_ide@kernel-bugs.osdl.org
:
: 13619 13696 (view as bug list)
Depends on:
Blocks: 13070
  Show dependency treegraph
 
Reported: 2009-05-28 16:54 UTC by Hans de Bruin
Modified: 2011-02-06 15:46 UTC (History)
7 users (show)

See Also:
Kernel Version: 2.6.31
Tree: Mainline
Regression: Yes


Attachments
dump of crash at boottime (18.17 KB, text/plain)
2009-05-28 16:54 UTC, Hans de Bruin
Details
.config (56.08 KB, application/octet-stream)
2009-05-28 17:07 UTC, Hans de Bruin
Details
new dmesg (18.17 KB, text/plain)
2009-06-05 13:51 UTC, Hans de Bruin
Details
ide-cd.dsm (124.66 KB, application/octet-stream)
2009-06-05 13:52 UTC, Hans de Bruin
Details
ide-cd.s (79.78 KB, application/octet-stream)
2009-06-05 13:52 UTC, Hans de Bruin
Details
debug patch (1.88 KB, patch)
2009-06-06 19:35 UTC, Borislav Petkov
Details | Diff
debug1 dmesg (20.45 KB, text/plain)
2009-06-08 16:05 UTC, Hans de Bruin
Details
debug1 ide-cd.dsm (141.89 KB, application/octet-stream)
2009-06-08 16:05 UTC, Hans de Bruin
Details
debug1 ide-cd.s (92.52 KB, application/octet-stream)
2009-06-08 16:06 UTC, Hans de Bruin
Details
debug1 ide-io.dsm (48.07 KB, application/octet-stream)
2009-06-08 16:06 UTC, Hans de Bruin
Details
debug1 ide-io.s (35.84 KB, application/octet-stream)
2009-06-08 16:07 UTC, Hans de Bruin
Details
debug1 dmesg (25.71 KB, text/plain)
2009-06-09 17:12 UTC, Hans de Bruin
Details
debug2 dmesg (comment 25) (426.47 KB, text/plain)
2009-06-15 16:20 UTC, Hans de Bruin
Details
tentative fix (1.77 KB, patch)
2009-06-16 06:29 UTC, Borislav Petkov
Details | Diff
Screen photo of dvd-rom crash (401.74 KB, image/jpeg)
2009-06-19 04:31 UTC, Stan Senuta Jr.
Details
My Mandrivia-2009.1 /var/log/dmesg (boot.log) (15.06 KB, text/plain)
2009-06-19 04:35 UTC, Stan Senuta Jr.
Details
ide-cd: prevent null pointer deref via cdrom_newpc_intr (1.56 KB, patch)
2009-06-22 08:04 UTC, Borislav Petkov
Details | Diff
image of my kernel-2.6.30 ide-cd.c file (121.06 KB, image/png)
2009-06-22 15:23 UTC, Stan Senuta Jr.
Details
ide-cd full debug (773 bytes, patch)
2009-09-11 05:45 UTC, Borislav Petkov
Details | Diff
dmesg (161.15 KB, text/plain)
2009-09-11 15:04 UTC, David Woodhouse
Details
testfix1 (477 bytes, patch)
2009-09-17 06:39 UTC, Borislav Petkov
Details | Diff
dmesg (63.63 KB, text/plain)
2009-09-26 06:09 UTC, David Woodhouse
Details

Description Hans de Bruin 2009-05-28 16:54:37 UTC
Created attachment 21602 [details]
dump of crash at boottime

After placing a cd in the dvdplayer the kernel crashes. Both at boot as at runtime.
Comment 1 Hans de Bruin 2009-05-28 17:07:26 UTC
Created attachment 21603 [details]
.config
Comment 2 Borislav Petkov 2009-05-29 07:25:28 UTC
Hi,

please do on the same unchanged kernel sources that produce the crash:

objdump -d drivers/ide/ide-cd.o > ide-cd.dsm
make drivers/ide/ide-cd.s

and send me .dsm and .s files.

Thanks,
Boris.
Comment 3 Hans de Bruin 2009-06-04 17:23:31 UTC
Boris, 

Did you reseive the files I sent a couple of day ago?

-- 
Hans
Comment 4 Borislav Petkov 2009-06-05 07:05:43 UTC
Hi Hans,

no, I don't see them in my mailbox. Can you please upload them to bugzilla instead ("Add an attachment" above) for everyone to see?

Thanks,
Boris.
Comment 5 Hans de Bruin 2009-06-05 13:51:20 UTC
Created attachment 21764 [details]
new dmesg
Comment 6 Hans de Bruin 2009-06-05 13:52:07 UTC
Created attachment 21765 [details]
ide-cd.dsm
Comment 7 Hans de Bruin 2009-06-05 13:52:45 UTC
Created attachment 21766 [details]
ide-cd.s
Comment 8 Borislav Petkov 2009-06-06 19:35:25 UTC
Created attachment 21779 [details]
debug patch
Comment 9 Borislav Petkov 2009-06-06 19:37:25 UTC
Hi,

can you apply the attached debug patch and send me the whole dmesg output again. Also, do

objdump -d drivers/ide/ide-cd.o > ide-cd.dsm
make drivers/ide/ide-cd.s

objdump -d drivers/ide/ide-io.o > ide-io.dsm
make drivers/ide/ide-io.s

and send me the .dsm and .s files.

Thanks,
Boris.
Comment 10 Hans de Bruin 2009-06-08 16:05:01 UTC
Created attachment 21811 [details]
debug1 dmesg
Comment 11 Hans de Bruin 2009-06-08 16:05:37 UTC
Created attachment 21812 [details]
debug1 ide-cd.dsm
Comment 12 Hans de Bruin 2009-06-08 16:06:03 UTC
Created attachment 21813 [details]
debug1 ide-cd.s
Comment 13 Hans de Bruin 2009-06-08 16:06:39 UTC
Created attachment 21814 [details]
debug1 ide-io.dsm
Comment 14 Hans de Bruin 2009-06-08 16:07:06 UTC
Created attachment 21815 [details]
debug1 ide-io.s
Comment 15 Borislav Petkov 2009-06-09 05:31:44 UTC
Hi,

thanks for testing. Looks like your debug1 dmesg is incomplete. Please
try catching the the complete dmesg. For that, try enlarging the kernel
printk buffer with the command line param 'log_buf_len=10M' in case the
dmesg looks truncated. Although, it doesn't seem so - it's more a 'dmesg
-s<bufsize>' thing. Experiment with that in order to dump the whole
buffer contents if the printk buffer is big enough but the dmesg tool
itself is using a smaller one.

Thanks,
Boris.
Comment 16 Hans de Bruin 2009-06-09 14:22:39 UTC
Maybe dmesg is not a nice description of the files. I had to capture the dump over the com port. wil log_buff_len help in this case?

Hans
Comment 17 Borislav Petkov 2009-06-09 15:57:55 UTC
Sorry for misleading you - I saw the 'console=uart..' part in the
command line only after hitting send. In that case, dmesg is not the
thing we need at all. Just add 'log_buf_len=10M' to your kernel command
line, boot with it and make sure the output caught over the serial
console is complete or at least it has caught the complete OOPS. You
could also add 'console=tty0' to the kernel command line so that you can
monitor the oopsing machine output and compare it with what has gone
over through the com port.

Thanks,
Boris.
Comment 18 Hans de Bruin 2009-06-09 17:12:18 UTC
Created attachment 21831 [details]
debug1 dmesg
Comment 19 Bartlomiej Zolnierkiewicz 2009-06-10 11:14:02 UTC
On Tuesday 09 June 2009 19:12:18 bugzilla-daemon@bugzilla.kernel.org wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=13399
> 
> 
> 
> 
> 
> --- Comment #18 from Hans de Bruin <bruinjm@xs4all.nl>  2009-06-09 17:12:18 ---
> Created an attachment (id=21831)
>  --> (http://bugzilla.kernel.org/attachment.cgi?id=21831)
> debug1 dmesg

Hans, thanks for the debug data!

ide-cd: ide_cd_queue_pc: cmd[0]: 0x51, write: 0x0, timeout: 1750, cmd_flags: 0x8000
ide-cd: ide_cd_do_request: cmd: 0x51, block: 18446744073709551615
ide_cd_do_request: dev hda: type=a, flags=108a640
  sector 18446744073709551615, nr/cnr 0/0
  bio (null), biotail (null), buffer (null), data ffff88011b849ba0, len 2
ide-cd: cdrom_do_block_pc: rq->cmd[0]: 0x51, rq->cmd_type: 0xa
ide-cd: cdrom_newpc_intr: cmd: 0x51, write: 0x0
ide-cd: cdrom_newpc_intr: DRQ: stat: 0x58, thislen: 2
ide-cd: ide_cd_check_ireason: ireason: 0x2, rw: 0x0
ide-cd: cdrom_newpc_intr: data transfer, rq->cmd_type: 0xa, ireason: 0x2
ide-cd: cdrom_newpc_intr: cmd: 0x51, write: 0x0
ide-cd: cdrom_newpc_intr: DRQ: stat: 0x50, thislen: 0
ide-cd: ide_cd_request_sense_fixup: rq->cmd[0]: 0x51
#3
ide-cd: ide_cd_queue_pc: cmd[0]: 0x51, write: 0x0, timeout: 1750, cmd_flags: 0x8000
ide-cd: ide_cd_do_request: cmd: 0x51, block: 18446744073709551615
ide_cd_do_request: dev hda: type=a, flags=108a640
  sector 18446744073709551615, nr/cnr 0/0
  bio (null), biotail (null), buffer (null), data ffff88011b849ba0, len 32
ide-cd: cdrom_do_block_pc: rq->cmd[0]: 0x51, rq->cmd_type: 0xa
ide-cd: cdrom_newpc_intr: cmd: 0x51, write: 0x0
ide-cd: cdrom_newpc_intr: DRQ: stat: 0x58, thislen: 30
ide-cd: ide_cd_check_ireason: ireason: 0x2, rw: 0x0
ide-cd: cdrom_newpc_intr: data transfer, rq->cmd_type: 0xa, ireason: 0x2
ide-cd: cdrom_newpc_intr: cmd: 0x51, write: 0x0
ide-cd: cdrom_newpc_intr: DRQ: stat: 0x50, thislen: 2
ide-cd: ide_cd_request_sense_fixup: rq->cmd[0]: 0x51
#3
BUG: unable to handle kernel NULL pointer dereference at 0000000000000048

The above command sequence corresponds to requests generated by
drivers/cdrom/cdrom.c::cdrom_get_disc_info()

	/* set up command and get the disc info */
	init_cdrom_command(&cgc, di, sizeof(*di), CGC_DATA_READ);
	cgc.cmd[0] = GPCMD_READ_DISC_INFO;
	cgc.cmd[8] = cgc.buflen = 2;
	cgc.quiet = 1;

	if ((ret = cdo->generic_packet(cdi, &cgc)))
		return ret;

	/* not all drives have the same disc_info length, so requeue
	 * packet with the length the drive tells us it can supply
	 */
	buflen = be16_to_cpu(di->disc_information_length) +
		     sizeof(di->disc_information_length);

	if (buflen > sizeof(disc_information))
		buflen = sizeof(disc_information);

	cgc.cmd[8] = cgc.buflen = buflen;
	if ((ret = cdo->generic_packet(cdi, &cgc)))
		return ret;

and we are hitting BUG because of the limitation of block layer, only
rq->bio requests can be partially completed:

int __blk_end_request(struct request *rq, int error, unsigned int nr_bytes)
{
	if (rq->bio && __end_that_request_first(rq, error, nr_bytes))
		return 1;

	add_disk_randomness(rq->rq_disk);

	end_that_request_last(rq, error);

	return 0;
}

so on error we complete the request fully instead of partially and later
we try to complete the already freed request.

Borislav, there are actually two issues here to investigate:

* seems like we need to check for rq->bio before calling ide_cd_error_cmd()
  to make the regression go away (IOW we shouldn't don't do partial "good"
  completions for !rq->bio requests)

* potentially revise our strategy of handling GPCMD_READ_DISC_INFO packet
  command returning less data than requested (in this particular case we ask
  for 32 bytes, device gives us only 30 bytes but returns good status)
Comment 20 Borislav Petkov 2009-06-13 16:29:05 UTC
Hi Bart,

thanks for analyzing this.

I'm staring at the ATA_DRQ == 0 part in cdrom_newpc_intr:

                } else if (!blk_pc_request(rq)) {
                        ide_cd_request_sense_fixup(drive, cmd);
                        /* complain if we still have data left to transfer */
                        uptodate = cmd->nleft ? 0 : 1;
                        if (uptodate == 0)
                                rq->cmd_flags |= REQ_FAILED;
                }
                goto out_end;
        }

so, in our case ide_cd_error_cmd() kills the rq prematurely and that's
why ide_complete_rq() oopses later. And this is caused by uptodate ==
0. Now, here's how the ATA spec (d1532v1r4b-ATA-ATAPI-7) describes the
semantics of clearing of the DRQ bit by the drive:

"
5.14.5.5 DRQ (Data request)

...

The DRQ bit shall be cleared to zero by the device:
         1) when the last word of the data transfer occurs;
         2) when the last word of the command packet transfer occurs for a PACKET command.
"

now there's a subtlety here wrt to what am I to do as an IRQ handler
when my drive clears the DRQ bit: do I _drain_ the last bytes remaining
(in our case 2) or do I fail the rq straightaway. I'm pretty sure
cmd->nleft is 2 in our case so I think that it might be only right to
drain the device first, i.e. do

	uptodate = (cmd->nleft - thislen) ? 0 : 1;

and then later ide_pio_bytes() before completing the rq properly. Hmm?

And yes, this is against spec since the following sentence states that
the data can be drained only "... via DMA mode if DMARQ and DMACK- are
asserted and BSY is set to one." but we should give it a try...

Ideas? Opinions?
Comment 21 Bartlomiej Zolnierkiewicz 2009-06-13 16:54:36 UTC
On Saturday 13 June 2009 18:29:07 bugzilla-daemon@bugzilla.kernel.org wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=13399
> 
> 
> 
> 
> 
> --- Comment #20 from Borislav Petkov <bbpetkov@yahoo.de>  2009-06-13 16:29:05 ---
> Hi Bart,
> 
> thanks for analyzing this.
> 
> I'm staring at the ATA_DRQ == 0 part in cdrom_newpc_intr:
> 
>                 } else if (!blk_pc_request(rq)) {
>                         ide_cd_request_sense_fixup(drive, cmd);
>                         /* complain if we still have data left to transfer */
>                         uptodate = cmd->nleft ? 0 : 1;
>                         if (uptodate == 0)
>                                 rq->cmd_flags |= REQ_FAILED;
>                 }
>                 goto out_end;
>         }
> 
> so, in our case ide_cd_error_cmd() kills the rq prematurely and that's
> why ide_complete_rq() oopses later. And this is caused by uptodate ==

Nope, it is block layer that kills it prematurely.

There are two issues here:

* OOPS (*the*regression* which should be taken care of first)
  (cause: unexpected interaction between ide/block)

* handling of non-fully completed requests with "good" status
  (cause: stupid hardware)

> 0. Now, here's how the ATA spec (d1532v1r4b-ATA-ATAPI-7) describes the
> semantics of clearing of the DRQ bit by the drive:
> 
> "
> 5.14.5.5 DRQ (Data request)
> 
> ...
> 
> The DRQ bit shall be cleared to zero by the device:
>          1) when the last word of the data transfer occurs;
>          2) when the last word of the command packet transfer occurs for a
> PACKET command.
> "
> 
> now there's a subtlety here wrt to what am I to do as an IRQ handler
> when my drive clears the DRQ bit: do I _drain_ the last bytes remaining
> (in our case 2) or do I fail the rq straightaway. I'm pretty sure
> cmd->nleft is 2 in our case so I think that it might be only right to
> drain the device first, i.e. do
> 
>     uptodate = (cmd->nleft - thislen) ? 0 : 1;
> 
> and then later ide_pio_bytes() before completing the rq properly. Hmm?

We should drain only if the device really wants to transfer more data...

cmd->nleft is our concept, thislen reflects the hardware state.

> And yes, this is against spec since the following sentence states that
> the data can be drained only "... via DMA mode if DMARQ and DMACK- are
> asserted and BSY is set to one." but we should give it a try...
> 
> Ideas? Opinions?

Yes.  Please stop reading this stupid ATAPI spec, it is near to worthless
when it comes to the real hardware implementations/bugs..
Comment 22 Anonymous Emailer 2009-06-14 10:06:13 UTC
Reply-To: petkovbb@googlemail.com

Hi,

On Sat, Jun 13, 2009 at 06:59:53PM +0200, Bartlomiej Zolnierkiewicz wrote:
> > --- Comment #20 from Borislav Petkov <bbpetkov@yahoo.de>  2009-06-13 16:29:05 ---
> > Hi Bart,
> > 
> > thanks for analyzing this.
> > 
> > I'm staring at the ATA_DRQ == 0 part in cdrom_newpc_intr:
> > 
> >                 } else if (!blk_pc_request(rq)) {
> >                         ide_cd_request_sense_fixup(drive, cmd);
> >                         /* complain if we still have data left to transfer */
> >                         uptodate = cmd->nleft ? 0 : 1;
> >                         if (uptodate == 0)
> >                                 rq->cmd_flags |= REQ_FAILED;
> >                 }
> >                 goto out_end;
> >         }
> > 
> > so, in our case ide_cd_error_cmd() kills the rq prematurely and that's
> > why ide_complete_rq() oopses later. And this is caused by uptodate ==
> 
> Nope, it is block layer that kills it prematurely.

ok, I still need to understand the whole code flow properly so please
bare with me.

I got misled by the __blk_end_request() thing: am I right to assume that
you were using it to give a better example where it is more clear that
the block layer really kills rq->bio-less requests?

Because we don't hit __blk_end_request from ide_cd_error_cmd() (or
ide_complete_rq() too, for that matter) - we do rather:

ide_cd_error_cmd() does ide_complete_rq(), which ends up doing
blk_end_request(), then blk_end_io() and the rq->bio thing is checked
in end_that_request_data(). Which is actually the same thing but done
slightly differently.

> There are two issues here:
>
> * OOPS (*the*regression* which should be taken care of first (cause:
> unexpected interaction between ide/block)

I'm thinking something like

if (uptodate == 0 && !(OK_STAT(stat, 0, BAD_STAT)))
	ide_cd_error_cmd(drive, cmd);

> * handling of non-fully completed requests with "good" status (cause:
> stupid hardware)

The non-intrusive solution, IMHO, would be to add another quirk flag for
such a device (SONY DVD-ROM DDU1615) and do not complete the rq in that
case, aka no partial completion for packet commands to that device. I'm
wondering what else is broken with it especially if you're requiring
bigger buffers like the capacities page.

So, how about an ide_cd_complete_rq() helper which hides such special
cases and is called as an indirection at the end of the irq handler?
Comment 23 Bartlomiej Zolnierkiewicz 2009-06-14 12:27:19 UTC
On Sunday 14 June 2009 12:06:06 Borislav Petkov wrote:
> Hi,
> 
> On Sat, Jun 13, 2009 at 06:59:53PM +0200, Bartlomiej Zolnierkiewicz wrote:
> > > --- Comment #20 from Borislav Petkov <bbpetkov@yahoo.de>  2009-06-13 16:29:05 ---
> > > Hi Bart,
> > > 
> > > thanks for analyzing this.
> > > 
> > > I'm staring at the ATA_DRQ == 0 part in cdrom_newpc_intr:
> > > 
> > >                 } else if (!blk_pc_request(rq)) {
> > >                         ide_cd_request_sense_fixup(drive, cmd);
> > >                         /* complain if we still have data left to transfer */
> > >                         uptodate = cmd->nleft ? 0 : 1;
> > >                         if (uptodate == 0)
> > >                                 rq->cmd_flags |= REQ_FAILED;
> > >                 }
> > >                 goto out_end;
> > >         }
> > > 
> > > so, in our case ide_cd_error_cmd() kills the rq prematurely and that's
> > > why ide_complete_rq() oopses later. And this is caused by uptodate ==
> > 
> > Nope, it is block layer that kills it prematurely.
> 
> ok, I still need to understand the whole code flow properly so please
> bare with me.
> 
> I got misled by the __blk_end_request() thing: am I right to assume that
> you were using it to give a better example where it is more clear that
> the block layer really kills rq->bio-less requests?
> 
> Because we don't hit __blk_end_request from ide_cd_error_cmd() (or
> ide_complete_rq() too, for that matter) - we do rather:
> 
> ide_cd_error_cmd() does ide_complete_rq(), which ends up doing
> blk_end_request(), then blk_end_io() and the rq->bio thing is checked
> in end_that_request_data(). Which is actually the same thing but done
> slightly differently.

Yes, you're seeing block layer updates from 2 days ago.

> > There are two issues here:
> >
> > * OOPS (*the*regression* which should be taken care of first (cause:
> > unexpected interaction between ide/block)
> 
> I'm thinking something like
> 
> if (uptodate == 0 && !(OK_STAT(stat, 0, BAD_STAT)))
> 	ide_cd_error_cmd(drive, cmd);

Those are two separate issues, please stop mixing them.

AFAICS ide-cd has been always failing !uptodate requests so the latter
issue is nothing new.  Which means that it is really not the right time
to be scratching our heads about it while the former problem has been
seriously affecting people for weeks now and also managed to slip into
the final 2.6.30 release..

> > * handling of non-fully completed requests with "good" status (cause:
> > stupid hardware)
> 
> The non-intrusive solution, IMHO, would be to add another quirk flag for
> such a device (SONY DVD-ROM DDU1615) and do not complete the rq in that
> case, aka no partial completion for packet commands to that device. I'm
> wondering what else is broken with it especially if you're requiring
> bigger buffers like the capacities page.
> 
> So, how about an ide_cd_complete_rq() helper which hides such special
> cases and is called as an indirection at the end of the irq handler?

I can't tell much without seeing the concept with more details.

IOW please send patches, that is the best way to discuss things.
Comment 24 Anonymous Emailer 2009-06-14 13:02:45 UTC
Reply-To: petkovbb@googlemail.com

Hi,

On Sun, Jun 14, 2009 at 02:32:10PM +0200, Bartlomiej Zolnierkiewicz wrote:
> On Sunday 14 June 2009 12:06:06 Borislav Petkov wrote:
> > Hi,
> > 
> > On Sat, Jun 13, 2009 at 06:59:53PM +0200, Bartlomiej Zolnierkiewicz wrote:
> > > > --- Comment #20 from Borislav Petkov <bbpetkov@yahoo.de>  2009-06-13 16:29:05 ---
> > > > Hi Bart,
> > > > 
> > > > thanks for analyzing this.
> > > > 
> > > > I'm staring at the ATA_DRQ == 0 part in cdrom_newpc_intr:
> > > > 
> > > >                 } else if (!blk_pc_request(rq)) {
> > > >                         ide_cd_request_sense_fixup(drive, cmd);
> > > >                         /* complain if we still have data left to transfer */
> > > >                         uptodate = cmd->nleft ? 0 : 1;
> > > >                         if (uptodate == 0)
> > > >                                 rq->cmd_flags |= REQ_FAILED;
> > > >                 }
> > > >                 goto out_end;
> > > >         }
> > > > 
> > > > so, in our case ide_cd_error_cmd() kills the rq prematurely and that's
> > > > why ide_complete_rq() oopses later. And this is caused by uptodate ==
> > > 
> > > Nope, it is block layer that kills it prematurely.
> > 
> > ok, I still need to understand the whole code flow properly so please
> > bare with me.
> > 
> > I got misled by the __blk_end_request() thing: am I right to assume that
> > you were using it to give a better example where it is more clear that
> > the block layer really kills rq->bio-less requests?
> > 
> > Because we don't hit __blk_end_request from ide_cd_error_cmd() (or
> > ide_complete_rq() too, for that matter) - we do rather:
> > 
> > ide_cd_error_cmd() does ide_complete_rq(), which ends up doing
> > blk_end_request(), then blk_end_io() and the rq->bio thing is checked
> > in end_that_request_data(). Which is actually the same thing but done
> > slightly differently.
> 
> Yes, you're seeing block layer updates from 2 days ago.

Nope, I'm staring at .30-rc7 sources. I guess those updates are a bit
older.

> > > There are two issues here:
> > >
> > > * OOPS (*the*regression* which should be taken care of first (cause:
> > > unexpected interaction between ide/block)
> > 
> > I'm thinking something like
> > 
> > if (uptodate == 0 && !(OK_STAT(stat, 0, BAD_STAT)))
> > 	ide_cd_error_cmd(drive, cmd);
> 
> Those are two separate issues, please stop mixing them.
> 
> AFAICS ide-cd has been always failing !uptodate requests so the latter
> issue is nothing new.  Which means that it is really not the right time
> to be scratching our heads about it while the former problem has been
> seriously affecting people for weeks now and also managed to slip into
> the final 2.6.30 release..

Ok, let's do that:

@Hans: can you apply the following ontop of the debugging patch and test and
send us the whole dmesg, thanks.

--
diff --git a/drivers/ide/ide-cd.c b/drivers/ide/ide-cd.c
index 6f728da..219c303 100644
--- a/drivers/ide/ide-cd.c
+++ b/drivers/ide/ide-cd.c
@@ -764,7 +764,7 @@ out_end:
 				rq->errors = -EIO;
 		}
 
-		if (uptodate == 0)
+		if (uptodate == 0 && rq->bio)
 			ide_cd_error_cmd(drive, cmd);
 
 		/* make sure it's fully ended */

--

> > > * handling of non-fully completed requests with "good" status (cause:
> > > stupid hardware)
> > 
> > The non-intrusive solution, IMHO, would be to add another quirk flag for
> > such a device (SONY DVD-ROM DDU1615) and do not complete the rq in that
> > case, aka no partial completion for packet commands to that device. I'm
> > wondering what else is broken with it especially if you're requiring
> > bigger buffers like the capacities page.
> > 
> > So, how about an ide_cd_complete_rq() helper which hides such special
> > cases and is called as an indirection at the end of the irq handler?
> 
> I can't tell much without seeing the concept with more details.
>
> IOW please send patches, that is the best way to discuss things.

Ok.
Comment 25 Anonymous Emailer 2009-06-15 06:28:02 UTC
Reply-To: petkovbb@googlemail.com

Hi,

here's a yet another analysis:


issue GPCMD_READ_DISC_INFO

ide-cd: ide_cd_queue_pc: cmd[0]: 0x51, write: 0x0, timeout: 1750, cmd_flags: 0x8000
ide-cd: ide_cd_do_request: cmd: 0x51, block: 18446744073709551615
ide_cd_do_request: dev hda: type=a, flags=108a640
  sector 18446744073709551615, nr/cnr 0/0
  bio (null), biotail (null), buffer (null), data ffff88011b849ba0, len 32
ide-cd: cdrom_do_block_pc: rq->cmd[0]: 0x51, rq->cmd_type: 0xa
ide-cd: cdrom_newpc_intr: cmd: 0x51, write: 0x0

DRQ set, read 30 bytes

ide-cd: cdrom_newpc_intr: DRQ: stat: 0x58, thislen: 30
ide-cd: ide_cd_check_ireason: ireason: 0x2, rw: 0x0
ide-cd: cdrom_newpc_intr: data transfer, rq->cmd_type: 0xa, ireason: 0x2

xfer 30 bytes and rearm IRQ handler with a timeout

ide-cd: cdrom_newpc_intr: cmd: 0x51, write: 0x0

second IRQ, DRQ is cleared, read 2 bytes

ide-cd: cdrom_newpc_intr: DRQ: stat: 0x50, thislen: 2
ide-cd: ide_cd_request_sense_fixup: rq->cmd[0]: 0x51
#3

here we do "goto out_end" instead of xferring the remaining 2 bytes
out of the drive. So how about something like the workaround below, I
haven't tested it on the real drive. Wrt to the ide_cd_error_cmd(), I
guess the proper fix is to check the rq->bio thing because the rq is
going to be ended anyways.

@Hans: can you please test applying it ontop of the debugging patch and
sending the output again, thanks.


---

diff --git a/drivers/ide/ide-cd.c b/drivers/ide/ide-cd.c
index 6f728da..716a576 100644
--- a/drivers/ide/ide-cd.c
+++ b/drivers/ide/ide-cd.c
@@ -680,10 +680,19 @@ static ide_startstop_t cdrom_newpc_intr(ide_drive_t *drive)
 			}
 		} else if (!blk_pc_request(rq)) {
 			ide_cd_request_sense_fixup(drive, cmd);
+
 			/* complain if we still have data left to transfer */
 			uptodate = cmd->nleft ? 0 : 1;
-			if (uptodate == 0)
-				rq->cmd_flags |= REQ_FAILED;
+			if (uptodate == 0) {
+				if (drive->atapi_flags & IDE_AFLAG_FRAG_CMD) {
+					ide_pio_bytes(drive, cmd, write,
+						      cmd->nleft);
+					uptodate = (cmd->nleft - thislen) ? 0
+									  : 1;
+				}
+				if (!uptodate)
+					rq->cmd_flags |= REQ_FAILED;
+			}
 		}
 		pr_err("#3\n");
 		goto out_end;
@@ -764,7 +773,7 @@ out_end:
 				rq->errors = -EIO;
 		}
 
-		if (uptodate == 0)
+		if (uptodate == 0 && rq->bio)
 			ide_cd_error_cmd(drive, cmd);
 
 		/* make sure it's fully ended */
@@ -1554,6 +1563,11 @@ static const struct cd_list_entry ide_cd_quirks_list[] = {
 	{ "Optiarc DVD RW AD-7200A", NULL,   IDE_AFLAG_PLAY_AUDIO_OK	     },
 	{ "Optiarc DVD RW AD-7543A", NULL,   IDE_AFLAG_NO_AUTOCLOSE	     },
 	{ "TEAC CD-ROM CD-224E",     NULL,   IDE_AFLAG_NO_AUTOCLOSE	     },
+	/*
+	 * some drives clear the DRQ bit on the last word of a packet command
+	 * and therefore need some special handling in the IRQ handler
+	 */
+	{ "SONY DVD-ROM DDU1615",    NULL,   IDE_AFLAG_FRAG_CMD		     },
 	{ NULL, NULL, 0 }
 };
 
diff --git a/include/linux/ide.h b/include/linux/ide.h
index 9fed365..e18b455 100644
--- a/include/linux/ide.h
+++ b/include/linux/ide.h
@@ -456,6 +456,9 @@ enum {
 	IDE_AFLAG_MEDIUM_PRESENT	= (1 << 23),
 
 	IDE_AFLAG_NO_AUTOCLOSE		= (1 << 24),
+
+	/* ide-cd quirk */
+	IDE_AFLAG_FRAG_CMD		= (1 << 25),
 };
 
 /* device flags */
Comment 26 Hans de Bruin 2009-06-15 16:20:41 UTC
Created attachment 21925 [details]
debug2 dmesg (comment 25)
Comment 27 Hans de Bruin 2009-06-15 16:23:34 UTC
cutting and pasting from bugzilla somehow didn't provide a clean diff file, so I had to cut and past directly into the code. Have fun with the extensive dmesg :)

Hans
Comment 28 Borislav Petkov 2009-06-16 06:28:27 UTC
Hi,

ok, seems like you can bootup normally. I see even some read commands
which means that the drive might be working ok but you'll have to do
some more testing.

We need to see whether there are still any glitches. Just remove the
debugging patch and apply the fix. By the way, I've uploaded it to
bugzilla so that you don't have to copy-paste. Just do some audio
playing/dvd video/data cdrom accesses/whatever else use case comes to
your mind and collect the whole bootlog again and send it on, please.

Thanks,
Boris.
Comment 29 Borislav Petkov 2009-06-16 06:29:39 UTC
Created attachment 21930 [details]
tentative fix
Comment 30 Hans de Bruin 2009-06-16 15:45:59 UTC
hda: media error (bad sector): status=0x51 { DriveReady SeekComplete Error }
hda: media error (bad sector): error=0x30 { LastFailedSense=0x03 }
hda: possibly failed opcode: 0xa0
ATAPI device hda:
  Error: Medium error -- (Sense key=0x03)
  (reserved error code) -- (asc=0x02, ascq=0x00)
  The failed "Read 10" packet command was: 
  "28 00 00 00 76 78 00 00 40 00 00 00 00 00 00 00 "
end_request: I/O error, dev hda, sector 121312
Buffer I/O error on device hda, logical block 30328
Buffer I/O error on device hda, logical block 30329
Buffer I/O error on device hda, logical block 30330
Buffer I/O error on device hda, logical block 30331
Buffer I/O error on device hda, logical block 30332
Buffer I/O error on device hda, logical block 30333
Buffer I/O error on device hda, logical block 30334
Buffer I/O error on device hda, logical block 30335
Buffer I/O error on device hda, logical block 30336
Buffer I/O error on device hda, logical block 30337

Just kidding a CD and a whiteboard marker apparently don't mix :). I can boot whith a CD, create iso's from CD's and DVD's and copy at file level. Since this is my at work pc I dont have musical CD's at hand.

Where did this problem start? I could still use cd's in 2.6.29 but in a slackware newsgroup I found this threat: http://groups.google.com/group/alt.os.linux.slackware/browse_thread/thread/1add287bea50b0bc#
slack. Slack current uses 2.6.29.4

-- 
Hans
Comment 31 Borislav Petkov 2009-06-18 08:19:14 UTC
Hi,

(In reply to comment #30)
> hda: media error (bad sector): status=0x51 { DriveReady SeekComplete Error }
> hda: media error (bad sector): error=0x30 { LastFailedSense=0x03 }
> hda: possibly failed opcode: 0xa0
> ATAPI device hda:
>   Error: Medium error -- (Sense key=0x03)
>   (reserved error code) -- (asc=0x02, ascq=0x00)
>   The failed "Read 10" packet command was: 
>   "28 00 00 00 76 78 00 00 40 00 00 00 00 00 00 00 "
> end_request: I/O error, dev hda, sector 121312
> Buffer I/O error on device hda, logical block 30328
> Buffer I/O error on device hda, logical block 30329
> Buffer I/O error on device hda, logical block 30330
> Buffer I/O error on device hda, logical block 30331
> Buffer I/O error on device hda, logical block 30332
> Buffer I/O error on device hda, logical block 30333
> Buffer I/O error on device hda, logical block 30334
> Buffer I/O error on device hda, logical block 30335
> Buffer I/O error on device hda, logical block 30336
> Buffer I/O error on device hda, logical block 30337
> 
> Just kidding a CD and a whiteboard marker apparently don't mix :). 

You funny guy :)

> I can boot whith a CD, create iso's from CD's and DVD's and copy at
> file level. Since this is my at work pc I dont have musical CD's at
> hand.

Thanks for testing.

> Where did this problem start? I could still use cd's in
> 2.6.29 but in a slackware newsgroup I found this threat:
> http://groups.google.com/group/alt.os.linux.slackware/browse_thread/th
> read/1add287bea50b0bc# slack. Slack current uses 2.6.29.4

The fix might need backporting. If you like, you could test latest
stable kernels (2.6.29.5, maybe even 2.6.28.10) and send me the bootlogs
of those so that we could verify that the fix is needed. That'll be of
great help :).

Thanks,
Boris.
Comment 32 Stan Senuta Jr. 2009-06-18 12:12:57 UTC
I am also experiencing system lockups whenever I insert a formated dvd or cd into my "hdd" ro drive. A blank dics doesn't cause the system to freeze. This happens when running X windows & running just init 3 on Mandriva-2009.1 with the EXT4 filesystem & kernel-2.6.30. 

I have another dvd combo rw drive labeled "hdc" in this same "ASUS" box  & it works ok. Here is the only error output I was able to see when accessing drive hdd:

  "kernel panic not syncing Fatal inception interrupt"
  "pid 1289 comm: modprobe tainted: G   D  2.6.30 #1"

Also, I still have kernel-2.6.29 installed in this box & it handles both the "hdc" & "hdd" cd drives without problem.

I hope this helps.
Comment 33 Borislav Petkov 2009-06-18 13:36:37 UTC
(In reply to comment #32)
> I am also experiencing system lockups whenever I insert a formated dvd
> or cd into my "hdd" ro drive. A blank dics doesn't cause the system to
> freeze. This happens when running X windows & running just init 3 on
> Mandriva-2009.1 with the EXT4 filesystem & kernel-2.6.30.

Is that a vanilla kernel or a distro-patched one? Please supply the
whole boot log of the machine. Also, can you catch any oopses over
serial console or netconsole, that'll help us a lot.

> I have another dvd combo rw drive labeled "hdc" in this same "ASUS" box  & it
> works ok. Here is the only error output I was able to see when accessing drive
> hdd:
> 
>   "kernel panic not syncing Fatal inception interrupt"
>   "pid 1289 comm: modprobe tainted: G   D  2.6.30 #1"

If you've copied the line correctly, the D shows that this is part of a
follow-up OOPS. However, the first one is actually the interesting one.
Alternatively to the serial or netconsole methods above, you could try
to make a photo of your screen with any camera and send me that. Please
try to catch the whole OOPS, though.

Thanks,
Boris.
Comment 34 Hans de Bruin 2009-06-18 16:22:25 UTC
There are no problems with 2.6.29.5, for me that is.
Comment 35 Stan Senuta Jr. 2009-06-19 04:31:04 UTC
Created attachment 22003 [details]
Screen photo of dvd-rom crash
Comment 36 Stan Senuta Jr. 2009-06-19 04:35:33 UTC
Created attachment 22004 [details]
My Mandrivia-2009.1 /var/log/dmesg (boot.log)
Comment 37 Stan Senuta Jr. 2009-06-19 04:37:25 UTC
I compiled kernel-2.6.30 from the tarball using the .config file from my kernel-2.6.29 & the "Make olddconfig" option. 
Kernel-2.6.29 was also compiled from source. I don't like Mandriva kernels because they require you to use an initrd.

The only new configuration option I added to kernel-2.6.30  was to keep the default journal_data=ordered. I did this
because I also run mandriva-2008.0 EXT3 on this "ASUS" box & It would only boot to ro failsafe mode unless I did this. 
I discovered that w/o journal_data=ordered, the rootfs is 1st mounted ro with journal_data=writeback & then couldn't 
get "re-mounted" rw with journal_data=ordered because changing the journal at re-mount isn't permitted by the kernel.

Another oddity I ran into was that kernel-2.6.30  built under Mandriva-2009.1 EXT4 was not usable under an EXT3 filesystem. 
I had to compile kernel-2.6.30 for EXT3 Mandriva-2008.0 seperately or no modules would load.  

I am attaching a photo (crash.jpeg) of my init 3 screen after I inserted a dvd-rw disc formated iso9660  & it crashed. 
Please note, I didn't even attempt to mount this disc. The DVD_ROM drive "HDD" just crashes the system when a formated disc 
or movie disc is detected.

I also am attaching a copy of my dmesg file because /var/log/boot.log is an empty file on my system. I hope that will help.

______________________________________________________________________________________________
Here is my partition setup:

Disk /dev/hda: 80.0 GB, 80026361856 bytes
16 heads, 63 sectors/track, 155061 cylinders
Units = cylinders of 1008 * 512 = 516096 bytes
Disk identifier: 0x06a2e5a5

   Device Boot      Start         End      Blocks   Id  System
/dev/hda1               1       58920    29695648+  83  Linux
/dev/hda2           58921      118855    30207240    5  Extended
/dev/hda3          118856      155061    18247824   83  Linux
/dev/hda5           58921       59935      511528+  82  Linux swap / Solaris
/dev/hda6           59936      118855    29695648+  83  Linux

Disk /dev/sda: 250.0 GB, 250059350016 bytes
255 heads, 63 sectors/track, 30401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x6fb3f3d0

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1           7       56196   83  Linux
/dev/sda2               8        7841    62926605   83  Linux
/dev/sda3            7842       15936    65023087+   5  Extended
/dev/sda4           15937       30401   116190112+  83  Linux
/dev/sda5            7842        8103     2104483+  82  Linux swap / Solaris
/dev/sda6            8104       15936    62918541   83  Linux

[root@localhost stan]# df -T
Filesystem    Type    Size  Used Avail Use% Mounted on
/dev/sda2     ext4     60G  4.2G   52G   8% /         
/dev/sda1     ext3     54M   13M   39M  24% /boot     
/dev/sda6     ext4     60G  3.1G   53G   6% /usr      
/dev/hdc   iso9660    2.4G  2.4G     0 100% /mnt/misc
Comment 38 Borislav Petkov 2009-06-22 06:37:43 UTC
(In reply to comment #34)
> There are no problems with 2.6.29.5, for me that is.

Thanks for testing.
Comment 39 Borislav Petkov 2009-06-22 08:03:36 UTC
Hi Stan,

can you try the attached patch?

Thanks,
Boris.
Comment 40 Borislav Petkov 2009-06-22 08:04:08 UTC
Created attachment 22042 [details]
ide-cd: prevent null pointer deref via cdrom_newpc_intr
Comment 41 Stan Senuta Jr. 2009-06-22 15:23:48 UTC
Created attachment 22046 [details]
image of my kernel-2.6.30 ide-cd.c file
Comment 42 Stan Senuta Jr. 2009-06-22 15:25:49 UTC
Please explain how to apply patch. I downloaded 0001-ide-cd-prevent-null-pointer-deref-via-cdrom_newpc_in.patch & saved existing linux-2.6.30 to linux-2.6.30-orig but now what? 

Also, I noted that your patch-ide-cd/diff-attachment.cgi.html shows lines 667-673 are to be changed in the ide-cd.c file, ..but my kernel-2.6.30 ide-cd.c file has that data on lines 758-764. Will that make the patch fail?

I have attached a snapshot1.png file showing my kernel-2.6.30 ide-cd.c file.

Sorry, but I know very little about applying patches. I do have the /usr/bin/patch program though if that is what's needed.
Comment 43 Borislav Petkov 2009-06-22 15:49:03 UTC
(In reply to comment #42)
> Please explain how to apply patch. I downloaded
> 0001-ide-cd-prevent-null-pointer-deref-via-cdrom_newpc_in.patch & saved
> existing linux-2.6.30 to linux-2.6.30-orig but now what? 
> 
> Also, I noted that your patch-ide-cd/diff-attachment.cgi.html shows lines
> 667-673 are to be changed in the ide-cd.c file, ..but my kernel-2.6.30 ide-cd.c
> file has that data on lines 758-764. Will that make the patch fail?
> 
> I have attached a snapshot1.png file showing my kernel-2.6.30 ide-cd.c file.
> 
> Sorry, but I know very little about applying patches. I do have the
> /usr/bin/patch program though if that is what's needed.

Ah, let's do the thing by foot then since it is easier:

Open drivers/ide/ide-cd.c and locate the following lines (as you've done
already in the screenshot):

756                 } else {
757                         if (uptodate <= 0 && rq->errors == 0)
758                                 rq->errors = -EIO;
759                 }
760
761                 if (uptodate == 0)
762                         ide_cd_error_cmd(drive, cmd);
763
764                 /* make sure it's fully ended */

and change the 761th line from

761                 if (uptodate == 0)

to

761                 if (uptodate == 0 && rq->bio)

and then recompile and install your kernel as you normally do. Then boot
test it to see whether it works. That's it :).

Thanks,
Boris.
Comment 44 Stan Senuta Jr. 2009-06-23 04:07:35 UTC
I am pleased to report that your kernel-2.6.30 patch (0001-ide-cd-prevent-null-pointer-deref-via-cdrom_newpc_in.patch) 
fixed the crash problem when accessing my /dev/hdd dvd-rom drive. I was able to mount a dvd-rw disc & here is the proof:

[root@localhost stan]# mount -t iso9660 /dev/hdd /mnt/misc
mount: block device /dev/hdd is write-protected, mounting read-only
[root@localhost stan]# df -T
Filesystem    Type    Size  Used Avail Use% Mounted on
/dev/sda2     ext4     60G  4.3G   52G   8% /
/dev/sda1     ext3     54M   16M   36M  31% /boot
/dev/sda6     ext4     60G  3.1G   53G   6% /usr
/dev/hdd   iso9660    400K  400K     0 100% /mnt/misc

[root@localhost stan]# mount -t iso9660 /dev/hdc /mnt/cddrive
[root@localhost stan]# df -T
Filesystem    Type    Size  Used Avail Use% Mounted on
/dev/sda2     ext4     60G  4.3G   52G   8% /
/dev/sda1     ext3     54M   16M   36M  31% /boot
/dev/sda6     ext4     60G  3.1G   53G   6% /usr
/dev/hdd   iso9660    400K  400K     0 100% /mnt/misc
/dev/hdc   iso9660    350K  350K     0 100% /mnt/cddrive

I edited the kernel-2.6.30 "ide-cd.c" file as instructed to apply the patch & then ran the following 
commands to re-compile my kernel:

      make mrproper
      cp .config-save .config
      make oldconfig
      make
      make modules_install
 
My  "new"  /lib/modules/2.6.30/drivers/ide/ide-cd_mod.ko showed file size = 35133 bytes.
The "orig" /lib/modules/2.6.30/drivers/ide/ide-cd_mod.ko showed file size = 35125 bytes.

I also inserted a commercial video movie disc & a cd-rw disc in problem drive /dev/hdd.
Everything seems too work fine. Thank you.
Comment 45 Borislav Petkov 2009-06-23 05:37:01 UTC
Sounds cool, thanks for reporting and testing for us. Closing.

Boris.
Comment 46 Kai Wasserbäch 2009-06-25 10:02:47 UTC
*** Bug 13619 has been marked as a duplicate of this bug. ***
Comment 47 Bernardo Dal Seno 2009-07-03 12:09:49 UTC
*** Bug 13696 has been marked as a duplicate of this bug. ***
Comment 48 David Woodhouse 2009-09-10 12:03:04 UTC
I'm seeing a crash which looks very similar to this, in 2.6.31.
Comment 49 David Woodhouse 2009-09-10 12:23:24 UTC
http://david.woodhou.se/cimg2777.jpg

(gdb) list *cdrom_newpc_intr + 0x72b
0xe56 is in cdrom_newpc_intr (drivers/ide/ide-cd.c:692).
687					rq->resid_len += cmd->last_xfer_len;
688			}
689	
690			ide_complete_rq(drive, uptodate ? 0 : -EIO, blk_rq_bytes(rq));
691	
692			if (sense && rc == 2)
693				ide_error(drive, "request sense failure", stat);
694		}
695		return ide_stopped;
696	}

(gdb) list *ide_complete_rq+0x1f
0xffffffff812f0cd6 is in ide_complete_rq (drivers/ide/ide-io.c:125).
120	
121		/*
122		 * if failfast is set on a request, override number of sectors
123		 * and complete the whole request right now
124		 */
125		if (blk_noretry_request(rq) && error <= 0)
126			nr_bytes = blk_rq_sectors(rq) << 9;
127	
128		rc = ide_end_rq(drive, rq, error, nr_bytes);
129		if (rc == 0)
(gdb) list *ide_complete_rq+0x37
0xffffffff812f0cee is in ide_complete_rq (drivers/ide/ide-io.c:129).
124		 */
125		if (blk_noretry_request(rq) && error <= 0)
126			nr_bytes = blk_rq_sectors(rq) << 9;
127	
128		rc = ide_end_rq(drive, rq, error, nr_bytes);
129		if (rc == 0)
130			hwif->rq = NULL;
131	
132		return rc;
133	}
Comment 50 David Woodhouse 2009-09-10 13:19:47 UTC
More useful backtrace at http://david.woodhou.se/cimg2780.jpg

It looks like hwif->rq is NULL when ide_complete_rq() runs.  And then it oopses when looking at rq->cmd_flags.

If I put in a quick hack to avoid calling id_complete_rq() if !drive->hwif->rq, then it no longer oopses. I get this:

[ 1091.428321] would oops. rq->bio (null)
[ 1091.428323] hdb: request sense failure: status=0x51 { DriveReady SeekComplete Error }
[ 1091.428329] hdb: request sense failure: error=0x20 { LastFailedSense=0x02 }
[ 1091.428332] hdb: possibly failed opcode: 0xa0
Comment 51 Borislav Petkov 2009-09-11 05:45:41 UTC
Created attachment 23062 [details]
ide-cd full debug
Comment 52 Borislav Petkov 2009-09-11 05:49:29 UTC
Hi David,

dang, seems we haven't fixed it properly, 39c58f37a10198054c656c28202fb1e6d22fd505 was supposed to do that.

Can you please apply http://bugzilla.kernel.org/attachment.cgi?id=23062 and get me the whole dmesg.

Thanks,
Boris.
Comment 53 David Woodhouse 2009-09-11 15:04:21 UTC
Created attachment 23073 [details]
dmesg

That patch still lets drive->debug_mask get overridden from the module parameter, shortly after you manually set it to 0xffffffff. With that fixed, here's the output.
Comment 54 Borislav Petkov 2009-09-17 06:39:17 UTC
Created attachment 23110 [details]
testfix1
Comment 55 Borislav Petkov 2009-09-17 06:46:57 UTC
(In reply to comment #53)

Hi David,

sorry for the delay but I was very busy at work.

> Created an attachment (id=23073) [details]
> dmesg
> 
> That patch still lets drive->debug_mask get overridden from the module
> parameter, shortly after you manually set it to 0xffffffff. With that fixed,
> here's the output.

Yeah, this _was_ work in progress and is not the final ide stage we
aimed at but since we're not allowed to hack on it anymore, we'll have
to live with that, sorry.

Ok, if I'm not mistaken, it seems that your drive gets a bunch
of ioctls fired at it and it cannot exec even one command
successfully and slaps a sense request after each one. At oops
time, the sense request itself fails and it gets completed twice
therefore the hwif->rq being NULL. Can you try the testfix1
(http://bugzilla.kernel.org/attachment.cgi?id=23110) above to see
whether this fixes it?

Thanks.
Comment 56 David Woodhouse 2009-09-17 06:57:37 UTC
Thanks. I'm just about to leave for the Plumbers conference; it'll be a few days before I manage to test the patch.
Comment 57 David Woodhouse 2009-09-26 06:09:57 UTC
Created attachment 23183 [details]
dmesg

It's still not happy...
Comment 58 Borislav Petkov 2009-09-27 16:19:26 UTC
Ok, my fix addresses at least the sense handling case, however, your
drive is still unresponsive and fails almost every command. I'm assuming
the drive has been functional at one point in time; if so, can you
please bisect the kernel - it might give us more insight into the failure.

Thanks,
Boris.
Comment 59 David Woodhouse 2009-09-27 16:52:53 UTC
To be honest, I don't know that it ever worked. I've always used libata on this machine and using drivers/ide for my test build of 2.6.31 wasn't intentional. When I get home next week, I'll attempt to find a good kernel and then bisect.
Comment 60 Alan 2010-01-19 21:49:57 UTC
Ping ?
Comment 61 David Woodhouse 2010-01-19 22:52:30 UTC
When I get home next week, I'll (try to remember to) build another test kernel with drivers/ide, and see what happens. Poke me again if I forget again; sorry.
Comment 62 Florian Mickler 2010-11-30 08:54:10 UTC
*poke*
Comment 63 Florian Mickler 2011-02-06 15:45:13 UTC
I'm closing this as fixed, as the problem of the original reporter and the second problem of the next one both got fixed. 

If you come up with anything new just shout. 


merged in v2.6.31-rc1:
commit 39c58f37a10198054c656c28202fb1e6d22fd505
Author: Rainer Weikusat <rweikusat@mssgmbh.com>
Date:   Thu Jun 18 17:04:00 2009 +0200

    ide-cd: prevent null pointer deref via cdrom_newpc_intr


merged in v2.6.31-rc2:
commit 9c72ebef5aabf3532469d602a9d87beceea268b1
Author: Borislav Petkov <petkovbb@gmail.com>
Date:   Fri Jun 26 11:22:37 2009 -0700

    ide-cd: handle fragmented packet commands gracefully

Note You need to log in before you can comment on or make changes to this bug.