Created attachment 21602 [details] dump of crash at boottime After placing a cd in the dvdplayer the kernel crashes. Both at boot as at runtime.
Created attachment 21603 [details] .config
Hi, please do on the same unchanged kernel sources that produce the crash: objdump -d drivers/ide/ide-cd.o > ide-cd.dsm make drivers/ide/ide-cd.s and send me .dsm and .s files. Thanks, Boris.
Boris, Did you reseive the files I sent a couple of day ago? -- Hans
Hi Hans, no, I don't see them in my mailbox. Can you please upload them to bugzilla instead ("Add an attachment" above) for everyone to see? Thanks, Boris.
Created attachment 21764 [details] new dmesg
Created attachment 21765 [details] ide-cd.dsm
Created attachment 21766 [details] ide-cd.s
Created attachment 21779 [details] debug patch
Hi, can you apply the attached debug patch and send me the whole dmesg output again. Also, do objdump -d drivers/ide/ide-cd.o > ide-cd.dsm make drivers/ide/ide-cd.s objdump -d drivers/ide/ide-io.o > ide-io.dsm make drivers/ide/ide-io.s and send me the .dsm and .s files. Thanks, Boris.
Created attachment 21811 [details] debug1 dmesg
Created attachment 21812 [details] debug1 ide-cd.dsm
Created attachment 21813 [details] debug1 ide-cd.s
Created attachment 21814 [details] debug1 ide-io.dsm
Created attachment 21815 [details] debug1 ide-io.s
Hi, thanks for testing. Looks like your debug1 dmesg is incomplete. Please try catching the the complete dmesg. For that, try enlarging the kernel printk buffer with the command line param 'log_buf_len=10M' in case the dmesg looks truncated. Although, it doesn't seem so - it's more a 'dmesg -s<bufsize>' thing. Experiment with that in order to dump the whole buffer contents if the printk buffer is big enough but the dmesg tool itself is using a smaller one. Thanks, Boris.
Maybe dmesg is not a nice description of the files. I had to capture the dump over the com port. wil log_buff_len help in this case? Hans
Sorry for misleading you - I saw the 'console=uart..' part in the command line only after hitting send. In that case, dmesg is not the thing we need at all. Just add 'log_buf_len=10M' to your kernel command line, boot with it and make sure the output caught over the serial console is complete or at least it has caught the complete OOPS. You could also add 'console=tty0' to the kernel command line so that you can monitor the oopsing machine output and compare it with what has gone over through the com port. Thanks, Boris.
Created attachment 21831 [details] debug1 dmesg
On Tuesday 09 June 2009 19:12:18 bugzilla-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=13399 > > > > > > --- Comment #18 from Hans de Bruin <bruinjm@xs4all.nl> 2009-06-09 17:12:18 > --- > Created an attachment (id=21831) > --> (http://bugzilla.kernel.org/attachment.cgi?id=21831) > debug1 dmesg Hans, thanks for the debug data! ide-cd: ide_cd_queue_pc: cmd[0]: 0x51, write: 0x0, timeout: 1750, cmd_flags: 0x8000 ide-cd: ide_cd_do_request: cmd: 0x51, block: 18446744073709551615 ide_cd_do_request: dev hda: type=a, flags=108a640 sector 18446744073709551615, nr/cnr 0/0 bio (null), biotail (null), buffer (null), data ffff88011b849ba0, len 2 ide-cd: cdrom_do_block_pc: rq->cmd[0]: 0x51, rq->cmd_type: 0xa ide-cd: cdrom_newpc_intr: cmd: 0x51, write: 0x0 ide-cd: cdrom_newpc_intr: DRQ: stat: 0x58, thislen: 2 ide-cd: ide_cd_check_ireason: ireason: 0x2, rw: 0x0 ide-cd: cdrom_newpc_intr: data transfer, rq->cmd_type: 0xa, ireason: 0x2 ide-cd: cdrom_newpc_intr: cmd: 0x51, write: 0x0 ide-cd: cdrom_newpc_intr: DRQ: stat: 0x50, thislen: 0 ide-cd: ide_cd_request_sense_fixup: rq->cmd[0]: 0x51 #3 ide-cd: ide_cd_queue_pc: cmd[0]: 0x51, write: 0x0, timeout: 1750, cmd_flags: 0x8000 ide-cd: ide_cd_do_request: cmd: 0x51, block: 18446744073709551615 ide_cd_do_request: dev hda: type=a, flags=108a640 sector 18446744073709551615, nr/cnr 0/0 bio (null), biotail (null), buffer (null), data ffff88011b849ba0, len 32 ide-cd: cdrom_do_block_pc: rq->cmd[0]: 0x51, rq->cmd_type: 0xa ide-cd: cdrom_newpc_intr: cmd: 0x51, write: 0x0 ide-cd: cdrom_newpc_intr: DRQ: stat: 0x58, thislen: 30 ide-cd: ide_cd_check_ireason: ireason: 0x2, rw: 0x0 ide-cd: cdrom_newpc_intr: data transfer, rq->cmd_type: 0xa, ireason: 0x2 ide-cd: cdrom_newpc_intr: cmd: 0x51, write: 0x0 ide-cd: cdrom_newpc_intr: DRQ: stat: 0x50, thislen: 2 ide-cd: ide_cd_request_sense_fixup: rq->cmd[0]: 0x51 #3 BUG: unable to handle kernel NULL pointer dereference at 0000000000000048 The above command sequence corresponds to requests generated by drivers/cdrom/cdrom.c::cdrom_get_disc_info() /* set up command and get the disc info */ init_cdrom_command(&cgc, di, sizeof(*di), CGC_DATA_READ); cgc.cmd[0] = GPCMD_READ_DISC_INFO; cgc.cmd[8] = cgc.buflen = 2; cgc.quiet = 1; if ((ret = cdo->generic_packet(cdi, &cgc))) return ret; /* not all drives have the same disc_info length, so requeue * packet with the length the drive tells us it can supply */ buflen = be16_to_cpu(di->disc_information_length) + sizeof(di->disc_information_length); if (buflen > sizeof(disc_information)) buflen = sizeof(disc_information); cgc.cmd[8] = cgc.buflen = buflen; if ((ret = cdo->generic_packet(cdi, &cgc))) return ret; and we are hitting BUG because of the limitation of block layer, only rq->bio requests can be partially completed: int __blk_end_request(struct request *rq, int error, unsigned int nr_bytes) { if (rq->bio && __end_that_request_first(rq, error, nr_bytes)) return 1; add_disk_randomness(rq->rq_disk); end_that_request_last(rq, error); return 0; } so on error we complete the request fully instead of partially and later we try to complete the already freed request. Borislav, there are actually two issues here to investigate: * seems like we need to check for rq->bio before calling ide_cd_error_cmd() to make the regression go away (IOW we shouldn't don't do partial "good" completions for !rq->bio requests) * potentially revise our strategy of handling GPCMD_READ_DISC_INFO packet command returning less data than requested (in this particular case we ask for 32 bytes, device gives us only 30 bytes but returns good status)
Hi Bart, thanks for analyzing this. I'm staring at the ATA_DRQ == 0 part in cdrom_newpc_intr: } else if (!blk_pc_request(rq)) { ide_cd_request_sense_fixup(drive, cmd); /* complain if we still have data left to transfer */ uptodate = cmd->nleft ? 0 : 1; if (uptodate == 0) rq->cmd_flags |= REQ_FAILED; } goto out_end; } so, in our case ide_cd_error_cmd() kills the rq prematurely and that's why ide_complete_rq() oopses later. And this is caused by uptodate == 0. Now, here's how the ATA spec (d1532v1r4b-ATA-ATAPI-7) describes the semantics of clearing of the DRQ bit by the drive: " 5.14.5.5 DRQ (Data request) ... The DRQ bit shall be cleared to zero by the device: 1) when the last word of the data transfer occurs; 2) when the last word of the command packet transfer occurs for a PACKET command. " now there's a subtlety here wrt to what am I to do as an IRQ handler when my drive clears the DRQ bit: do I _drain_ the last bytes remaining (in our case 2) or do I fail the rq straightaway. I'm pretty sure cmd->nleft is 2 in our case so I think that it might be only right to drain the device first, i.e. do uptodate = (cmd->nleft - thislen) ? 0 : 1; and then later ide_pio_bytes() before completing the rq properly. Hmm? And yes, this is against spec since the following sentence states that the data can be drained only "... via DMA mode if DMARQ and DMACK- are asserted and BSY is set to one." but we should give it a try... Ideas? Opinions?
On Saturday 13 June 2009 18:29:07 bugzilla-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=13399 > > > > > > --- Comment #20 from Borislav Petkov <bbpetkov@yahoo.de> 2009-06-13 16:29:05 > --- > Hi Bart, > > thanks for analyzing this. > > I'm staring at the ATA_DRQ == 0 part in cdrom_newpc_intr: > > } else if (!blk_pc_request(rq)) { > ide_cd_request_sense_fixup(drive, cmd); > /* complain if we still have data left to transfer */ > uptodate = cmd->nleft ? 0 : 1; > if (uptodate == 0) > rq->cmd_flags |= REQ_FAILED; > } > goto out_end; > } > > so, in our case ide_cd_error_cmd() kills the rq prematurely and that's > why ide_complete_rq() oopses later. And this is caused by uptodate == Nope, it is block layer that kills it prematurely. There are two issues here: * OOPS (*the*regression* which should be taken care of first) (cause: unexpected interaction between ide/block) * handling of non-fully completed requests with "good" status (cause: stupid hardware) > 0. Now, here's how the ATA spec (d1532v1r4b-ATA-ATAPI-7) describes the > semantics of clearing of the DRQ bit by the drive: > > " > 5.14.5.5 DRQ (Data request) > > ... > > The DRQ bit shall be cleared to zero by the device: > 1) when the last word of the data transfer occurs; > 2) when the last word of the command packet transfer occurs for a > PACKET command. > " > > now there's a subtlety here wrt to what am I to do as an IRQ handler > when my drive clears the DRQ bit: do I _drain_ the last bytes remaining > (in our case 2) or do I fail the rq straightaway. I'm pretty sure > cmd->nleft is 2 in our case so I think that it might be only right to > drain the device first, i.e. do > > uptodate = (cmd->nleft - thislen) ? 0 : 1; > > and then later ide_pio_bytes() before completing the rq properly. Hmm? We should drain only if the device really wants to transfer more data... cmd->nleft is our concept, thislen reflects the hardware state. > And yes, this is against spec since the following sentence states that > the data can be drained only "... via DMA mode if DMARQ and DMACK- are > asserted and BSY is set to one." but we should give it a try... > > Ideas? Opinions? Yes. Please stop reading this stupid ATAPI spec, it is near to worthless when it comes to the real hardware implementations/bugs..
Reply-To: petkovbb@googlemail.com Hi, On Sat, Jun 13, 2009 at 06:59:53PM +0200, Bartlomiej Zolnierkiewicz wrote: > > --- Comment #20 from Borislav Petkov <bbpetkov@yahoo.de> 2009-06-13 > 16:29:05 --- > > Hi Bart, > > > > thanks for analyzing this. > > > > I'm staring at the ATA_DRQ == 0 part in cdrom_newpc_intr: > > > > } else if (!blk_pc_request(rq)) { > > ide_cd_request_sense_fixup(drive, cmd); > > /* complain if we still have data left to transfer > */ > > uptodate = cmd->nleft ? 0 : 1; > > if (uptodate == 0) > > rq->cmd_flags |= REQ_FAILED; > > } > > goto out_end; > > } > > > > so, in our case ide_cd_error_cmd() kills the rq prematurely and that's > > why ide_complete_rq() oopses later. And this is caused by uptodate == > > Nope, it is block layer that kills it prematurely. ok, I still need to understand the whole code flow properly so please bare with me. I got misled by the __blk_end_request() thing: am I right to assume that you were using it to give a better example where it is more clear that the block layer really kills rq->bio-less requests? Because we don't hit __blk_end_request from ide_cd_error_cmd() (or ide_complete_rq() too, for that matter) - we do rather: ide_cd_error_cmd() does ide_complete_rq(), which ends up doing blk_end_request(), then blk_end_io() and the rq->bio thing is checked in end_that_request_data(). Which is actually the same thing but done slightly differently. > There are two issues here: > > * OOPS (*the*regression* which should be taken care of first (cause: > unexpected interaction between ide/block) I'm thinking something like if (uptodate == 0 && !(OK_STAT(stat, 0, BAD_STAT))) ide_cd_error_cmd(drive, cmd); > * handling of non-fully completed requests with "good" status (cause: > stupid hardware) The non-intrusive solution, IMHO, would be to add another quirk flag for such a device (SONY DVD-ROM DDU1615) and do not complete the rq in that case, aka no partial completion for packet commands to that device. I'm wondering what else is broken with it especially if you're requiring bigger buffers like the capacities page. So, how about an ide_cd_complete_rq() helper which hides such special cases and is called as an indirection at the end of the irq handler?
On Sunday 14 June 2009 12:06:06 Borislav Petkov wrote: > Hi, > > On Sat, Jun 13, 2009 at 06:59:53PM +0200, Bartlomiej Zolnierkiewicz wrote: > > > --- Comment #20 from Borislav Petkov <bbpetkov@yahoo.de> 2009-06-13 > 16:29:05 --- > > > Hi Bart, > > > > > > thanks for analyzing this. > > > > > > I'm staring at the ATA_DRQ == 0 part in cdrom_newpc_intr: > > > > > > } else if (!blk_pc_request(rq)) { > > > ide_cd_request_sense_fixup(drive, cmd); > > > /* complain if we still have data left to > transfer */ > > > uptodate = cmd->nleft ? 0 : 1; > > > if (uptodate == 0) > > > rq->cmd_flags |= REQ_FAILED; > > > } > > > goto out_end; > > > } > > > > > > so, in our case ide_cd_error_cmd() kills the rq prematurely and that's > > > why ide_complete_rq() oopses later. And this is caused by uptodate == > > > > Nope, it is block layer that kills it prematurely. > > ok, I still need to understand the whole code flow properly so please > bare with me. > > I got misled by the __blk_end_request() thing: am I right to assume that > you were using it to give a better example where it is more clear that > the block layer really kills rq->bio-less requests? > > Because we don't hit __blk_end_request from ide_cd_error_cmd() (or > ide_complete_rq() too, for that matter) - we do rather: > > ide_cd_error_cmd() does ide_complete_rq(), which ends up doing > blk_end_request(), then blk_end_io() and the rq->bio thing is checked > in end_that_request_data(). Which is actually the same thing but done > slightly differently. Yes, you're seeing block layer updates from 2 days ago. > > There are two issues here: > > > > * OOPS (*the*regression* which should be taken care of first (cause: > > unexpected interaction between ide/block) > > I'm thinking something like > > if (uptodate == 0 && !(OK_STAT(stat, 0, BAD_STAT))) > ide_cd_error_cmd(drive, cmd); Those are two separate issues, please stop mixing them. AFAICS ide-cd has been always failing !uptodate requests so the latter issue is nothing new. Which means that it is really not the right time to be scratching our heads about it while the former problem has been seriously affecting people for weeks now and also managed to slip into the final 2.6.30 release.. > > * handling of non-fully completed requests with "good" status (cause: > > stupid hardware) > > The non-intrusive solution, IMHO, would be to add another quirk flag for > such a device (SONY DVD-ROM DDU1615) and do not complete the rq in that > case, aka no partial completion for packet commands to that device. I'm > wondering what else is broken with it especially if you're requiring > bigger buffers like the capacities page. > > So, how about an ide_cd_complete_rq() helper which hides such special > cases and is called as an indirection at the end of the irq handler? I can't tell much without seeing the concept with more details. IOW please send patches, that is the best way to discuss things.
Reply-To: petkovbb@googlemail.com Hi, On Sun, Jun 14, 2009 at 02:32:10PM +0200, Bartlomiej Zolnierkiewicz wrote: > On Sunday 14 June 2009 12:06:06 Borislav Petkov wrote: > > Hi, > > > > On Sat, Jun 13, 2009 at 06:59:53PM +0200, Bartlomiej Zolnierkiewicz wrote: > > > > --- Comment #20 from Borislav Petkov <bbpetkov@yahoo.de> 2009-06-13 > 16:29:05 --- > > > > Hi Bart, > > > > > > > > thanks for analyzing this. > > > > > > > > I'm staring at the ATA_DRQ == 0 part in cdrom_newpc_intr: > > > > > > > > } else if (!blk_pc_request(rq)) { > > > > ide_cd_request_sense_fixup(drive, cmd); > > > > /* complain if we still have data left to > transfer */ > > > > uptodate = cmd->nleft ? 0 : 1; > > > > if (uptodate == 0) > > > > rq->cmd_flags |= REQ_FAILED; > > > > } > > > > goto out_end; > > > > } > > > > > > > > so, in our case ide_cd_error_cmd() kills the rq prematurely and that's > > > > why ide_complete_rq() oopses later. And this is caused by uptodate == > > > > > > Nope, it is block layer that kills it prematurely. > > > > ok, I still need to understand the whole code flow properly so please > > bare with me. > > > > I got misled by the __blk_end_request() thing: am I right to assume that > > you were using it to give a better example where it is more clear that > > the block layer really kills rq->bio-less requests? > > > > Because we don't hit __blk_end_request from ide_cd_error_cmd() (or > > ide_complete_rq() too, for that matter) - we do rather: > > > > ide_cd_error_cmd() does ide_complete_rq(), which ends up doing > > blk_end_request(), then blk_end_io() and the rq->bio thing is checked > > in end_that_request_data(). Which is actually the same thing but done > > slightly differently. > > Yes, you're seeing block layer updates from 2 days ago. Nope, I'm staring at .30-rc7 sources. I guess those updates are a bit older. > > > There are two issues here: > > > > > > * OOPS (*the*regression* which should be taken care of first (cause: > > > unexpected interaction between ide/block) > > > > I'm thinking something like > > > > if (uptodate == 0 && !(OK_STAT(stat, 0, BAD_STAT))) > > ide_cd_error_cmd(drive, cmd); > > Those are two separate issues, please stop mixing them. > > AFAICS ide-cd has been always failing !uptodate requests so the latter > issue is nothing new. Which means that it is really not the right time > to be scratching our heads about it while the former problem has been > seriously affecting people for weeks now and also managed to slip into > the final 2.6.30 release.. Ok, let's do that: @Hans: can you apply the following ontop of the debugging patch and test and send us the whole dmesg, thanks. -- diff --git a/drivers/ide/ide-cd.c b/drivers/ide/ide-cd.c index 6f728da..219c303 100644 --- a/drivers/ide/ide-cd.c +++ b/drivers/ide/ide-cd.c @@ -764,7 +764,7 @@ out_end: rq->errors = -EIO; } - if (uptodate == 0) + if (uptodate == 0 && rq->bio) ide_cd_error_cmd(drive, cmd); /* make sure it's fully ended */ -- > > > * handling of non-fully completed requests with "good" status (cause: > > > stupid hardware) > > > > The non-intrusive solution, IMHO, would be to add another quirk flag for > > such a device (SONY DVD-ROM DDU1615) and do not complete the rq in that > > case, aka no partial completion for packet commands to that device. I'm > > wondering what else is broken with it especially if you're requiring > > bigger buffers like the capacities page. > > > > So, how about an ide_cd_complete_rq() helper which hides such special > > cases and is called as an indirection at the end of the irq handler? > > I can't tell much without seeing the concept with more details. > > IOW please send patches, that is the best way to discuss things. Ok.
Reply-To: petkovbb@googlemail.com Hi, here's a yet another analysis: issue GPCMD_READ_DISC_INFO ide-cd: ide_cd_queue_pc: cmd[0]: 0x51, write: 0x0, timeout: 1750, cmd_flags: 0x8000 ide-cd: ide_cd_do_request: cmd: 0x51, block: 18446744073709551615 ide_cd_do_request: dev hda: type=a, flags=108a640 sector 18446744073709551615, nr/cnr 0/0 bio (null), biotail (null), buffer (null), data ffff88011b849ba0, len 32 ide-cd: cdrom_do_block_pc: rq->cmd[0]: 0x51, rq->cmd_type: 0xa ide-cd: cdrom_newpc_intr: cmd: 0x51, write: 0x0 DRQ set, read 30 bytes ide-cd: cdrom_newpc_intr: DRQ: stat: 0x58, thislen: 30 ide-cd: ide_cd_check_ireason: ireason: 0x2, rw: 0x0 ide-cd: cdrom_newpc_intr: data transfer, rq->cmd_type: 0xa, ireason: 0x2 xfer 30 bytes and rearm IRQ handler with a timeout ide-cd: cdrom_newpc_intr: cmd: 0x51, write: 0x0 second IRQ, DRQ is cleared, read 2 bytes ide-cd: cdrom_newpc_intr: DRQ: stat: 0x50, thislen: 2 ide-cd: ide_cd_request_sense_fixup: rq->cmd[0]: 0x51 #3 here we do "goto out_end" instead of xferring the remaining 2 bytes out of the drive. So how about something like the workaround below, I haven't tested it on the real drive. Wrt to the ide_cd_error_cmd(), I guess the proper fix is to check the rq->bio thing because the rq is going to be ended anyways. @Hans: can you please test applying it ontop of the debugging patch and sending the output again, thanks. --- diff --git a/drivers/ide/ide-cd.c b/drivers/ide/ide-cd.c index 6f728da..716a576 100644 --- a/drivers/ide/ide-cd.c +++ b/drivers/ide/ide-cd.c @@ -680,10 +680,19 @@ static ide_startstop_t cdrom_newpc_intr(ide_drive_t *drive) } } else if (!blk_pc_request(rq)) { ide_cd_request_sense_fixup(drive, cmd); + /* complain if we still have data left to transfer */ uptodate = cmd->nleft ? 0 : 1; - if (uptodate == 0) - rq->cmd_flags |= REQ_FAILED; + if (uptodate == 0) { + if (drive->atapi_flags & IDE_AFLAG_FRAG_CMD) { + ide_pio_bytes(drive, cmd, write, + cmd->nleft); + uptodate = (cmd->nleft - thislen) ? 0 + : 1; + } + if (!uptodate) + rq->cmd_flags |= REQ_FAILED; + } } pr_err("#3\n"); goto out_end; @@ -764,7 +773,7 @@ out_end: rq->errors = -EIO; } - if (uptodate == 0) + if (uptodate == 0 && rq->bio) ide_cd_error_cmd(drive, cmd); /* make sure it's fully ended */ @@ -1554,6 +1563,11 @@ static const struct cd_list_entry ide_cd_quirks_list[] = { { "Optiarc DVD RW AD-7200A", NULL, IDE_AFLAG_PLAY_AUDIO_OK }, { "Optiarc DVD RW AD-7543A", NULL, IDE_AFLAG_NO_AUTOCLOSE }, { "TEAC CD-ROM CD-224E", NULL, IDE_AFLAG_NO_AUTOCLOSE }, + /* + * some drives clear the DRQ bit on the last word of a packet command + * and therefore need some special handling in the IRQ handler + */ + { "SONY DVD-ROM DDU1615", NULL, IDE_AFLAG_FRAG_CMD }, { NULL, NULL, 0 } }; diff --git a/include/linux/ide.h b/include/linux/ide.h index 9fed365..e18b455 100644 --- a/include/linux/ide.h +++ b/include/linux/ide.h @@ -456,6 +456,9 @@ enum { IDE_AFLAG_MEDIUM_PRESENT = (1 << 23), IDE_AFLAG_NO_AUTOCLOSE = (1 << 24), + + /* ide-cd quirk */ + IDE_AFLAG_FRAG_CMD = (1 << 25), }; /* device flags */
Created attachment 21925 [details] debug2 dmesg (comment 25)
cutting and pasting from bugzilla somehow didn't provide a clean diff file, so I had to cut and past directly into the code. Have fun with the extensive dmesg :) Hans
Hi, ok, seems like you can bootup normally. I see even some read commands which means that the drive might be working ok but you'll have to do some more testing. We need to see whether there are still any glitches. Just remove the debugging patch and apply the fix. By the way, I've uploaded it to bugzilla so that you don't have to copy-paste. Just do some audio playing/dvd video/data cdrom accesses/whatever else use case comes to your mind and collect the whole bootlog again and send it on, please. Thanks, Boris.
Created attachment 21930 [details] tentative fix
hda: media error (bad sector): status=0x51 { DriveReady SeekComplete Error } hda: media error (bad sector): error=0x30 { LastFailedSense=0x03 } hda: possibly failed opcode: 0xa0 ATAPI device hda: Error: Medium error -- (Sense key=0x03) (reserved error code) -- (asc=0x02, ascq=0x00) The failed "Read 10" packet command was: "28 00 00 00 76 78 00 00 40 00 00 00 00 00 00 00 " end_request: I/O error, dev hda, sector 121312 Buffer I/O error on device hda, logical block 30328 Buffer I/O error on device hda, logical block 30329 Buffer I/O error on device hda, logical block 30330 Buffer I/O error on device hda, logical block 30331 Buffer I/O error on device hda, logical block 30332 Buffer I/O error on device hda, logical block 30333 Buffer I/O error on device hda, logical block 30334 Buffer I/O error on device hda, logical block 30335 Buffer I/O error on device hda, logical block 30336 Buffer I/O error on device hda, logical block 30337 Just kidding a CD and a whiteboard marker apparently don't mix :). I can boot whith a CD, create iso's from CD's and DVD's and copy at file level. Since this is my at work pc I dont have musical CD's at hand. Where did this problem start? I could still use cd's in 2.6.29 but in a slackware newsgroup I found this threat: http://groups.google.com/group/alt.os.linux.slackware/browse_thread/thread/1add287bea50b0bc# slack. Slack current uses 2.6.29.4 -- Hans
Hi, (In reply to comment #30) > hda: media error (bad sector): status=0x51 { DriveReady SeekComplete Error } > hda: media error (bad sector): error=0x30 { LastFailedSense=0x03 } > hda: possibly failed opcode: 0xa0 > ATAPI device hda: > Error: Medium error -- (Sense key=0x03) > (reserved error code) -- (asc=0x02, ascq=0x00) > The failed "Read 10" packet command was: > "28 00 00 00 76 78 00 00 40 00 00 00 00 00 00 00 " > end_request: I/O error, dev hda, sector 121312 > Buffer I/O error on device hda, logical block 30328 > Buffer I/O error on device hda, logical block 30329 > Buffer I/O error on device hda, logical block 30330 > Buffer I/O error on device hda, logical block 30331 > Buffer I/O error on device hda, logical block 30332 > Buffer I/O error on device hda, logical block 30333 > Buffer I/O error on device hda, logical block 30334 > Buffer I/O error on device hda, logical block 30335 > Buffer I/O error on device hda, logical block 30336 > Buffer I/O error on device hda, logical block 30337 > > Just kidding a CD and a whiteboard marker apparently don't mix :). You funny guy :) > I can boot whith a CD, create iso's from CD's and DVD's and copy at > file level. Since this is my at work pc I dont have musical CD's at > hand. Thanks for testing. > Where did this problem start? I could still use cd's in > 2.6.29 but in a slackware newsgroup I found this threat: > http://groups.google.com/group/alt.os.linux.slackware/browse_thread/th > read/1add287bea50b0bc# slack. Slack current uses 2.6.29.4 The fix might need backporting. If you like, you could test latest stable kernels (2.6.29.5, maybe even 2.6.28.10) and send me the bootlogs of those so that we could verify that the fix is needed. That'll be of great help :). Thanks, Boris.
I am also experiencing system lockups whenever I insert a formated dvd or cd into my "hdd" ro drive. A blank dics doesn't cause the system to freeze. This happens when running X windows & running just init 3 on Mandriva-2009.1 with the EXT4 filesystem & kernel-2.6.30. I have another dvd combo rw drive labeled "hdc" in this same "ASUS" box & it works ok. Here is the only error output I was able to see when accessing drive hdd: "kernel panic not syncing Fatal inception interrupt" "pid 1289 comm: modprobe tainted: G D 2.6.30 #1" Also, I still have kernel-2.6.29 installed in this box & it handles both the "hdc" & "hdd" cd drives without problem. I hope this helps.
(In reply to comment #32) > I am also experiencing system lockups whenever I insert a formated dvd > or cd into my "hdd" ro drive. A blank dics doesn't cause the system to > freeze. This happens when running X windows & running just init 3 on > Mandriva-2009.1 with the EXT4 filesystem & kernel-2.6.30. Is that a vanilla kernel or a distro-patched one? Please supply the whole boot log of the machine. Also, can you catch any oopses over serial console or netconsole, that'll help us a lot. > I have another dvd combo rw drive labeled "hdc" in this same "ASUS" box & it > works ok. Here is the only error output I was able to see when accessing > drive > hdd: > > "kernel panic not syncing Fatal inception interrupt" > "pid 1289 comm: modprobe tainted: G D 2.6.30 #1" If you've copied the line correctly, the D shows that this is part of a follow-up OOPS. However, the first one is actually the interesting one. Alternatively to the serial or netconsole methods above, you could try to make a photo of your screen with any camera and send me that. Please try to catch the whole OOPS, though. Thanks, Boris.
There are no problems with 2.6.29.5, for me that is.
Created attachment 22003 [details] Screen photo of dvd-rom crash
Created attachment 22004 [details] My Mandrivia-2009.1 /var/log/dmesg (boot.log)
I compiled kernel-2.6.30 from the tarball using the .config file from my kernel-2.6.29 & the "Make olddconfig" option. Kernel-2.6.29 was also compiled from source. I don't like Mandriva kernels because they require you to use an initrd. The only new configuration option I added to kernel-2.6.30 was to keep the default journal_data=ordered. I did this because I also run mandriva-2008.0 EXT3 on this "ASUS" box & It would only boot to ro failsafe mode unless I did this. I discovered that w/o journal_data=ordered, the rootfs is 1st mounted ro with journal_data=writeback & then couldn't get "re-mounted" rw with journal_data=ordered because changing the journal at re-mount isn't permitted by the kernel. Another oddity I ran into was that kernel-2.6.30 built under Mandriva-2009.1 EXT4 was not usable under an EXT3 filesystem. I had to compile kernel-2.6.30 for EXT3 Mandriva-2008.0 seperately or no modules would load. I am attaching a photo (crash.jpeg) of my init 3 screen after I inserted a dvd-rw disc formated iso9660 & it crashed. Please note, I didn't even attempt to mount this disc. The DVD_ROM drive "HDD" just crashes the system when a formated disc or movie disc is detected. I also am attaching a copy of my dmesg file because /var/log/boot.log is an empty file on my system. I hope that will help. ______________________________________________________________________________________________ Here is my partition setup: Disk /dev/hda: 80.0 GB, 80026361856 bytes 16 heads, 63 sectors/track, 155061 cylinders Units = cylinders of 1008 * 512 = 516096 bytes Disk identifier: 0x06a2e5a5 Device Boot Start End Blocks Id System /dev/hda1 1 58920 29695648+ 83 Linux /dev/hda2 58921 118855 30207240 5 Extended /dev/hda3 118856 155061 18247824 83 Linux /dev/hda5 58921 59935 511528+ 82 Linux swap / Solaris /dev/hda6 59936 118855 29695648+ 83 Linux Disk /dev/sda: 250.0 GB, 250059350016 bytes 255 heads, 63 sectors/track, 30401 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x6fb3f3d0 Device Boot Start End Blocks Id System /dev/sda1 * 1 7 56196 83 Linux /dev/sda2 8 7841 62926605 83 Linux /dev/sda3 7842 15936 65023087+ 5 Extended /dev/sda4 15937 30401 116190112+ 83 Linux /dev/sda5 7842 8103 2104483+ 82 Linux swap / Solaris /dev/sda6 8104 15936 62918541 83 Linux [root@localhost stan]# df -T Filesystem Type Size Used Avail Use% Mounted on /dev/sda2 ext4 60G 4.2G 52G 8% / /dev/sda1 ext3 54M 13M 39M 24% /boot /dev/sda6 ext4 60G 3.1G 53G 6% /usr /dev/hdc iso9660 2.4G 2.4G 0 100% /mnt/misc
(In reply to comment #34) > There are no problems with 2.6.29.5, for me that is. Thanks for testing.
Hi Stan, can you try the attached patch? Thanks, Boris.
Created attachment 22042 [details] ide-cd: prevent null pointer deref via cdrom_newpc_intr
Created attachment 22046 [details] image of my kernel-2.6.30 ide-cd.c file
Please explain how to apply patch. I downloaded 0001-ide-cd-prevent-null-pointer-deref-via-cdrom_newpc_in.patch & saved existing linux-2.6.30 to linux-2.6.30-orig but now what? Also, I noted that your patch-ide-cd/diff-attachment.cgi.html shows lines 667-673 are to be changed in the ide-cd.c file, ..but my kernel-2.6.30 ide-cd.c file has that data on lines 758-764. Will that make the patch fail? I have attached a snapshot1.png file showing my kernel-2.6.30 ide-cd.c file. Sorry, but I know very little about applying patches. I do have the /usr/bin/patch program though if that is what's needed.
(In reply to comment #42) > Please explain how to apply patch. I downloaded > 0001-ide-cd-prevent-null-pointer-deref-via-cdrom_newpc_in.patch & saved > existing linux-2.6.30 to linux-2.6.30-orig but now what? > > Also, I noted that your patch-ide-cd/diff-attachment.cgi.html shows lines > 667-673 are to be changed in the ide-cd.c file, ..but my kernel-2.6.30 > ide-cd.c > file has that data on lines 758-764. Will that make the patch fail? > > I have attached a snapshot1.png file showing my kernel-2.6.30 ide-cd.c file. > > Sorry, but I know very little about applying patches. I do have the > /usr/bin/patch program though if that is what's needed. Ah, let's do the thing by foot then since it is easier: Open drivers/ide/ide-cd.c and locate the following lines (as you've done already in the screenshot): 756 } else { 757 if (uptodate <= 0 && rq->errors == 0) 758 rq->errors = -EIO; 759 } 760 761 if (uptodate == 0) 762 ide_cd_error_cmd(drive, cmd); 763 764 /* make sure it's fully ended */ and change the 761th line from 761 if (uptodate == 0) to 761 if (uptodate == 0 && rq->bio) and then recompile and install your kernel as you normally do. Then boot test it to see whether it works. That's it :). Thanks, Boris.
I am pleased to report that your kernel-2.6.30 patch (0001-ide-cd-prevent-null-pointer-deref-via-cdrom_newpc_in.patch) fixed the crash problem when accessing my /dev/hdd dvd-rom drive. I was able to mount a dvd-rw disc & here is the proof: [root@localhost stan]# mount -t iso9660 /dev/hdd /mnt/misc mount: block device /dev/hdd is write-protected, mounting read-only [root@localhost stan]# df -T Filesystem Type Size Used Avail Use% Mounted on /dev/sda2 ext4 60G 4.3G 52G 8% / /dev/sda1 ext3 54M 16M 36M 31% /boot /dev/sda6 ext4 60G 3.1G 53G 6% /usr /dev/hdd iso9660 400K 400K 0 100% /mnt/misc [root@localhost stan]# mount -t iso9660 /dev/hdc /mnt/cddrive [root@localhost stan]# df -T Filesystem Type Size Used Avail Use% Mounted on /dev/sda2 ext4 60G 4.3G 52G 8% / /dev/sda1 ext3 54M 16M 36M 31% /boot /dev/sda6 ext4 60G 3.1G 53G 6% /usr /dev/hdd iso9660 400K 400K 0 100% /mnt/misc /dev/hdc iso9660 350K 350K 0 100% /mnt/cddrive I edited the kernel-2.6.30 "ide-cd.c" file as instructed to apply the patch & then ran the following commands to re-compile my kernel: make mrproper cp .config-save .config make oldconfig make make modules_install My "new" /lib/modules/2.6.30/drivers/ide/ide-cd_mod.ko showed file size = 35133 bytes. The "orig" /lib/modules/2.6.30/drivers/ide/ide-cd_mod.ko showed file size = 35125 bytes. I also inserted a commercial video movie disc & a cd-rw disc in problem drive /dev/hdd. Everything seems too work fine. Thank you.
Sounds cool, thanks for reporting and testing for us. Closing. Boris.
*** Bug 13619 has been marked as a duplicate of this bug. ***
*** Bug 13696 has been marked as a duplicate of this bug. ***
I'm seeing a crash which looks very similar to this, in 2.6.31.
http://david.woodhou.se/cimg2777.jpg (gdb) list *cdrom_newpc_intr + 0x72b 0xe56 is in cdrom_newpc_intr (drivers/ide/ide-cd.c:692). 687 rq->resid_len += cmd->last_xfer_len; 688 } 689 690 ide_complete_rq(drive, uptodate ? 0 : -EIO, blk_rq_bytes(rq)); 691 692 if (sense && rc == 2) 693 ide_error(drive, "request sense failure", stat); 694 } 695 return ide_stopped; 696 } (gdb) list *ide_complete_rq+0x1f 0xffffffff812f0cd6 is in ide_complete_rq (drivers/ide/ide-io.c:125). 120 121 /* 122 * if failfast is set on a request, override number of sectors 123 * and complete the whole request right now 124 */ 125 if (blk_noretry_request(rq) && error <= 0) 126 nr_bytes = blk_rq_sectors(rq) << 9; 127 128 rc = ide_end_rq(drive, rq, error, nr_bytes); 129 if (rc == 0) (gdb) list *ide_complete_rq+0x37 0xffffffff812f0cee is in ide_complete_rq (drivers/ide/ide-io.c:129). 124 */ 125 if (blk_noretry_request(rq) && error <= 0) 126 nr_bytes = blk_rq_sectors(rq) << 9; 127 128 rc = ide_end_rq(drive, rq, error, nr_bytes); 129 if (rc == 0) 130 hwif->rq = NULL; 131 132 return rc; 133 }
More useful backtrace at http://david.woodhou.se/cimg2780.jpg It looks like hwif->rq is NULL when ide_complete_rq() runs. And then it oopses when looking at rq->cmd_flags. If I put in a quick hack to avoid calling id_complete_rq() if !drive->hwif->rq, then it no longer oopses. I get this: [ 1091.428321] would oops. rq->bio (null) [ 1091.428323] hdb: request sense failure: status=0x51 { DriveReady SeekComplete Error } [ 1091.428329] hdb: request sense failure: error=0x20 { LastFailedSense=0x02 } [ 1091.428332] hdb: possibly failed opcode: 0xa0
Created attachment 23062 [details] ide-cd full debug
Hi David, dang, seems we haven't fixed it properly, 39c58f37a10198054c656c28202fb1e6d22fd505 was supposed to do that. Can you please apply http://bugzilla.kernel.org/attachment.cgi?id=23062 and get me the whole dmesg. Thanks, Boris.
Created attachment 23073 [details] dmesg That patch still lets drive->debug_mask get overridden from the module parameter, shortly after you manually set it to 0xffffffff. With that fixed, here's the output.
Created attachment 23110 [details] testfix1
(In reply to comment #53) Hi David, sorry for the delay but I was very busy at work. > Created an attachment (id=23073) [details] > dmesg > > That patch still lets drive->debug_mask get overridden from the module > parameter, shortly after you manually set it to 0xffffffff. With that fixed, > here's the output. Yeah, this _was_ work in progress and is not the final ide stage we aimed at but since we're not allowed to hack on it anymore, we'll have to live with that, sorry. Ok, if I'm not mistaken, it seems that your drive gets a bunch of ioctls fired at it and it cannot exec even one command successfully and slaps a sense request after each one. At oops time, the sense request itself fails and it gets completed twice therefore the hwif->rq being NULL. Can you try the testfix1 (http://bugzilla.kernel.org/attachment.cgi?id=23110) above to see whether this fixes it? Thanks.
Thanks. I'm just about to leave for the Plumbers conference; it'll be a few days before I manage to test the patch.
Created attachment 23183 [details] dmesg It's still not happy...
Ok, my fix addresses at least the sense handling case, however, your drive is still unresponsive and fails almost every command. I'm assuming the drive has been functional at one point in time; if so, can you please bisect the kernel - it might give us more insight into the failure. Thanks, Boris.
To be honest, I don't know that it ever worked. I've always used libata on this machine and using drivers/ide for my test build of 2.6.31 wasn't intentional. When I get home next week, I'll attempt to find a good kernel and then bisect.
Ping ?
When I get home next week, I'll (try to remember to) build another test kernel with drivers/ide, and see what happens. Poke me again if I forget again; sorry.
*poke*
I'm closing this as fixed, as the problem of the original reporter and the second problem of the next one both got fixed. If you come up with anything new just shout. merged in v2.6.31-rc1: commit 39c58f37a10198054c656c28202fb1e6d22fd505 Author: Rainer Weikusat <rweikusat@mssgmbh.com> Date: Thu Jun 18 17:04:00 2009 +0200 ide-cd: prevent null pointer deref via cdrom_newpc_intr merged in v2.6.31-rc2: commit 9c72ebef5aabf3532469d602a9d87beceea268b1 Author: Borislav Petkov <petkovbb@gmail.com> Date: Fri Jun 26 11:22:37 2009 -0700 ide-cd: handle fragmented packet commands gracefully