Latest working kernel version: 2.6.25 Earliest failing kernel version: 2.6.26 Distribution: Debian Lenny (tested with vanilla kernel) Hardware Environment: i386 Problem Description: The Seagate STT20000A tapestreamer is no longer usable when modprobing the ide_tape module there's kernel BUG. It was reported on the lkml[1][2], I tested with 2.6.29-rc7 and the problem still remains. Attached the boot dmesg and the kernel BUG dmesg. Please let me know if more information is needed. [1]http://marc.info/?l=linux-kernel&m=122203193728465&w=2 [2]http://marc.info/?l=linux-kernel&m=122202406417545&w=2
Created attachment 20526 [details] dmesg after booting the system
Created attachment 20527 [details] dmesg after modprobe ide_tape
(In reply to comment #1) > Created an attachment (id=20526) [details] > dmesg after booting the system (In reply to comment #2) > Created an attachment (id=20527) [details] > dmesg after modprobe ide_tape Please set the MIME type of these attachments to text/plain.
(In reply to comment #3) > (In reply to comment #1) > > Created an attachment (id=20526) [details] [details] > > dmesg after booting the system > > (In reply to comment #2) > > Created an attachment (id=20527) [details] [details] > > dmesg after modprobe ide_tape > > Please set the MIME type of these attachments to text/plain. > Sorry I assumed the autodetection of bugzilla would set the proper type. Changed it now.
Hi, please do objdump -d drivers/ide/ide-tape.o > ide-tape.dsm make drivers/ide/ide-tape.s and send me .dsm and .s files. Thanks.
Created attachment 20587 [details] ide-tape.dsm
Created attachment 20588 [details] ide-tape.s
Created attachment 20666 [details] debug patch
Hi, can you apply the debug patch attached and send me the modprobe output again. The patch is against stock 2.6.29 but it should apply fine against 2.6.29-rc7. Thanks, Boris.
Created attachment 20746 [details] 2.6.29-rc7 modprobe output with debug patch
Created attachment 20753 [details] temporary fix
Hi Mark, here's a fix. Please do test and let me know whether it fixes your bug. Thanks, Boris.
Created attachment 20810 [details] modprobe with patch I just tested with only the "temporary fix" patch patch applied and it didn't help. Attached the dmesg output of modprobe.
Hi, sorry, the patch was wrong. How about that one? Thanks, Boris.
Created attachment 20822 [details] tmp fix 2
Created attachment 20826 [details] modprobe with 'tmp fix 2' patch I just tried the 'tmp fix 2' patch and the modprobe still fails, but with a different backtrace and error.
Hi, yeah, the driver is quite broken but no worries, we'll fix it :). So, on to the next thing: If my analysis isn't completely wrong, we get a divide-by-zero error because tape->buffer_size is somehow 0. I'd like to see what capabilities your drive reports so please apply the following combo patch ontop of the "tmp fix 2"-patch and please send me the _whole_ dmesg this time. Thanks, Boris.
Created attachment 20873 [details] dump drive caps and dbg rqs
Created attachment 20892 [details] dmesg with patches 'tmp fix 2' + 'dump drive caps and dbg rqs' As requested the complete dmesg output of booting and modprobe with the 'tmp fix 2' + 'dump drive caps and dbg rqs' patches applied.
Hmm, seems like the buffer is quite botched or the drive replies with some mode parameter headers I cannot correlate. Ok, let's now enable full debugging - please apply the following patch ontop of the other two you have already applied on your tree. This time you should expect more output so you should add 'log_buf_len=10M ignore_loglevel' to the kernel command line to make sure you catch it all. And as always, please send me the whole output. Thanks, Boris.
Created attachment 20917 [details] dump whole caps buffer
Comment on attachment 20917 [details] dump whole caps buffer diff --git a/drivers/ide/ide-atapi.c b/drivers/ide/ide-atapi.c index e9d042d..4ef3b7e 100644 --- a/drivers/ide/ide-atapi.c +++ b/drivers/ide/ide-atapi.c @@ -8,6 +8,8 @@ #include <linux/ide.h> #include <scsi/scsi.h> +#define DEBUG 1 + #ifdef DEBUG #define debug_log(fmt, args...) \ printk(KERN_INFO "ide: " fmt, ## args) diff --git a/drivers/ide/ide-tape.c b/drivers/ide/ide-tape.c index aed280f..70e7876 100644 --- a/drivers/ide/ide-tape.c +++ b/drivers/ide/ide-tape.c @@ -59,7 +59,7 @@ enum { }; /* define to see debug info */ -#define IDETAPE_DEBUG_LOG 0 +#define IDETAPE_DEBUG_LOG 1 #if IDETAPE_DEBUG_LOG #define debug_log(lvl, fmt, args...) \ @@ -720,9 +720,11 @@ static void idetape_create_mode_sense_cmd(struct ide_atapi_pc *pc, u8 page_code) if (page_code == IDETAPE_BLOCK_DESCRIPTOR) pc->req_xfer = 12; else if (page_code == IDETAPE_CAPABILITIES_PAGE) - pc->req_xfer = 24; + pc->req_xfer = IDE_PC_BUFFER_SIZE; else pc->req_xfer = 50; + + memset(pc->pc_buf, 0xf, IDE_PC_BUFFER_SIZE); } static ide_startstop_t idetape_media_access_finished(ide_drive_t *drive) @@ -2082,7 +2084,7 @@ static void idetape_get_mode_sense_results(ide_drive_t *drive) struct ide_atapi_pc pc; u8 *caps; u8 speed, max_speed; - int i; + unsigned int i; idetape_create_mode_sense_cmd(&pc, IDETAPE_CAPABILITIES_PAGE); if (ide_queue_pc_tail(drive, tape->disk, &pc)) { @@ -2097,8 +2099,11 @@ static void idetape_get_mode_sense_results(ide_drive_t *drive) caps = pc.buf + 4 + pc.buf[3]; printk(KERN_ERR "caps: "); - for (i = 0; i < 24; i++) + for (i = 0; i < IDE_PC_BUFFER_SIZE; i++) { printk(KERN_CONT " %x", pc.buf[i]); + if ((i & 0xf) == 0xf) + printk(KERN_CONT "\n"); + } printk(KERN_CONT "\n"); /* convert to host order and save for later use */ @@ -2207,6 +2212,7 @@ static void idetape_setup(ide_drive_t *drive, idetape_tape_t *tape, int minor) drive->pc_callback = ide_tape_callback; drive->pc_update_buffers = idetape_update_buffers; drive->pc_io_buffers = ide_tape_io_buffers; + drive->debug_mask = 0xffffffff; spin_lock_init(&tape->lock);
Created attachment 20918 [details] dump whole caps buffer
(In reply to comment #20) > Hmm, seems like the buffer is quite botched or the drive replies with some > mode > parameter headers I cannot correlate. If you mean this line: [ 42.272638] caps: 70 0 6 0 0 0 0 a 0 0 0 0 29 0 0 0 0 0 0 0 0 0 0 0 this is the extended sense data with its characteristic 1st byte of 0x70, 8-byte header with sense key UNIT ATTENTION, 10 bytes of additinal length and ASC of 0x29 meaning "reset hs occured".
Hi Mark, can you please try 2.6.30 to see whether the problem still persists? Thanks, Boris.
Hi Boris, I didn't test before since I thought the fixes would be in 2.6.31-rc1. I just tested 2.6.30 and the drive still causes an OOPS. I'll test again after 2.6.31-rc1 is released and if it still crashes I'll post a full dmesg and OOPS report. Regards, Mark
Hi Mark, can you send the OOPS with 2.6.30? Thanks, Boris.
I tried to get the OOPS but the system hung and I wasn't able to capture it. I just tested with 2.6.31-rc3 which didn't OOPS, but files written to the streamer were truncated when read. I hope to find some more time this weekend to do some more testing and see whether there's a pattern in the truncating. Just wanted to let you know to see whether it rings a bell.
I just did some more testing by writing files of different sizes to the streamer and reading them back. With the 2.6.24-debian kernel the files had the same size before and after writing. With 2.6.31-rc3 they all return smaller as written. I did three runs on 2.6.31-rc3 which all produced the same results. The output of the attached script was: 0 30.copy 0 32.copy 30720 30 32768 32 55296 60.copy 55296 64.copy 61440 60 65536 64 110592 120.copy 110592 128.copy 122880 120 131072 128 221184 240.copy 221184 256.copy 245760 240 262144 256 442368 480.copy 491520 480 497664 512.copy 524288 512 940032 960.copy 983040 960 995328 1024.copy 1048576 1024
Created attachment 22400 [details] test script to test reading and writing data to the streamer
Hi, please send me that script and let me try to reproduce it here. Thanks, Boris.
Ah, too much in a hurry, as always. You've sent it already. Nevermind.
Created attachment 22722 [details] disable DMA for RW commands
Hi Mark. please try this patch to see whether it fixes your reported problem: http://bugzilla.kernel.org/attachment.cgi?id=22722. Thanks.
Hi Boris, I tried your patch and it makes no difference, the files are still truncated at the same length. Regards, Mark
Created attachment 22782 [details] ide-tape enable debug for whole driver
Hi Mark, ok, we have to wheel out the heavy guns. In ide-next I've got an ide-tape debugging patch for the whole driver. Get the tree at git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide-next-2.6.git and apply the patch at http://bugzilla.kernel.org/attachment.cgi?id=22782 and catch the _whole_ output while doing only one filesize, let's say: cat /tmp/32 >/dev/ht0 cat /dev/ht0 > /tmp/32.copy Try setting log_buf_len=10M kernel parameter in case the output gets truncated since it is quite a lot. Thanks.
Created attachment 22933 [details] dmesg after running the test I ran the test with kernel 2.6.31-rc8, pulling linux-next and your patch. Attached the dmesg output after running it with the 32Kb file (32.copy was still zero bytes).
Hi Mark, I am curious just for completeness of the record about how recent kernels behave here. (I don't expect they'll be any different since there haven't been any real fixes since v2.6.32-rc1 as far as I can tell.) In particular, can you still use the tape streamer without crashing or lockups, and does it still truncate files when you try to copy them with dd? Also, can you bisect[1] to find the particular change that introduced the bug? (That would make it easier to find a fix, for example by backing that change out.) Thanks, Jonathan [1] http://git-htmldocs.googlecode.com/git/git-bisect-lk2009.html