Bug 219419 - Can not query tape status - MTIOCGET fails with EIO
Summary: Can not query tape status - MTIOCGET fails with EIO
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All Linux
: P3 normal
Assignee: drivers_other
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-10-22 19:39 UTC by John Meneghini
Modified: 2024-12-11 22:10 UTC (History)
4 users (show)

See Also:
Kernel Version: v6.6-rc1
Subsystem:
Regression: No
Bisected commit-id:


Attachments
st-mtiocget-pos_unknown-1.diff (1.75 KB, application/octet-stream)
2024-10-25 15:37 UTC, Kai Mäkisara
Details
st-mtiocget-pos_unknown-2.diff (2.04 KB, application/octet-stream)
2024-10-28 17:29 UTC, Kai Mäkisara
Details
Failing test logs (12.09 KB, text/plain)
2024-10-30 19:51 UTC, John Meneghini
Details
Failing scsi_debug test logs (10.91 KB, text/plain)
2024-10-30 19:52 UTC, John Meneghini
Details
Simple patch to clear was_reset (458 bytes, patch)
2024-10-30 20:33 UTC, John Meneghini
Details | Diff
Passing test logs (12.56 KB, text/plain)
2024-10-31 01:12 UTC, John Meneghini
Details
Passing verbose test logs (273.67 KB, text/plain)
2024-10-31 01:15 UTC, John Meneghini
Details
Passing scsi_debug test logs (12.20 KB, text/plain)
2024-10-31 01:19 UTC, John Meneghini
Details
Latest test logs created with upstream patches (7.88 KB, text/plain)
2024-11-05 18:22 UTC, John Meneghini
Details
tape_reset_load test results. (8.00 KB, text/plain)
2024-11-05 19:03 UTC, John Meneghini
Details
new-media.diff (548 bytes, application/octet-stream)
2024-11-05 19:52 UTC, Kai Mäkisara
Details
new-media.diff test_tape_status.log (7.48 KB, text/plain)
2024-11-06 03:15 UTC, John Meneghini
Details
final passing test results with new-media (22.89 KB, text/plain)
2024-11-06 05:40 UTC, John Meneghini
Details
More reset patches - test results (56.49 KB, text/plain)
2024-11-18 20:26 UTC, John Meneghini
Details
tape_reset_debug_sg.sh test results (44.79 KB, text/plain)
2024-12-11 21:32 UTC, John Meneghini
Details
run_tests.sh test results. (357.32 KB, text/plain)
2024-12-11 22:10 UTC, John Meneghini
Details

Description John Meneghini 2024-10-22 19:39:50 UTC
This issue continues the work to address the bug reported in:

 https://bugzilla.redhat.com/show_bug.cgi?id=2210024 

This problem occurred when an unexpected power/on reset resulted in an undetected position lost condition which caused lost data as the st driver wrote a FM at the beginning of the tape. This happened to a customer several times while using an Amazon tape gateway. A connection loss condition in the fabric resulted in the gateway tape device resetting itself and rewinding the tape. The st driver swallowed the Unit Attention and wrote a file mark at BOT. 

We reproduced this problem in our lab by connecting an LT04 tape device with iSCSI though a LIO target server and then resetting/power cycling the tape drive while writes were in progress.

This issue was addressed by commit 9604eea5bd3a ("scsi: st: Add third party poweron reset handling") This commit added code which detected the unexpected a power on/reset Unit Attention to st_chk_results(). This correctly set pos_unknown in the driver and prevented further access to the device until the tape is re-positioned or rewound.

However, the problem with this changes is that it is catching POR Unit Attentions that are expected as well as those that are unexpected. st_chk_results() sets pos_unknown every time a POR UA is detected. This is confusing as some tape drives can set a POR UA when the system reboots, the driver reloads, or a tape is reloaded.

This results in an unexpected EIO response to MTIOCGET.
Comment 1 Kai Mäkisara 2024-10-23 18:32:42 UTC
The strategy to handle device resets in st is conservative because it is used for archives and backups: if there is any indication of possible bus reset, make the user retry. So, I think that any POR UA should set pos_unknown, unless it can be proved false in all possible cases.

The EIO response to MTIOCGET results from calling flush_buffer(). But, looking at the code, it seems to be called twice. First, pos_unknown is tested and flush_buffer() only if pos_unknown is false. If it is true, some commands, including MTIOCGET, can continue. But, later there is an unconditional call to flush_buffer(). I have to look at the history to see why there are two calls. I don't have enough time for that tonight, but I will continue later.
Comment 2 Kai Mäkisara 2024-10-25 15:37:43 UTC
Created attachment 307074 [details]
st-mtiocget-pos_unknown-1.diff

The first flush_buffer() was for only MTIOCTOP.

The attached patch (against current torvalds/linux.git) should enable MTIOCGET also after device reset. The received data should be correct as far as it can (which is not much). (The modified st compiles, but I can't test it because I don't have access to any tape hardware any more.)

One thing puzzles me though: is MTIOCGET failing after device reset a bug or a feature? Is someone using the fact that it fails to gain some information?
Comment 3 John Meneghini 2024-10-28 16:44:30 UTC
OK. I will test out this patch.

Here's an example of how not everything works as expected after sg_reset.

When I unload the tape and then reset the tape drive, I am unable to to issue the load command.

[root@to-be-determined ~]# mt -f /dev/nst0 status
SCSI 2 tape drive:
File number=-1, block number=-1, partition=0.
Tape block size 0 bytes. Density code 0x0 (default).
Soft error count since last status=0
General status bits on (50000):
 DR_OPEN IM_REP_EN
[root@to-be-determined ~]# mt -f /dev/nst0 load
/dev/nst0: Input/output error
Comment 4 Kai Mäkisara 2024-10-28 17:28:28 UTC
Well, status succeeded ;-)

Load is failing probably because it is not in the list of allowed commands after reset
(as it should be). The attached patch adds fix to this (the second item starting from
line 3619).

(Now the patch already includes two bug fixes, in addition to allowing MTIOCGET
after reset.)
Comment 5 Kai Mäkisara 2024-10-28 17:29:05 UTC
Created attachment 307083 [details]
st-mtiocget-pos_unknown-2.diff

The new patch.
Comment 6 John Meneghini 2024-10-30 19:48:24 UTC
OK, good news.  Your patch fixed pretty much all of the problems.

I'm attaching some before and after log files from my test runs.

The tests I ran can be seen here:

 https://github.com/johnmeneghini/tape_tests

Please take a quick look at these tests/logs and let me know if the tests are valid and you agree the results are good.

Thanks.
Comment 7 John Meneghini 2024-10-30 19:51:55 UTC
Created attachment 307093 [details]
Failing test logs

Test log showing failures with the tape_reset_test before the proposed patch.

Tests run are from: https://github.com/johnmeneghini/tape_tests
Comment 8 John Meneghini 2024-10-30 19:52:54 UTC
Created attachment 307094 [details]
Failing scsi_debug test logs

Test log showing failures with the tape_reset_debug test before the proposed patch.

Tests run are from: https://github.com/johnmeneghini/tape_tests
Comment 9 John Meneghini 2024-10-30 20:29:28 UTC
mt -f $DEV status
SCSI 2 tape drive:
File number=1, block number=0, partition=0.
Tape block size 0 bytes. Density code 0x46 (LTO-4).
Soft error count since last status=0
General status bits on (81010000):
 EOF ONLINE IM_REP_EN
dd if=$DEV count=1024 of=/dev/null
1024+0 records in
1024+0 records out
524288 bytes (524 kB, 512 KiB) copied, 0.101394 s, 5.2 MB/s
mt -f $DEV status
SCSI 2 tape drive:
File number=1, block number=1024, partition=0.
Tape block size 0 bytes. Density code 0x46 (LTO-4).
Soft error count since last status=0
General status bits on (1010000):
 ONLINE IM_REP_EN
mt -f $DEV eod
mt -f $DEV status
/dev/nst0: Input/output error

Here's an example of why we need to clear was_reset every time we clear pos_unkown.  

See my attached patch.

[Wed Oct 30 16:19:35 2024] st 6:0:0:0: [st0] st_flush: 1386: pos_unknown 0x0 was_reset 0x1 ready 0x0
[Wed Oct 30 16:19:35 2024] st 6:0:0:0: [st0] Number of r/w requests 1024, dio used in 1024, pages 1024.
[Wed Oct 30 16:19:35 2024] st 6:0:0:0: [st0] flush_buffer: 844: pos_unknown 0x0 was_reset 0x1 ready 0x0
[Wed Oct 30 16:19:35 2024] st 6:0:0:0: [st0] check_tape: 1064: pos_unknown 0x0 was_reset 0x1 ready 0x0
[Wed Oct 30 16:19:35 2024] st 6:0:0:0: [st0] Block limits 1 - 16777215 bytes.
[Wed Oct 30 16:19:35 2024] st 6:0:0:0: [st0] Mode sense. Length 11, medium 0, WBS 10, BLL 8
[Wed Oct 30 16:19:35 2024] st 6:0:0:0: [st0] Density 46, tape length: 0, drv buffer: 1
[Wed Oct 30 16:19:35 2024] st 6:0:0:0: [st0] Block size: 0, buffer size: 4096 (1 blocks).
[Wed Oct 30 16:19:35 2024] st 6:0:0:0: [st0] check_tape: 1265: CHKRES_READY pos_unknown 0x0 was_reset 0x1 ready 0x0
[Wed Oct 30 16:19:35 2024] st 6:0:0:0: [st0] flush_buffer: 844: pos_unknown 0x0 was_reset 0x1 ready 0x0
[Wed Oct 30 16:19:35 2024] st 6:0:0:0: [st0] st_flush: 1386: pos_unknown 0x0 was_reset 0x1 ready 0x0
[Wed Oct 30 16:19:35 2024] st 6:0:0:0: [st0] flush_buffer: 844: pos_unknown 0x0 was_reset 0x1 ready 0x0
[Wed Oct 30 16:19:35 2024] st 6:0:0:0: [st0] check_tape: 1064: pos_unknown 0x0 was_reset 0x1 ready 0x0
[Wed Oct 30 16:19:35 2024] st 6:0:0:0: [st0] Block limits 1 - 16777215 bytes.
[Wed Oct 30 16:19:35 2024] st 6:0:0:0: [st0] Mode sense. Length 11, medium 0, WBS 10, BLL 8
[Wed Oct 30 16:19:35 2024] st 6:0:0:0: [st0] Density 46, tape length: 0, drv buffer: 1
[Wed Oct 30 16:19:35 2024] st 6:0:0:0: [st0] Block size: 0, buffer size: 4096 (1 blocks).
[Wed Oct 30 16:19:35 2024] st 6:0:0:0: [st0] check_tape: 1265: CHKRES_READY pos_unknown 0x0 was_reset 0x1 ready 0x0
[Wed Oct 30 16:19:35 2024] st 6:0:0:0: [st0] flush_buffer: 844: pos_unknown 0x0 was_reset 0x1 ready 0x0
[Wed Oct 30 16:19:35 2024] st 6:0:0:0: [st0] Spacing tape forward over 8388607 filemarks.

[Wed Oct 30 16:19:48 2024] st 6:0:0:0: [st0] Error: 402, cmd: 11 1 7f ff ff 0
[Wed Oct 30 16:19:48 2024] st 6:0:0:0: [st0] Sense Key : Blank Check [current] 
[Wed Oct 30 16:19:48 2024] st 6:0:0:0: [st0] Add. Sense: End-of-data detected

^^^ this check condition at eod re-asserts pos_unknown.

[Wed Oct 30 16:19:48 2024] st 6:0:0:0: [st0] st_chk_result: 424: pos_unknown 0x1 was_reset 0x1 ready 0x0, result 1026
[Wed Oct 30 16:19:48 2024] st 6:0:0:0: [st0] st_flush: 1386: pos_unknown 0x1 was_reset 0x1 ready 0x0
[Wed Oct 30 16:19:48 2024] st 6:0:0:0: [st0] flush_buffer: 844: pos_unknown 0x1 was_reset 0x1 ready 0x0
[Wed Oct 30 16:19:48 2024] st 6:0:0:0: [st0] check_tape: 1064: pos_unknown 0x1 was_reset 0x1 ready 0x0
[Wed Oct 30 16:19:48 2024] st 6:0:0:0: [st0] Block limits 1 - 16777215 bytes.
[Wed Oct 30 16:19:48 2024] st 6:0:0:0: [st0] Mode sense. Length 11, medium 0, WBS 10, BLL 8
[Wed Oct 30 16:19:48 2024] st 6:0:0:0: [st0] Density 46, tape length: 0, drv buffer: 1
[Wed Oct 30 16:19:48 2024] st 6:0:0:0: [st0] Block size: 0, buffer size: 4096 (1 blocks).
[Wed Oct 30 16:19:48 2024] st 6:0:0:0: [st0] check_tape: 1265: CHKRES_READY pos_unknown 0x1 was_reset 0x1 ready 0x0
[Wed Oct 30 16:19:48 2024] st 6:0:0:0: [st0] flush_buffer: 844: pos_unknown 0x1 was_reset 0x1 ready 0x0
[Wed Oct 30 16:19:48 2024] st 6:0:0:0: [st0] st_flush: 1386: pos_unknown 0x1 was_reset 0x1 ready 0x0
[Wed Oct 30 16:19:48 2024] st 6:0:0:0: [st0] flush_buffer: 844: pos_unknown 0x1 was_reset 0x1 ready 0x0
Comment 10 John Meneghini 2024-10-30 20:33:55 UTC
Created attachment 307095 [details]
Simple patch to clear was_reset

> git diff HEAD^^ HEAD^ > clear_was_reset.patch
 
> git show HEAD^
commit c5d061be5eab53d9c7ef13260d41bd39dcf1e70b
Author: John Meneghini <jmeneghi@redhat.com>
Date:   Wed Oct 23 15:55:05 2024 -0400

    scsi: st: clear was_reset when CHKRES_NEW_SESSION
    
    Be sure to clear was_reset when ever we clear pos_unknown. If we don't
    then the code in st_chk_result() will turn on pos_unknown again.
    
            STp->pos_unknown |= STp->device->was_reset;
    
    This results in confusion as future commands fail unexpectedly.
    
    Signed-off-by: John Meneghini <jmeneghi@redhat.com>

diff --git a/drivers/scsi/st.c b/drivers/scsi/st.c
index 9c0f9779768e..e9d1cb6c8a86 100644
--- a/drivers/scsi/st.c
+++ b/drivers/scsi/st.c
@@ -1083,6 +1083,7 @@ static int check_tape(struct scsi_tape *STp, struct file *filp)
 
        if (retval == CHKRES_NEW_SESSION) {
                STp->pos_unknown = 0;
+               STp->device->was_reset = 0;
                STp->partition = STp->new_partition = 0;
                if (STp->can_partitions)
                        STp->nbr_partitions = 1; /* This guess will be updated later
Comment 11 John Meneghini 2024-10-31 01:12:55 UTC
Created attachment 307097 [details]
Passing test logs

Test logs collected from a test run with the patches from:

 https://lore.kernel.org/linux-scsi/20241031010032.117296-1-jmeneghi@redhat.com/T/#ma155e0b7bd87bb081ec94644a9689c8c494c879b

Note there are some failures here which are unexpected.

> grep "^---" tape_reset_test-after.log
---TEST FAILED--- with status 0
---TEST FAILED--- with status 0
---TEST FAILED--- with status 0
---TEST FAILED--- with status 0
---TEST FAILED--- with status 0
---TEST FAILED--- with status 0


# Reset the device with tape at EOD
#
$PWD/tape_reset.sh $SDEV 1 &
sleep 3
sleeping 1 seconds   
sg_reset --target /dev/sg1

#
# These commands should fail
#
dd if=/dev/random count=1001024 of=$DEV && echo "---TEST FAILED--- with status $?"
1001024+0 records in
1001024+0 records out
512524288 bytes (513 MB, 489 MiB) copied, 97.2296 s, 5.3 MB/s
---TEST FAILED--- with status 0
mt -f $DEV weof 1 && echo "---TEST FAILED--- with status $?"
---TEST FAILED--- with status 0
mt -f $DEV wset 1 && echo "---TEST FAILED--- with status $?"
/dev/nst0: Input/output error
mt: The tape is write-protected.
dd if=$DEV count=1024 of=/dev/null && echo "---TEST FAILED--- with status $?"
dd: error reading '/dev/nst0': Input/output error

See the verbose logs for more information.
Comment 12 John Meneghini 2024-10-31 01:15:08 UTC
Created attachment 307098 [details]
Passing verbose test logs

Same test run with dmesg.log
Comment 13 John Meneghini 2024-10-31 01:19:28 UTC
Created attachment 307099 [details]
Passing scsi_debug test logs

The tape_reset_debug test log run with the patches from:

https://lore.kernel.org/linux-scsi/20241031010032.117296-1-jmeneghi@redhat.com/
Comment 14 John Meneghini 2024-11-05 18:22:30 UTC
Created attachment 307146 [details]
Latest test logs created with upstream patches

Thanks for the patches!

I've made some improvement to the tests. This is test log of the new test_reset_status.sh test script run against your newest patches set upstream at:

https://lore.kernel.org/linux-scsi/20241104112623.2675-1-Kai.Makisara@kolumbus.fi/

I'd like to ask some questions about these test results.

The first error is found in the test sequence which resets the tape device while the file handle is closed while the tape is at EOD.

mt -f /dev/nst0 eod

mt -f /dev/nst0 status
SCSI 2 tape drive:
File number=2, block number=-1, partition=0.
Tape block size 0 bytes. Density code 0x46 (LTO-4).
Soft error count since last status=0
General status bits on (9010000):
 EOD ONLINE IM_REP_EN

^^^^^^^^^^^^^^^^^^^^^
Two files have been written to the tape and the tape is positioned at EOD.

sg_reset --target /dev/sg1

mt -f /dev/nst0 status
SCSI 2 tape drive:
File number=0, block number=0, partition=0.
Tape block size 0 bytes. Density code 0x46 (LTO-4).
Soft error count since last status=0
General status bits on (41010000):
 BOT ONLINE IM_REP_EN
--- mt -f /dev/nst0 status TEST FAILED--- with status 0

^^^^^^^^^^^^^^^^^^^^
Shouldn't this status command return EIO from st_open?  It looks like it has
cleared the pos_unknown statement set by the UA found in check_tape/test_ready.

mt -f /dev/nst0 status
SCSI 2 tape drive:
File number=0, block number=0, partition=0.
Tape block size 0 bytes. Density code 0x46 (LTO-4).
Soft error count since last status=0
General status bits on (41010000):

dd if=/dev/random count=1001024 of=/dev/nst0
1001024+0 records in
1001024+0 records out
512524288 bytes (513 MB, 489 MiB) copied, 95.6305 s, 5.4 MB/s
--- dd if=/dev/random count=1001024 of=/dev/nst0  TEST FAILED--- with status 0

mt -f /dev/nst0 weof 1
--- mt -f /dev/nst0 weof 1  TEST FAILED--- with status 0

^^^^^^^^^^^^^^^^
Here's the worry... we are allowing writes to the tape w/out requiring a tape reposition after reset.
Comment 15 John Meneghini 2024-11-05 19:03:05 UTC
Created attachment 307147 [details]
tape_reset_load test results.

This log is taken from the tape_reset_load.sh test.  This verifies that the "mt load" now works after reset.
Comment 16 Kai Mäkisara 2024-11-05 19:52:10 UTC
Created attachment 307148 [details]
new-media.diff

> On 5. Nov 2024, at 20.22, bugzilla-daemon@kernel.org wrote:
> 
...
> I've made some improvement to the tests. This is test log of the new
> test_reset_status.sh test script run against your newest patches set upstream
> at:
> 
>
> https://lore.kernel.org/linux-scsi/20241104112623.2675-1-Kai.Makisara@kolumbus.fi/
> 
> I'd like to ask some questions about these test results.
> 
> The first error is found in the test sequence which resets the tape device
> while the file handle is closed while the tape is at EOD.
> 
> mt -f /dev/nst0 eod
> 
> mt -f /dev/nst0 status
> SCSI 2 tape drive:
> File number=2, block number=-1, partition=0.
> Tape block size 0 bytes. Density code 0x46 (LTO-4).
> Soft error count since last status=0
> General status bits on (9010000):
> EOD ONLINE IM_REP_EN
> 
> ^^^^^^^^^^^^^^^^^^^^^
> Two files have been written to the tape and the tape is positioned at EOD.
> 
> sg_reset --target /dev/sg1
> 
> mt -f /dev/nst0 status
> SCSI 2 tape drive:
> File number=0, block number=0, partition=0.
> Tape block size 0 bytes. Density code 0x46 (LTO-4).
> Soft error count since last status=0
> General status bits on (41010000):
> BOT ONLINE IM_REP_EN
> --- mt -f /dev/nst0 status TEST FAILED--- with status 0
> 
> ^^^^^^^^^^^^^^^^^^^^
> Shouldn't this status command return EIO from st_open?  It looks like it has
> cleared the pos_unknown statement set by the UA found in
> check_tape/test_ready.
> 

I think this status should succeed, as it does. The patch just modifies the code so
that MTIOCGET is allowed also after reset. So, this 'mt status' is not different from
the next ones.


> mt -f /dev/nst0 status
> SCSI 2 tape drive:
> File number=0, block number=0, partition=0.
> Tape block size 0 bytes. Density code 0x46 (LTO-4).
> Soft error count since last status=0
> General status bits on (41010000):
> 
> dd if=/dev/random count=1001024 of=/dev/nst0
> 1001024+0 records in
> 1001024+0 records out
> 512524288 bytes (513 MB, 489 MiB) copied, 95.6305 s, 5.4 MB/s
> --- dd if=/dev/random count=1001024 of=/dev/nst0  TEST FAILED--- with status
> 0
> 
> mt -f /dev/nst0 weof 1
> --- mt -f /dev/nst0 weof 1  TEST FAILED--- with status 0
> 
> ^^^^^^^^^^^^^^^^
> Here's the worry... we are allowing writes to the tape w/out requiring a tape
> reposition after reset.

Yes, this is worrying. I have been developing a patch to get rid of of was_reset
and there I have found a problem that may explain this behavior: UA handling in
test_ready() sets new_session also for reset if it is seen before open(). New
session clears pos_unknown. If reset is seen when writing, then this does
not happen. Scsi error handling clears was_reset if it re-locks a locked door
and this may explain why pos_unknown is not set if reset occurs just before
'mt status'.

If my theory is correct, the attached patch may fix the problem.
Comment 17 John Meneghini 2024-11-06 01:02:00 UTC
(In reply to Kai Mäkisara from comment #16)
> > mt -f /dev/nst0 weof 1
> > --- mt -f /dev/nst0 weof 1  TEST FAILED--- with status 0
> > 
> > ^^^^^^^^^^^^^^^^
> > Here's the worry... we are allowing writes to the tape w/out requiring a
> tape
> > reposition after reset.
> 
> Yes, this is worrying. I have been developing a patch to get rid of of
> was_reset
> and there I have found a problem that may explain this behavior: UA handling
> in
> test_ready() sets new_session also for reset if it is seen before open(). New
> session clears pos_unknown. If reset is seen when writing, then this does
> not happen. Scsi error handling clears was_reset if it re-locks a locked door
> and this may explain why pos_unknown is not set if reset occurs just before
> 'mt status'.
> 
> If my theory is correct, the attached patch may fix the problem.

OK, I'm looking at this patch and I don't think it will help because setting
new_session will result in test_ready() clearing pos_unknown.  I'll test it out and see what it does.

                        if (scode == UNIT_ATTENTION) { /* New media? */
-                               new_session = 1;
+                               if (cmdstatp->sense_hdr.asc == 0x28) { /* New media */
+                                       new_session = 1;
+                                       DEBC_printk(STp, "New tape session.");
+                               }
Comment 18 John Meneghini 2024-11-06 03:15:56 UTC
Created attachment 307151 [details]
new-media.diff test_tape_status.log

OK testing with your new-media patch seems to work. Attached is the new log file.

Please add this patch to your series upstream.

Here are the sections of the test run with this patch (on top of your other patches) which matter:

mt -f /dev/nst0 eod

mt -f /dev/nst0 status
SCSI 2 tape drive:
File number=2, block number=-1, partition=0.
Tape block size 0 bytes. Density code 0x46 (LTO-4).
Soft error count since last status=0
General status bits on (9010000):
 EOD ONLINE IM_REP_EN
sg_reset --target /dev/sg1

mt -f /dev/nst0 status
SCSI 2 tape drive:
File number=-1, block number=-1, partition=0.
Tape block size 0 bytes. Density code 0x46 (LTO-4).
Soft error count since last status=0
General status bits on (1010000):
 ONLINE IM_REP_EN

^^^^^^^^^^^^^^^^^^^^^^
mt status after reset works correctly, I've updated my test.

dd if=/dev/random count=1001024 of=/dev/nst0
dd: writing to '/dev/nst0': Input/output error
dd: closing output file '/dev/nst0': Input/output error

mt -f /dev/nst0 weof 1
/dev/nst0: Input/output error
mt: The tape is write-protected.

mt -f /dev/nst0 wset 1
/dev/nst0: Input/output error
mt: The tape is write-protected.

dd if=/dev/nst0 count=1024 of=/dev/null
dd: error reading '/dev/nst0': Input/output error
dd: closing input file '/dev/nst0': Input/output error

dd if=/dev/random count=1001024 of=/dev/nst0
dd: writing to '/dev/nst0': Input/output error
dd: closing output file '/dev/nst0': Input/output error
^^^^^^^^^^^^^^^^^^^^^^^

All reads and writes correctly fail after reset.

mt -f /dev/nst0 status
SCSI 2 tape drive:
File number=-1, block number=-1, partition=0.
Tape block size 0 bytes. Density code 0x46 (LTO-4).
Soft error count since last status=0
General status bits on (1010000):
 ONLINE IM_REP_EN
mt -f /dev/nst0 rewind

mt -f /dev/nst0 status
SCSI 2 tape drive:
File number=0, block number=0, partition=0.
Tape block size 0 bytes. Density code 0x46 (LTO-4).
Soft error count since last status=0
General status bits on (41010000):
 BOT ONLINE IM_REP_EN

dd if=/dev/nst0 count=1024 of=/dev/null
1024+0 records in
1024+0 records out
524288 bytes (524 kB, 512 KiB) copied, 2.23899 s, 234 kB/s

^^^^^^^^^^^^^^^^^^^^^^
mt rewind clears the condition.
mt -f /dev/nst0 fsf 1

dd if=/dev/nst0 count=1024 of=/dev/null
1024+0 records in
1024+0 records out
Comment 19 John Meneghini 2024-11-06 05:40:31 UTC
Created attachment 307152 [details]
final passing test results with new-media

Using the new-media patch together with "[PATCH 0/2] scsi: st: Device reset patches" all tests are now passing.  Including the concatenated log file "tape_reset_tests.log"

See the updated test scripts in https://github.com/johnmeneghini/tape_tests

This issue can be closed as soon as these patches are merged upstream.

Thanks for all of your help, Kai.
Comment 20 John Meneghini 2024-11-18 20:26:48 UTC
Created attachment 307228 [details]
More reset patches - test results

Adding the passing test results with new patches from:

 https://lore.kernel.org/linux-scsi/20241115162003.3908-1-Kai.Makisara@kolumbus.fi/

linux(tape_test) > git logl -7
177c3f710de4 2024-11-15 1729629595  (HEAD -> tape_test, johnm/tape_test2) scsi: st: instrument the pos_unknown code [ John Meneghini / jmeneghi@redhat.com ]
7373d7d717a0 2024-11-15 1731687603  (tape_test_noinst) scsi: st: Restore some drive settings after reset [ John Meneghini / Kai.Makisara@kolumbus.fi ]
25be4d26ef58 2024-11-15 1731687602  scsi: st: Remove use of device->was_reset [ John Meneghini / Kai.Makisara@kolumbus.fi ]
c5e0a7687c6b 2024-11-15 1730887043  scsi: st: New session only when Unit Attention for new tape [ John Meneghini / Kai.Makisara@kolumbus.fi ]
1ab8b314a34b 2024-11-15 1730887042  scsi: st: Add MTIOCGET and MTLOAD to ioctls allowed after device reset [ John Meneghini / Kai.Makisara@kolumbus.fi ]
0013312311da 2024-11-15 1730887041  scsi: st: Don't modify unknown block number in MTIOCGET [ John Meneghini / Kai.Makisara@kolumbus.fi ]
2d5404caa8c7 2024-11-10 1731277175  (tag: v6.12-rc7, branch_v6.12-rc7) Linux 6.12-rc7 [ Linus Torvalds / torvalds@linux-foundation.org ]
Comment 21 John Meneghini 2024-12-11 21:32:10 UTC
Created attachment 307345 [details]
tape_reset_debug_sg.sh test results

scsi_debug tape emulation test results for latest upstream patches at:

https://lore.kernel.org/linux-scsi/20241125140301.3912-1-Kai.Makisara@kolumbus.fi/
Comment 22 John Meneghini 2024-12-11 22:10:24 UTC
Created attachment 307346 [details]
run_tests.sh test results.

run_tests.sh test results

These are the results for the hardware tape tests run with the latest upstream patches: 

https://lore.kernel.org/linux-scsi/20241125140301.3912-1-Kai.Makisara@kolumbus.fi/

Note You need to log in before you can comment on or make changes to this bug.