Bug 5195

Summary: scsi_eh not get released for USB storage device after failure or disconnect
Product: Drivers Reporter: Feng-sung Yang (fsyang_tw)
Component: USBAssignee: Alan Stern (stern)
Status: CLOSED CODE_FIX    
Severity: normal CC: greg
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.13-mm1 Subsystem:
Regression: --- Bisected commit-id:
Bug Depends on:    
Bug Blocks: 5089    
Attachments: /var/log/kernel/info
Fix errors in the SCSI core
info.050912.bz2
info.050912.bz2

Description Feng-sung Yang 2005-09-06 06:17:39 UTC
Most recent kernel where this bug did not occur: None
Distribution: Mandriva Linux release 2006.0 (Cooker) for i586
Hardware Environment: i686 Intel(R) Pentium(R) 4 CPU 3.00GHz
Software Environment:
Kernel tested: 2.6.13-mm1 with patch from
http://marc.theaimsgroup.com/?l=linux-usb-devel&m=112551468126219&w=2
Problem Description:

1. I have tried the -mm1 patch (plus the
http://marc.theaimsgroup.com/?l=linux-usb-devel&m=112551468126219&w=2 reset patch).
Only one card reader is used. I remove the card reader while it's
being read (dd if=/dev/sda of=/dev/null). Then the usb-storage driver is gone
with scsi_eh_4 left.

The following is the last trace messages:

Sep  6 17:00:34 fsyang kernel: usb-storage: Command READ_10 (10 bytes)
Sep  6 17:00:34 fsyang kernel: usb-storage:  28 00 00 01 30 40 00 00 40 00
Sep  6 17:00:34 fsyang kernel: usb-storage: Bulk Command S 0x43425355 T 0x4cf L
32768 F 128 Trg 0 LUN 0 CL 10
Sep  6 17:00:34 fsyang kernel: usb-storage: usb_stor_bulk_transfer_buf: xfer 31
bytes
Sep  6 17:00:34 fsyang kernel: usb-storage: Status code 0; transferred 31/31
Sep  6 17:00:34 fsyang kernel: usb-storage: -- transfer complete
Sep  6 17:00:34 fsyang kernel: usb-storage: Bulk command transfer result=0
Sep  6 17:00:34 fsyang kernel: usb-storage: usb_stor_bulk_transfer_sglist: xfer
32768 bytes, 1 entries
Sep  6 17:00:34 fsyang kernel: usb-storage: Status code 0; transferred 32768/32768
Sep  6 17:00:34 fsyang kernel: usb-storage: -- transfer complete
Sep  6 17:00:34 fsyang kernel: usb-storage: Bulk data transfer result 0x0
Sep  6 17:00:34 fsyang kernel: usb-storage: Attempting to get CSW...
Sep  6 17:00:34 fsyang kernel: usb-storage: usb_stor_bulk_transfer_buf: xfer 13
bytes
Sep  6 17:00:34 fsyang kernel: usb-storage: Status code 0; transferred 13/13
Sep  6 17:00:34 fsyang kernel: usb-storage: -- transfer complete
Sep  6 17:00:34 fsyang kernel: usb-storage: Bulk status result = 0
Sep  6 17:00:34 fsyang kernel: usb-storage: Bulk Status S 0x53425355 T 0x4cf R 0
Stat 0x0
Sep  6 17:00:34 fsyang kernel: usb-storage: scsi cmd done, result=0x0
Sep  6 17:00:34 fsyang kernel: usb-storage: *** thread sleeping.
Sep  6 17:00:34 fsyang kernel: usb-storage: queuecommand called
Sep  6 17:00:34 fsyang kernel: usb-storage: *** thread awakened.
Sep  6 17:00:34 fsyang kernel: usb-storage: Command READ_10 (10 bytes)
Sep  6 17:00:34 fsyang kernel: usb-storage:  28 00 00 01 30 80 00 00 40 00
Sep  6 17:00:34 fsyang kernel: usb-storage: Bulk Command S 0x43425355 T 0x4d0 L
32768 F 128 Trg 0 LUN 0 CL 10
Sep  6 17:00:34 fsyang kernel: usb-storage: usb_stor_bulk_transfer_buf: xfer 31
bytes
Sep  6 17:00:34 fsyang kernel: usb-storage: Status code 0; transferred 31/31
Sep  6 17:00:34 fsyang kernel: usb-storage: -- transfer complete
Sep  6 17:00:34 fsyang kernel: usb-storage: Bulk command transfer result=0
Sep  6 17:00:34 fsyang kernel: usb-storage: usb_stor_bulk_transfer_sglist: xfer
32768 bytes, 1 entries
Sep  6 17:00:34 fsyang kernel: usb-storage: Status code 0; transferred 32768/32768
Sep  6 17:00:34 fsyang kernel: usb-storage: -- transfer complete
Sep  6 17:00:34 fsyang kernel: usb-storage: Bulk data transfer result 0x0
Sep  6 17:00:34 fsyang kernel: usb-storage: Attempting to get CSW...
Sep  6 17:00:34 fsyang kernel: usb-storage: usb_stor_bulk_transfer_buf: xfer 13
bytes
Sep  6 17:00:34 fsyang kernel: usb-storage: Status code 0; transferred 13/13
Sep  6 17:00:34 fsyang kernel: usb-storage: -- transfer complete
Sep  6 17:00:34 fsyang kernel: usb-storage: Bulk status result = 0
Sep  6 17:00:34 fsyang kernel: usb-storage: Bulk Status S 0x53425355 T 0x4d0 R 0
Stat 0x0
Sep  6 17:00:34 fsyang kernel: usb-storage: scsi cmd done, result=0x0
Sep  6 17:00:34 fsyang kernel: usb-storage: *** thread sleeping.
Sep  6 17:00:34 fsyang kernel: usb-storage: queuecommand called
Sep  6 17:00:34 fsyang kernel: usb-storage: *** thread awakened.
Sep  6 17:00:34 fsyang kernel: usb-storage: Command READ_10 (10 bytes)
Sep  6 17:00:34 fsyang kernel: usb-storage:  28 00 00 01 30 c0 00 00 40 00
Sep  6 17:00:34 fsyang kernel: usb-storage: Bulk Command S 0x43425355 T 0x4d1 L
32768 F 128 Trg 0 LUN 0 CL 10
Sep  6 17:00:34 fsyang kernel: usb-storage: usb_stor_bulk_transfer_buf: xfer 31
bytes
Sep  6 17:00:34 fsyang kernel: usb-storage: Status code 0; transferred 31/31
Sep  6 17:00:34 fsyang kernel: usb-storage: -- transfer complete
Sep  6 17:00:34 fsyang kernel: usb-storage: Bulk command transfer result=0
Sep  6 17:00:34 fsyang kernel: usb-storage: usb_stor_bulk_transfer_sglist: xfer
32768 bytes, 1 entries
Sep  6 17:00:34 fsyang kernel: usb-storage: Status code -71; transferred 26112/32768
Sep  6 17:00:34 fsyang kernel: usb-storage: -- unknown error
Sep  6 17:00:34 fsyang kernel: usb-storage: Bulk data transfer result 0x4
Sep  6 17:00:34 fsyang kernel: usb-storage: -- transport indicates error, resetting
Sep  6 17:00:34 fsyang kernel: usb 5-1: USB disconnect, address 17
Sep  6 17:00:34 fsyang kernel: usb-storage: storage_disconnect() called
Sep  6 17:00:34 fsyang kernel: usb-storage: usb_stor_stop_transport called
Sep  6 17:00:36 fsyang kernel: usb-storage: No reset during disconnect
Sep  6 17:00:36 fsyang kernel: usb-storage: usb_stor_Bulk_reset called
Sep  6 17:00:36 fsyang kernel: usb-storage: No reset during disconnect
Sep  6 17:00:36 fsyang kernel: usb-storage: scsi cmd done, result=0x70000
Sep  6 17:00:36 fsyang kernel: usb-storage: *** thread sleeping.
Sep  6 17:00:36 fsyang kernel: usb-storage: queuecommand called
Sep  6 17:00:36 fsyang kernel: usb-storage: Fail command during disconnect
Sep  6 17:00:36 fsyang kernel: SCSI error : <4 0 0 0> return code = 0x10000
Sep  6 17:00:36 fsyang kernel: usb-storage: queuecommand called
Sep  6 17:00:36 fsyang kernel: usb-storage: Fail command during disconnect
Sep  6 17:00:36 fsyang kernel: SCSI error : <4 0 0 0> return code = 0x10000
Sep  6 17:00:36 fsyang kernel: usb-storage: -- usb_stor_release_resources
Sep  6 17:00:36 fsyang kernel: usb-storage: -- sending exit command to thread
Sep  6 17:00:36 fsyang kernel: usb-storage: -- dissociate_dev
Sep  6 17:00:36 fsyang kernel: usb-storage: *** thread awakened.
Sep  6 17:00:36 fsyang kernel: usb-storage: -- exiting

2. When stressing (reading/writing) 7 card readers simultaneously, a similar
condition happens. (Actually 8 card readers are connected. But only 7 are found.
I am not sure whether it's a driver problem...) I think the signal quality on
USB bus seems really bad when multiple card readers are tested at the same time.
(The environment is the same as http://bugzilla.kernel.org/show_bug.cgi?id=5173)

Sometimes it can be found that some scsi_eh_XX is present without any
corresponding USB device. I think when doing error recovery for that USB device,
the scsi_eh_XX driver is accidentially replaced by another one with cancelling
the original one.

When one more orphan scsi_eh_XX is found, I try to stop the testing immediately.
I think it still takes two or more seconds to really stop the testing. The ps
result is as below:

~ 1012$ ps -eo pid,tid,class,rtprio,ni,pri,psr,pcpu,stat,wchan:30,comm | grep scsi
23474 23474 TS       -   0  19   0  0.0 S    -                             
scsi_eh_119
25649 25649 TS       -   0  24   0  0.0 S    -                             
scsi_eh_125
25686 25686 TS       -   0  19   0  0.0 S    -                             
scsi_eh_126
25723 25723 TS       -   0  19   0  0.0 S    -                             
scsi_eh_127
25760 25760 TS       -   0  19   0  0.0 S    -                             
scsi_eh_128
25797 25797 TS       -   0  24   0  0.0 S    -                             
scsi_eh_129
25834 25834 TS       -   0  24   0  0.0 S    -                             
scsi_eh_130
25871 25871 TS       -   0  24   0  0.0 S    -                             
scsi_eh_131
27074 27074 TS       -   0  19   0  0.0 S    -                             
scsi_eh_132

scsi_eh_119 is caused by the previous test before this one since I was unable to
save the verbose debug information in time. So that I run the tests again to
cause the orphan scsi_eh_128. The device mappings is:

/sys/block 1004$ ls -l sd?/device
lrwxrwxrwx  1 root root 0  9&#26376;  6 20:59 sda/device ->
../../devices/pci0000:00/0000:00:1d.7/usb5/5-1/5-1.1/5-1.1:1.0/host125/target125:0:0/125:0:0:0/
lrwxrwxrwx  1 root root 0  9&#26376;  6 20:59 sdb/device ->
../../devices/pci0000:00/0000:00:1d.7/usb5/5-1/5-1.2/5-1.2:1.0/host126/target126:0:0/126:0:0:0/
lrwxrwxrwx  1 root root 0  9&#26376;  6 20:58 sdc/device ->
../../devices/pci0000:00/0000:00:1d.7/usb5/5-1/5-1.3/5-1.3:1.0/host127/target127:0:0/127:0:0:0/
lrwxrwxrwx  1 root root 0  9&#26376;  6 20:59 sdd/device ->
../../devices/pci0000:00/0000:00:1d.7/usb5/5-1/5-1.4/5-1.4:1.0/host132/target132:0:0/132:0:0:0/
lrwxrwxrwx  1 root root 0  9&#26376;  6 20:59 sde/device ->
../../devices/pci0000:00/0000:00:1d.7/usb5/5-2/5-2.1/5-2.1:1.0/host129/target129:0:0/129:0:0:0/
lrwxrwxrwx  1 root root 0  9&#26376;  6 20:59 sdf/device ->
../../devices/pci0000:00/0000:00:1d.7/usb5/5-2/5-2.2/5-2.2:1.0/host130/target130:0:0/130:0:0:0/
lrwxrwxrwx  1 root root 0  9&#26376;  6 20:59 sdg/device ->
../../devices/pci0000:00/0000:00:1d.7/usb5/5-2/5-2.3/5-2.3:1.0/host131/target131:0:0/131:0:0:0/

/sys/block 1006$ ls -l /proc/bus/usb/005
&#32317;&#35336; 0
-rw-rw----  1 fsyang usb 43  9&#26376;  6 20:28 001
-rw-rw----  1 fsyang usb 43  9&#26376;  6 20:28 002
-rw-rw----  1 fsyang usb 43  9&#26376;  6 20:28 003
-rw-rw-r--  1 root   usb 50  9&#26376;  6 20:56 016
-rw-rw-r--  1 root   usb 50  9&#26376;  6 20:56 017
-rw-rw-r--  1 root   usb 50  9&#26376;  6 20:56 018
-rw-rw-r--  1 root   usb 50  9&#26376;  6 20:56 020
-rw-rw-r--  1 root   usb 50  9&#26376;  6 20:56 021
-rw-rw-r--  1 root   usb 50  9&#26376;  6 20:56 022
-rw-rw-r--  1 root   usb 50  9&#26376;  6 20:59 023

/sys/block 1007$ lsusb
Bus 005 Device 023: ID 058f:6369 Alcor Micro Corp. 
Bus 005 Device 022: ID 058f:6369 Alcor Micro Corp. 
Bus 005 Device 021: ID 058f:6369 Alcor Micro Corp. 
Bus 005 Device 020: ID 058f:6369 Alcor Micro Corp. 
Bus 005 Device 018: ID 058f:6369 Alcor Micro Corp. 
Bus 005 Device 017: ID 058f:6369 Alcor Micro Corp. 
Bus 005 Device 016: ID 058f:6369 Alcor Micro Corp. 
Bus 005 Device 003: ID 05e3:0605 Genesys Logic, Inc. 
Bus 005 Device 002: ID 05e3:0605 Genesys Logic, Inc. 
Bus 005 Device 001: ID 0000:0000  
Bus 004 Device 001: ID 0000:0000  
Bus 003 Device 001: ID 0000:0000  
Bus 002 Device 001: ID 0000:0000  
Bus 001 Device 001: ID 0000:0000  

The final part of /var/log/kernel/info is attached as "info1.bz2". The head is
gone since I periodically cut this file when testing so that it can be kept at a
reasonable size.
Comment 1 Feng-sung Yang 2005-09-06 06:19:11 UTC
Created attachment 5916 [details]
/var/log/kernel/info
Comment 2 Alan Stern 2005-09-08 07:28:18 UTC
Created attachment 5931 [details]
Fix errors in the SCSI core

Attached is a patch for 2.6.13-mm1 that fixes a number of errors in the SCSI
core.  Among other things, it should get rid of all those extra error-handler
processes.  Keep all the other patches and apply this one in addition.
Comment 3 Feng-sung Yang 2005-09-12 03:46:29 UTC
Created attachment 5977 [details]
info.050912.bz2
Comment 4 Feng-sung Yang 2005-09-12 03:56:23 UTC
Created attachment 5978 [details]
info.050912.bz2
Comment 5 Feng-sung Yang 2005-09-12 04:00:14 UTC
Thanks a lot.
I have tried this patch and I haven't encounter any previously reported problems
yet.
The patches I used in this kernel is:

-mm1
http://marc.theaimsgroup.com/?l=linux-usb-devel&m=112551468126219&w=2
msleep(2000) in drivers/usb/storage/transport.c
"Fix errors in the SCSI core" patch

However, when I use -mm2 patch to replace -mm1. Things become odd. The
usb_storage and scsi_eh_X drivers won't get released after the USB device is
removed even when only one card reader is plugged-in and then unplugged. And,
after some stress testings for multiple card readers, the USB system hangs as
before with a lot of driver left. Since the difference of USB drivers between
these two patches are small so I am curious if -mm2 shouldn't be used.

The following is the ps result when and after the device is disconnected:

[root@fsyang ~]# ps -eo pid,tid,class,rtprio,ni,pri,psr,pcpu,stat,wchan:30,comm
| grep usb
 3123  3123 TS       -   0  24   0  0.0 S    -                             
usb-storage
[root@fsyang ~]# ps -eo pid,tid,class,rtprio,ni,pri,psr,pcpu,stat,wchan:30,comm
| grep scsi
 3120  3120 TS       -  -5  28   0  0.0 S<   -                             
scsi_eh_0

[root@fsyang ~]# # Unplugged......

[root@fsyang ~]# ps -eo pid,tid,class,rtprio,ni,pri,psr,pcpu,stat,wchan:30,comm
| grep usb
 3123  3123 TS       -   0  24   0  0.0 D    kthread_stop                  
usb-storage
[root@fsyang ~]# ps -eo pid,tid,class,rtprio,ni,pri,psr,pcpu,stat,wchan:30,comm
| grep scsi
 3120  3120 TS       -  -5  29   0  0.0 S<   -                             
scsi_eh_0


And the /var/log/kernel/info is attached as info.050912.bz2.

Comment 6 Alan Stern 2005-09-28 13:58:35 UTC
I think this has been fixed in the latest kernel.  Try using 2.6.14-rc2-git6.
Comment 7 Feng-sung Yang 2005-10-12 01:37:33 UTC
Hi, I have tried linux-2.6.14-rc4 dozens of times and this problem didn't happen
again. Thanks a lot.
Comment 8 Alan Stern 2005-10-12 07:35:12 UTC
Okay, I'll mark this bug CLOSED.