Bug 5237 - scsi_eh not get released by disconnecting the device when doing rescan
Summary: scsi_eh not get released by disconnecting the device when doing rescan
Status: CLOSED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: USB (show other bugs)
Hardware: i386 Linux
: P2 normal
Assignee: Alan Stern
URL:
Keywords:
Depends on:
Blocks: USB
  Show dependency tree
 
Reported: 2005-09-12 21:11 UTC by Feng-sung Yang
Modified: 2005-11-23 11:23 UTC (History)
1 user (show)

See Also:
Kernel Version: 2.6.13-mm1, 2.6.14-rc4, 2.6.14
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
info.050913.bz2 (2.67 KB, application/octet-stream)
2005-09-12 21:12 UTC, Feng-sung Yang
Details
/var/log/kernel/info (7.00 KB, application/octet-stream)
2005-10-11 02:31 UTC, Feng-sung Yang
Details
info.expected (4.09 KB, application/octet-stream)
2005-10-20 00:40 UTC, Feng-sung Yang
Details
info.hang (5.40 KB, application/octet-stream)
2005-10-20 00:41 UTC, Feng-sung Yang
Details
rescan.py (1.17 KB, text/plain)
2005-10-20 00:41 UTC, Feng-sung Yang
Details
kallsyms (155.00 KB, application/octet-stream)
2005-10-20 00:42 UTC, Feng-sung Yang
Details
info.051101.bz2 (5.01 KB, application/octet-stream)
2005-10-31 18:42 UTC, Feng-sung Yang
Details
kallsyms.051101.bz2 (159.20 KB, application/octet-stream)
2005-10-31 18:43 UTC, Feng-sung Yang
Details
Fix race between sd_rescan and sd_remove (603 bytes, patch)
2005-11-02 14:23 UTC, Alan Stern
Details | Diff

Description Feng-sung Yang 2005-09-12 21:11:25 UTC
Most recent kernel where this bug did not occur: 2.6.13-mm1
Distribution: Mandriva Linux release 2006.0 (Cooker) for i586
Hardware Environment: i686 Intel(R) Pentium(R) 4 CPU 3.00GHz
Software Environment:
Problem Description:
I try to detect if the media is removed for the USB card readers. The steps are
simple. Just to check if the corresponding /sys files are still existent and
write the /sys rescan file to do the actual USB rescan. However, it seems a race
condition happens when doing the rescan since the user may also manually
disconnect the USB card reader at the same time. The write() to the rescan file
seems blocked forever (although can be killed normally by CTRL-C the program)
and the corresponding scsi_eh_XX driver won't get released. I write this program
in Python and it's large. As a result, I write another small perl program to
emulate the behavior. However, it seems that the perl program won't hang like
python. It's odd. But at least it tells you the steps in rough. By the way,
Kernel 2.6.11 seems okay...

The patches I used in this kernel is:

-mm1
http://marc.theaimsgroup.com/?l=linux-usb-devel&m=112551468126219&w=2
msleep(2000) in drivers/usb/storage/transport.c's Handle_Errors:
"Fix errors in the SCSI core" patch in
http://bugzilla.kernel.org/show_bug.cgi?id=5195


=====================================================================
#!/usr/bin/perl

use Time::HiRes qw(usleep);

$n = $ARGV[0];
$deviceDir = "/sys/devices/pci0000:00/0000:00:1d.7/usb5/5-1/5-1:1.0";
$rescanFile = "$deviceDir/host$n/target$n:0:0/$n:0:0:0/rescan";
$sizeFile = "$deviceDir/host$n/target$n:0:0/$n:0:0:0/block/size";

while (1) {
        if (! -d $deviceDir) {
                print "\"$deviceDir\" is gone!\n";
                last;
        }

        if (! -e $sizeFile) {
                print "\"$sizeFile\" is gone!\n";
                last;
        }

        $size = `$sizeFile`;
        chop($size);
        if ($size eq "0") {
                print "Size becomes 0!\n";
                last;
        }

        print "***open $rescanFile\n";
        open (RESCAN, ">$rescanFile") or die "Failed to open file\n";

        print "write 1 to rescan file\n";
        print RESCAN "1";

        print "try to close\n";
        close RESCAN;
        print "close done\n";

        usleep(100000); # 0.1 second
}

print "Finish!!!!!!";
=====================================================================


PS result after the test:
~/mptester 1040$ ps -eo pid,tid,class,rtprio,ni,pri,psr,pcpu,stat,wchan:30,comm
| grep usb
~/mptester 1041$ ps -eo pid,tid,class,rtprio,ni,pri,psr,pcpu,stat,wchan:30,comm
| grep scsi
 7561  7561 TS       -   0  19   0  0.0 S    -                             
scsi_eh_22

The /var/log/kernel/info is attached as info.050913.bz2.
Comment 1 Feng-sung Yang 2005-09-12 21:12:18 UTC
Created attachment 5983 [details]
info.050913.bz2
Comment 2 Feng-sung Yang 2005-09-12 22:25:20 UTC
It seems that even the perl sample code is translated into python. The write()
can still finish with "Segmentation fault". The thread must be used so the
write() will hang and cause scsi_eh_XX to hang also. The code is as below:

=====================================================================
#!/usr/bin/python

import os
import sys
import time
import thread

n = int(sys.argv[1])

deviceDir = '/sys/devices/pci0000:00/0000:00:1d.7/usb5/5-1/5-1:1.0/'
rescanFile = deviceDir + 'host%d/target%d:0:0/%d:0:0:0/rescan' % ((n,) * 3)
sizeFile = deviceDir + 'host%d/target%d:0:0/%d:0:0:0/block/size' % ((n,) * 3)

bDone = False

def Run():
	while True:
		if not os.path.isdir(deviceDir):
			print '"%s" is gone!' % deviceDir
			break

		if not os.path.exists(sizeFile):
			print '"%s" is gone!' % sizeFile
			break

		size = file(sizeFile, 'r').read()
		if size == "0\n":
			print "Size becomes 0!"
			break

		print '***open ' + rescanFile
		try:
			fd = os.open(rescanFile, os.O_WRONLY);

			print "write 1 to rescan file";
			os.write(fd, "1")

			print 'try to close'
			os.close(fd)
			print 'close done'
		except (IOError, OSError):
			break

		time.sleep(0.1);	# 0.1 second

if __name__ == "__main__":

	thread.start_new_thread(Run, ())

	while not bDone:
		time.sleep(0.2)

	print 'Finish!!!!!!'

=====================================================================
Comment 3 Greg Kroah-Hartman 2005-10-05 10:34:12 UTC
I don't understand, is the kernel oopsing when you do this?

If so, please provide the oops message.
Comment 4 Feng-sung Yang 2005-10-11 02:30:51 UTC
Greg, sorry I don't understand what you mean (My English is bad...) and reply
this so lately since I seemed not got notified. What I want is that the storage
driver can just remove the /sys files for that device when I try to do rescan
without hanging the write() function call.

When the card reader is removed while the rescan operation (write() function
call) is continuously tried, the usb_storage driver will disappear as expected.
The corresponding sys files in /sys/class/usb_device/, /sys/class/scsi_host/,
/sys/class/scsi_device/,  and /sys/class/usb_device/ will also disappear.
Howerver, the scsi_eh_X driver won't get released and my write() is pended
forever even after my program is terminated by CTRL-C.

Now I try linux-2.6.14-rc4 with squashfs and unionfs patches. The error
condition still happens. I tried several times to produce the error and the ps
result is different from what I previously reported:

~ 1000$ ps -eo pid,tid,class,rtprio,ni,pri,psr,pcpu,stat,wchan:30,comm | grep usb
~ 1001$ ps -eo pid,tid,class,rtprio,ni,pri,psr,pcpu,stat,wchan:30,comm | grep scsi
 3838  3838 TS       -  -5  28   0  0.0 S<   scsi_error_handler            
scsi_eh_0
~ 1002$ 

By the way, I wrongly filled the "Most recent kernel where this bug did not
occur:" field. It should be "2.6.11" instead of "2.6.13-mm1"

The usb storage verbose message "info.051011.bz2" is as attached.
Comment 5 Feng-sung Yang 2005-10-11 02:31:41 UTC
Created attachment 6275 [details]
/var/log/kernel/info
Comment 6 Feng-sung Yang 2005-10-20 00:39:07 UTC
Hi,
This time I enabled the SCS log function to log all SCSI messages. Two logs are
attached. The first one (info.expected) is for the non-thread version. The
rescan operation will fail and my program will terminate as expected. The second
one (info.hang) is for the threaded version and the write() hangs. After I
CTRL-C the program after some delay, the log is still the same (no more messages
appended). I also attached my simple python program (use the variable bUseThread
to enable/disable thread). The kernel symbol file is also attached. It seems the
problem is in the SCSI module....
Comment 7 Feng-sung Yang 2005-10-20 00:40:46 UTC
Created attachment 6339 [details]
info.expected
Comment 8 Feng-sung Yang 2005-10-20 00:41:17 UTC
Created attachment 6340 [details]
info.hang
Comment 9 Feng-sung Yang 2005-10-20 00:41:42 UTC
Created attachment 6341 [details]
rescan.py
Comment 10 Feng-sung Yang 2005-10-20 00:42:18 UTC
Created attachment 6342 [details]
kallsyms
Comment 11 Alan Stern 2005-10-31 12:39:50 UTC
Can you try this again using 2.6.14?
Comment 12 Feng-sung Yang 2005-10-31 18:42:49 UTC
Created attachment 6431 [details]
info.051101.bz2

Hi,
    The result is the same for 2.6.14 (with unionfs and squashfs patches). I
also update the kernel log and kallsyms here.

[root@fsyang fsyang]# ps -eo
pid,tid,class,rtprio,ni,pri,psr,pcpu,stat,wchan:30,comm | grep usb
[root@fsyang fsyang]# ps -eo
pid,tid,class,rtprio,ni,pri,psr,pcpu,stat,wchan:30,comm | grep scsi
 4913  4913 TS	     -	-5  28	 0  0.0 S<   scsi_error_handler 	   
scsi_eh_3

By the way, could any body tell me why I am not notified when this bug is
modified? I had ever received some notifications before. But now I am not
notified so that I must check this bug daily... Thanks...
Comment 13 Feng-sung Yang 2005-10-31 18:43:18 UTC
Created attachment 6432 [details]
kallsyms.051101.bz2
Comment 14 Alan Stern 2005-11-01 13:20:57 UTC
I am able to reproduce the failure on my computer, and I'm working to fix it.

You should be receiving email notifications about updates to this bug.  Bugzilla
does send out a message automatically to the address listed as the Submitter.
Comment 15 Alan Stern 2005-11-02 14:23:27 UTC
Created attachment 6452 [details]
Fix race between sd_rescan and sd_remove

Try the attached patch.  It fixed the problem on my computer.
Comment 16 Feng-sung Yang 2005-11-03 00:32:48 UTC
Hi,
    I have tried your patch and things worked as expected so far. Thanks a lot.

Note You need to log in before you can comment on or make changes to this bug.