Most recent kernel where this bug did not occur: 2.6.13-mm1 Distribution: Mandriva Linux release 2006.0 (Cooker) for i586 Hardware Environment: i686 Intel(R) Pentium(R) 4 CPU 3.00GHz Software Environment: Problem Description: I try to detect if the media is removed for the USB card readers. The steps are simple. Just to check if the corresponding /sys files are still existent and write the /sys rescan file to do the actual USB rescan. However, it seems a race condition happens when doing the rescan since the user may also manually disconnect the USB card reader at the same time. The write() to the rescan file seems blocked forever (although can be killed normally by CTRL-C the program) and the corresponding scsi_eh_XX driver won't get released. I write this program in Python and it's large. As a result, I write another small perl program to emulate the behavior. However, it seems that the perl program won't hang like python. It's odd. But at least it tells you the steps in rough. By the way, Kernel 2.6.11 seems okay... The patches I used in this kernel is: -mm1 http://marc.theaimsgroup.com/?l=linux-usb-devel&m=112551468126219&w=2 msleep(2000) in drivers/usb/storage/transport.c's Handle_Errors: "Fix errors in the SCSI core" patch in http://bugzilla.kernel.org/show_bug.cgi?id=5195 ===================================================================== #!/usr/bin/perl use Time::HiRes qw(usleep); $n = $ARGV[0]; $deviceDir = "/sys/devices/pci0000:00/0000:00:1d.7/usb5/5-1/5-1:1.0"; $rescanFile = "$deviceDir/host$n/target$n:0:0/$n:0:0:0/rescan"; $sizeFile = "$deviceDir/host$n/target$n:0:0/$n:0:0:0/block/size"; while (1) { if (! -d $deviceDir) { print "\"$deviceDir\" is gone!\n"; last; } if (! -e $sizeFile) { print "\"$sizeFile\" is gone!\n"; last; } $size = `$sizeFile`; chop($size); if ($size eq "0") { print "Size becomes 0!\n"; last; } print "***open $rescanFile\n"; open (RESCAN, ">$rescanFile") or die "Failed to open file\n"; print "write 1 to rescan file\n"; print RESCAN "1"; print "try to close\n"; close RESCAN; print "close done\n"; usleep(100000); # 0.1 second } print "Finish!!!!!!"; ===================================================================== PS result after the test: ~/mptester 1040$ ps -eo pid,tid,class,rtprio,ni,pri,psr,pcpu,stat,wchan:30,comm | grep usb ~/mptester 1041$ ps -eo pid,tid,class,rtprio,ni,pri,psr,pcpu,stat,wchan:30,comm | grep scsi 7561 7561 TS - 0 19 0 0.0 S - scsi_eh_22 The /var/log/kernel/info is attached as info.050913.bz2.
Created attachment 5983 [details] info.050913.bz2
It seems that even the perl sample code is translated into python. The write() can still finish with "Segmentation fault". The thread must be used so the write() will hang and cause scsi_eh_XX to hang also. The code is as below: ===================================================================== #!/usr/bin/python import os import sys import time import thread n = int(sys.argv[1]) deviceDir = '/sys/devices/pci0000:00/0000:00:1d.7/usb5/5-1/5-1:1.0/' rescanFile = deviceDir + 'host%d/target%d:0:0/%d:0:0:0/rescan' % ((n,) * 3) sizeFile = deviceDir + 'host%d/target%d:0:0/%d:0:0:0/block/size' % ((n,) * 3) bDone = False def Run(): while True: if not os.path.isdir(deviceDir): print '"%s" is gone!' % deviceDir break if not os.path.exists(sizeFile): print '"%s" is gone!' % sizeFile break size = file(sizeFile, 'r').read() if size == "0\n": print "Size becomes 0!" break print '***open ' + rescanFile try: fd = os.open(rescanFile, os.O_WRONLY); print "write 1 to rescan file"; os.write(fd, "1") print 'try to close' os.close(fd) print 'close done' except (IOError, OSError): break time.sleep(0.1); # 0.1 second if __name__ == "__main__": thread.start_new_thread(Run, ()) while not bDone: time.sleep(0.2) print 'Finish!!!!!!' =====================================================================
I don't understand, is the kernel oopsing when you do this? If so, please provide the oops message.
Greg, sorry I don't understand what you mean (My English is bad...) and reply this so lately since I seemed not got notified. What I want is that the storage driver can just remove the /sys files for that device when I try to do rescan without hanging the write() function call. When the card reader is removed while the rescan operation (write() function call) is continuously tried, the usb_storage driver will disappear as expected. The corresponding sys files in /sys/class/usb_device/, /sys/class/scsi_host/, /sys/class/scsi_device/, and /sys/class/usb_device/ will also disappear. Howerver, the scsi_eh_X driver won't get released and my write() is pended forever even after my program is terminated by CTRL-C. Now I try linux-2.6.14-rc4 with squashfs and unionfs patches. The error condition still happens. I tried several times to produce the error and the ps result is different from what I previously reported: ~ 1000$ ps -eo pid,tid,class,rtprio,ni,pri,psr,pcpu,stat,wchan:30,comm | grep usb ~ 1001$ ps -eo pid,tid,class,rtprio,ni,pri,psr,pcpu,stat,wchan:30,comm | grep scsi 3838 3838 TS - -5 28 0 0.0 S< scsi_error_handler scsi_eh_0 ~ 1002$ By the way, I wrongly filled the "Most recent kernel where this bug did not occur:" field. It should be "2.6.11" instead of "2.6.13-mm1" The usb storage verbose message "info.051011.bz2" is as attached.
Created attachment 6275 [details] /var/log/kernel/info
Hi, This time I enabled the SCS log function to log all SCSI messages. Two logs are attached. The first one (info.expected) is for the non-thread version. The rescan operation will fail and my program will terminate as expected. The second one (info.hang) is for the threaded version and the write() hangs. After I CTRL-C the program after some delay, the log is still the same (no more messages appended). I also attached my simple python program (use the variable bUseThread to enable/disable thread). The kernel symbol file is also attached. It seems the problem is in the SCSI module....
Created attachment 6339 [details] info.expected
Created attachment 6340 [details] info.hang
Created attachment 6341 [details] rescan.py
Created attachment 6342 [details] kallsyms
Can you try this again using 2.6.14?
Created attachment 6431 [details] info.051101.bz2 Hi, The result is the same for 2.6.14 (with unionfs and squashfs patches). I also update the kernel log and kallsyms here. [root@fsyang fsyang]# ps -eo pid,tid,class,rtprio,ni,pri,psr,pcpu,stat,wchan:30,comm | grep usb [root@fsyang fsyang]# ps -eo pid,tid,class,rtprio,ni,pri,psr,pcpu,stat,wchan:30,comm | grep scsi 4913 4913 TS - -5 28 0 0.0 S< scsi_error_handler scsi_eh_3 By the way, could any body tell me why I am not notified when this bug is modified? I had ever received some notifications before. But now I am not notified so that I must check this bug daily... Thanks...
Created attachment 6432 [details] kallsyms.051101.bz2
I am able to reproduce the failure on my computer, and I'm working to fix it. You should be receiving email notifications about updates to this bug. Bugzilla does send out a message automatically to the address listed as the Submitter.
Created attachment 6452 [details] Fix race between sd_rescan and sd_remove Try the attached patch. It fixed the problem on my computer.
Hi, I have tried your patch and things worked as expected so far. Thanks a lot.