Bug 18422 - pl2303 stalls blocking calls to open()
Summary: pl2303 stalls blocking calls to open()
Status: RESOLVED INVALID
Alias: None
Product: Drivers
Classification: Unclassified
Component: USB (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Greg Kroah-Hartman
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-09-13 21:02 UTC by Søren Holm
Modified: 2010-09-27 06:39 UTC (History)
1 user (show)

See Also:
Kernel Version: 2.6.36
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
Test-application that opens and closes /dev/ttyUSB0 forever (290 bytes, text/x-csrc)
2010-09-21 21:06 UTC, Søren Holm
Details
Debug output during device attach, open, and close. NOT working scenario (75.87 KB, text/x-log)
2010-09-25 16:21 UTC, Søren Holm
Details
Debug output during device attach, open, and close. WORKING scenario (9.97 KB, text/x-log)
2010-09-25 16:22 UTC, Søren Holm
Details

Description Søren Holm 2010-09-13 21:02:49 UTC
My USB-2-serial adapter has somewhere between Ubuntu lucid begun to have strange. The most apparent case is when trying to open the device /dev/ttyUSB0 with a simple open("/dev/ttyUSB0","rw"). That call will allways block and I never gets to actually read from the device.

A workaround for me is to open the device in a non-blocking maner. I use gtkterm. While opening it is now possible to simultaneusly call open("dev","rw") on the port and it suceeds. Now I close gtkterm while the blocking application is running (reading,writing etc.). Hereafter I can close the port and open it again without problems. The even more worrying thing is that unplugging the device, unloading pl2303, replugging the device does not make the problem come back. It works until next reboot.

I have not compiled a custom kernel to try to fix the issue (it is still a problem in 2.6.36-rc4 though), but I did use git see the changes to pl2303 since the current ubuntu lucid kernel. But since the problem persists over driver unload+device unplug I suspect that it might not be the driver. (NOTE: normal serial ports does not work as intended either, but that is problably another bug).

$ git log Ubuntu-2.6.32-25.43..Ubuntu-2.6.35-20.29 --oneline -- drivers/usb/serial/pl2303.*
a848012 USB: pl2303: New vendor and product id
5d78fcb USB: pl2303.h: checkpatch cleanups
d45cc8d USB: pl2303: use tty_insert_flip_string_fixed_flag
8b0127b USB: pl2303: use generic close
684c6e3 USB: pl2303: switch to generic write implementation
f08e07a USB: pl2303: switch to generic read implementation
7919c2f USB: pl2303: increase bulk-in buffer size to 256 byte
3efeaff USB: pl2303: increase the bulk-out buffer size to 256 byte
bd5afa9 usb-serial: Use tty_port version console instead of usb_serial_port
18344a1 USB: serial: pl2303: Hybrid reader Uniform HCR331
9a61d72 USB: pl2303: add AdLink ND-6530 USB IDs
124d255 USB: pl2303: remove unnecessary reset of usb_device in urbs
67b9946 USB: pl2303: initial TIOCGSERIAL support
1f87158 USB: remove references to port->port.count from the serial drivers
a108bfc USB: tty: Prune uses of tty_request_room in the USB layer
7d40d7e USB serial: make USB device id constant
Comment 1 Greg Kroah-Hartman 2010-09-13 21:10:35 UTC
On Mon, Sep 13, 2010 at 09:02:51PM +0000, bugzilla-daemon@bugzilla.kernel.org wrote:
>     Kernel Version: 2.6.36

There is no 2.6.36 kernel released yet.

> My USB-2-serial adapter has somewhere between Ubuntu lucid begun to have
> strange. The most apparent case is when trying to open the device
> /dev/ttyUSB0
> with a simple open("/dev/ttyUSB0","rw"). That call will allways block and I
> never gets to actually read from the device.

What does 'strace' show here?

> I have not compiled a custom kernel to try to fix the issue (it is still a
> problem in 2.6.36-rc4 though), but I did use git see the changes to pl2303
> since the current ubuntu lucid kernel. But since the problem persists over
> driver unload+device unplug I suspect that it might not be the driver. (NOTE:
> normal serial ports does not work as intended either, but that is problably
> another bug).

Please use a kernel.org kernel, we can not help you out with any Ubuntu
kernel.  If you can reproduce the issue on 2.6.36-rc4 from kernel.org,
please email the strace output to linux-usb@vger.kernel.org and we can
take it from there.
Comment 2 Søren Holm 2010-09-14 06:48:25 UTC
Sorry Greg.....  it is of cause 2.6.36-rc4 I have tried out. The issue can be tested easy with "strace cat /dev/ttyUSB0"

It outputs this :

sgh@koontz:~$ strace cat /dev/ttyUSB0 
execve("/bin/cat", ["cat", "/dev/ttyUSB0"], [/* 43 vars */]) = 0
brk(0)                                  = 0x1697000
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbcb0112000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY)      = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=119130, ...}) = 0
mmap(NULL, 119130, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fbcb00f4000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
open("/lib/libc.so.6", O_RDONLY)        = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\240\356\1\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1572232, ...}) = 0
mmap(NULL, 3680296, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fbcafb71000
mprotect(0x7fbcafceb000, 2093056, PROT_NONE) = 0
mmap(0x7fbcafeea000, 20480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x179000) = 0x7fbcafeea000
mmap(0x7fbcafeef000, 18472, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7fbcafeef000
close(3)                                = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbcb00f3000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbcb00f2000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbcb00f1000
arch_prctl(ARCH_SET_FS, 0x7fbcb00f2700) = 0
mprotect(0x7fbcafeea000, 16384, PROT_READ) = 0
mprotect(0x60a000, 4096, PROT_READ)     = 0
mprotect(0x7fbcb0114000, 4096, PROT_READ) = 0
munmap(0x7fbcb00f4000, 119130)          = 0
brk(0)                                  = 0x1697000
brk(0x16b8000)                          = 0x16b8000
open("/usr/lib/locale/locale-archive", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=4214144, ...}) = 0
mmap(NULL, 4214144, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fbcaf76c000
close(3)                                = 0
fstat(1, {st_mode=S_IFCHR|0600, st_rdev=makedev(136, 1), ...}) = 0
open("/dev/ttyUSB0", O_RDONLY

And it just stall somewhere in the open call.
Comment 3 Søren Holm 2010-09-21 21:06:53 UTC
Created attachment 30932 [details]
Test-application that opens and closes /dev/ttyUSB0 forever

The test usually stops at the first iteration. The desired result is output like this:

#1 Opening port ...done.
#2 Opening port ...done.                                                                                                               
#3 Opening port ...done.                                                                                                               
#4 Opening port ...done.
etc.
etc.
etc.
Comment 4 Søren Holm 2010-09-21 21:18:57 UTC
Ok. After trying different kernel version I have found a more concrete workaround.

Running pl2303_test normally gives the following output :

# ./pl2303_test 
#1 Opening port ...

The open call fails :(

But, doing the following gets the driver in a working state:

# rmmod pl2303
# rmmod usbserial
# modprobe pl2303

It is not enough to just reload pl2303 - usbserial has to be unloaded as well. Additionally the usb2serial adaptor must have been connected prior to doing this. It does not matter if it is connected while unloading the modules. The final modprobe pl2303 can be replaced by (re)connecting the usb2serial adaptor.

If I run pl2303_test now it gives me this output :

# ./pl2303_test                                                                                             
#1 Opening port ...done.
#2 Opening port ...done.                                                                                                               
#3 Opening port ...done.                                                                                                               
#4 Opening port ...done.
#5 Opening port ...done.
#6 Opening port ...done.
#7 Opening port ...done.
#8 Opening port ...done.
#9 Opening port ...done.
#10 Opening port ...done.
#11 Opening port ...done.
#12 Opening port ...done
etc.
etc.
etc.

The speed is around 3.5 iterations per second. That's sweet and the workaround work every time.

This workaround works for 2.6.35 and problably also for 2.6.36-rc4

I hope it helps tracking down this bug.
Comment 5 Alan Stern 2010-09-24 21:11:28 UTC
Søren, when you receive this message from Bugzilla, forward it to the linux-usb mailing list.  I think any further emails in that thread will be stored in the bug report.
Comment 6 Søren Holm 2010-09-25 16:21:41 UTC
Created attachment 31472 [details]
Debug output during device attach, open, and close. NOT working scenario
Comment 7 Søren Holm 2010-09-25 16:22:24 UTC
Created attachment 31482 [details]
Debug output during device attach, open, and close. WORKING scenario
Comment 8 Søren Holm 2010-09-25 16:26:44 UTC
I have now compiled pl2303 and usbserial in debugging mode. The previous two attachments contains the output during device connect, open and close.

The log are generated as follows.

1.  Rebooted into my custom kernel
2.  Attach the usb2serial adaptor
3.  Execute "cat /dev/ttyUSB0" (blocks on open)
4.  Kill the cat with CTRL-C
5.  rmmod pl2303
6.  rmmod usbserial
7.  Disconnect the usb2serial adaptor
8.  Attach the usb2serial adaptor
9.  Execute "cat /dev/ttyUSB0" (blocks on read)
10. Kill the cat with CTRL-C
Comment 9 Søren Holm 2010-09-25 23:35:14 UTC
I have modified pl2303_open(...) so it looks like this :

.....
	if (port->interrupt_in_urb)
		dbg("%s - port->interrupt_in_urb!=0\n", __func__);
	else
		dbg("%s - port->interrupt_in_urb==0 :(\n", __func__);
	dbg("%s - submitting interrupt urb\n", __func__);
	result = usb_submit_urb(port->interrupt_in_urb, GFP_KERNEL);
	dbg("%s - usb_submit_urb returned %d\n", __func__, result);
	if (result) {
		dev_err(&port->dev, "%s - failed submitting interrupt urb,"
			" error %d\n", __func__, result);
		pl2303_close(port);
		return -EPROTO;
	}
	dbg("%s - setting drain delay\n", __func__);
	port->port.drain_delay = 256;
	return 0;
}

After an open-call the log contains this :
Sep 26 01:28:17 phoenix kernel: [ 3113.002515] /home/sgh/git/linux-2.6/drivers/usb/serial/pl2303.c: pl2303_open - submitting read urb
Sep 26 01:28:17 phoenix kernel: [ 3113.002542] /home/sgh/git/linux-2.6/drivers/usb/serial/pl2303.c: pl2303_open - port->interrupt_in_urb!=0
Sep 26 01:28:17 phoenix kernel: [ 3113.002546] 
Sep 26 01:28:17 phoenix kernel: [ 3113.002551] /home/sgh/git/linux-2.6/drivers/usb/serial/pl2303.c: pl2303_open - submitting interrupt urb
Sep 26 01:28:17 phoenix kernel: [ 3113.002555] 
Sep 26 01:28:17 phoenix kernel: [ 3113.002571] /home/sgh/git/linux-2.6/drivers/usb/serial/pl2303.c: pl2303_open - usb_submit_urb returned 0
Sep 26 01:28:17 phoenix kernel: [ 3113.002575] 
Sep 26 01:28:17 phoenix kernel: [ 3113.002579] /home/sgh/git/linux-2.6/drivers/usb/serial/pl2303.c: pl2303_open - setting drain delay
Sep 26 01:28:17 phoenix kernel: [ 3113.002583] 
Sep 26 01:28:17 phoenix kernel: [ 3113.004451] /home/sgh/git/linux-2.6/drivers/usb/serial/pl2303.c: pl2303_read_int_callback (0)
Sep 26 01:28:17 phoenix kernel: [ 3113.004468] pl2303 ttyUSB0: pl2303_read_int_callback - length = 10, data = a1 20 00 00 00 00 02 00 00 02 
Sep 26 01:28:17 phoenix kernel: [ 3113.006015] /home/sgh/git/linux-2.6/drivers/usb/serial/pl2303.c: set_control_lines - value = 3, retval = 0

So it is not in pl2303_open the open syscall blocks. serial_open(...) does not contain anything interresting either. But since rmmod both usbserial pl23303 it might not have anything to do with pl2303. Removing just pl2303 does not help the problem. May some uninitialized stuff in usbserial?
Comment 10 Alan Stern 2010-09-26 03:26:14 UTC
On Sat, 25 Sep 2010 bugzilla-daemon@bugzilla.kernel.org wrote:

> So it is not in pl2303_open the open syscall blocks. serial_open(...) does
> not
> contain anything interresting either. But since rmmod both usbserial pl23303
> it
> might not have anything to do with pl2303. Removing just pl2303 does not help
> the problem. May some uninitialized stuff in usbserial?

No, I don't think the problem has anything to do with usbserial.  If
you compare the two debugging logs, you'll see that a lot of extra
stuff happens when you first plug the device in, compared to when you
reload the drivers.  The port gets opened a couple of times and the
baud rate gets set to several different values (grep for "baud set" in
the log).  Probably this is done by system-level software, such as udev
or NetworkManager.

The problem is most likely something really stupid and obvious.  For
example, maybe the system programs set the port to use hardware
handshaking.  Then since there's no handshaking signal, your "cat"  
blocks during open.  When you reload the drivers, the port doesn't get
set to use hardware handshaking, so the open completes normally.

That's why you shouldn't use programs like "cat" for testing serial 
ports.  I asked you before to try minicom; please do so.

Alan Stern
Comment 11 Søren Holm 2010-09-26 16:28:30 UTC
Thanks for helping out Alan & Greg. modem-manager messed up the ports. So it is not a kernel-think afterall.
Comment 12 Alan Stern 2010-09-27 01:08:05 UTC
On Sun, 26 Sep 2010 bugzilla-daemon@bugzilla.kernel.org wrote:

> Thanks for helping out Alan & Greg. modem-manager messed up the ports. So it
> is
> not a kernel-think afterall.

How did you figure out which program was causing the problem?

And why did the problem occur even in single-user mode?
Comment 13 Søren Holm 2010-09-27 06:39:28 UTC
That a good question :)

Single-user is apparently not as 'single' as I thought it was. Network-manager and friends where running.

First I tried killing udevd - it helped. Then I removed rules and killed running applications until it went away.

Note You need to log in before you can comment on or make changes to this bug.