Bug 6596

Summary: kernel bug at unloading pl2303
Product: Drivers Reporter: Michael (auslands-kv)
Component: SerialAssignee: Greg Kroah-Hartman (greg)
Status: RESOLVED CODE_FIX    
Severity: blocking CC: stern, zaitcev
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.16 Subsystem:
Regression: --- Bisected commit-id:
Bug Depends on:    
Bug Blocks: 5089    
Attachments: Test fix #1 - wait until scheduled work completes in disconnect
Test fix #2 (on top of #1) - move the wait where it belongs

Description Michael 2006-05-22 05:17:47 UTC
Most recent kernel where this bug did not occur:
Distribution: Kanotix
Hardware Environment: Thinkpad X31
Software Environment: Gnu/linux Unstable
Problem Description:

Attaching a serial-to-usb converter (pl2303) automatically loads the correct
drivers. 

May 22 13:47:16 LaptopMB kernel: pl2303 1-1.2.1.2:1.0: pl2303 converter detected
May 22 13:47:16 LaptopMB kernel: usb 1-1.2.1.2: pl2303 converter now attached to
ttyUSB1
May 22 13:47:16 LaptopMB kernel: usbcore: registered new driver pl2303
May 22 13:47:16 LaptopMB kernel: drivers/usb/serial/pl2303.c: Prolific PL2303
USB to serial adaptor driver

However, the driver gives an error and does not work with the device. Syslog
says with every connection:

May 22 13:47:16 LaptopMB kernel: pl2303 ttyUSB1: pl2303_open - failed submitting
interrupt urb, error -28

When detaching the device, a kernel bug appears:

May 22 13:47:22 LaptopMB kernel: usb 1-1.2.1: USB disconnect, address 6
May 22 13:47:22 LaptopMB kernel: usb 1-1.2.1.1: USB disconnect, address 7
May 22 13:47:22 LaptopMB kernel: usb 1-1.2.1.1.2: USB disconnect, address 9
May 22 13:47:22 LaptopMB kernel: drivers/usb/class/usblp.c: usblp0: removed
May 22 13:47:22 LaptopMB kernel: usb 1-1.2.1.2: USB disconnect, address 8
May 22 13:47:22 LaptopMB kernel: ------------[ cut here ]------------
May 22 13:47:22 LaptopMB kernel: kernel BUG at kernel/workqueue.c:109!
May 22 13:47:22 LaptopMB kernel: invalid opcode: 0000 [#1]
May 22 13:47:22 LaptopMB kernel: PREEMPT 
May 22 13:47:22 LaptopMB kernel: Modules linked in: pl2303 radeon drm rfcomm
l2cap bluetooth lp ftdi_sio usbserial thermal fan button battery ac usblp capifs
cryptoloop af_packet xt_tcpudp xt_state ip6table_filter ip6_tables ipv6
iptable_filter ip_tables x_tables ip_conntrack_tftp ip_conntrack_proto_sctp
ip_conntrack_pptp ip_conntrack_netlink ip_nat ip_conntrack_netbios_ns
ip_conntrack_irc ip_conntrack_ftp ip_conntrack_amanda ip_conntrack nfnetlink
ibm_acpi nvram speedstep_centrino freq_table processor eth1394 pcmcia tsdev e100
mii yenta_socket ohci1394 ipw2100 ieee80211 ieee80211_crypt ieee1394
rsrc_nonstatic pcmcia_core irtty_sir sir_dev irda crc_ccitt parport_pc
snd_intel8x0 parport snd_ac97_codec snd_ac97_bus snd_pcm_oss snd_mixer_oss
snd_pcm 8250_pnp psmouse serio_raw snd_timer pcspkr snd soundcore snd_page_alloc
i2c_i801 hw_random shpchp pci_hotplug intel_agp agpgart 8250_pci 8250
serial_core evdev usbmouse usbhid usbkbd uhci_hcd ehci_hcd usbcore
May 22 13:47:22 LaptopMB kernel: CPU:    0
May 22 13:47:22 LaptopMB kernel: EIP:    0060:[queue_work+85/96]    Tainted: P 
    VLI
May 22 13:47:22 LaptopMB kernel: EFLAGS: 00210286   (2.6.16.16-kanotix-up-1 #1) 
May 22 13:47:22 LaptopMB kernel: EIP is at queue_work+0x55/0x60
May 22 13:47:22 LaptopMB kernel: eax: f519713c   ebx: 00000000   ecx: dffd33c0 
 edx: f5197138
May 22 13:47:22 LaptopMB kernel: esi: ecfdc340   edi: ecfdc340   ebp: dfb2e214 
 esp: c1d33e78
May 22 13:47:22 LaptopMB kernel: ds: 007b   es: 007b   ss: 0068
May 22 13:47:22 LaptopMB kernel: Process khubd (pid: 355, threadinfo=c1d32000
task=c1da5560)
May 22 13:47:22 LaptopMB kernel: Stack: <0>00000000 f930ab37 f5197000 dfb2e200
f93ac040 f93ac064 dfb2e214 f8e89106 
May 22 13:47:22 LaptopMB kernel:        dfb2e200 dfb2e27c dfb2e214 c02fe0ff
dfb2e214 dfb2e214 dfb2e214 f8e9a060 
May 22 13:47:22 LaptopMB kernel:        c02fe308 dfb2e214 c02fdaa9 dfb2e214
dfb2e25c dfb2e214 f14b3858 00000000 
May 22 13:47:22 LaptopMB kernel: Call Trace:
May 22 13:47:22 LaptopMB kernel:  [pg0+953723703/1067967488]
usb_serial_disconnect+0x47/0xb0 [usbserial]
May 22 13:47:22 LaptopMB kernel:  [pg0+948998406/1067967488]
usb_unbind_interface+0x36/0x80 [usbcore]
May 22 13:47:22 LaptopMB kernel:  [__device_release_driver+79/112]
__device_release_driver+0x4f/0x70
May 22 13:47:22 LaptopMB kernel:  [device_release_driver+24/48]
device_release_driver+0x18/0x30
May 22 13:47:22 LaptopMB kernel:  [bus_remove_device+137/176]
bus_remove_device+0x89/0xb0
May 22 13:47:22 LaptopMB kernel:  [device_del+57/112] device_del+0x39/0x70
May 22 13:47:22 LaptopMB kernel:  [pg0+948994406/1067967488]
usb_disable_device+0xb6/0x110 [usbcore]
May 22 13:47:22 LaptopMB kernel:  [pg0+948969907/1067967488]
usb_disconnect+0x83/0xf0 [usbcore]
May 22 13:47:22 LaptopMB kernel:  [pg0+948969889/1067967488]
usb_disconnect+0x71/0xf0 [usbcore]
May 22 13:47:22 LaptopMB kernel:  [pg0+948979308/1067967488]
hub_thread+0x3bc/0xc84 [usbcore]
May 22 13:47:22 LaptopMB kernel:  [autoremove_wake_function+0/64]
autoremove_wake_function+0x0/0x40
May 22 13:47:22 LaptopMB kernel:  [kthread+139/192] kthread+0x8b/0xc0
May 22 13:47:22 LaptopMB kernel:  [pg0+948978352/1067967488]
hub_thread+0x0/0xc84 [usbcore]
May 22 13:47:22 LaptopMB kernel:  [kthread+183/192] kthread+0xb7/0xc0
May 22 13:47:22 LaptopMB kernel:  [kthread+0/192] kthread+0x0/0xc0
May 22 13:47:22 LaptopMB kernel:  [kernel_thread_helper+5/16]
kernel_thread_helper+0x5/0x10
May 22 13:47:22 LaptopMB kernel: Code: 14 8b 40 08 a8 08 75 1a 89 d8 5b c3 8d 42
04 3b 42 04 75 17 8b 01 bb 01 00 00 00 e8 d6 fa ff ff eb d1 e8 9f 1f 2e 00 89 d8
5b c3 <0f> 0b 6d 00 a1 dd 43 c0 eb df 90 89 c2 a1 78 de 54 c0 e9 94 ff 
May 22 13:47:22 LaptopMB kernel:  <6>note: khubd[355] exited with preempt_count 1

Steps to reproduce:
Difficult, I guess, as my Desktop running the same distro and the same kernel
does not have this problem. I hope the syslog helps a bit.
Maybe one addition: I also have another serial-to-usb device connected (no
pl2303), so this one is the second. The first one uses the fddi_sio driver and
has never shown any problems.
Comment 1 Alan Stern 2006-05-22 12:06:33 UTC
It might help if you didn't have USB hubs nested four deep.

Also, the converter would probably work better if you plugged it directly into
the computer or into a USB-1.1 hub which was attached to the computer.  The
ehci-hcd driver still has problems handling full-speed devices plugged into
high-speed hubs; one typical symptom of this is a -28 (-ENOSPC) error code.
Comment 2 Michael 2006-05-22 12:22:20 UTC
Well, you are definately right. I disassembled my whole usb setup and connected
the pl2303 directly to the usb port -> works and no kernel bug. Good to know!

However, for me this is not a very usable workaround, as I need this setup. I
have several usb devices that I need to switch between three computers. Some of
them are USB2.0 devices.

So it
Comment 3 Alan Stern 2006-05-22 12:53:18 UTC
There are two issues here: the -28 error and the oops upon disconnect.  Probably
one triggers the other.

Anyway, the -28 error (caused by use of a full-speed device behind a high-speed
hub) is a known problem and people are working on it.  There's no reason to file
a bug report for it.

The oops is a real error and it should be fixed.  I don't know enough about the
usb-serial driver to be able to help, but other USB developers do.

It's not clear whether your use of nested hubs has anything to do with either
problem.  That is, what matters might be only the total number of hubs and not
whether they are nested.  Or whether the serial device is plugged into a hub vs.
plugged directly into the computer.
Comment 4 Pete Zaitcev 2006-05-22 13:03:43 UTC
See also:
 https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=180463
I keep thinking that I need to look closer, just need to find a moment.
Comment 5 Michael 2006-05-22 13:15:04 UTC
Oh yes, that
Comment 6 Alan Stern 2006-05-22 13:57:31 UTC
My guess is that the problem starts with the -28 error.  That causes the serial
device not to be registered in the first place.  Then when you unplug the pl2303
it tries to unregister the device, which causes the oops.

It should be easy enough to simulate this by hacking the driver.  Instead of
submitting the interrupt URB, make the driver think the submission failed and
then see what happens.
Comment 7 Pete Zaitcev 2006-05-22 21:37:07 UTC
Created attachment 8187 [details]
Test fix #1 - wait until scheduled work completes in disconnect
Comment 8 Pete Zaitcev 2006-05-22 21:38:36 UTC
Michael, please test the attached patch, thanks.
Comment 9 Michael 2006-05-23 00:20:45 UTC
Unbelievable! Just a few hours after my report there is already the first fix!!!!

Unfortunately, although I would like to help as much as possible, I am afraid
testing the patch is far beyond by capabilities. :(

I am (more or less) just an ordinary user. Kernel is a binary kernel from the
Kanotix project. I have never done any kernel compiling, not at all any
patching. I might find out how to do this, but I'm afraid, this will take quite
a while (at least much longer than it took to create the patch.

What I will do, is contact one of the developers of the kanotix project, ask him
to include this patch into his latest development kernel (2.6.17) and send my
the binary.

Thanks,

Michael
Comment 10 Michael 2006-05-24 01:39:24 UTC
These guys at Kanotix are really incredibly helpful!! Yesterday evening I asked
one of the developers if he could include the patch in his next development
kernel and in the morning I already had the new kernel install binary in my mail
box!!!

The patch works. No more kernel bug. Of course, the -28 error still occurs and
the adaptor still doesn
Comment 11 Pete Zaitcev 2006-05-24 10:59:14 UTC
Michael, thanks for testing. I'll forward the patch for Greg, so Kanotix
people won't need to continue to carry it forward in the future releases.
Comment 13 Pete Zaitcev 2006-06-21 20:43:57 UTC
Yes, however it may not be sufficient in all cases. That is because I failed
to notice that real devices are torn down elsewhere. I'll attach a follow-up
patch which I sent to Greg.
Comment 14 Pete Zaitcev 2006-06-21 20:46:24 UTC
Created attachment 8371 [details]
Test fix #2 (on top of #1) - move the wait where it belongs
Comment 15 Greg Kroah-Hartman 2006-07-17 11:08:43 UTC
This should all be fixed in 2.6.17 now.  If not, please reopen.