Latest working kernel version: 2.6.26, 2.6.26.5 Earliest failing kernel version: 2.6.27-rc1 Distribution: Fedora 8 Linux karlalex.palosanto.com 2.6.26-bisect-4-05350-g519f014 #4 PREEMPT Mon Oct 20 00:33:58 ECT 2008 i686 i686 i386 GNU/Linux Gnu C 4.1.2 Gnu make 3.81 binutils 2.17.50.0.18 util-linux 2.13.1 mount 2.13.1 module-init-tools 3.4 e2fsprogs 1.40.4 pcmciautils 014 quota-tools 3.14. PPP 2.4.4 Linux C Library 2.7 Dynamic linker (ldd) 2.7 Procps 3.2.7 Net-tools 1.60 Kbd 1.12 oprofile 0.9.3 Sh-utils 6.9 udev 118 wireless-tools 29 Modules Loaded usb_storage irnet ppp_generic slhc irtty_sir sir_dev ircomm_tty ircomm ks959_sir eeprom ipv6 bridge stp rfcomm bnep l2cap bluetooth it87 hwmon_vid fuse vfat fat dm_mirror dm_log dm_multipath scsi_dh dm_mod snd_seq_dummy snd_via82xx_modem snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss snd_via82xx snd_ac97_codec snd_mixer_oss snd_mpu401 ac97_bus snd_mpu401_uart snd_rawmidi parport_pc snd_pcm rtc_cmos rtc_core parport pcspkr ns558 rtc_lib snd_timer via_ircc snd_seq_device gameport snd irda soundcore via_rhine snd_page_alloc floppy mii crc_ccitt i2c_viapro i2c_core button via_agp agpgart sr_mod sg cdrom ata_generic pata_via libata sd_mod scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd Problem Description: The Kingsun KS-959 USB IrDA dongle support (ks959-sir) is broken in 2.6.27 - obexftp and similar programs can no longer connect to remote targets with this dongle. This is a regression: 2.6.26.5 works correctly with this very same dongle. This bug exists since at least 2.6.27-rc1. Steps to reproduce: Get KS-959 USB IrDA dongle and remote IrDA target (Sony Ericsson K300 used for tests) Connect dongle to any USB port Attempt to use obexftp to connect and list contents. Actual results: program times out attempting to connect. In addition, the following appears on the kernel log: [ 101.404383] usb 2-2: new low speed USB device using uhci_hcd and address 2 [ 101.584601] usb 2-2: configuration #1 chosen from 1 choice [ 101.734805] KingSun KS-959 IRDA/USB found at address 2, Vendor: 7d0, Product: 4959 [ 101.739778] ks959_sir: IrDA: Registered KingSun KS-959 device irda0 [ 101.741421] usbcore: registered new interface driver ks959-sir [ 102.020925] IrCOMM protocol (Dag Brattli) [ 102.292940] PPP generic driver version 2.4.2 [ 102.899711] irlap_change_speed(), setting speed to 9600 [ 120.425255] irlap_change_speed(), setting speed to 57600 [ 123.173934] IrLAP, no activity on link! [ 125.310542] irlmp_state_setup() WATCHDOG_TIMEOUT! [ 125.310606] irda_connect(), connect failed! [ 126.173864] IrLAP, no activity on link! [ 129.173805] IrLAP, no activity on link! [ 130.310478] irlmp_state_setup() WATCHDOG_TIMEOUT! [ 130.310543] irda_connect(), connect failed! [ 132.173742] IrLAP, no activity on link! [ 132.673731] irlap_change_speed(), setting speed to 9600 [ 132.673846] irda_connect(), connect failed! Expected results: program connects immediately, executes requested action. One workaround is to execute the following command before using any program to connect via the dongle: echo 9600 > /proc/sys/net/irda/max_baud_rate This allows the program to connect, but forces a lower transfer speed. Analysis so far: From what I have seen in the usbmon logs (attached), the bug is that the IrDA subsystem somehow forgets to flag a new speed change properly to the ks959-sir driver. In the 2.6.26.5 log, the control requests that switch the dongle speeds can be recognized as request-type 0x21, request 0x09, value 0x0200: ddb8b900 4231738535 S Co:2:003:0 s 21 09 0200 0001 0008 8 = 80250000 03000000 <--- switches to 9600 bps ... ddb8b900 4237269710 S Co:2:003:0 s 21 09 0200 0001 0008 8 = 00e10000 03000000 <--- switches to 57600 bps The new speed is also encoded in the transmitted IrDA packet, so that the remote side knows about the speed change. The control request to change speed to 57600 bps is absent in the 2.6.27 usbmon log, but the wrapped packet still encodes the speed change, so the remote side changes speed but the dongle remains at 9600 bps, and they get out of sync. This is consistent with the workaround that keeps the maximum speed at 9600 bps. I am doing a bisect right now, but my working theory is that irda_get_next_speed() is returning -1 instead of the actual next speed to program. Since the function includes a check for LAP_MAGIC, it is possible that a recent commit broke the LAP_MAGIC check. While filing this bug, I have noticed that my working theory is correct. Recent changes have broken the LAP_MAGIC check by 2.6.27-rc1. This is supported by tests performed by Vasily Khoruzhick, who gave alert of this bug at irda-users. Incomplete git bisection follows: [alex@karlalex linux-2.6-git]$ git-bisect log git-bisect start # good: [bce7f793daec3e65ec5c5705d2457b81fe7b5725] Linux 2.6.26 git-bisect good bce7f793daec3e65ec5c5705d2457b81fe7b5725 # bad: [6e86841d05f371b5b9b86ce76c02aaee83352298] Linux 2.6.27-rc1 git-bisect bad 6e86841d05f371b5b9b86ce76c02aaee83352298 # good: [d20b27478d6ccf7c4c8de4f09db2bdbaec82a6c0] V4L/DVB (8415): gspca: # Infinite loop in i2c_w() of etoms. git-bisect good d20b27478d6ccf7c4c8de4f09db2bdbaec82a6c0 # bad: [53baaaa9682c230410a057263d1ce2922f43ddc4] Merge # git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6 git-bisect bad 53baaaa9682c230410a057263d1ce2922f43ddc4 # good: [925068dcdc746236264d1877d3d5df656e87882a] Merge branch 'davem-next' # of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6 git-bisect good 925068dcdc746236264d1877d3d5df656e87882a # bad: [519f0141f1c42e2b8b59c7dea005cbf6095358e8] Merge branch 'for-linus' of # git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input git-bisect bad 519f0141f1c42e2b8b59c7dea005cbf6095358e8
Created attachment 18378 [details] usbmon trace with dongle inserted, shows speed command being correctly sent
Created attachment 18379 [details] usbmon trace with dongle inserted, 2.6.27, shows absent speed change command
It seems not only ks959 support is broken, but that complete irda subsystem is disfuntional in 2.6.27
Guys, could you try this patchset: http://www.kernel.org/pub/linux/kernel/people/sameo/patches/irda/2.6.28-rc2/ It reserves an additional skbuff header for the irda_skb_cb. The latter should thus be safe accross dev_queue_xmit.
Works just fine on 2.6.28-rc2 with this patchset. Thank you!
(In reply to comment #5) > Works just fine on 2.6.28-rc2 with this patchset. Thank you! > No go, no go. The patchset still does not solve the problem in my setup. I tested with the kingsun-sir dongle and the ks959-sir dongles. With both of them, discovery seems to work fine, but the act of connecting with obexftp fails inmediately. Vasily, could you please describe your IRDA setup? What kind of dongle are you using? Do any strange messages appear? How did you test that the patchset actually works? Did you use obexftp/gnokii/other, or just ifconfig? Regarding the patchset, I think there might not be a guarantee that all the sk_buff structs allocated were actually allocated with the new functions.
Created attachment 18692 [details] dmesg output of affected kernel with proposed patchset Notice the strange LMP messages
> Vasily, could you please describe your IRDA setup? What kind of dongle are > you > using? Do any strange messages appear? How did you test that the patchset > actually works? Did you use obexftp/gnokii/other, or just ifconfig? I tested this code with ircp, and ftpobex. ftpobex connects to my nokia phone fines, and I could download an Image from there with ircp. You're right though, we could be missing some skbuff down the stack. I'll send an additional patch to check for the skbuff header validity, today or tomorrow.
Created attachment 18695 [details] skb headromm check
> You're right though, we could be missing some skbuff down the stack. > I'll send an additional patch to check for the skbuff header validity, today > or > tomorrow. I just added a patch that will simply write some error message whenever we dont have enough skb headroom. Alex, could you please apply this patch, and try again ?
2 Alex Villacis Lasso: I'm using ks959 dongle, I've used gnokii (with nokia 6020 phone) and kbeam (with 6020 and my pda) to test irda, both works fine.
Btw, I got some strange messages on dmesg too: irlap_change_speed(), setting speed to 9600 irlap_change_speed(), setting speed to 57600 irlmp_state_dtr(), Unknown event LM_LAP_CONNECT_CONFIRM on LSAP 0x11 irlmp_state_setup() WATCHDOG_TIMEOUT! irda_connect(), connect failed! irlmp_state_dtr(), Unknown event LM_LAP_CONNECT_CONFIRM on LSAP 0x11 irlap_change_speed(), setting speed to 9600 irda_connect(), connect failed! irlap_change_speed(), setting speed to 57600
(In reply to comment #12) > Btw, I got some strange messages on dmesg too: > > irlap_change_speed(), setting speed to 9600 > irlap_change_speed(), setting speed to 57600 > irlmp_state_dtr(), Unknown event LM_LAP_CONNECT_CONFIRM on LSAP 0x11 > irlmp_state_setup() WATCHDOG_TIMEOUT! > irda_connect(), connect failed! > irlmp_state_dtr(), Unknown event LM_LAP_CONNECT_CONFIRM on LSAP 0x11 > irlap_change_speed(), setting speed to 9600 > irda_connect(), connect failed! > irlap_change_speed(), setting speed to 57600 Weird...you confirm that you never had those messages with a working kernel (e.g. 2.6.26) ?
I never got "Uknown event" and "WATCHDOG_TIMEOUT/connection failed" on 2.6.26
(In reply to comment #14) > I never got "Uknown event" and "WATCHDOG_TIMEOUT/connection failed" on 2.6.26 Do you mind applying the kernel patch I just attached to this bug and give it another try ? Thanks.
With this patch I get a lot of ***** Not enough headroom, skb will be corrupted ***** messages on dmesg
Handled-By : Samuel Ortiz <samuel@sortiz.org> Notify-Also : Vasily <anarsoul@gmail.com>
Created attachment 18781 [details] [1/3] Create new field tx_extra in skbuff The previous patchsets attempt to fix the clobbering of the irda information by storing it within the data payload itself, as an additional header. For this, space has been allocated via skb_pull (?!) and later, skb_reserve(). The problem with this approach is that we do not have a guarantee that all skbuffs that are processed in the irda stack are actually allocated with functions that reserve the required space for the irda metadata, especially in the tx route. In addition, this approach mixes the payload data with metadata that should not be transmitted at all, which is a bit disorganized. This is the first of 3 patches that try a different approach. Instead of allocating an additional "header" within the data buffer itself, it introduces a new field within the skbuff, named tx_extra. This field should be used for passing data from the higher layers that is required for the drivers to transmit the packet correctly, and formalizes the previous usage of the cb field by the irda stack. The only issue I see is that every single skbuff carries an additional 32 bytes which are not put to any use in other stacks (for now). I was thinking about a pointer field to on-the-fly allocated data, but that means messing around with the skbuff allocation functions, the cloning functions (involving deciding how to behave on cloning), etc. This way is simpler to understand. This patch and the other two that follow fix the issue for me under 2.6.28-rc3.
Created attachment 18782 [details] [2/3] IRDA stack should call irda_get_skb_cb (tx_extra implementation) A rewrite of the original patch by Samuel Ortiz, for the tx_extra implementation. Does away with irda_alloc_skb which is irrelevant for this implementation.
Created attachment 18783 [details] [3/3] IRDA drivers should call irda_get_skb_cb The original patch by Samuel Ortiz, which applies without changes for the tx_extra implementation.
(In reply to comment #18) > Created an attachment (id=18781) [details] > [1/3] Create new field tx_extra in skbuff > > The previous patchsets attempt to fix the clobbering of the irda information > by > storing it within the data payload itself, as an additional header. For this, > space has been allocated via skb_pull (?!) and later, skb_reserve(). No, the additional space is allocated with alloc_skb(), see irda_alloc_skb() in: http://www.kernel.org/pub/linux/kernel/people/sameo/patches/irda/2.6.28-rc2/0001-irda-Introduce-irda_alloc_skb.patch > The > problem with this approach is that we do not have a guarantee that all > skbuffs > that are processed in the irda stack are actually allocated with functions > that > reserve the required space for the irda metadata, especially in the tx route. > In addition, this approach mixes the payload data with metadata that should > not > be transmitted at all, which is a bit disorganized. > > This is the first of 3 patches that try a different approach. Instead of > allocating an additional "header" within the data buffer itself, it > introduces > a new field within the skbuff, named tx_extra. This field should be used for > passing data from the higher layers that is required for the drivers to > transmit the packet correctly, and formalizes the previous usage of the cb > field by the irda stack. The only issue I see is that every single skbuff > carries an additional 32 bytes which are not put to any use in other stacks > (for now). I was thinking about a pointer field to on-the-fly allocated data, > but that means messing around with the skbuff allocation functions, the > cloning > functions (involving deciding how to behave on cloning), etc. This way is > simpler to understand. > > This patch and the other two that follow fix the issue for me under > 2.6.28-rc3. >
Ok, so I managed to reproduce the case where I was getting insufficient head room with my patches. I think I fixed it now, and it would be very nice if Alex and Vasily could try my fixes. It's a serie of 8 patches, to be applied on top of any recent 2.8.28-rc kernel. They're here: http://www.kernel.org/pub/linux/kernel/people/sameo/patches/irda/2.6.28-rc4/ along with a series file to apply them in the right order. If you guys have time to test those patches, please let me know how it works for you. Cheers, Samuel.
Patch : http://www.kernel.org/pub/linux/kernel/people/sameo/patches/irda/2.6.28-rc4/
Works for me with latest patchset applied on the top of gentoo's 2.6.27-r2 kernel
The most recent patches from Samuel Ortiz do fix the problem. Tested with ks959-sir and kingsun-sir on a 2.6.27-rc4 kernel.
Right before submitting this patch set to Dave Miller, I realized that having skb->head point to irda_skb_cb is in theory a bad idea. For dev_alloc_skb() created skbs, packet schedulers could in theory overwrite skb->head. However, we reserve enough head room for both the packet schedulers and our data to fit. Thus, our irda_skb_cb should live right before skb->data, instead of at the beginning of the skb's headroom. The changes are minor, and in fact, only affect pacthes #1 and #8. It works for me (TM), but I'd appreciate to have a confirmation from either Alex ot Vasily. If that's the case, I'll submit this patchset upstream. Thanks for your help.
(In reply to comment #26) > Right before submitting this patch set to Dave Miller, I realized that having > skb->head point to irda_skb_cb is in theory a bad idea. > For dev_alloc_skb() created skbs, packet schedulers could in theory overwrite > skb->head. However, we reserve enough head room for both the packet > schedulers > and our data to fit. Thus, our irda_skb_cb should live right before > skb->data, > instead of at the beginning of the skb's headroom. > > The changes are minor, and in fact, only affect pacthes #1 and #8. It works > for > me (TM), but I'd appreciate to have a confirmation from either Alex ot > Vasily. > If that's the case, I'll submit this patchset upstream. > Thanks for your help. > Could you please indicate where can I get updated patches? As of 2008/11/24 11:41 GMT -5 the patches at the last URL are the exact same ones as previously tested.
> Could you please indicate where can I get updated patches? As of 2008/11/24 > 11:41 GMT -5 the patches at the last URL are the exact same ones as > previously > tested. Oops, forgot that, sorry. The patches are here: http://www.kernel.org/pub/linux/kernel/people/sameo/patches/irda/2.6.28-rc6/ Thanks, Samuel.
(In reply to comment #28) > > Could you please indicate where can I get updated patches? As of 2008/11/24 > > 11:41 GMT -5 the patches at the last URL are the exact same ones as > previously > > tested. Tested with ks959-sir. I think I spoke too soon when saying that the patches fix the problem. They do fix the problem when using openobex to communicate with the phone. However, when using minicom with /dev/ircomm0, the WARN_ON again triggers. This happens both with the patches at http://www.kernel.org/pub/linux/kernel/people/sameo/patches/irda/2.6.28-rc4/ and the ones at http://www.kernel.org/pub/linux/kernel/people/sameo/patches/irda/2.6.28-rc6/ . I will attach the dmesg output next.
Created attachment 19065 [details] dmesg output after patches for rc4 and running minicom /dev/ircomm0
Created attachment 19066 [details] dmesg output after patches for rc6 and running minicom /dev/ircomm0
Last patchset doesn't work for me at all (applied on gentoo's gentoo-sources-2.6.27-r4)
> Tested with ks959-sir. I think I spoke too soon when saying that the patches > fix the problem. They do fix the problem when using openobex to communicate > with the phone. However, when using minicom with /dev/ircomm0, the WARN_ON > again triggers. This happens both with the patches at > http://www.kernel.org/pub/linux/kernel/people/sameo/patches/irda/2.6.28-rc4/ > and the ones at > http://www.kernel.org/pub/linux/kernel/people/sameo/patches/irda/2.6.28-rc6/ > . > I will attach the dmesg output next. Ok, with the patches here: http://www.kernel.org/pub/linux/kernel/people/sameo/patches/irda/2.6.28-rc7/ I couldnt reproduce those warnings. Note that the patches should also apply against an -rc6. Would you mind giving them a try ? Cheers, Samuel.
(In reply to comment #33) > > Tested with ks959-sir. I think I spoke too soon when saying that the > patches > > fix the problem. They do fix the problem when using openobex to communicate > > with the phone. However, when using minicom with /dev/ircomm0, the WARN_ON > > again triggers. This happens both with the patches at > > > http://www.kernel.org/pub/linux/kernel/people/sameo/patches/irda/2.6.28-rc4/ > > and the ones at > > > http://www.kernel.org/pub/linux/kernel/people/sameo/patches/irda/2.6.28-rc6/ > . > > I will attach the dmesg output next. > Ok, with the patches here: > > http://www.kernel.org/pub/linux/kernel/people/sameo/patches/irda/2.6.28-rc7/ > > I couldnt reproduce those warnings. Note that the patches should also apply > against an -rc6. > Would you mind giving them a try ? > > Cheers, > Samuel. > Tested with 2.6.28-rc7. There are no warnings anymore with obex_test from openobex package, as well as minicom /dev/ircomm0.
(In reply to comment #34) > (In reply to comment #33) > > > Tested with ks959-sir. I think I spoke too soon when saying that the > patches > > > fix the problem. They do fix the problem when using openobex to > communicate > > > with the phone. However, when using minicom with /dev/ircomm0, the > WARN_ON > > > again triggers. This happens both with the patches at > > > > http://www.kernel.org/pub/linux/kernel/people/sameo/patches/irda/2.6.28-rc4/ > > > and the ones at > > > > http://www.kernel.org/pub/linux/kernel/people/sameo/patches/irda/2.6.28-rc6/ > . > > > I will attach the dmesg output next. > > Ok, with the patches here: > > > > > http://www.kernel.org/pub/linux/kernel/people/sameo/patches/irda/2.6.28-rc7/ > > > > I couldnt reproduce those warnings. Note that the patches should also apply > > against an -rc6. > > Would you mind giving them a try ? > > > > Cheers, > > Samuel. > > > > Tested with 2.6.28-rc7. There are no warnings anymore with obex_test from > openobex package, as well as minicom /dev/ircomm0. Awesome, thanks a lot for testing. I'll submit these patches upstream tomorrow, you guys still have 24 hours to show the red light. Cheers, Samuel.
Just for the record, my latest patchset has not been accepted upstream. Instead, davem took a hckish fix in order to have a functional IrDA stack with 2.6.28: http://git.kernel.org/?p=linux/kernel/git/davem/net-2.6.git;a=commit;h=69c30e1e7492192f882a3fc11888b320fde5206a