Most recent kernel where this bug did not occur: ------------------------------------------------ * 2.6.16.20 but also there when backporting ieee1394 from 2.6.18-rc7 to 2.6.16.20 Distribution: ------------------------------------------------ * Custom Debian Sarge Hardware Environment: ------------------------------------------------ * FirewireDevice: iRez StealthFire Firewire Camera * 0000:00:0c.0 FireWire (IEEE 1394): NEC Corporation uPD72874 IEEE1394 OHCI 1.1 3-port PHY-Link Ctrlr (rev 01) * CPU: i686 processor Software Environment: ------------------------------------------------ * motion (3.1.19) using vloopback to pipe video1394 to video4linux * vloopback kernel module 0.97-snap3 (nothing different than latest 1.0) * libdc1394 (1.0.0) to transform video1394 -> video4linux via vloopback. Using dc1394_vloopback application from the examples. Problem Description: ------------------------------------------------ Video camera works until getting a lot of "unsolicited response packet received - no tlabel match" messages in dmesg. Need to do a 'modprobe -r' to all ieee1394 device and then load them again for getting more frames. Steps to reproduce: ------------------------------------------------ * Attach video camera * Load firewire modules modprobe video1394 modprobe raw1394 * Load v4l pipes module modprobe vloopback pipes=16 dev_offset=30 * Start libdc1394 test application dc1394_vloopback --vloopback=/dev/video30 --video1394=/dev/video1394/0 * Watch something on /dev/video31 device (the output pipe of v4l) * Atfter 5-15 seconds you get the "no tlabel match" errors Extra Info: ------------------------------------------------ I've backported the ieee1394 directory from 2.6.18-rc7 do 2.6.16.20 and enabled debugging for ieee1394 and ohci1394 modules. I can see that the ohci1394 module sends two consecutive packets with different tlabel's (61 and 62 in the attachement that i send) but it doesn't wait for response from the 61 package, instead it expects the response from 62 package. See the attachment for more info.
Created attachment 9503 [details] Dmesg output when the error is first seen Dmesg output with debug enabled for ohci1394 and ieee1394 modules. Looks like "IntEvent: 00000080" is not treated correctly or something.
You wrote "Most recent kernel where this bug did not occur". --- I suppose you actually meant "kernel where this bug _did_ occur", didn't you? Side note: I am maintaining backports too. http://me.in-berlin.de/~s5r6/linux1394/updates/ You can get slightly newer stuff there but it certainly won't help you.
Yes, sorry, i read "Most recent kernel where this bug did occur"
Could you resend the log with "Packet sent to node..." lines too? BTW, the (asynchronous) transactions before it breaks are quadlet write requests (tcode 0) and write responses (tcode 2). The IntEvent 00000080 right before the breakage means "isochRx" = "Isochronous Receive DMA interrupt. Indicates that one or more isochronous receive contexts have generated an interrupt". The other IntEvents refer to asynchronous transmit and receive DMAs. The transactions which fail because of alleged tlabel mismatch are quadlet read requests (tcode 4) and read responses (tcode 6). Your conjecture that ieee1394 doesn't wait for a response to the request with tlabel 61 and instead expects the response to request with tlabel 62 is not necessarily true. These transactions are so-called split transactions and the tlabels are there to allow several split transactions to overlap. What happened here is that ieee1394 discarded the tlabel 61 for some reason before it got the response to the outstanding request with tlabel 61. The "Packet sent to node..." line after the request 61 was issued may perhaps contain a reason for this.
Hmm, actually the information in "Got reqTxComplete interrupt status=0x00008012" after the request 61 already says what I wanted to know. The request packet got through to the destination node and was accepted by it with ack_pending, i.e. the acknowledgement that it arrived and a response packet will come back later. Now the question is why ieee1394 threw the tlabel away before the response arrived. Could you attach the .config of this kernel?
Could you also recompile the kernel with "Kernel hacking"/ "Show timing information on printks", i.e. CONFIG_PRINTK_TIME=y? That way we get sub-second timestamps. Transactions time out after 100ms per default, and maybe the responding node really did respond too late. But before you do that, you could try the following trick. It is possible to stretch the transaction timeout at runtime. Write the value 1 as a quadlet to the local node to address 0xFFFFF0000018. This gives 1100ms timeout; likewise a 2 would give 2100ms timeout. You can do this with gscanbus or 1394commander. With gscanbus: Click on the icon of the local host and note the "Physical ID". Then use the menue item "Transaction"/ "Write Quadlet..." and fill in the physical ID as Destination ID, 0xFFFFF0000018 as Memory Offset, and 1 as Data, click OK. You can check whether it succeeded by reading from the same destination and offset with the quadlet read menue item. Also, if you read from 0xFFFFF000001c you should get 0x19000000 as result. This (result >> 19) * 125 is the subsecond part of the timeout, in microseconds. With 1394commander: Start it and you get into a command line interface. You can write to the local node with the command w . . 0xFFFFF0000018 4 1 which sends a quadlet (4 bytes) of the value 1 to the offset 0xFFFFF0000018 on the local node (.) at the local bus (.). You can read with r . . 0xFFFFF0000018 4
Does an increased transaction timeout, as described in comment #6, prevent the "unsolicited response"s?
Pleae reopen this bug if: - it is still present with kernel 2.6.20 and - you can provide the requested information.