Bug 7521

Summary: unsolicited response packet received - no tlabel match
Product: Drivers Reporter: Razvan Gavril (razvan.g)
Component: IEEE1394Assignee: Stefan Richter (stefanr)
Status: REJECTED INSUFFICIENT_DATA    
Severity: normal CC: bunk
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.16.20 Subsystem:
Regression: --- Bisected commit-id:
Attachments: Dmesg output when the error is first seen

Description Razvan Gavril 2006-11-14 06:12:33 UTC
Most recent kernel where this bug did not occur:
------------------------------------------------
* 2.6.16.20 but also there when backporting ieee1394 from 2.6.18-rc7 to 2.6.16.20


Distribution:
------------------------------------------------
* Custom Debian Sarge


Hardware Environment:
------------------------------------------------
* FirewireDevice: iRez StealthFire Firewire Camera
* 0000:00:0c.0 FireWire (IEEE 1394): NEC Corporation uPD72874 IEEE1394 OHCI 1.1
3-port PHY-Link Ctrlr (rev 01)
* CPU: i686 processor 


Software Environment:
------------------------------------------------
* motion (3.1.19) using vloopback to pipe video1394 to video4linux
* vloopback kernel module 0.97-snap3 (nothing different than latest 1.0)
* libdc1394 (1.0.0) to transform video1394 -> video4linux via vloopback. Using
dc1394_vloopback application from the examples.


Problem Description:
------------------------------------------------
Video camera works until getting a lot of "unsolicited response packet received
- no tlabel match" messages in dmesg. Need to do a 'modprobe -r' to all ieee1394
device and then load them again for getting more frames.


Steps to reproduce:
------------------------------------------------
* Attach video camera
* Load firewire modules
modprobe video1394
modprobe raw1394
* Load v4l pipes module
modprobe vloopback pipes=16 dev_offset=30
* Start libdc1394 test application
dc1394_vloopback --vloopback=/dev/video30 --video1394=/dev/video1394/0
* Watch something on /dev/video31 device (the output pipe of v4l)
* Atfter 5-15 seconds you get the "no tlabel match" errors


Extra Info:
------------------------------------------------
I've backported the ieee1394 directory from 2.6.18-rc7 do 2.6.16.20 and enabled
debugging for ieee1394 and ohci1394 modules. I can see that the ohci1394 module
sends two consecutive packets with different tlabel's (61 and 62 in the
attachement that i send) but it doesn't wait for response from the 61 package,
instead it expects the response from 62 package. See the attachment for more info.
Comment 1 Razvan Gavril 2006-11-14 06:15:07 UTC
Created attachment 9503 [details]
Dmesg output when the error is first seen

Dmesg output with debug enabled for ohci1394 and ieee1394 modules. Looks like
"IntEvent: 00000080" is not treated correctly or something.
Comment 2 Stefan Richter 2006-11-14 07:35:41 UTC
You wrote "Most recent kernel where this bug did not occur". --- I suppose you
actually meant "kernel where this bug _did_ occur", didn't you?

Side note: I am maintaining backports too.
http://me.in-berlin.de/~s5r6/linux1394/updates/
You can get slightly newer stuff there but it certainly won't help you.
Comment 3 Razvan Gavril 2006-11-14 08:07:38 UTC
Yes, sorry, i read "Most recent kernel where this bug did occur"
Comment 4 Stefan Richter 2006-11-14 09:48:58 UTC
Could you resend the log with "Packet sent to node..." lines too?

BTW, the (asynchronous) transactions before it breaks are quadlet write requests
(tcode 0) and write responses (tcode 2).

The IntEvent 00000080 right before the breakage means "isochRx" = "Isochronous
Receive DMA interrupt. Indicates that one or more isochronous
receive contexts have generated an interrupt". The other IntEvents refer to
asynchronous transmit and receive DMAs.

The transactions which fail because of alleged tlabel mismatch are quadlet read
requests (tcode 4) and read responses (tcode 6).

Your conjecture that ieee1394 doesn't wait for a response to the request with
tlabel 61 and instead expects the response to request with tlabel 62 is not
necessarily true. These transactions are so-called split transactions and the
tlabels are there to allow several split transactions to overlap. What happened
here is that ieee1394 discarded the tlabel 61 for some reason before it got the
response to the outstanding request with tlabel 61. The "Packet sent to node..."
line after the request 61 was issued may perhaps contain a reason for this.
Comment 5 Stefan Richter 2006-11-14 09:59:09 UTC
Hmm, actually the information in "Got reqTxComplete interrupt status=0x00008012"
after the request 61 already says what I wanted to know. The request packet got
through to the destination node and was accepted by it with ack_pending, i.e.
the acknowledgement that it arrived and a response packet will come back later.

Now the question is why ieee1394 threw the tlabel away before the response
arrived. Could you attach the .config of this kernel?
Comment 6 Stefan Richter 2006-11-14 10:25:34 UTC
Could you also recompile the kernel with "Kernel hacking"/ "Show timing
information on printks", i.e. CONFIG_PRINTK_TIME=y? That way we get sub-second
timestamps. Transactions time out after 100ms per default, and maybe the
responding node really did respond too late.

But before you do that, you could try the following trick. It is possible to
stretch the transaction timeout at runtime. Write the value 1 as a quadlet to
the local node to address 0xFFFFF0000018. This gives 1100ms timeout; likewise a
2 would give 2100ms timeout. You can do this with gscanbus or 1394commander.

With gscanbus: Click on the icon of the local host and note the "Physical ID".
Then use the menue item "Transaction"/ "Write Quadlet..." and fill in the
physical ID as Destination ID, 0xFFFFF0000018 as Memory Offset, and 1 as Data,
click OK. You can check whether it succeeded by reading from the same
destination and offset with the quadlet read menue item. Also, if you read from
0xFFFFF000001c you should get 0x19000000 as result. This (result >> 19) * 125 is
the subsecond part of the timeout, in microseconds.

With 1394commander: Start it and you get into a command line interface. You can
write to the local node with the command
w . . 0xFFFFF0000018 4 1
which sends a quadlet (4 bytes) of the value 1 to the offset 0xFFFFF0000018 on
the local node (.) at the local bus (.). You can read with
r . . 0xFFFFF0000018 4
Comment 7 Stefan Richter 2007-02-02 07:02:53 UTC
Does an increased transaction timeout, as described in comment #6, prevent the
"unsolicited response"s?
Comment 8 Adrian Bunk 2007-03-10 01:04:21 UTC
Pleae reopen this bug if:
- it is still present with kernel 2.6.20 and
- you can provide the requested information.