(got the report via personal e-mail) Hardware Environment: 2.0 GHz dual core PC, 1394b hardware Software Environment: eth1394 between two Linux PCs, tested with netperf -n 2 -H ... (http://www.netperf.org/) Problem Description: Apr 25 10:51:24 localhost kernel: 00:1023 Apr 25 10:51:24 localhost kernel: eth1394: No more tlabels left while sending to node 0-0:1023 The reason is that more than 64 packets are attempted to be transmitted before the split transaction of the first packet is completed. AFAICT it may result in dropped IP packets.
The root cause is most certainly the same as in bug 6948 : Tlabels are consumed from softIRQ context but freed from kthread context (khpsbpkt). There may already be enough transactions completed, but the kthread is not woken up early enough to free their tlabels. The fix in bug 6948 is more a workaround: It blocks further SCSI requests and initiates the last outstanding request (the first request which failed due to tlabel pool exhaustion) from workqueue context which sleeps until a tlabel is available, i.e. until khpsbpkt was woken up. Therefore the ultimate fix for both bugs would apparently be to convert khpsbpkt into tasklets.
Can also be reproduced using FTP. Tested: 1.6 GHz single core Athlon with proftpd serves a >500 MB big file to a 2.0 GHz Core 2 Duo with KDE's ftp component. File is received fully intact, speed was ~12 MByte/s over S400A. No errors were logged on the client, but ~250 times tlabel exhaustion was logged by eth1394 on the server.
Created attachment 11397 [details] [PATCH 1/2] ieee1394: eth1394: remove bogus netif_wake_queue precondition for a following fix
Created attachment 11398 [details] [PATCH 2/2] ieee1394: eth1394: handle tlabel exhaustion The patch halts the transmit queue if no tlabel is available to eth1394's ->hard_start_xmit(). A workqueue job is then scheduled to catch the moment when ieee1394 recycled the next lot of tlabels. (Before this, I also tried an ieee1394 API addition which lets the ieee1394 recycle tlabels already in the low-level's receive tasklet's context. This lowered the frequency of tlabel exhaustion somewhat but didn't address eth1394's inability to throttle the outgoing queue when necessary. It also made the transaction API even more complicated than it already is.)
Created attachment 11405 [details] [PATCH 2.6.18] ieee1394: eth1394: handle tlabel exhaustion patch 1/2 + 2/2 combined and backported to Linux 2.6.18, only compile-tested
Created attachment 11772 [details] ieee1394: fix to ether1394_tx in ether1394.c additional patch, currently pending for mainline merge This is necessary in addition to "ieee1394: eth1394: handle tlabel exhaustion" in order to get the re-queued packets after tlabel recovery into proper shape.
fix merged in 2.6.22-rcSomething