Bug 10306

Summary: eth1394 stops, keyboard hangs
Product: Drivers Reporter: Stefan Richter (stefanr)
Component: IEEE1394Assignee: Stefan Richter (stefanr)
Status: REJECTED WILL_NOT_FIX    
Severity: normal    
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.22 - 2.6.25-rc Tree: Mainline
Regression: Yes
Bug Depends on:    
Bug Blocks: 10046    

Description Stefan Richter 2008-03-22 03:46:15 UTC
Latest working kernel version: 2.6.21 (had other eth1394 bugs)
Earliest failing kernel version: 2.6.22
Software Environment: uniprocessor PREEMPT_NONE kernel

  - tlabel consumer eth1394 (IPv4 over FireWire) grabs lots of tlabels
    in soft IRQ context.
  - tlabel recycler khpsbpkt (a kthread of ieee1394) sleeps even though
    it could start putting tlabels back into the pool.
  - eth1394 can't get tlabels anymore, stops the transmit queue,
    schedules a workqueue job.
  - eth1394's workqueue job (run by the events kthread) tries to acquire
    a tlabel.  It does so in non-atomic context and hence sleeps in
    hpsb_get_tlabel() until the tlabel pool is nonempty again.  It would
    then wake up the eth1394 transmit queue again.
  - Normally, khpsbpkt would have been woken up by now and would have
    released a lot of now unused tlabels back into the pool again.
    However, on UP preempt_none kernels, khpsbpkt continues to sleep.
    (The 1394 stack's lower level runing in IRQ context or perhaps
    tasklet context wakes up khpsbpkt.)
  - Since it doesn't get a tlabel, eth1394's workqueue jobs sleeps
    forever as well.

Result is that all other tasks of the shared workqueue can't be serviced, notably the keyboard is stuck, and that the eth1394 connection breaks down.

CONFIG_PREEMPT=y avoids the problem.

Reported by andrey.aleksandrovich at googlemail
http://thread.gmane.org/gmane.linux.kernel.firewire.user/3144
"eth1394, Connection between PC and Laptop disrupts"

Caused by:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=7a97bc03e089d1a75dc533f0fe69ec8dac672916
"ieee1394: eth1394: handle tlabel exhaustion"
Comment 1 Stefan Richter 2008-03-24 14:41:42 UTC
Tested with ftp between Core2Duo i686 SMP PREEMPT and Core2Duo x86-64 *UP* PREEMPT_NONE, wasn't able to reproduce the bug with it.  Need to try a different test PC, and maybe with scp instead of ftp.
Comment 2 Stefan Richter 2008-12-13 03:13:51 UTC
This should be fixed eventually by providing a IP over 1394 implementation in the new firewire driver stack.