Bug 10306 - eth1394 stops, keyboard hangs
Summary: eth1394 stops, keyboard hangs
Status: REJECTED WILL_NOT_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: IEEE1394 (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Stefan Richter
URL:
Keywords:
Depends on:
Blocks: 10046
  Show dependency tree
 
Reported: 2008-03-22 03:46 UTC by Stefan Richter
Modified: 2008-12-13 03:13 UTC (History)
0 users

See Also:
Kernel Version: 2.6.22 - 2.6.25-rc
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments

Description Stefan Richter 2008-03-22 03:46:15 UTC
Latest working kernel version: 2.6.21 (had other eth1394 bugs)
Earliest failing kernel version: 2.6.22
Software Environment: uniprocessor PREEMPT_NONE kernel

  - tlabel consumer eth1394 (IPv4 over FireWire) grabs lots of tlabels
    in soft IRQ context.
  - tlabel recycler khpsbpkt (a kthread of ieee1394) sleeps even though
    it could start putting tlabels back into the pool.
  - eth1394 can't get tlabels anymore, stops the transmit queue,
    schedules a workqueue job.
  - eth1394's workqueue job (run by the events kthread) tries to acquire
    a tlabel.  It does so in non-atomic context and hence sleeps in
    hpsb_get_tlabel() until the tlabel pool is nonempty again.  It would
    then wake up the eth1394 transmit queue again.
  - Normally, khpsbpkt would have been woken up by now and would have
    released a lot of now unused tlabels back into the pool again.
    However, on UP preempt_none kernels, khpsbpkt continues to sleep.
    (The 1394 stack's lower level runing in IRQ context or perhaps
    tasklet context wakes up khpsbpkt.)
  - Since it doesn't get a tlabel, eth1394's workqueue jobs sleeps
    forever as well.

Result is that all other tasks of the shared workqueue can't be serviced, notably the keyboard is stuck, and that the eth1394 connection breaks down.

CONFIG_PREEMPT=y avoids the problem.

Reported by andrey.aleksandrovich at googlemail
http://thread.gmane.org/gmane.linux.kernel.firewire.user/3144
"eth1394, Connection between PC and Laptop disrupts"

Caused by:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=7a97bc03e089d1a75dc533f0fe69ec8dac672916
"ieee1394: eth1394: handle tlabel exhaustion"
Comment 1 Stefan Richter 2008-03-24 14:41:42 UTC
Tested with ftp between Core2Duo i686 SMP PREEMPT and Core2Duo x86-64 *UP* PREEMPT_NONE, wasn't able to reproduce the bug with it.  Need to try a different test PC, and maybe with scp instead of ftp.
Comment 2 Stefan Richter 2008-12-13 03:13:51 UTC
This should be fixed eventually by providing a IP over 1394 implementation in the new firewire driver stack.

Note You need to log in before you can comment on or make changes to this bug.