Kernels which have a respective warning enabled in mdelay() [http://marc.theaimsgroup.com/?l=linux1394-devel&m=113963277029128] show a stackdump like this: Badness in ohci_hw_csr_reg at drivers/ieee1394/ohci1394.c:3154 [<d09ad084>] ohci_hw_csr_reg+0x69/0x82 [ohci1394] [<d0a0ad33>] host_reset+0x64/0x1ea [ieee1394] [<d0a0a74b>] highlevel_host_reset+0x27/0x34 [ieee1394] [<d09ac0b5>] ohci_irq_handler+0x8b7/0x90d [ohci1394] [<c0107fe7>] handle_IRQ_event+0x25/0x4f [<c0108921>] do_IRQ+0x11c/0x242 [<c0106434>] common_interrupt+0x18/0x20 [<c0124a04>] __do_softirq+0x2c/0x79 [<c010946b>] do_softirq+0x3a/0x41 The code leading to it is csr.c::host_reset using {ohci_}hw_csr_reg to allocate channel 31 if the local node is IRM. Redhat and/or Fedora Core kernels and recent -mm kernels feature this warning, however the offending ohci1394 code is of course present in all kernels (ever was). https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=144201 http://marc.theaimsgroup.com/?t=113701363700001 http://marc.theaimsgroup.com/?t=113960304800002
Another code path which runs into "badness" is this block in ohci_irq_handler(): if (ohci->check_busreset) { ... /* [...] This mainly effects [sic] nForce2. */ if (loop_count > 10000) { ohci_devctl(host, RESET_BUS, LONG_RESET); ... } ... }
another trace from bugzilla.redhat.com Badness in get_phy_reg at drivers/ieee1394/ohci1394.c:238 (Not tainted) [<f89c110e>] get_phy_reg+0x10e/0x113 [ohci1394] [<f89c2272>] ohci_devctl+0x41f/0x5f7 [ohci1394] [<c01038ba>] common_interrupt+0x1a/0x20 [<c011098e>] delay_pmtmr+0xb/0x13 [<f89c3e08>] ohci_irq_handler+0x5e2/0x7a9 [ohci1394] [<c013cccd>] handle_IRQ_event+0x2e/0x5a [<c013cd77>] __do_IRQ+0x7e/0xd7 [<c0104f3a>] do_IRQ+0x4a/0x82 ======================= [<c01038ba>] common_interrupt+0x1a/0x20 [<c0316c3a>] _spin_unlock_irqrestore+0xa/0xc [<f89c225d>] ohci_devctl+0x40a/0x5f7 [ohci1394] [<f8a41a03>] csr1212_fill_cache+0xdf/0x106 [ieee1394] [<f8a41b50>] csr1212_generate_csr_image+0x126/0x249 [ieee1394] [<f8a38135>] hpsb_reset_bus+0x20/0x26 [ieee1394] [<c012beef>] worker_thread+0x182/0x22a [<f8a3a22c>] delayed_reset_bus+0x0/0xc0 [ieee1394] [<c0119a67>] default_wake_function+0x0/0xc [<c012bd6d>] worker_thread+0x0/0x22a [<c012f8eb>] kthread+0x87/0x8b [<c012f864>] kthread+0x0/0x8b [<c0101309>] kernel_thread_helper+0x5/0xb
Here are forward ports of the mdelay() warning patch mentioned in http://marc.theaimsgroup.com/?l=linux1394-devel&m=113963277029128 : http://me.in-berlin.de/~s5r6/linux1394/work-in-progress/debug-warn-if-we-sleep-in-an-irq-for-a-long-time/
I could not reproduce this yet on an x86 UP PC with 2 different FireWire cards and on an x86 SMP PC with 3 other different FireWire cards.
When I ran "modprobe ohci1394 && sleep 1 && modprobe -r ohci1394" in a loop, I got this trace _once_ during more than 100 executions of the loop body: Jan 8 21:24:24 shuttle kernel: BUG: at drivers/ieee1394/ohci1394.c:233 get_phy_reg() Jan 8 21:24:24 shuttle kernel: [show_trace_log_lvl+26/48] show_trace_log_lvl+0x1a/0x30 Jan 8 21:24:24 shuttle kernel: [<c010314a>] show_trace_log_lvl+0x1a/0x30 Jan 8 21:24:24 shuttle kernel: [show_trace+18/32] show_trace+0x12/0x20 Jan 8 21:24:24 shuttle kernel: [<c0103172>] show_trace+0x12/0x20 Jan 8 21:24:24 shuttle kernel: [dump_stack+22/32] dump_stack+0x16/0x20 Jan 8 21:24:24 shuttle kernel: [<c0103286>] dump_stack+0x16/0x20 Jan 8 21:24:24 shuttle kernel: [pg0+946286745/1067766784] get_phy_reg+0x99/0x130 [ohci1394] Jan 8 21:24:24 shuttle kernel: [<f8c24099>] get_phy_reg+0x99/0x130 [ohci1394] Jan 8 21:24:24 shuttle kernel: [pg0+946287215/1067766784] set_phy_reg_mask+0x1f/0x40 [ohci1394] Jan 8 21:24:24 shuttle kernel: [<f8c2426f>] set_phy_reg_mask+0x1f/0x40 [ohci1394] Jan 8 21:24:24 shuttle kernel: [pg0+946287441/1067766784] handle_selfid+0xc1/0x160 [ohci1394] Jan 8 21:24:24 shuttle kernel: [<f8c24351>] handle_selfid+0xc1/0x160 [ohci1394] Jan 8 21:24:24 shuttle kernel: [pg0+946298804/1067766784] ohci_irq_handler+0x564/0x770 [ohci1394] Jan 8 21:24:24 shuttle kernel: [<f8c26fb4>] ohci_irq_handler+0x564/0x770 [ohci1394] This is of course not a helpful way of recreating the circumstances of this bug.
This bug is a candidate for WILL_NOT_FIX, since the alternative drivers from Kristian Høgsberg do it right and might replace the ieee1394 drivers in mainline eventually. (There are still some fundamental problems to be solved in the alternative stack. After these were addressed, this bug could be rejected as WILL_NOT_FIX.)