Bug 6070

Summary: Badness in ohci_hw_csr_reg at drivers/ieee1394/ohci1394.c
Product: Drivers Reporter: Stefan Richter (stefanr)
Component: IEEE1394Assignee: Stefan Richter (stefanr)
Status: REJECTED WILL_NOT_FIX    
Severity: normal    
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: all Subsystem:
Regression: --- Bisected commit-id:
Bug Depends on:    
Bug Blocks: 10046    

Description Stefan Richter 2006-02-14 09:20:27 UTC
Kernels which have a respective warning enabled in mdelay()
[http://marc.theaimsgroup.com/?l=linux1394-devel&m=113963277029128] show a
stackdump like this:

Badness in ohci_hw_csr_reg at drivers/ieee1394/ohci1394.c:3154
 [<d09ad084>] ohci_hw_csr_reg+0x69/0x82 [ohci1394]
 [<d0a0ad33>] host_reset+0x64/0x1ea [ieee1394]
 [<d0a0a74b>] highlevel_host_reset+0x27/0x34 [ieee1394]
 [<d09ac0b5>] ohci_irq_handler+0x8b7/0x90d [ohci1394]
 [<c0107fe7>] handle_IRQ_event+0x25/0x4f
 [<c0108921>] do_IRQ+0x11c/0x242
 [<c0106434>] common_interrupt+0x18/0x20
 [<c0124a04>] __do_softirq+0x2c/0x79
 [<c010946b>] do_softirq+0x3a/0x41

The code leading to it is csr.c::host_reset using {ohci_}hw_csr_reg to allocate
channel 31 if the local node is IRM.

Redhat and/or Fedora Core kernels and recent -mm kernels feature this warning,
however the offending ohci1394 code is of course present in all kernels (ever was).
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=144201
http://marc.theaimsgroup.com/?t=113701363700001
http://marc.theaimsgroup.com/?t=113960304800002
Comment 1 Stefan Richter 2006-06-10 00:54:30 UTC
Another code path which runs into "badness" is this block in ohci_irq_handler():

if (ohci->check_busreset) {
        ...
        /* [...] This mainly effects [sic] nForce2. */
        if (loop_count > 10000) {
                ohci_devctl(host, RESET_BUS, LONG_RESET);
                ...
        }
        ...
}
Comment 2 Stefan Richter 2006-06-20 05:15:46 UTC
another trace from bugzilla.redhat.com

Badness in get_phy_reg at drivers/ieee1394/ohci1394.c:238 (Not tainted)
 [<f89c110e>] get_phy_reg+0x10e/0x113 [ohci1394]
 [<f89c2272>] ohci_devctl+0x41f/0x5f7 [ohci1394]
 [<c01038ba>] common_interrupt+0x1a/0x20
 [<c011098e>] delay_pmtmr+0xb/0x13
 [<f89c3e08>] ohci_irq_handler+0x5e2/0x7a9 [ohci1394]
 [<c013cccd>] handle_IRQ_event+0x2e/0x5a
 [<c013cd77>] __do_IRQ+0x7e/0xd7
 [<c0104f3a>] do_IRQ+0x4a/0x82
 =======================
 [<c01038ba>] common_interrupt+0x1a/0x20
 [<c0316c3a>] _spin_unlock_irqrestore+0xa/0xc
 [<f89c225d>] ohci_devctl+0x40a/0x5f7 [ohci1394]
 [<f8a41a03>] csr1212_fill_cache+0xdf/0x106 [ieee1394]
 [<f8a41b50>] csr1212_generate_csr_image+0x126/0x249 [ieee1394]
 [<f8a38135>] hpsb_reset_bus+0x20/0x26 [ieee1394]
 [<c012beef>] worker_thread+0x182/0x22a
 [<f8a3a22c>] delayed_reset_bus+0x0/0xc0 [ieee1394]
 [<c0119a67>] default_wake_function+0x0/0xc
 [<c012bd6d>] worker_thread+0x0/0x22a
 [<c012f8eb>] kthread+0x87/0x8b
 [<c012f864>] kthread+0x0/0x8b
 [<c0101309>] kernel_thread_helper+0x5/0xb
Comment 4 Stefan Richter 2007-01-06 08:19:50 UTC
I could not reproduce this yet on an x86 UP PC with 2 different FireWire cards
and on an x86 SMP PC with 3 other different FireWire cards.
Comment 5 Stefan Richter 2007-01-08 12:31:22 UTC
When I ran "modprobe ohci1394 && sleep 1 && modprobe -r ohci1394" in a loop, I
got this trace _once_ during more than 100 executions of the loop body:

Jan  8 21:24:24 shuttle kernel: BUG: at drivers/ieee1394/ohci1394.c:233
get_phy_reg()
Jan  8 21:24:24 shuttle kernel:  [show_trace_log_lvl+26/48]
show_trace_log_lvl+0x1a/0x30
Jan  8 21:24:24 shuttle kernel:  [<c010314a>] show_trace_log_lvl+0x1a/0x30
Jan  8 21:24:24 shuttle kernel:  [show_trace+18/32] show_trace+0x12/0x20
Jan  8 21:24:24 shuttle kernel:  [<c0103172>] show_trace+0x12/0x20
Jan  8 21:24:24 shuttle kernel:  [dump_stack+22/32] dump_stack+0x16/0x20
Jan  8 21:24:24 shuttle kernel:  [<c0103286>] dump_stack+0x16/0x20
Jan  8 21:24:24 shuttle kernel:  [pg0+946286745/1067766784]
get_phy_reg+0x99/0x130 [ohci1394]
Jan  8 21:24:24 shuttle kernel:  [<f8c24099>] get_phy_reg+0x99/0x130 [ohci1394]
Jan  8 21:24:24 shuttle kernel:  [pg0+946287215/1067766784]
set_phy_reg_mask+0x1f/0x40 [ohci1394]
Jan  8 21:24:24 shuttle kernel:  [<f8c2426f>] set_phy_reg_mask+0x1f/0x40 [ohci1394]
Jan  8 21:24:24 shuttle kernel:  [pg0+946287441/1067766784]
handle_selfid+0xc1/0x160 [ohci1394]
Jan  8 21:24:24 shuttle kernel:  [<f8c24351>] handle_selfid+0xc1/0x160 [ohci1394]
Jan  8 21:24:24 shuttle kernel:  [pg0+946298804/1067766784]
ohci_irq_handler+0x564/0x770 [ohci1394]
Jan  8 21:24:24 shuttle kernel:  [<f8c26fb4>] ohci_irq_handler+0x564/0x770
[ohci1394]

This is of course not a helpful way of recreating the circumstances of this bug.
Comment 6 Stefan Richter 2007-11-08 05:33:47 UTC
This bug is a candidate for WILL_NOT_FIX, since the alternative drivers from
Kristian Høgsberg do it right and might replace the ieee1394 drivers in mainline
eventually.

(There are still some fundamental problems to be solved in the alternative stack.  After these were addressed, this bug could be rejected as WILL_NOT_FIX.)