Bug 8646 - fw-ohci and ohci1394: panic in softirq, below smp_apic_timer_interrupt
Summary: fw-ohci and ohci1394: panic in softirq, below smp_apic_timer_interrupt
Status: CLOSED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: IEEE1394 (show other bugs)
Hardware: All Linux
: P1 high
Assignee: drivers_ieee1394
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-06-17 14:04 UTC by Stefan Richter
Modified: 2008-02-24 10:52 UTC (History)
0 users

See Also:
Kernel Version: all
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
screenshot (713.62 KB, image/jpeg)
2007-06-17 14:07 UTC, Stefan Richter
Details
screenshot (119.07 KB, image/jpeg)
2007-08-19 09:34 UTC, Stefan Richter
Details
kill tasklets in fw-ohci::pci_remove (1.17 KB, patch)
2007-08-20 14:22 UTC, Stefan Richter
Details | Diff
screenshot (109.16 KB, image/jpeg)
2007-08-20 14:32 UTC, Stefan Richter
Details
firewire: fix unloading of fw-ohci while devices are attached (948 bytes, patch)
2007-08-20 16:25 UTC, Stefan Richter
Details | Diff

Description Stefan Richter 2007-06-17 14:04:35 UTC
Most recent kernel where this bug did not occur: unknown
Hardware Environment: SMP i586 and SMP x86-64
Software Environment: Linux 2.6.x

If an SBP-2 device is attached and sbp2 or firewire-sbp2 is loaded, any of the following commands

# modprobe -r sbp2 && modprobe -r ohci1394
# modprobe -r firewire-ohci
# modprobe -r firewire-sbp2 firewire-ohci
# modprobe -r firewire-sbp2 && sleep 0 && modprobe -r firewire-ohci

will lead to a panic with a trace similar to this:
general protection fault [...]
Pid: 0, comm: swapper [...]
run_timer_softirq
__do_softirq
call_softirq
do_softirq
irq_exit
smp_apic_timer_interrupt
mwait_idle
apic_timer_interrupt
[...]

This happened on two different i945GM based boards with Core 2 Duo, 32-bit and 64-bit kernels.  The last time I tried this with ohci1394/sbp2 is a while ago.  I just saw it now happening in the new drivers too.

The same trace also happened on an older kernel repeatedly in a totally different context, without FireWire drivers loaded:  It could be triggered by "make -j" in the kernel source tree, i.e. by a spawning something with many subthreads.  I don't have a spare machine with enough RAM available to test this again with a recent kernel, for now.  I will try to make the machine where I saw it available again.

I.e. the bug may be entirely outside the old and new FireWire drivers.

*Not* affected are:
  - The same machines with "modprobe -r firewire-sbp2 && sleep 0.1 && modprobe -r firewire-ohci",
  - A VIA KM-266/ AMD-Athlon single-CPU PC, even if running an SMP kernel.

I will attach a sample screenshot in a follow-up entry.
Comment 1 Stefan Richter 2007-06-17 14:07:30 UTC
Created attachment 11773 [details]
screenshot
Comment 2 Stefan Richter 2007-07-28 04:51:59 UTC
An old Pentium MMX notebook also crashes this way but does not print out a panic message.  Tested with UP PREEMPT kernel.
Comment 3 Stefan Richter 2007-08-19 09:34:21 UTC
Created attachment 12445 [details]
screenshot

panic on 2.6.23-rc3 x86-64
triggered by modprobe -r firewire-ohci shortly after modprobe firewire-ohci,
firewire-sbp2 was not loaded
Comment 4 Stefan Richter 2007-08-20 14:22:16 UTC
Created attachment 12461 [details]
kill tasklets in fw-ohci::pci_remove

It seems this patch doesn't help.  See next screenshot.
Comment 5 Stefan Richter 2007-08-20 14:32:27 UTC
Created attachment 12462 [details]
screenshot

panic, with patch id=12461 applied, triggered by modprobe -r firewire-ohci with an SBP-2 disk attached and firewire-sbp2 loaded

The only and probably insignificant difference is that __update_rq_clock (a new function in 2.6.23-rc3) appears in the trace between __do_softirq and run_timer_softirq.
Comment 6 Stefan Richter 2007-08-20 16:25:54 UTC
Created attachment 12463 [details]
firewire: fix unloading of fw-ohci while devices are attached

Fixes modprobe -r firewire-ohci in presence of an SBP-2 device.
Comment 7 Stefan Richter 2007-08-20 16:35:36 UTC
Still to do:

  - Check if patch attachment 12469 [details] also fixed the bug per comment #3.
    We can't tell for sure as long as bug 8906 is unfixed.

  - Fix the bug for the old ieee1394 stack.
Comment 8 Stefan Richter 2007-08-20 16:36:41 UTC
(I meant attachment 12463 [details] of course.)
Comment 9 Stefan Richter 2007-09-18 14:59:11 UTC
attachment 12463 [details] has been merged in linux 2.6.23-rc4
Comment 10 Stefan Richter 2008-02-24 10:52:04 UTC
Re comment #7:
Fixes for bug 8906 have been posted, bug per comment #3 does not happen anymore.  I.e. only the old ieee1394 stack _may_ still be affected.

I am closing this bug now; if anybody still encounters this bug with the ieee1394 stack, please reopen and rename this bug.

Note You need to log in before you can comment on or make changes to this bug.