Bug 8906 - Some kind of Oops removing firewire_ohci module
Summary: Some kind of Oops removing firewire_ohci module
Status: CLOSED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: IEEE1394 (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Stefan Richter
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-08-19 08:23 UTC by Bruce Duncan
Modified: 2008-03-11 05:34 UTC (History)
1 user (show)

See Also:
Kernel Version: 2.6.23-rc3-hrt2
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
invalid patch: raise refcount of card datastructure when scheduling work (3.42 KB, patch)
2007-11-25 13:24 UTC, Stefan Richter
Details | Diff
invalid patch: raise refcount of card datastructure when scheduling work (3.42 KB, patch)
2007-11-25 13:24 UTC, Stefan Richter
Details | Diff

Description Bruce Duncan 2007-08-19 08:23:59 UTC
Most recent kernel where this bug did not occur: 2.6.22
Distribution: Debian unstable
Hardware Environment: ULI laptop
Software Environment: latest -rc with hrtimers patch
Problem Description: modprobe -r firewire_ohci gives some kind of oops which stops my keyboard working

Steps to reproduce: simply remove the firewire_ohci module. I don't have any firewire devices to plug in.

However, expecting it to oops again, I tried rmmod'ing firewire_ohci and then firewire_core (its only dependency). It didn't oops.

I've made it oops about 3 times now. Once, my keyboard just kept repeating the 'enter' key. Each other time it just stops responding. Music keeps playing and I can still use the magic SysRq keys.

The oops is:
======= CUT ========
ACPI: PCI interrupt for device 0000:00:10.2 disabled
firewire_ohci: Removed fw-ohci device.
PGD 203067 PUD 207063 PMD 36f1d067 PTE 0
CPU 0
Modules linked in: rc80211_simple snd_hda_intel i2c_ali1535 i2c_ali15x3 snd_pcm snd_page_alloc mac80211 ehci_hcd ohci_hcd
Pid: 5, comm: events/0 Not tainted 2.6.23-rc3-hrt2 #2
RIP: 0010:[<ffffffff8806a560>]  [<ffffffff8806a560>]
RSP: 0018:ffff810037f93eb8  EFLAGS: 00010247
RAX: ffff810037fc7f80 RBX: ffff810037fc7f80 RCX: ffff8100367dd680
RDX: ffff810037fc7f80 RSI: ffff810037f93ee0 RDI: ffff8100367dd678
RBP: ffffffff8806a560 R08: ffff810037f92000 R09: 0000000000000001
R10: 000000000040c3a9 R11: 0000000000000246 R12: ffffffff8023cf00
R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000
FS:  00002af78a6d76e0(0000) GS:ffffffff80677000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: ffffffff8806a560 CR3: 0000000022092000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process events/0 (pid: 5, threadinfo ffff810037f92000, task ffff810037f90000)
Stack:  ffffffff8023c62f 0000000000000000 ffff810037fc7f90 ffff810037fc7f80
 ffffffff8023cfc3 0000000000000000 ffff810037f90000 ffffffff80240250
 ffff810037f93ef8 ffff810037f93ef8 0000000000000001 00000000fffffffc
Call Trace:
 [<ffffffff8023c62f>] run_workqueue+0x6f/0xf0
 [<ffffffff8023cfc3>] worker_thread+0xc3/0x130
 [<ffffffff80240250>] autoremove_wake_function+0x0/0x30
 [<ffffffff8023cf00>] worker_thread+0x0/0x130
 [<ffffffff8023fe2b>] kthread+0x4b/0x80
 [<ffffffff8020c2a8>] child_rip+0xa/0x12
 [<ffffffff8023fde0>] kthread+0x0/0x80
 [<ffffffff8020c29e>] child_rip+0x0/0x12


Code:  Bad RIP value.
 RSP <ffff810037f93eb8>
SysRq : Emergency Sync
Emergency Sync complete
SysRq : Emergency Remount R/O
======= CUT ======

Bruce
Comment 1 Bruce Duncan 2007-08-19 08:27:18 UTC
Oh yes, perhaps I should mention that this -rc3 is from the git tree at rt2x00.serialmonkey.com, which has development drivers for my wireless card. They are in the rt2500pci, mac80211 etc. modules. But it doesn't seem to matter whether I unload these before firewire_ohci or not.

I think I also neglected to mention that this is an AMD64 machine.

Bruce
Comment 2 Stefan Richter 2007-08-19 09:02:46 UTC
Does the -hrt patch turn tasklets into workqueue jobs?
If yes, then this is certainly a duplicate of bug 8646.

I ran
# while modprobe -r firewire-ohci; do sleep $P; modprobe firewire-ohci || break; sleep $P; done
on vanilla 2.6.23-rc3 (x86-64, plus some recent firewire patches).  With P=2 or P=1, no problem.  With P=.2, bug 8646 happened.  I will add the respective screenshot there shortly.
Comment 3 Stefan Richter 2007-08-20 13:52:58 UTC
I can easily reproduce the bug on 2.6.23-rc3 (with unrelated firewire patches):

# modprobe firewire-ohci; sleep .1; modprobe -r firewire-ohci

Aug 20 22:27:51 mini ACPI: PCI Interrupt 0000:03:03.0[A] -> GSI 19 (level, low) -> IRQ 19
Aug 20 22:27:51 mini firewire_ohci: Added fw-ohci device 0000:03:03.0, OHCI version 1.0
Aug 20 22:27:51 mini firewire_ohci: failed to set phy reg bits.
Aug 20 22:27:51 mini ACPI: PCI interrupt for device 0000:03:03.0 disabled
Aug 20 22:27:51 mini firewire_ohci: Removed fw-ohci device.
Aug 20 22:27:51 mini Unable to handle kernel paging request at ffffffff8800b117 RIP: 
Aug 20 22:27:51 mini [<ffffffff8800b117>]
Aug 20 22:27:51 mini PGD 203067 PUD 207063 PMD 1d3a0067 PTE 0
Aug 20 22:27:51 mini Oops: 0010 [1] PREEMPT SMP 
Aug 20 22:27:51 mini CPU 0 
Aug 20 22:27:51 mini Modules linked in: nfs lockd sunrpc i915 drm applesmc led_class coretemp hwmon eeprom snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss rtc snd_hda_intel snd_pcm snd_timer snd snd_page_alloc thermal processor button sky2 i2c_i801 sg
Aug 20 22:27:51 mini Pid: 9, comm: events/0 Not tainted 2.6.23-rc3 #4
Aug 20 22:27:51 mini RIP: 0010:[<ffffffff8800b117>]  [<ffffffff8800b117>]
Aug 20 22:27:51 mini RSP: 0018:ffff81001e07feb8  EFLAGS: 00010247
Aug 20 22:27:51 mini RAX: ffff81001e0a9b40 RBX: ffff8100093054f8 RCX: 0000000000000003
Aug 20 22:27:51 mini RDX: ffffffff8023e36c RSI: 0000000000000001 RDI: ffff8100093054f0
Aug 20 22:27:51 mini RBP: ffff81001e0a9b40 R08: 0000000000000001 R09: ffffffff8023e309
Aug 20 22:27:51 mini R10: 000000000057a460 R11: ffffffff803f23a1 R12: ffff8100093054f0
Aug 20 22:27:51 mini R13: ffffffff8800b117 R14: ffffffff80561200 R15: 0000000000000000
Aug 20 22:27:51 mini FS:  0000000000000000(0000) GS:ffffffff80529000(0000) knlGS:0000000000000000
Aug 20 22:27:51 mini CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
Aug 20 22:27:51 mini CR2: ffffffff8800b117 CR3: 000000001d397000 CR4: 00000000000006e0
Aug 20 22:27:51 mini DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Aug 20 22:27:51 mini DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Aug 20 22:27:51 mini Process events/0 (pid: 9, threadinfo ffff81001e07e000, task ffff81001e03b100)
Aug 20 22:27:51 mini Stack:  ffffffff8023e389 ffff81001e01fcf0 ffff81001e0a9b40 ffffffff8023eda4
Aug 20 22:27:51 mini ffff81001e01fcf0 ffffffffffffffff ffffffff8023ee81 0000000000000000
Aug 20 22:27:51 mini ffff81001e03b100 ffffffff80241d35 ffff81001e07ff08 ffff81001e07ff08
Aug 20 22:27:51 mini Call Trace:
Aug 20 22:27:51 mini [<ffffffff8023e389>] run_workqueue+0x92/0x15e
Aug 20 22:27:51 mini [<ffffffff8023eda4>] worker_thread+0x0/0xe7
Aug 20 22:27:51 mini [<ffffffff8023ee81>] worker_thread+0xdd/0xe7
Aug 20 22:27:51 mini [<ffffffff80241d35>] autoremove_wake_function+0x0/0x2e
Aug 20 22:27:51 mini [<ffffffff80241c36>] kthread+0x47/0x75
Aug 20 22:27:51 mini [<ffffffff8020c578>] child_rip+0xa/0x12
Aug 20 22:27:51 mini [<ffffffff80241aad>] kthreadd+0x118/0x13d
Aug 20 22:27:51 mini [<ffffffff80241bef>] kthread+0x0/0x75
Aug 20 22:27:51 mini [<ffffffff8020c56e>] child_rip+0x0/0x12
Aug 20 22:27:51 mini 
Aug 20 22:27:51 mini 
Aug 20 22:27:51 mini Code:  Bad RIP value.
Aug 20 22:27:51 mini RIP  [<ffffffff8800b117>]
Aug 20 22:27:51 mini RSP <ffff81001e07feb8>
Aug 20 22:27:51 mini CR2: ffffffff8800b117
Comment 4 Stefan Richter 2007-08-20 13:54:16 UTC
I.e. forget my comment #2.
Comment 5 Stefan Richter 2007-08-21 09:55:39 UTC
Note on "progress": I tried adding
        cancel_rearming_delayed_work(&card->work);
at the top of fw-card.c::fw_core_remove_card(), plus the patch in http://marc.info/?l=linux1394-devel&m=118765115403632. Didn't help.
Comment 6 Stefan Richter 2007-11-25 09:55:04 UTC
The bug still exists in 2.6.24-rc3 (plus latest firewire development patches).
Comment 7 Stefan Richter 2007-11-25 13:24:51 UTC
Created attachment 13747 [details]
invalid patch: raise refcount of card datastructure when scheduling work

I attach this patch only for documentation purposes.  This patch does *not* fix the bug.  Maybe the workqueue jobs which are scheduled for devices (rather than the workqueu job for the card) cause the bug.
Comment 8 Stefan Richter 2007-11-25 13:24:57 UTC
Created attachment 13748 [details]
invalid patch: raise refcount of card datastructure when scheduling work

I attach this patch only for documentation purposes.  This patch does *not* fix the bug.  Maybe the workqueue jobs which are scheduled for devices (rather than the workqueu job for the card) cause the bug.
Comment 9 Stefan Richter 2008-02-24 10:46:39 UTC
Fixes posted:
http://thread.gmane.org/gmane.linux.kernel.firewire.devel/11617

Also available in patchkit v646 and later at
http://me.in-berlin.de/~s5r6/linux1394/updates/
Comment 10 Stefan Richter 2008-03-11 05:34:24 UTC
The relevant patches of the patch series which comment #9 refers to have been merged in Linux 2.6.25-rc4.

Note You need to log in before you can comment on or make changes to this bug.