Bug 8906
Summary: | Some kind of Oops removing firewire_ohci module | ||
---|---|---|---|
Product: | Drivers | Reporter: | Bruce Duncan (bwduncan) |
Component: | IEEE1394 | Assignee: | Stefan Richter (stefanr) |
Status: | CLOSED CODE_FIX | ||
Severity: | normal | CC: | stefanr |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.23-rc3-hrt2 | Subsystem: | |
Regression: | --- | Bisected commit-id: | |
Attachments: |
invalid patch: raise refcount of card datastructure when scheduling work
invalid patch: raise refcount of card datastructure when scheduling work |
Description
Bruce Duncan
2007-08-19 08:23:59 UTC
Oh yes, perhaps I should mention that this -rc3 is from the git tree at rt2x00.serialmonkey.com, which has development drivers for my wireless card. They are in the rt2500pci, mac80211 etc. modules. But it doesn't seem to matter whether I unload these before firewire_ohci or not. I think I also neglected to mention that this is an AMD64 machine. Bruce Does the -hrt patch turn tasklets into workqueue jobs? If yes, then this is certainly a duplicate of bug 8646. I ran # while modprobe -r firewire-ohci; do sleep $P; modprobe firewire-ohci || break; sleep $P; done on vanilla 2.6.23-rc3 (x86-64, plus some recent firewire patches). With P=2 or P=1, no problem. With P=.2, bug 8646 happened. I will add the respective screenshot there shortly. I can easily reproduce the bug on 2.6.23-rc3 (with unrelated firewire patches): # modprobe firewire-ohci; sleep .1; modprobe -r firewire-ohci Aug 20 22:27:51 mini ACPI: PCI Interrupt 0000:03:03.0[A] -> GSI 19 (level, low) -> IRQ 19 Aug 20 22:27:51 mini firewire_ohci: Added fw-ohci device 0000:03:03.0, OHCI version 1.0 Aug 20 22:27:51 mini firewire_ohci: failed to set phy reg bits. Aug 20 22:27:51 mini ACPI: PCI interrupt for device 0000:03:03.0 disabled Aug 20 22:27:51 mini firewire_ohci: Removed fw-ohci device. Aug 20 22:27:51 mini Unable to handle kernel paging request at ffffffff8800b117 RIP: Aug 20 22:27:51 mini [<ffffffff8800b117>] Aug 20 22:27:51 mini PGD 203067 PUD 207063 PMD 1d3a0067 PTE 0 Aug 20 22:27:51 mini Oops: 0010 [1] PREEMPT SMP Aug 20 22:27:51 mini CPU 0 Aug 20 22:27:51 mini Modules linked in: nfs lockd sunrpc i915 drm applesmc led_class coretemp hwmon eeprom snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss rtc snd_hda_intel snd_pcm snd_timer snd snd_page_alloc thermal processor button sky2 i2c_i801 sg Aug 20 22:27:51 mini Pid: 9, comm: events/0 Not tainted 2.6.23-rc3 #4 Aug 20 22:27:51 mini RIP: 0010:[<ffffffff8800b117>] [<ffffffff8800b117>] Aug 20 22:27:51 mini RSP: 0018:ffff81001e07feb8 EFLAGS: 00010247 Aug 20 22:27:51 mini RAX: ffff81001e0a9b40 RBX: ffff8100093054f8 RCX: 0000000000000003 Aug 20 22:27:51 mini RDX: ffffffff8023e36c RSI: 0000000000000001 RDI: ffff8100093054f0 Aug 20 22:27:51 mini RBP: ffff81001e0a9b40 R08: 0000000000000001 R09: ffffffff8023e309 Aug 20 22:27:51 mini R10: 000000000057a460 R11: ffffffff803f23a1 R12: ffff8100093054f0 Aug 20 22:27:51 mini R13: ffffffff8800b117 R14: ffffffff80561200 R15: 0000000000000000 Aug 20 22:27:51 mini FS: 0000000000000000(0000) GS:ffffffff80529000(0000) knlGS:0000000000000000 Aug 20 22:27:51 mini CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b Aug 20 22:27:51 mini CR2: ffffffff8800b117 CR3: 000000001d397000 CR4: 00000000000006e0 Aug 20 22:27:51 mini DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Aug 20 22:27:51 mini DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Aug 20 22:27:51 mini Process events/0 (pid: 9, threadinfo ffff81001e07e000, task ffff81001e03b100) Aug 20 22:27:51 mini Stack: ffffffff8023e389 ffff81001e01fcf0 ffff81001e0a9b40 ffffffff8023eda4 Aug 20 22:27:51 mini ffff81001e01fcf0 ffffffffffffffff ffffffff8023ee81 0000000000000000 Aug 20 22:27:51 mini ffff81001e03b100 ffffffff80241d35 ffff81001e07ff08 ffff81001e07ff08 Aug 20 22:27:51 mini Call Trace: Aug 20 22:27:51 mini [<ffffffff8023e389>] run_workqueue+0x92/0x15e Aug 20 22:27:51 mini [<ffffffff8023eda4>] worker_thread+0x0/0xe7 Aug 20 22:27:51 mini [<ffffffff8023ee81>] worker_thread+0xdd/0xe7 Aug 20 22:27:51 mini [<ffffffff80241d35>] autoremove_wake_function+0x0/0x2e Aug 20 22:27:51 mini [<ffffffff80241c36>] kthread+0x47/0x75 Aug 20 22:27:51 mini [<ffffffff8020c578>] child_rip+0xa/0x12 Aug 20 22:27:51 mini [<ffffffff80241aad>] kthreadd+0x118/0x13d Aug 20 22:27:51 mini [<ffffffff80241bef>] kthread+0x0/0x75 Aug 20 22:27:51 mini [<ffffffff8020c56e>] child_rip+0x0/0x12 Aug 20 22:27:51 mini Aug 20 22:27:51 mini Aug 20 22:27:51 mini Code: Bad RIP value. Aug 20 22:27:51 mini RIP [<ffffffff8800b117>] Aug 20 22:27:51 mini RSP <ffff81001e07feb8> Aug 20 22:27:51 mini CR2: ffffffff8800b117 I.e. forget my comment #2. Note on "progress": I tried adding cancel_rearming_delayed_work(&card->work); at the top of fw-card.c::fw_core_remove_card(), plus the patch in http://marc.info/?l=linux1394-devel&m=118765115403632. Didn't help. The bug still exists in 2.6.24-rc3 (plus latest firewire development patches). Created attachment 13747 [details]
invalid patch: raise refcount of card datastructure when scheduling work
I attach this patch only for documentation purposes. This patch does *not* fix the bug. Maybe the workqueue jobs which are scheduled for devices (rather than the workqueu job for the card) cause the bug.
Created attachment 13748 [details]
invalid patch: raise refcount of card datastructure when scheduling work
I attach this patch only for documentation purposes. This patch does *not* fix the bug. Maybe the workqueue jobs which are scheduled for devices (rather than the workqueu job for the card) cause the bug.
Fixes posted: http://thread.gmane.org/gmane.linux.kernel.firewire.devel/11617 Also available in patchkit v646 and later at http://me.in-berlin.de/~s5r6/linux1394/updates/ The relevant patches of the patch series which comment #9 refers to have been merged in Linux 2.6.25-rc4. |