Most recent kernel where this bug did not occur: 2.6.22 Distribution: Debian unstable Hardware Environment: ULI laptop Software Environment: latest -rc with hrtimers patch Problem Description: modprobe -r firewire_ohci gives some kind of oops which stops my keyboard working Steps to reproduce: simply remove the firewire_ohci module. I don't have any firewire devices to plug in. However, expecting it to oops again, I tried rmmod'ing firewire_ohci and then firewire_core (its only dependency). It didn't oops. I've made it oops about 3 times now. Once, my keyboard just kept repeating the 'enter' key. Each other time it just stops responding. Music keeps playing and I can still use the magic SysRq keys. The oops is: ======= CUT ======== ACPI: PCI interrupt for device 0000:00:10.2 disabled firewire_ohci: Removed fw-ohci device. PGD 203067 PUD 207063 PMD 36f1d067 PTE 0 CPU 0 Modules linked in: rc80211_simple snd_hda_intel i2c_ali1535 i2c_ali15x3 snd_pcm snd_page_alloc mac80211 ehci_hcd ohci_hcd Pid: 5, comm: events/0 Not tainted 2.6.23-rc3-hrt2 #2 RIP: 0010:[<ffffffff8806a560>] [<ffffffff8806a560>] RSP: 0018:ffff810037f93eb8 EFLAGS: 00010247 RAX: ffff810037fc7f80 RBX: ffff810037fc7f80 RCX: ffff8100367dd680 RDX: ffff810037fc7f80 RSI: ffff810037f93ee0 RDI: ffff8100367dd678 RBP: ffffffff8806a560 R08: ffff810037f92000 R09: 0000000000000001 R10: 000000000040c3a9 R11: 0000000000000246 R12: ffffffff8023cf00 R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000 FS: 00002af78a6d76e0(0000) GS:ffffffff80677000(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: ffffffff8806a560 CR3: 0000000022092000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process events/0 (pid: 5, threadinfo ffff810037f92000, task ffff810037f90000) Stack: ffffffff8023c62f 0000000000000000 ffff810037fc7f90 ffff810037fc7f80 ffffffff8023cfc3 0000000000000000 ffff810037f90000 ffffffff80240250 ffff810037f93ef8 ffff810037f93ef8 0000000000000001 00000000fffffffc Call Trace: [<ffffffff8023c62f>] run_workqueue+0x6f/0xf0 [<ffffffff8023cfc3>] worker_thread+0xc3/0x130 [<ffffffff80240250>] autoremove_wake_function+0x0/0x30 [<ffffffff8023cf00>] worker_thread+0x0/0x130 [<ffffffff8023fe2b>] kthread+0x4b/0x80 [<ffffffff8020c2a8>] child_rip+0xa/0x12 [<ffffffff8023fde0>] kthread+0x0/0x80 [<ffffffff8020c29e>] child_rip+0x0/0x12 Code: Bad RIP value. RSP <ffff810037f93eb8> SysRq : Emergency Sync Emergency Sync complete SysRq : Emergency Remount R/O ======= CUT ====== Bruce
Oh yes, perhaps I should mention that this -rc3 is from the git tree at rt2x00.serialmonkey.com, which has development drivers for my wireless card. They are in the rt2500pci, mac80211 etc. modules. But it doesn't seem to matter whether I unload these before firewire_ohci or not. I think I also neglected to mention that this is an AMD64 machine. Bruce
Does the -hrt patch turn tasklets into workqueue jobs? If yes, then this is certainly a duplicate of bug 8646. I ran # while modprobe -r firewire-ohci; do sleep $P; modprobe firewire-ohci || break; sleep $P; done on vanilla 2.6.23-rc3 (x86-64, plus some recent firewire patches). With P=2 or P=1, no problem. With P=.2, bug 8646 happened. I will add the respective screenshot there shortly.
I can easily reproduce the bug on 2.6.23-rc3 (with unrelated firewire patches): # modprobe firewire-ohci; sleep .1; modprobe -r firewire-ohci Aug 20 22:27:51 mini ACPI: PCI Interrupt 0000:03:03.0[A] -> GSI 19 (level, low) -> IRQ 19 Aug 20 22:27:51 mini firewire_ohci: Added fw-ohci device 0000:03:03.0, OHCI version 1.0 Aug 20 22:27:51 mini firewire_ohci: failed to set phy reg bits. Aug 20 22:27:51 mini ACPI: PCI interrupt for device 0000:03:03.0 disabled Aug 20 22:27:51 mini firewire_ohci: Removed fw-ohci device. Aug 20 22:27:51 mini Unable to handle kernel paging request at ffffffff8800b117 RIP: Aug 20 22:27:51 mini [<ffffffff8800b117>] Aug 20 22:27:51 mini PGD 203067 PUD 207063 PMD 1d3a0067 PTE 0 Aug 20 22:27:51 mini Oops: 0010 [1] PREEMPT SMP Aug 20 22:27:51 mini CPU 0 Aug 20 22:27:51 mini Modules linked in: nfs lockd sunrpc i915 drm applesmc led_class coretemp hwmon eeprom snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss rtc snd_hda_intel snd_pcm snd_timer snd snd_page_alloc thermal processor button sky2 i2c_i801 sg Aug 20 22:27:51 mini Pid: 9, comm: events/0 Not tainted 2.6.23-rc3 #4 Aug 20 22:27:51 mini RIP: 0010:[<ffffffff8800b117>] [<ffffffff8800b117>] Aug 20 22:27:51 mini RSP: 0018:ffff81001e07feb8 EFLAGS: 00010247 Aug 20 22:27:51 mini RAX: ffff81001e0a9b40 RBX: ffff8100093054f8 RCX: 0000000000000003 Aug 20 22:27:51 mini RDX: ffffffff8023e36c RSI: 0000000000000001 RDI: ffff8100093054f0 Aug 20 22:27:51 mini RBP: ffff81001e0a9b40 R08: 0000000000000001 R09: ffffffff8023e309 Aug 20 22:27:51 mini R10: 000000000057a460 R11: ffffffff803f23a1 R12: ffff8100093054f0 Aug 20 22:27:51 mini R13: ffffffff8800b117 R14: ffffffff80561200 R15: 0000000000000000 Aug 20 22:27:51 mini FS: 0000000000000000(0000) GS:ffffffff80529000(0000) knlGS:0000000000000000 Aug 20 22:27:51 mini CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b Aug 20 22:27:51 mini CR2: ffffffff8800b117 CR3: 000000001d397000 CR4: 00000000000006e0 Aug 20 22:27:51 mini DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Aug 20 22:27:51 mini DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Aug 20 22:27:51 mini Process events/0 (pid: 9, threadinfo ffff81001e07e000, task ffff81001e03b100) Aug 20 22:27:51 mini Stack: ffffffff8023e389 ffff81001e01fcf0 ffff81001e0a9b40 ffffffff8023eda4 Aug 20 22:27:51 mini ffff81001e01fcf0 ffffffffffffffff ffffffff8023ee81 0000000000000000 Aug 20 22:27:51 mini ffff81001e03b100 ffffffff80241d35 ffff81001e07ff08 ffff81001e07ff08 Aug 20 22:27:51 mini Call Trace: Aug 20 22:27:51 mini [<ffffffff8023e389>] run_workqueue+0x92/0x15e Aug 20 22:27:51 mini [<ffffffff8023eda4>] worker_thread+0x0/0xe7 Aug 20 22:27:51 mini [<ffffffff8023ee81>] worker_thread+0xdd/0xe7 Aug 20 22:27:51 mini [<ffffffff80241d35>] autoremove_wake_function+0x0/0x2e Aug 20 22:27:51 mini [<ffffffff80241c36>] kthread+0x47/0x75 Aug 20 22:27:51 mini [<ffffffff8020c578>] child_rip+0xa/0x12 Aug 20 22:27:51 mini [<ffffffff80241aad>] kthreadd+0x118/0x13d Aug 20 22:27:51 mini [<ffffffff80241bef>] kthread+0x0/0x75 Aug 20 22:27:51 mini [<ffffffff8020c56e>] child_rip+0x0/0x12 Aug 20 22:27:51 mini Aug 20 22:27:51 mini Aug 20 22:27:51 mini Code: Bad RIP value. Aug 20 22:27:51 mini RIP [<ffffffff8800b117>] Aug 20 22:27:51 mini RSP <ffff81001e07feb8> Aug 20 22:27:51 mini CR2: ffffffff8800b117
I.e. forget my comment #2.
Note on "progress": I tried adding cancel_rearming_delayed_work(&card->work); at the top of fw-card.c::fw_core_remove_card(), plus the patch in http://marc.info/?l=linux1394-devel&m=118765115403632. Didn't help.
The bug still exists in 2.6.24-rc3 (plus latest firewire development patches).
Created attachment 13747 [details] invalid patch: raise refcount of card datastructure when scheduling work I attach this patch only for documentation purposes. This patch does *not* fix the bug. Maybe the workqueue jobs which are scheduled for devices (rather than the workqueu job for the card) cause the bug.
Created attachment 13748 [details] invalid patch: raise refcount of card datastructure when scheduling work I attach this patch only for documentation purposes. This patch does *not* fix the bug. Maybe the workqueue jobs which are scheduled for devices (rather than the workqueu job for the card) cause the bug.
Fixes posted: http://thread.gmane.org/gmane.linux.kernel.firewire.devel/11617 Also available in patchkit v646 and later at http://me.in-berlin.de/~s5r6/linux1394/updates/
The relevant patches of the patch series which comment #9 refers to have been merged in Linux 2.6.25-rc4.