9870 – general protection fault loading and unloading modules

Bug 9870 - general protection fault loading and unloading modules

Summary: general protection fault loading and unloading modules

Status:	CLOSED CODE_FIX

Alias:	None

Product:	Drivers
Classification:	Unclassified
Component:	IEEE1394 (show other bugs)
Hardware:	All Linux

Importance:	P1 normal
Assignee:	Stefan Richter

URL:
Keywords:

Depends on:
Blocks:

Reported:	2008-02-01 11:28 UTC by Jarod Wilson
Modified:	2008-03-11 05:36 UTC (History)
CC List:	0 users

See Also:
Kernel Version:	2.6.24 + latest linux1394-git
Subsystem:
Regression:	---
Bisected commit-id:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Description Jarod Wilson 2008-02-01 11:28:13 UTC

Distribution: Fedora devel
Hardware Environment: Athlon 64, Ti FireWire + Via VT6307 FireWire controllers
Problem Description: general protection fault encountered loading and unloading modules.

Steps to reproduce:

[root@xantham ~]# while :
> do
> rmmod firewire-sbp2
> rmmod firewire-ohci
> modprobe firewire-ohci
> rmmod firewire-sbp2
> rmmod firewire-ohci
> rmmod firewire-core
> modprobe firewire-ohci
> sleep 10
> done

After a few loops, I got the following:

general protection fault: 0000 [1] SMP 
CPU 0 
Modules linked in: firewire_sbp2 firewire_ohci firewire_core radeon drm ipt_MASQUERADE iptable_nat nf_nat bridge rfcomm l2cap bluetooth autofs4 sunrpc nf_conntrack_ipv4 ipt_REJECT iptable_filter ip_tables nf_conntrack_ipv6 xt_state nf_conntrack xt_tcpudp ip6t_ipv6header ip6t_REJECT ip6table_filter ip6_tables x_tables ipv6 cpufreq_ondemand dm_multipath parport_pc parport snd_intel8x0 snd_ac97_codec floppy ac97_bus snd_seq_dummy snd_seq_oss snd_seq_midi_event serio_raw snd_seq pcspkr snd_seq_device snd_pcm_oss crc_itu_t snd_mixer_oss k8temp hwmon snd_pcm snd_timer snd soundcore snd_page_alloc forcedeth i2c_nforce2 i2c_core button sr_mod sg cdrom pata_amd dm_snapshot dm_zero dm_mirror dm_mod shpchp sata_sil libata sd_mod scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd
Pid: 6, comm: events/0 Not tainted 2.6.24-9.fc9 #1
RIP: 0010:[<ffffffff8818e2c0>]  [<ffffffff8818e2c0>] :firewire_core:fw_card_bm_work+0x1b9/0x292
RSP: 0018:ffff81003fa1dd30  EFLAGS: 00010002
RAX: 0000000000000001 RBX: ffff8100368e7518 RCX: 000000010009faa3
RDX: 0000000000000001 RSI: 000000010009837e RDI: 0000000000020340
RBP: ffff81003f9a7e18 R08: 0000000000000000 R09: 0000000000000001
R10: ffffffff8818e132 R11: ffffffff8818f382 R12: ffff8100368e7000
R13: ffff8100368e74a8 R14: 0000000000000286 R15: 6b6b6b6b6b6b6b6b
FS:  00002aaaaaaca7b0(0000) GS:ffffffff813ee000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00002aaaaad5c000 CR3: 000000003a9b3000 CR4: 00000000000006a0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process events/0 (pid: 6, threadinfo ffff81003fa1c000, task ffff81003fa1a000)
Stack:  ffffffff810a23bd ffff81003fa1a000 ffffffff81483cd8 0000000000000046
 ffffffff81483cc0 ffffe200016cb240 ffff81003fa1a8b0 0000000100006b6b
 0000000000000002 ffffffff81055985 ffffffff81483cc0 ffff81003fa1a000
Call Trace:
 [<ffffffff810a23bd>] add_partial_tail+0x12/0x34
 [<ffffffff81055985>] mark_held_locks+0x49/0x67
 [<ffffffff810a44a3>] kfree+0xe1/0xec
 [<ffffffff81055b36>] trace_hardirqs_on+0x107/0x12a
 [<ffffffff81053c99>] lock_release_holdtime+0x27/0x49
 [<ffffffff8818e107>] :firewire_core:fw_card_bm_work+0x0/0x292
 [<ffffffff8818e107>] :firewire_core:fw_card_bm_work+0x0/0x292
 [<ffffffff81046cbe>] run_workqueue+0xdf/0x1df
 [<ffffffff81047793>] worker_thread+0x0/0xe7
 [<ffffffff81047870>] worker_thread+0xdd/0xe7
 [<ffffffff8104ad0a>] autoremove_wake_function+0x0/0x2e
 [<ffffffff8104abea>] kthread+0x47/0x75
 [<ffffffff81275fb5>] trace_hardirqs_on_thunk+0x35/0x3a
 [<ffffffff8100cde8>] child_rip+0xa/0x12
 [<ffffffff8100c4ff>] restore_args+0x0/0x30
 [<ffffffff8104aba3>] kthread+0x0/0x75
 [<ffffffff8100cdde>] child_rip+0x0/0x12


Code: 41 83 3f 01 75 72 49 8b 87 a0 03 00 00 8b 5c 24 38 f6 40 0b 
RIP  [<ffffffff8818e2c0>] :firewire_core:fw_card_bm_work+0x1b9/0x292
 RSP <ffff81003fa1dd30>
---[ end trace 4c9652622aa156e6 ]---


A bit after that, a spinlock lockup warning:

BUG: spinlock lockup on CPU#0, events/0/6, ffff8100368e74a8 (Tainted: G      D)
Pid: 6, comm: events/0 Tainted: G      D 2.6.24-9.fc9 #1

Call Trace:
 <IRQ>  [<ffffffff8113213c>] _raw_spin_lock+0xd7/0xfe
 [<ffffffff8818e08b>] :firewire_core:flush_timer_callback+0x0/0x5
 [<ffffffff812769e8>] _spin_lock_irqsave+0x4e/0x5e
 [<ffffffff8818f23e>] :firewire_core:fw_flush_transactions+0x23/0xb2
 [<ffffffff8103fd6b>] run_timer_softirq+0x166/0x1da
 [<ffffffff8103d23d>] __do_softirq+0x5e/0xe0
 [<ffffffff8100d0cc>] call_softirq+0x1c/0x28
 [<ffffffff8100e6ce>] do_softirq+0x31/0x86
 [<ffffffff8103d19b>] irq_exit+0x4e/0x92
 [<ffffffff8101d23b>] smp_apic_timer_interrupt+0x3f/0x54
 [<ffffffff8103b239>] do_exit+0x208/0x7c8
 [<ffffffff8100cc0b>] apic_timer_interrupt+0x6b/0x70
 <EOI>  [<ffffffff810648b9>] acct_collect+0xa4/0x18f
 [<ffffffff810648b9>] acct_collect+0xa4/0x18f
 [<ffffffff8103b239>] do_exit+0x208/0x7c8
 [<ffffffff812768ea>] _spin_unlock_irq+0x26/0x27
 [<ffffffff8103b239>] do_exit+0x208/0x7c8
 [<ffffffff8100d6e6>] kernel_math_error+0x0/0x71
 [<ffffffff81276c1d>] error_exit+0x0/0xa9
 [<ffffffff8818f382>] :firewire_core:transmit_complete_callback+0x0/0x56
 [<ffffffff8818e132>] :firewire_core:fw_card_bm_work+0x2b/0x292
 [<ffffffff8818e2c0>] :firewire_core:fw_card_bm_work+0x1b9/0x292
 [<ffffffff810a23bd>] add_partial_tail+0x12/0x34
 [<ffffffff81055985>] mark_held_locks+0x49/0x67
 [<ffffffff810a44a3>] kfree+0xe1/0xec
 [<ffffffff81055b36>] trace_hardirqs_on+0x107/0x12a
 [<ffffffff81053c99>] lock_release_holdtime+0x27/0x49
 [<ffffffff8818e107>] :firewire_core:fw_card_bm_work+0x0/0x292
 [<ffffffff8818e107>] :firewire_core:fw_card_bm_work+0x0/0x292
 [<ffffffff81046cbe>] run_workqueue+0xdf/0x1df
 [<ffffffff81047793>] worker_thread+0x0/0xe7
 [<ffffffff81047870>] worker_thread+0xdd/0xe7
 [<ffffffff8104ad0a>] autoremove_wake_function+0x0/0x2e
 [<ffffffff8104abea>] kthread+0x47/0x75
 [<ffffffff81275fb5>] trace_hardirqs_on_thunk+0x35/0x3a
 [<ffffffff8100cde8>] child_rip+0xa/0x12
 [<ffffffff8100c4ff>] restore_args+0x0/0x30
 [<ffffffff8104aba3>] kthread+0x0/0x75
 [<ffffffff8100cdde>] child_rip+0x0/0x12

Comment 1 Jarod Wilson 2008-02-01 11:29:44 UTC

Of possible relevance is that when things *don't* lock up, I'm usually seeing a "firewire_core: BM lock failed, making local node (ffc0) root." message for the Via controller.

Comment 2 Jarod Wilson 2008-02-01 11:43:31 UTC

Yeesh. Same box on reboot, moving the hub and iidc camera over to the Via controller from the Ti, and the dv camera from the Ti over to the Via:

kernel BUG at lib/list_debug.c:33!
invalid opcode: 0000 [1] SMP 
CPU 0 
Modules linked in: radeon drm ipt_MASQUERADE iptable_nat nf_nat bridge rfcomm l2cap bluetooth autofs4 sunrpc nf_conntrack_ipv4 ipt_REJECT iptable_filter ip_tables nf_conntrack_ipv6 xt_state nf_conntrack xt_tcpudp ip6t_ipv6header ip6t_REJECT ip6table_filter ip6_tables x_tables ipv6 cpufreq_ondemand dm_multipath firewire_sbp2 parport_pc parport floppy snd_intel8x0 snd_ac97_codec ac97_bus snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq pcspkr firewire_ohci firewire_core snd_seq_device serio_raw snd_pcm_oss crc_itu_t snd_mixer_oss k8temp hwmon snd_pcm snd_timer snd soundcore snd_page_alloc forcedeth i2c_nforce2 i2c_core button sr_mod sg cdrom pata_amd dm_snapshot dm_zero dm_mirror dm_mod shpchp sata_sil libata sd_mod scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd
Pid: 0, comm: swapper Not tainted 2.6.24-9.fc9 #1
RIP: 0010:[<ffffffff8113259a>]  [<ffffffff8113259a>] __list_add+0x47/0x5b
RSP: 0018:ffffffff8155adf0  EFLAGS: 00010086
RAX: 0000000000000079 RBX: ffff81003e9cc6e8 RCX: ffff81003d53c198
RDX: ffffffff813a37a0 RSI: 0000000000000001 RDI: ffffffff813a92a0
RBP: ffff81003d53c198 R08: ffffffff813a92c0 R09: ffff810001005900
R10: 000000000000a7b9 R11: 0000000000000000 R12: ffff81003e0e0000
R13: 0000000000000000 R14: 000000000000003f R15: ffff81003d53c220
FS:  0000000040a00950(0000) GS:ffffffff813ee000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000647000 CR3: 000000003c844000 CR4: 00000000000006a0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffffffff81498000, task ffffffff813a37a0)
Stack:  ffff81003e0e0000 ffffffff88193fbb ffffffff813a4008 ffffffff00000002
 ffff81003d53c1a8 ffff81003e0e0648 0000000100012a4e ffff81003e0e064c
 ffff81003e0e0640 0000000300000000 ffff81003e0e04a8 0000000000000246
Call Trace:
 <IRQ>  [<ffffffff88193fbb>] :firewire_core:fw_core_handle_bus_reset+0x67a/0x758
 [<ffffffff81276950>] _spin_unlock_irqrestore+0x3e/0x44
 [<ffffffff8103d32c>] tasklet_action+0x2e/0xb0
 [<ffffffff8103d35b>] tasklet_action+0x5d/0xb0
 [<ffffffff8103d23d>] __do_softirq+0x5e/0xe0
 [<ffffffff81053c99>] lock_release_holdtime+0x27/0x49
 [<ffffffff8100d0cc>] call_softirq+0x1c/0x28
 [<ffffffff8100e6ce>] do_softirq+0x31/0x86
 [<ffffffff8103d19b>] irq_exit+0x4e/0x92
 [<ffffffff8100e861>] do_IRQ+0x13e/0x161
 [<ffffffff8100b066>] default_idle+0x0/0x51
 [<ffffffff8100b066>] default_idle+0x0/0x51
 [<ffffffff8100c456>] ret_from_intr+0x0/0xf
 <EOI>  [<ffffffff8101ce78>] lapic_next_event+0x0/0xa
 [<ffffffff8100b066>] default_idle+0x0/0x51
 [<ffffffff8100b09d>] default_idle+0x37/0x51
 [<ffffffff8100b09b>] default_idle+0x35/0x51
 [<ffffffff8100b155>] cpu_idle+0x9e/0xc6
 [<ffffffff814a2b2b>] start_kernel+0x301/0x30d
 [<ffffffff814a211d>] _sinittext+0x11d/0x124


Code: 0f 0b eb fe 48 89 7e 08 48 89 37 48 89 57 08 48 89 3a 5a c3 
RIP  [<ffffffff8113259a>] __list_add+0x47/0x5b
 RSP <ffffffff8155adf0>
---[ end trace d4a4f763a33e3c92 ]---
Kernel panic - not syncing: Aiee, killing interrupt handler!



I'm going to go out on a limb here and say we don't like Via ohci 1.0 controllers very well... ;)

Comment 3 Stefan Richter 2008-02-01 12:24:27 UTC

The bug in comment #2 seems quite different.

The bug in the description has similarities to bug 8906.

Comment 4 Anonymous Emailer 2008-02-01 12:29:56 UTC

Reply-To: stefanr@s5r6.in-berlin.de

bugme-daemon@bugzilla.kernel.org wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=9870
> ------- Comment #1 from jwilson@redhat.com  2008-02-01 11:29 -------
> Of possible relevance is that when things *don't* lock up, I'm usually seeing
> a
> "firewire_core: BM lock failed, making local node (ffc0) root." message for
> the
> Via controller.

So then the time spent in the bus manager workqueue job is different and
avoids whatever race condition caused the GPF.

Comment 5 Stefan Richter 2008-02-24 11:11:14 UTC

The two bugs in the description may be fixed by patches posted today:
http://thread.gmane.org/gmane.linux.kernel.firewire.devel/11617

Also available in patchkit v646 and later at
http://me.in-berlin.de/~s5r6/linux1394/updates/

Comment 6 Jarod Wilson 2008-02-26 11:04:32 UTC

Okay, so the original spew from loading and unloading modules does seem to be resolved by the patchset in comment #5, but the spew in comment #2 just happened again, with a slight twist.

Previously, I saw this while booting the machine up. This time, I moved the hub and camera over to the Via controller after the system was already booted. Looks like I first started getting never-ending bus resets (phy config line printing over and over on the console), and when I told the box to reboot, I hit the same list_debug.c BUG in comment #2 -- didn't panic, just hung.

Oh, back to the original spew for a sec... Looks like this patchset makes it impossible to rmmod firewire-spb2 if there's a drive plugged in. Is this intentional? This is the case even if the drive isn't actually in use, so reloading the firewire-sbp2 module requires unplugging or powering off all sbp2 devices.

Comment 7 Jarod Wilson 2008-02-26 11:07:52 UTC

Ah, and I do still get the same panic behavior as comment #2 when freshly booted with the hub and iidc camera hooked to the via controller.

Comment 8 Stefan Richter 2008-02-26 11:16:24 UTC

> Looks like this patchset makes it impossible to rmmod firewire-spb2
> if there's a drive plugged in. Is this intentional?

No, it is a bug.

Comment 9 Stefan Richter 2008-02-26 11:21:24 UTC

> Ah, and I do still get the same panic behavior as comment #2 when
> freshly booted with the hub and iidc camera hooked to the via controller.

I will start using CONFIG_DEBUG_LIST.

Comment 10 Stefan Richter 2008-02-26 11:55:09 UTC

>> Looks like this patchset makes it impossible to rmmod firewire-spb2
>> if there's a drive plugged in. Is this intentional?
>
> No, it is a bug.

The original module unloading bug makes it quite time-consuming to bisect the patch series for the patch where this new bug went in...  Stay tuned.

Comment 11 Stefan Richter 2008-02-26 12:11:05 UTC

It is patch "firewire: fw-sbp2: fix NULL pointer deref. in scsi_remove_device" which keeps the refcount of firewire-sbp2 one up.  It's so obvious in hindsight.

Now back to the drawingboard for a better fix of the scsi_remove_device bug.

Comment 12 Stefan Richter 2008-02-26 14:43:15 UTC

firewire-sbp2 unloading brought back in http://thread.gmane.org/gmane.linux.kernel.firewire.devel/11631

Also available in patchkit v647 and later at
http://me.in-berlin.de/~s5r6/linux1394/updates/

Comment 13 Stefan Richter 2008-02-26 15:28:06 UTC

> Ah, and I do still get the same panic behavior as comment #2 when
> freshly booted with the hub and iidc camera hooked to the via controller.

Could you test with vanilla 2.6.25-rc3 plus linux1394-2.6.git master pulled in, or simply with Linus' current git?  This is only to eliminate a bug by some unrelated Fedora kernel patches.

If mainline has the bug too, please narrow down where in fw_core_handle_bus_reset the bug happens, e.g. insert a few printk()s.  Your log didn't show a more specific place due to function inlining by the compiler.  build_tree() could be a suspect.  for_each_fw_node() too but that is less likely to be automatically inlined.  update_tree() might be another candidate.

Comment 14 Jarod Wilson 2008-02-26 20:43:32 UTC

(In reply to comment #12)
> firewire-sbp2 unloading brought back in
> http://thread.gmane.org/gmane.linux.kernel.firewire.devel/11631

Excellent, will test that out in the morning. Just as an fyi, '[PATCH 5/5] firewire: refactor fw_unit reference counting' requires some minor rediffing to get applied with this in place. (May already be fixed in your patchkit, dunno, already fixed things up for my local build).

Comment 15 Jarod Wilson 2008-02-26 20:44:28 UTC

(In reply to comment #13)
> > Ah, and I do still get the same panic behavior as comment #2 when
> > freshly booted with the hub and iidc camera hooked to the via controller.
> 
> Could you test with vanilla 2.6.25-rc3 plus linux1394-2.6.git master pulled
> in,
> or simply with Linus' current git?  This is only to eliminate a bug by some
> unrelated Fedora kernel patches.
> 
> If mainline has the bug too, please narrow down where in
> fw_core_handle_bus_reset the bug happens, e.g. insert a few printk()s.  Your
> log didn't show a more specific place due to function inlining by the
> compiler.
>  build_tree() could be a suspect.  for_each_fw_node() too but that is less
> likely to be automatically inlined.  update_tree() might be another
> candidate.

I'll see what I can do with this tomorrow as well...

Comment 16 Stefan Richter 2008-02-27 00:07:40 UTC

> Just as an fyi, '[PATCH 5/5] firewire: refactor fw_unit reference counting'
> requires some minor rediffing to get applied with this in place.

Right, there was "fuzz 2".  (Quilt is configured here to accept fuzz but I always check the result after patching with "fuzz".)

Would a duplicate of my quilt trees in git be useful?

Comment 17 Jarod Wilson 2008-02-27 12:07:10 UTC

> Would a duplicate of my quilt trees in git be useful?

The quilt bits could be useful, but istr git is set up to ignore a patches/ folder, and we actually reject patches with fuzz greater than 1 (iirc) in the fedora rpm spec patch application section as an extra safeguard against mis-merging something after a rebase.

Comment 18 Jarod Wilson 2008-02-27 12:08:27 UTC

(In reply to comment #14)
> > firewire-sbp2 unloading brought back in
> > http://thread.gmane.org/gmane.linux.kernel.firewire.devel/11631
> 
> Excellent, will test that out in the morning.

Well, I was permitted to unload the module, but when I did, I got a general protection fault.

general protection fault: 0000 [1] SMP DEBUG_PAGEALLOC
CPU 0 
Modules linked in: radeon drm ipt_MASQUERADE iptable_nat nf_nat bridge rfcomm l2cap bluetooth autofs4 sunrpc nf_conntrack_ipv4 ipt_REJECT iptable_filter ip_tables nf_conntrack_ipv6 xt_state nf_conntrack xt_tcpudp ip6t_ipv6header ip6t_REJECT ip6table_filter ip6_tables x_tables ipv6 dm_multipath firewire_sbp2(-) parport_pc parport snd_intel8x0 snd_ac97_codec ac97_bus snd_seq_dummy floppy snd_seq_oss snd_seq_midi_event serio_raw snd_seq pcspkr firewire_ohci snd_seq_device firewire_core snd_pcm_oss crc_itu_t snd_mixer_oss k8temp snd_pcm hwmon snd_timer snd soundcore snd_page_alloc forcedeth i2c_nforce2 i2c_core button sg sr_mod cdrom pata_amd dm_snapshot dm_zero dm_mirror dm_mod shpchp sata_sil libata sd_mod scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd [last unloaded: freq_table]
Pid: 2601, comm: rmmod Not tainted 2.6.25-0.70.rc3.git1.fc9.fw #1
RIP: 0010:[<ffffffff8822adef>]  [<ffffffff8822adef>] :firewire_sbp2:sbp2_release_target+0xea/0x103
RSP: 0018:ffff810032d7bdc8  EFLAGS: 00010286
RAX: 6b6b6b6b6b6b6b6b RBX: ffff81003f031090 RCX: ffff810032d7bd28
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff810032d88000
RBP: ffff810032d7bdf8 R08: ffff810032d7bd58 R09: ffffe20001908300
R10: ffffffff8805393b R11: ffff810032d7bbc8 R12: ffff81003f0306b0
R13: ffff81003f0306a0 R14: ffff81003f0306b0 R15: ffff81003f030000
FS:  00007f7f4546e6f0(0000) GS:ffffffff81416000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007fddb34e73c0 CR3: 0000000032d4a000 CR4: 00000000000006a0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process rmmod (pid: 2601, threadinfo ffff810032d7a000, task ffff810032d88000)
Stack:  ffff81003f0306b8 ffff81003f0306a0 ffffffff8822ad05 ffff81003d8cdb40
 ffffffff8822d740 0000000000000880 ffff810032d7be18 ffffffff8113589a
 ffff81003dcca240 ffffffff8822d740 ffff810032d7be28 ffffffff8822a130
Call Trace:
 [<ffffffff8822ad05>] ? :firewire_sbp2:sbp2_release_target+0x0/0x103
 [<ffffffff8113589a>] kref_put+0x43/0x4f
 [<ffffffff8822a130>] :firewire_sbp2:sbp2_target_put+0x10/0x12
 [<ffffffff8822a142>] :firewire_sbp2:sbp2_remove+0x10/0x14
 [<ffffffff811b43a4>] __device_release_driver+0x76/0x9a
 [<ffffffff811b490e>] driver_detach+0xe3/0x125
 [<ffffffff811b3c5b>] bus_remove_driver+0x86/0xa8
 [<ffffffff811b49b3>] driver_unregister+0x36/0x3b
 [<ffffffff8822bb3c>] :firewire_sbp2:sbp2_cleanup+0x10/0x1e
 [<ffffffff8105ccd0>] sys_delete_module+0x18e/0x1d6
 [<ffffffff81014768>] ? syscall_trace_enter+0xb0/0xb5
 [<ffffffff8100c137>] tracesys+0xdc/0xe1


Code: c9 9b e2 ff 49 8b 75 10 48 c7 c7 ed bd 22 88 31 c0 e8 df a1 e0 f8 49 8b 7d 08 e8 42 6a f8 f8 4c 89 ff e8 d5 93 e2 ff 49 8b 45 08 <48> 8b b8 a8 01 00 00 e8 2a 6a f8 f8 41 58 5b 41 5c 41 5d 41 5e 
RIP  [<ffffffff8822adef>] :firewire_sbp2:sbp2_release_target+0xea/0x103
 RSP <ffff810032d7bdc8>
---[ end trace b854416fd93a1b7c ]---

Comment 19 Stefan Richter 2008-02-27 12:34:03 UTC

I enabled DEBUG_PAGEALLOC now as well.  The general protection fault happens here too, either when unloading firewire-sbp2 or when unplugging the disk.

Comment 20 Stefan Richter 2008-02-27 12:57:16 UTC

I bisected my patch series and found that patch "firewire: fix crash in automatic module unloading" introduced the unable-to-handle-kernel-paging-request bug.

Comment 21 Stefan Richter 2008-02-27 13:01:20 UTC

> found that patch "firewire: fix crash in automatic module unloading"
> introduced the unable-to-handle-kernel-paging-request bug.

Yeah, but only after I updated it.  I think I see the problem.

Comment 22 Stefan Richter 2008-02-27 13:31:52 UTC

Fixed in
http://thread.gmane.org/gmane.linux.kernel.firewire.devel/11631/focus=11639
and in patchkit v647a on my site.

Comment 23 Jarod Wilson 2008-02-27 14:04:18 UTC

(In reply to comment #13)
> > Ah, and I do still get the same panic behavior as comment #2 when
> > freshly booted with the hub and iidc camera hooked to the via controller.
> 
> Could you test with vanilla 2.6.25-rc3 plus linux1394-2.6.git master pulled
> in,
> or simply with Linus' current git?  This is only to eliminate a bug by some
> unrelated Fedora kernel patches.

2.6.25-rc3-git1 + linux1394-2.6.git + firewire patches under review, the panic still happens. Looks like it happens with a hub between the iidc camera and the via controller, but not with the hub removed. i.e., if I plug the camera in directly to the controller, all is well, but if I insert the hub between the camera and the controller, I hit the panic. And thus far, this is *only* with the Via VT6307 ohci 1.0 controller in this box. If I move the hub and camera to the Ti controller in the same box, no problems. Hm... I should try to reproduce it with another hub and on my other box w/a VT6307 ohci 1.0 controller...

> If mainline has the bug too, please narrow down where in
> fw_core_handle_bus_reset the bug happens, e.g. insert a few printk()s.  Your
> log didn't show a more specific place due to function inlining by the
> compiler.
>  build_tree() could be a suspect.  for_each_fw_node() too but that is less
> likely to be automatically inlined.  update_tree() might be another
> candidate.

Starting to prod right now...

Comment 24 Jarod Wilson 2008-02-27 14:16:59 UTC

(In reply to comment #23)
> Hm... I should try to reproduce it with another hub

Oh fun. Doesn't happen if I replace the iogear hub that was there with a kensington one (both bus-powered, fwiw). I'll see what I can see on my other VT6307 ohci 1.0 box, but at the moment, this appears to be specific to the combination of a VT6307 controller and this iogear hub. (iogear usb 2.0 & Firewire Combo Hub, Model# GUH420)

Comment 25 Jarod Wilson 2008-02-27 14:43:30 UTC

(In reply to comment #22)
> Fixed in
> http://thread.gmane.org/gmane.linux.kernel.firewire.devel/11631/focus=11639
> and in patchkit v647a on my site.

Can successfully rmmod firewire-sbp2 and no more oops here either.

Comment 26 Jarod Wilson 2008-02-27 14:53:07 UTC

(In reply to comment #24)
> (In reply to comment #23)
> > Hm... I should try to reproduce it with another hub
> 
> Oh fun. Doesn't happen if I replace the iogear hub that was there with a
> kensington one (both bus-powered, fwiw). I'll see what I can see on my other
> VT6307 ohci 1.0 box, but at the moment, this appears to be specific to the
> combination of a VT6307 controller and this iogear hub. (iogear usb 2.0 &
> Firewire Combo Hub, Model# GUH420)

Oh yeah, and *just* the hub plugged in, or the hub + an sbp2 hard disk (complete with dd'ing its block device to /dev/null), no panic. So its seems the iidc camera is also required in the above setup (unibrain fire-i in this case).

Comment 27 Stefan Richter 2008-02-27 15:04:13 UTC

Maybe what gets into the selfID buffer looks very very special in that one case.

Comment 28 Jarod Wilson 2008-02-27 15:36:29 UTC

(In reply to comment #13)
> > Ah, and I do still get the same panic behavior as comment #2 when
> > freshly booted with the hub and iidc camera hooked to the via controller.
> 
> Could you test with vanilla 2.6.25-rc3 plus linux1394-2.6.git master pulled
> in,
> or simply with Linus' current git?  This is only to eliminate a bug by some
> unrelated Fedora kernel patches.
> 
> If mainline has the bug too, please narrow down where in
> fw_core_handle_bus_reset the bug happens, e.g. insert a few printk()s.  Your
> log didn't show a more specific place due to function inlining by the
> compiler.
>  build_tree() could be a suspect.  for_each_fw_node() too but that is less
> likely to be automatically inlined.  update_tree() might be another
> candidate.
> 

Looks like update_tree() is the culprit. The crash seems to happen in here:

                for (i = 0; i < node0->port_count; i++) {
                        if (node0->ports[i] && node1->ports[i]) {
                                /*
                                 * This port didn't change, queue the
                                 * connected node for further
                                 * investigation.
                                 */
                                if (node0->ports[i]->color == card->color)
                                        continue;
                                list_add_tail(&node0->ports[i]->link, &list0);
                                list_add_tail(&node1->ports[i]->link, &list1);

I get a slew of bus resets that all go through that code okay for a number of iterations, but it finally gives up the ghost around one of those list_add_tail() calls. Out of time for this one tonight, gotta head homeward...

Comment 29 Stefan Richter 2008-02-28 02:19:48 UTC

"kernel BUG at lib/list_debug.c:33" moved over to bug 10128.
Closing this bug since module unloading finally works.

Comment 30 Stefan Richter 2008-03-11 05:36:16 UTC

The fix to the problem according to the initial report has been merged in Linux 2.6.25-rc4.

Note You need to log in before you can comment on or make changes to this bug.