Most recent kernel where this bug did *NOT* occur: unknown Hardware Environment: x86 single processor, two FireWire cards Software Environment: 2.6.19-rc4 + ieee1394 like -rc5-mm2 + this: --- linux-2.6.19-rc4.orig/drivers/ieee1394/nodemgr.c 2006-11-18 23:31:35.000000000 +0100 +++ linux-2.6.19-rc4/drivers/ieee1394/nodemgr.c 2006-11-19 14:19:00.000000000 +0100 @@ -1874,7 +1874,13 @@ static void nodemgr_remove_host(struct h struct host_info *hi = hpsb_get_hostinfo(&nodemgr_highlevel, host); if (hi) { + //up(&host->device.sem); + if (host->device.parent) + up(&host->device.parent->sem); kthread_stop(hi->thread); + if (host->device.parent) + down(&host->device.parent->sem); + //down(&host->device.sem); nodemgr_remove_host_dev(&host->device); } } Problem Description: # modprobe ohci1394 && sleep 2 && modprobe -r ohci1394 ieee1394: Initialized config rom entry `ip1394' ohci1394: fw-host0: OHCI-1394 1.1 (PCI): IRQ=[17] MMIO=[e7004000-e70047ff] Max Packet=[4096] IR/IT contexts=[4/8] ohci1394: fw-host1: OHCI-1394 1.0 (PCI): IRQ=[19] MMIO=[e7006000-e70067ff] Max Packet=[2048] IR/IT contexts=[8/8] ieee1394: Error parsing configrom for node 0-00:1023 ieee1394: Host added: ID:BUS[0-01:1023] GUID[0001080000002d02] eth1394: eth1: IEEE-1394 IPv4 over 1394 Ethernet (fw-host0) BUG: unable to handle kernel NULL pointer dereference at virtual address 000003e4 printing eip: f8c7db91 *pde = 00000000 Oops: 0000 [#1] PREEMPT SMP Modules linked in: eth1394 ohci1394 ieee1394 nfsd exportfs lockd sunrpc snd_via82xx snd_ac97_codec snd_ac97_bus snd_pcm snd_timer snd_page_alloc snd_mpu401_uart snd_rawmidi snd lp af_packet 8139too mii loop via_agp agpgart uhci_hcd CPU: 0 EIP: 0060:[<f8c7db91>] Not tainted VLI EFLAGS: 00010202 (2.6.19-rc4 #10) EIP is at ether1394_remove_host+0x31/0xa0 [eth1394] eax: f680ad0c ebx: 00000380 ecx: f678efc4 edx: f680ad0c esi: f680ad0c edi: f5c26000 ebp: f5c57e4c esp: f5c57e30 ds: 007b es: 007b ss: 0068 Process modprobe (pid: 6822, ti=f5c56000 task=f678ea90 task.ti=f5c56000) Stack: f8c80fa0 f5c26000 f8f2bf66 f7639d34 f8c80fa0 f5c26000 f5c26000 f5c57e70 f8f2c1fc f5c26000 f5c26000 00000000 00000282 f8c80fa0 f5c26000 c21e0094 f5c57e8c f8f2cb56 f8c80fa0 f5c26000 00000000 f5c26000 f5c260c4 f5c57e9c Call Trace: [<c010402f>] show_trace_log_lvl+0x2f/0x50 [<c0104117>] show_stack_log_lvl+0x97/0xc0 [<c0104372>] show_registers+0x1c2/0x270 [<c0104619>] die+0x129/0x220 [<c011499a>] do_page_fault+0x3ca/0x650 [<c02dbac1>] error_code+0x39/0x40 [<f8f2c1fc>] __unregister_host+0x8c/0xd0 [ieee1394] [<f8f2cb56>] highlevel_remove_host+0x36/0x60 [ieee1394] [<f8f2bc63>] hpsb_remove_host+0x43/0x70 [ieee1394] [<f8c0fe18>] ohci1394_pci_remove+0x68/0x240 [ohci1394] [<c01fee66>] pci_device_remove+0x46/0x50 [<c0234253>] __device_release_driver+0xa3/0xc0 [<c02343e8>] driver_detach+0x118/0x120 [<c0233834>] bus_remove_driver+0x44/0x70 [<c02346b2>] driver_unregister+0x12/0x20 [<c01ff1d5>] pci_unregister_driver+0x15/0x30 [<f8c104e2>] ohci1394_cleanup+0x12/0x14 [ohci1394] [<c0141566>] sys_delete_module+0x156/0x180 [<c010326d>] sysenter_past_esp+0x56/0x79 ======================= Code: 89 7d fc 8b 7d 08 89 75 f8 89 5d f4 c7 04 24 a0 0f c8 f8 89 7c 24 04 e8 3e e2 2a 00 85 c0 89 c6 74 38 8b 58 04 81 c3 80 03 00 00 <8b> 43 64 8b 53 68 89 7c 24 04 c7 04 24 a0 0f c8 f8 89 44 24 08 EIP: [<f8c7db91>] ether1394_remove_host+0x31/0xa0 [eth1394] SS:ESP 0068:f5c57e30 <6>eth1394: eth2: IEEE-1394 IPv4 over 1394 Ethernet (fw-host1) eth1394: eth2: Could not allocate isochronous receive context for the broadcast channel
The oops may happen if "modprobe ohci1394" is shortly followed by "modprobe -r ohci1394", and eth1394 was present. I _cannot_ reproduce this consistently but I got it twice in about 30 attempts. The little patch quoted above is merely to prevent the deadlock from bug 6706. (If the ieee1394 stack is stuck in that deadlock, the oops in eth1394 can of course not occur because the stack never manages to unload eth1394.)
I could not reproduce it with 2.6.20-rc3 + today's linux1394-2.6.git. No oops even after lots of tests, only the rather harmless bug 7792 (but that one _very_ often with the right timing between modprobe ohci1394 and modprobe -r ohci1394). Perhaps eth1394's oops is prevented by bug 7792. Therefore I leave this open until bug 7792 is fixed.
cannot reproduce it anymore