Bug 7550

Summary: NULL pointer dereference in ether1394_remove_host
Product: Drivers Reporter: Stefan Richter (stefanr)
Component: IEEE1394Assignee: drivers_ieee1394
Status: CLOSED PATCH_ALREADY_AVAILABLE    
Severity: low    
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.19-rc Subsystem:
Regression: --- Bisected commit-id:
Bug Depends on: 7792    
Bug Blocks:    

Description Stefan Richter 2006-11-19 05:32:18 UTC
Most recent kernel where this bug did *NOT* occur: unknown
Hardware Environment: x86 single processor, two FireWire cards
Software Environment: 2.6.19-rc4 + ieee1394 like -rc5-mm2 + this:
--- linux-2.6.19-rc4.orig/drivers/ieee1394/nodemgr.c    2006-11-18
23:31:35.000000000 +0100
+++ linux-2.6.19-rc4/drivers/ieee1394/nodemgr.c 2006-11-19 14:19:00.000000000 +0100
@@ -1874,7 +1874,13 @@ static void nodemgr_remove_host(struct h
        struct host_info *hi = hpsb_get_hostinfo(&nodemgr_highlevel, host);

        if (hi) {
+               //up(&host->device.sem);
+               if (host->device.parent)
+                       up(&host->device.parent->sem);
                kthread_stop(hi->thread);
+               if (host->device.parent)
+                       down(&host->device.parent->sem);
+               //down(&host->device.sem);
                nodemgr_remove_host_dev(&host->device);
        }
 }


Problem Description:

# modprobe ohci1394 && sleep 2 && modprobe -r ohci1394

ieee1394: Initialized config rom entry `ip1394'
ohci1394: fw-host0: OHCI-1394 1.1 (PCI): IRQ=[17]  MMIO=[e7004000-e70047ff]  Max
Packet=[4096]  IR/IT contexts=[4/8]
ohci1394: fw-host1: OHCI-1394 1.0 (PCI): IRQ=[19]  MMIO=[e7006000-e70067ff]  Max
Packet=[2048]  IR/IT contexts=[8/8]
ieee1394: Error parsing configrom for node 0-00:1023
ieee1394: Host added: ID:BUS[0-01:1023]  GUID[0001080000002d02]
eth1394: eth1: IEEE-1394 IPv4 over 1394 Ethernet (fw-host0)
BUG: unable to handle kernel NULL pointer dereference at virtual address 000003e4
 printing eip:
f8c7db91
*pde = 00000000
Oops: 0000 [#1]
PREEMPT SMP 
Modules linked in: eth1394 ohci1394 ieee1394 nfsd exportfs lockd sunrpc
snd_via82xx snd_ac97_codec snd_ac97_bus snd_pcm snd_timer snd_page_alloc
snd_mpu401_uart snd_rawmidi snd lp af_packet 8139too mii loop via_agp agpgart
uhci_hcd
CPU:    0
EIP:    0060:[<f8c7db91>]    Not tainted VLI
EFLAGS: 00010202   (2.6.19-rc4 #10)
EIP is at ether1394_remove_host+0x31/0xa0 [eth1394]
eax: f680ad0c   ebx: 00000380   ecx: f678efc4   edx: f680ad0c
esi: f680ad0c   edi: f5c26000   ebp: f5c57e4c   esp: f5c57e30
ds: 007b   es: 007b   ss: 0068
Process modprobe (pid: 6822, ti=f5c56000 task=f678ea90 task.ti=f5c56000)
Stack: f8c80fa0 f5c26000 f8f2bf66 f7639d34 f8c80fa0 f5c26000 f5c26000 f5c57e70 
       f8f2c1fc f5c26000 f5c26000 00000000 00000282 f8c80fa0 f5c26000 c21e0094 
       f5c57e8c f8f2cb56 f8c80fa0 f5c26000 00000000 f5c26000 f5c260c4 f5c57e9c 
Call Trace:
 [<c010402f>] show_trace_log_lvl+0x2f/0x50
 [<c0104117>] show_stack_log_lvl+0x97/0xc0
 [<c0104372>] show_registers+0x1c2/0x270
 [<c0104619>] die+0x129/0x220
 [<c011499a>] do_page_fault+0x3ca/0x650
 [<c02dbac1>] error_code+0x39/0x40
 [<f8f2c1fc>] __unregister_host+0x8c/0xd0 [ieee1394]
 [<f8f2cb56>] highlevel_remove_host+0x36/0x60 [ieee1394]
 [<f8f2bc63>] hpsb_remove_host+0x43/0x70 [ieee1394]
 [<f8c0fe18>] ohci1394_pci_remove+0x68/0x240 [ohci1394]
 [<c01fee66>] pci_device_remove+0x46/0x50
 [<c0234253>] __device_release_driver+0xa3/0xc0
 [<c02343e8>] driver_detach+0x118/0x120
 [<c0233834>] bus_remove_driver+0x44/0x70
 [<c02346b2>] driver_unregister+0x12/0x20
 [<c01ff1d5>] pci_unregister_driver+0x15/0x30
 [<f8c104e2>] ohci1394_cleanup+0x12/0x14 [ohci1394]
 [<c0141566>] sys_delete_module+0x156/0x180
 [<c010326d>] sysenter_past_esp+0x56/0x79
 =======================
Code: 89 7d fc 8b 7d 08 89 75 f8 89 5d f4 c7 04 24 a0 0f c8 f8 89 7c 24 04 e8 3e
e2 2a 00 85 c0 89 c6 74 38 8b 58 04 81 c3 80 03 00 00 <8b> 43 64 8b 53 68 89 7c
24 04 c7 04 24 a0 0f c8 f8 89 44 24 08 
EIP: [<f8c7db91>] ether1394_remove_host+0x31/0xa0 [eth1394] SS:ESP 0068:f5c57e30
 <6>eth1394: eth2: IEEE-1394 IPv4 over 1394 Ethernet (fw-host1)
eth1394: eth2: Could not allocate isochronous receive context for the broadcast
channel
Comment 1 Stefan Richter 2006-11-19 05:37:26 UTC
The oops may happen if "modprobe ohci1394" is shortly followed by "modprobe -r
ohci1394", and eth1394 was present. I _cannot_ reproduce this consistently but I
got it twice in about 30 attempts.

The little patch quoted above is merely to prevent the deadlock from bug 6706.
(If the ieee1394 stack is stuck in that deadlock, the oops in eth1394 can of
course not occur because the stack never manages to unload eth1394.)
Comment 2 Stefan Richter 2007-01-08 13:22:50 UTC
I could not reproduce it with 2.6.20-rc3 + today's linux1394-2.6.git. No oops
even after lots of tests, only the rather harmless bug 7792 (but that one _very_
often with the right timing between modprobe ohci1394 and modprobe -r ohci1394).
Perhaps eth1394's oops is prevented by bug 7792. Therefore I leave this open
until bug 7792 is fixed.
Comment 3 Stefan Richter 2007-02-10 13:23:09 UTC
cannot reproduce it anymore