Bug 16452

Summary: nodemgr, sbp2: NULL pointer dereference in sbp2_update
Product: Drivers Reporter: Stefan Richter (stefanr)
Component: IEEE1394Assignee: drivers_ieee1394
Status: CLOSED WILL_NOT_FIX    
Severity: normal    
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.32.16 - 2.6.35-rc6 Subsystem:
Regression: No Bisected commit-id:
Bug Depends on:    
Bug Blocks: 10046    

Description Stefan Richter 2010-07-24 07:53:45 UTC
Reported by Martin Mokrejs:

When a series of bus resets happens while ieee1394/sbp2 tries to log in to a FireWire disk, sbp2_update may be run shortly after a failed login, then crashes with a NULL pointer dereference.

http://lkml.org/lkml/2010/7/23/233

Crash log from 2.6.32(.16):
===========================

ohci1394 0000:08:07.0: PCI INT A -> GSI 20 (level, low) -> IRQ 20
ohci1394: fw-host0: OHCI-1394 1.1 (PCI): IRQ=[20]  MMIO=[febff800-febfffff]  Max Packet=[2048]  IR/IT contexts=[4/8]
ieee1394: Node added: ID:BUS[0-00:1023]  GUID[00e018000305e5fc]
ieee1394: Node added: ID:BUS[0-01:1023]  GUID[001b8c8000000105]
ieee1394: Host added: ID:BUS[0-02:1023]  GUID[0011d80001762a80]
scsi7 : SBP-2 IEEE-1394
ieee1394: sbp2: Error logging into SBP-2 device - failed
sbp2: probe of 001b8c8000000105-0 failed with error -16
BUG: unable to handle kernel NULL pointer dereference at 00000000000000e0
IP: [<ffffffffa005af32>] sbp2_update+0x1c/0x1f1 [sbp2]
PGD 22e926067 PUD 22e6d5067 PMD 0 
Oops: 0000 [#1] SMP 
last sysfs file: /sys/bus/ieee1394/drivers/sbp2/uevent
CPU 0 
Modules linked in: sbp2 firewire_net firewire_ohci firewire_core ohci1394 ieee1394 sky2 i2c_i801
Pid: 1397, comm: knodemgrd_0 Not tainted 2.6.32.16-default #9 System Product Name
RIP: 0010:[<ffffffffa005af32>]  [<ffffffffa005af32>] sbp2_update+0x1c/0x1f1 [sbp2]
RSP: 0018:ffff88022d101ca0  EFLAGS: 00010286
RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff88022fab2ed0
RDX: 0000000000000000 RSI: 0000000000000020 RDI: ffff88022f925048
RBP: ffff88022d101cc0 R08: ffff88022d100000 R09: ffff880028210340
R10: 0000000000000400 R11: 00000000000005fe R12: ffff88022f925000
R13: ffff88022f9250a8 R14: ffff88022e693340 R15: ffff88022e687400
FS:  0000000000000000(0000) GS:ffff880028200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00000000000000e0 CR3: 000000022eadc000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process knodemgrd_0 (pid: 1397, threadinfo ffff88022d100000, task ffff88022d1e6050)
Stack:
 ffff88022d101cc0 ffffffff8104f5c3 ffffffffa005d628 ffff88022f925000
<0> ffff88022d101cf0 ffffffffa001d385 ffff88022e6d1800 ffffffffa001d339
<0> ffff88022d101d10 ffff88022e693340 ffff88022d101d50 ffffffff812ae68a
Call Trace:
 [<ffffffff8104f5c3>] ? down+0x33/0x38
 [<ffffffffa001d385>] update_pdrv+0x4c/0x79 [ieee1394]
 [<ffffffffa001d339>] ? update_pdrv+0x0/0x79 [ieee1394]
 [<ffffffff812ae68a>] class_for_each_device+0x6e/0x99
 [<ffffffffa00181ef>] ? hpsb_free_packet+0x2c/0x2e [ieee1394]
 [<ffffffffa001daca>] node_probe+0x286/0x304 [ieee1394]
 [<ffffffffa001d844>] ? node_probe+0x0/0x304 [ieee1394]
 [<ffffffff812ae68a>] class_for_each_device+0x6e/0x99
 [<ffffffffa001f7d3>] ? csr1212_destroy_csr+0x51/0x59 [ieee1394]
 [<ffffffffa001e271>] nodemgr_host_thread+0x729/0x8bc [ieee1394]
 [<ffffffffa001db48>] ? nodemgr_host_thread+0x0/0x8bc [ieee1394]
 [<ffffffff8104bc77>] kthread+0x7a/0x82
 [<ffffffff8100c93a>] child_rip+0xa/0x20
 [<ffffffff8104bbfd>] ? kthread+0x0/0x82
 [<ffffffff8100c930>] ? child_rip+0x0/0x20
Code: 31 c0 48 83 c4 20 5b 41 5c 41 5d 41 5e c9 c3 55 48 83 c7 48 48 89 e5 41 54 53 48 83 ec 10 e8 30 29 25 e1 be 20 00 00 00 48 89 c3 <4c> 8b a0 e0 00 00 00 48 8b 40 50 c7 00 00 00 00 00 48 8b 43 50 
RIP  [<ffffffffa005af32>] sbp2_update+0x1c/0x1f1 [sbp2]
 RSP <ffff88022d101ca0>
CR2: 00000000000000e0
---[ end trace 5f2aa54eef767b34 ]---


Crash log from 2.6.35(-rc6):
============================

ohci1394 0000:08:07.0: PCI INT A -> GSI 20 (level, low) -> IRQ 20
ohci1394: fw-host0: OHCI-1394 1.1 (PCI): IRQ=[20]  MMIO=[febff800-febfffff]  Max Packet=[2048]  IR/IT contexts=[4/8]
ieee1394: Node added: ID:BUS[0-00:1023]  GUID[00e018000305e5fc]
ieee1394: Node added: ID:BUS[0-01:1023]  GUID[001b8c8000000105]
ieee1394: Host added: ID:BUS[0-02:1023]  GUID[0011d80001762a80]
scsi7 : SBP-2 IEEE-1394
ieee1394: sbp2: Error logging into SBP-2 device - failed
sbp2: probe of 001b8c8000000105-0 failed with error -16
BUG: unable to handle kernel NULL pointer dereference at 00000000000000e0
IP: [<ffffffffa005af3b>] sbp2_update+0x1c/0x1f1 [sbp2]
PGD 22eb14067 PUD 22d872067 PMD 0 
Oops: 0000 [#1] SMP 
last sysfs file: /sys/bus/ieee1394/drivers/sbp2/uevent
CPU 3 
Modules linked in: sbp2 firewire_net firewire_ohci firewire_core ohci1394 ieee1394 sky2 i2c_i801
Pid: 1408, comm: knodemgrd_0 Not tainted 2.6.35-rc6-default #1 P5K WS/System Product Name
RIP: 0010:[<ffffffffa005af3b>]  [<ffffffffa005af3b>] sbp2_update+0x1c/0x1f1 [sbp2]
RSP: 0018:ffff88022c9a5ca0  EFLAGS: 00010282
RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff88022ef06dd0
RDX: ffff88022d8400b0 RSI: 0000000000000020 RDI: ffff88022d840048
RBP: ffff88022c9a5cc0 R08: ffff88022c9a4000 R09: ffffffff00000001
R10: 0000000000000282 R11: ffff88022d23da98 R12: ffff88022d840000
R13: ffff88022d8400a8 R14: ffff88022d04acc0 R15: ffff88022ea83000
FS:  0000000000000000(0000) GS:ffff880001b80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000000000000e0 CR3: 000000022eb28000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process knodemgrd_0 (pid: 1408, threadinfo ffff88022c9a4000, task ffff88022fd25230)
Stack:
 ffff88022c9a5cc0 ffffffff814c52a8 ffffffffa005d608 ffff88022d840000
<0> ffff88022c9a5cf0 ffffffffa00303f5 ffff88022d842c00 ffffffffa00303a9
<0> ffff88022c9a5d10 ffff88022d04acc0 ffff88022c9a5d50 ffffffff812c1945
Call Trace:
 [<ffffffff814c52a8>] ? mutex_lock+0x16/0x2d
 [<ffffffffa00303f5>] update_pdrv+0x4c/0x79 [ieee1394]
 [<ffffffffa00303a9>] ? update_pdrv+0x0/0x79 [ieee1394]
 [<ffffffff812c1945>] class_for_each_device+0x6e/0x99
 [<ffffffffa002b1fc>] ? hpsb_free_packet+0x2c/0x2e [ieee1394]
 [<ffffffffa0030b3c>] node_probe+0x289/0x307 [ieee1394]
 [<ffffffffa00308b3>] ? node_probe+0x0/0x307 [ieee1394]
 [<ffffffff812c1945>] class_for_each_device+0x6e/0x99
 [<ffffffffa003288b>] ? csr1212_destroy_csr+0x51/0x59 [ieee1394]
 [<ffffffffa0031323>] nodemgr_host_thread+0x769/0x902 [ieee1394]
 [<ffffffffa0030bba>] ? nodemgr_host_thread+0x0/0x902 [ieee1394]
 [<ffffffff81047737>] kthread+0x7a/0x82
 [<ffffffff81003614>] kernel_thread_helper+0x4/0x10
 [<ffffffff810476bd>] ? kthread+0x0/0x82
 [<ffffffff81003610>] ? kernel_thread_helper+0x0/0x10
Code: 31 c0 48 83 c4 20 5b 41 5c 41 5d 41 5e c9 c3 55 48 83 c7 48 48 89 e5 41 54 53 48 83 ec 10 e8 77 5b 26 e1 be 20 00 00 00 48 89 c3 <4c> 8b a0 e0 00 00 00 48 8b 40 50 c7 00 00 00 00 00 48 8b 43 50 
RIP  [<ffffffffa005af3b>] sbp2_update+0x1c/0x1f1 [sbp2]
 RSP <ffff88022c9a5ca0>
CR2: 00000000000000e0
---[ end trace 6b85c94fd5786882 ]---


"sbp2: Error logging into SBP-2 device - failed" means that sbp2 did receive status after a login request, but one or more of /resp/, /d(ead)/, or /sbp_status/ were non-zero in the status block.  I.e. the target signaled login failure due to one or another error condition.  After such or any other failure during sbp2_probe, sbp2 frees everything and returns -errno.

The real problem here is that ieee1394/nodemgr goes on to call update_pdrv on this device (presumably after a bus reset happened), even though the driver probe failed.  In theory, drv = get_driver(ud->device.driver) should be NULL at this point and hence sbp2_update never be called.  Obviously this assumption is wrong and sbp2 is still bound to this device after the failed probe.

Anyway; this problem does not exist in firewire-core + firewire-sbp2 since firewire-core's device<-->driver management is from the grounds up different from ieee1394's.