Bug 10008

Summary: firewire-sbp2: rescan-scsi-bus segfault, sbp2_scsi_slave_alloc NULL pointer dereference
Product: Drivers Reporter: Stefan Richter (stefanr)
Component: IEEE1394Assignee: Stefan Richter (stefanr)
Status: CLOSED CODE_FIX    
Severity: normal    
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.22...2.6.25-rc Subsystem:
Regression: --- Bisected commit-id:

Description Stefan Richter 2008-02-17 04:53:14 UTC
Latest working kernel version: 2.6.25-rc2 (suspected), 2.6.24 + firewire updates (tested)
Earliest failing kernel version: 2.6.22 (suspected), 2.6.24 + firewire updates (tested)
Distribution: Gentoo
Hardware Environment: x86-32
Software Environment:
Problem Description: 

The /proc scsi-add-single method is dangerous to use with firewire-sbp2.

Steps to reproduce:

1.) Manually remove a scsi_device and scsi_target via sysfs, behind firewire-sbp2's back.
# echo 1 > /sys/devices/pci0000\:00/0000:00:1e.0/0000:05:04.0/fw1/fw1.0/host63/target63:0:0/63:0:0:0/delete

2.) Scan for SCSI devices via SCSI core's proc interface.
# rescan-scsi-bus
Host adapter 0 (ata_piix) found.
Host adapter 1 (ata_piix) found.
Host adapter 63 (sbp2) found.
Scanning SCSI subsystem for new devices
Scanning host 0 channels 0 for  SCSI target IDs  0 1 2 3 4 5 6 7, all LUNs
Scanning for device 0 0 0 0 ...
OLD: Host: scsi0 Channel: 00 Id: 00 Lun: 00
      Vendor: ATA      Model: ST3750640AS      Rev: 3.AA
      Type:   Direct-Access                    ANSI SCSI revision: 05
Scanning host 1 channels 0 for  SCSI target IDs  0 1 2 3 4 5 6 7, all LUNs
Scanning for device 1 0 0 0 ...
OLD: Host: scsi1 Channel: 00 Id: 00 Lun: 00
      Vendor: ATA      Model: ST3750640AS      Rev: 3.AA
      Type:   Direct-Access                    ANSI SCSI revision: 05
Scanning for device 1 0 1 0 ...
OLD: Host: scsi1 Channel: 00 Id: 01 Lun: 00
      Vendor: ATA      Model: ST3750640AS      Rev: 3.AA
      Type:   Direct-Access                    ANSI SCSI revision: 05
Scanning host 63 channels 0 for  SCSI target IDs  0 1 2 3 4 5 6 7, all LUNs
Scanning for device 63 0 0 0 ...
NEW: Segmentation fault

3.) This results in a NULL pointer dereference:
BUG: unable to handle kernel NULL pointer dereference at virtual address 00000000
printing eip: f919300c *pde = 00000000
Oops: 0000 [#1] PREEMPT SMP
Modules linked in: firewire_sbp2 firewire_ohci firewire_core crc_itu_t nls_iso8859_1 nls_cp850 vfat fat usb_storage sr_mod ext3 jbd cpufreq_ondemand acpi_cpufreq freq_table i915 drm snd_pcm_oss snd_mixer_oss snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device nfsd lockd sunrpc exportfs coretemp w83627ehf hwmon_vid hwmon usbhid hid sg sd_mod snd_hda_intel snd_pcm yenta_socket rsrc_nonstatic pcmcia_core snd_timer snd uhci_hcd ehci_hcd snd_page_alloc usbcore processor ata_piix e1000 libata rtc

Pid: 369, comm: rescan-scsi-bus Not tainted (2.6.24 #1)
EIP: 0060:[<f919300c>] EFLAGS: 00210202 CPU: 0
EIP is at sbp2_scsi_slave_alloc+0xc/0x20 [firewire_sbp2]
EAX: 00000000 EBX: eaca4400 ECX: 00000001 EDX: eaca4400
ESI: eaca4800 EDI: eaca4814 EBP: f7fa0000 ESP: f57ccd34
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process rescan-scsi-bus (pid: 369, ti=f57cc000 task=c487ca90 task.ti=f57cc000)
Stack: c027538a 00000000 c026d7ea 00000000 f7fa0000 00000000 f7fa0170 eaca4800
       c0275650 c02f231d 00000000 00000001 c02592d7 00000000 00000000 c03c0be0
       00200246 22222222 22222222 22222222 00000000 c04efd0c 00000000 00000000
Call Trace:
 [<c027538a>] scsi_alloc_sdev+0x18a/0x200
 [<c026d7ea>] scsi_device_lookup_by_target+0x6a/0x80
 [<c0275650>] scsi_probe_and_add_lun+0xf0/0xb90
 [<c02f231d>] mutex_lock_nested+0x15d/0x2c0
 [<c02592d7>] attribute_container_device_trigger+0x17/0xc0
 [<c02f3795>] _spin_lock_irqsave+0x45/0x60
 [<c01ece7f>] kobject_get+0xf/0x20
 [<c0253710>] get_device+0x10/0x20
 [<c027646d>] scsi_alloc_target+0x23d/0x350
 [<c02766af>] __scsi_scan_target+0x8f/0x6c0
 [<c02f3a15>] _spin_unlock+0x25/0x40
 [<c0188f63>] mntput_no_expire+0x13/0x60
 [<c017c635>] link_path_walk+0x65/0xc0
 [<c02f231d>] mutex_lock_nested+0x15d/0x2c0
 [<c0276dd6>] scsi_scan_host_selected+0x56/0x160
 [<c0276e61>] scsi_scan_host_selected+0xe1/0x160
 [<c0278157>] store_scan+0xe7/0xf0
 [<c015750f>] __alloc_pages+0x4f/0x380
 [<c02f231d>] mutex_lock_nested+0x15d/0x2c0
 [<c0278070>] store_scan+0x0/0xf0
 [<c0257606>] class_device_attr_store+0x26/0x40
 [<c01b1516>] sysfs_write_file+0xa6/0x110
 [<c0173206>] vfs_write+0xa6/0x140
 [<c01b1470>] sysfs_write_file+0x0/0x110
 [<c01738d1>] sys_write+0x41/0x70
 [<c0104406>] sysenter_past_esp+0x5f/0x91
 =======================
Code: <8b> 00 f6 40 28 02 74 04 c6 42 77 24 31 c0 c3 90 8d 74 26 00 83 ec
EIP: [<f919300c>] sbp2_scsi_slave_alloc+0xc/0x20 [firewire_sbp2] SS:ESP 0068:f57ccd34
---[ end trace 087d2b81b476ab71 ]---
Comment 1 Stefan Richter 2008-02-17 05:16:54 UTC
Also happens without step 1.
Comment 2 Stefan Richter 2008-02-17 05:32:14 UTC
> Latest working kernel version: 2.6.25-rc2 (suspected), 2.6.24 + firewire
> updates (tested)

Hand me the Tipp-Ex.  I read that as "latest failing kernel version".  #-|
Comment 3 Stefan Richter 2008-02-17 06:01:55 UTC
Proposed fix: http://lkml.org/lkml/2008/2/17/143
Comment 4 Stefan Richter 2008-02-19 11:07:47 UTC
fix committed to linux1394-2.6.git:master, planned to submit to Linus next week
Comment 5 Stefan Richter 2008-02-28 02:27:51 UTC
fix was merged by Linus

Jarod noted that firewire-sbp2 cannot be unloaded while one or more SBP-2 device is attached, due to that fix

better fix posted at http://thread.gmane.org/gmane.linux.kernel.firewire.devel/11631
should be submitted before 2.6.25 as well
Comment 6 Stefan Richter 2008-03-11 05:37:17 UTC
The improved fix has been merged in Linux 2.6.25-rc4.