Latest working kernel version: No Earliest failing kernel version: 2.6.22.16 Distribution: LinuxFromScratch-6.4 Hardware Environment: LSI-MPTSAS-1068E (http://www.lsi.com/storage_home/products_home/standard_product_ics/sas_ics/lsisas1068e/index.html) Software Environment: Problem Description: mptsas driver cannot discover some hotplugged SATA/SAS disks Steps to reproduce: 1. use LSI-SAS-1068E controller to manage multiple disks, including disks on multiple JBODs; 2. hotunplug and hotplug the disks at random places; Result: 1. disks direct attached can can rediscovered, disks on the last JBOD can also be rediscovered, other disks cannot. Like this, disks on HEAD and JBOD3 can be rediscovered, others on JBOD1 and JBOD2 cannot. HEAD <=> JBOD1 <=> JBOD2 <=> JBOD3 Reproducible: always
Reply-To: James.Bottomley@HansenPartnership.com On Thu, 2008-12-04 at 00:47 -0800, bugme-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=12163 > Latest working kernel version: No > Earliest failing kernel version: 2.6.22.16 > Distribution: LinuxFromScratch-6.4 > Hardware Environment: LSI-MPTSAS-1068E > > (http://www.lsi.com/storage_home/products_home/standard_product_ics/sas_ics/lsisas1068e/index.html) > Software Environment: > Problem Description: mptsas driver cannot discover some hotplugged SATA/SAS > disks > > Steps to reproduce: > 1. use LSI-SAS-1068E controller to manage multiple disks, including disks on > multiple JBODs; > 2. hotunplug and hotplug the disks at random places; > > Result: > 1. disks direct attached can can rediscovered, disks on the last JBOD can > also > be rediscovered, other disks cannot. > > Like this, disks on HEAD and JBOD3 can be rediscovered, others on JBOD1 and > JBOD2 cannot. > > HEAD <=> JBOD1 <=> JBOD2 <=> JBOD3 My suspicion is that this is because there's no firmware event triggered (the LSI is a fat firmware device, it relies on firmware for almost everything to function including hotplug). If this is true, it's unfixable in the driver. To verify, try this patch which will print out all the fw events and see if it prints anything when you hotplug. James --- diff --git a/drivers/message/fusion/mptsas.c b/drivers/message/fusion/mptsas.c index 12b7325..e755bdd 100644 --- a/drivers/message/fusion/mptsas.c +++ b/drivers/message/fusion/mptsas.c @@ -2600,6 +2600,8 @@ mptsas_hotplug_work(struct work_struct *work) VirtTarget *vtarget; VirtDevice *vdevice; + printk("mptsas hotplug event: event %d ioc%d SAS addr %llx device info %x phy_id %d phys_disk_num %d\n", ev->event_type, ioc->id, ev->sas_address, ev->device_info, ev->phy_id, ev->phys_disk_num); + mutex_lock(&ioc->sas_discovery_mutex); switch (ev->event_type) { case MPTSAS_DEL_DEVICE:
Created attachment 19155 [details] The vendor driver 4.00.43.00 from LSI Corporation http://www.lsi.com/storage_home/products_home/standard_product_ics/sas_ics/lsisas1068e/index.html It's just the vendor driver from LSI, you can also download from it and extract them under drivers instead of applying this patch.
Created attachment 19156 [details] port vendor driver 4.00.43.00 for RHEL5.1 to the latest mainstream git kernel with the previous 4.00.43.00 driver and this, you can run it on the latest git kernel.
I have tested and verified that this two patches can let my LSI-SAS-1068E controllers work well with 2.6.26.2 kernels, perfect, Tested-by: Cheng Renquan <crquan@gmail.com> but the latest git kernel has not yet been tested. I personally suggest this patch integrated into Greg KH's staging tree.
This driver worked well in general situation, 1. use mptsas vendor driver 4.00.43 and this patch to let it work on 2.6.26.2; 2. all disks can be recognized when staticly booting or after hotplugged; But today we found another severe hotplug problem: Steps to reproduce: 1. use mdadm to create a soft RAID, e.g. create a md1 with level 5 on sdm,sdn,sdo; 2. if plug the disks in a soft RAID, the disks(sdm,sdn,sdo) cannot be recognized after inserted; 3. if these disks inserted into other slots, they can be recognized by mptsas; 4. if other disks inserted into these slots where (sdm,sdn,sdo) occupied, they cannot be recognized, too. 5. At the same time, other disks not in the soft RAID still support hotplug well; To sum in one word, the slot which has been occupied by a disk in soft RAID cannot support hotplug anymore (before rebooting). I suspect this is a generic scsi-level problem, sometimes it reports scsi_target alloc failed with -EEXIST; Reproducible: always
On another type of machine, the driver 4.00.43 with rhel 5.2 x86_64 will cause to a kernel panic. Steps to reproduce: 1. Install rhel5.2 x86_64, then mptlinux-4.00.43.00-1-rhel5.x86_64.rpm from lsi website; 2. normal spare disks support hotplug well; 3. if create a filesystem and mount it, or create lvm, or create soft RAID on any of the disks, it doesn't support hotplug; 4. if try to plug the disk in a mounted filesystem, or lvm, or soft RAID, the kernel will panic, the message as: gektop@tux ~/sforge/linux-2.6 $ netcat -u -l -p 5140 Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP: [<ffffffff800649dd>] _spin_lock+0x0/0xa PGD 0 Oops: 0002 [1] SMP last sysfs file: /module/ehci_hcd/sections/.text CPU 1 Modules linked in: netconsole(U) autofs4(U) hidp(U) l2cap(U) bluetooth(U) sunrpc(U) mptctl(U) loop(U) dm_multipath(U) video(U) sbs(U) backlight(U) i2c_ec(U) button(U) battery(U) asus_acpi(U) acpi_memhotplug(U) ac(U) parport_pc(U) lp(U) parport(U) floppy(U) sg(U) i2c_i801(U) serio_raw(U) i5000_edac(U) i2c_core(U) pcspkr(U) edac_mc(U) e1000e(U) shpchp(U) dm_snapshot(U) dm_zero(U) dm_mirror(U) dm_mod(U) ata_piix(U) libata(U) mptfc(U) scsi_transport_fc(U) mptspi(U) scsi_transport_spi(U) mptsas(U) mptscsih(U) mptbase(U) scsi_transport_sas(U) sd_mod(U) scsi_mod(U) ext3(U) jbd(U) uhci_hcd(U) ohci_hcd(U) ehci_hcd(U) Pid: 455, comm: mpt/0 Tainted: G 2.6.18-u1219 #1 RIP: 0010:[<ffffffff800649dd>] [<ffffffff800649dd>] _spin_lock+0x0/0xa RSP: 0018:ffff81007effbc78 EFLAGS: 00010282 RAX: ffff81007ee95000 RBX: ffff81007a887828 RCX: ffff81007e8ed860 RDX: ffff8100796b0000 RSI: ffff81007a887800 RDI: 0000000000000000 RBP: ffff81007a887800 R08: 00000000ffffffff R09: ffff81007e8ed860 R10: ffff81007e8ed860 R11: ffff81007e8ed860 R12: 0000000000000000 R13: ffff81007ee65c38 R14: ffff81007a887a80 R15: ffffffff880fd792 FS: 0000000000000000(0000) GS:ffff81007ff0b7c0(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000000000000000 CR3: 0000000079867000 CR4: 00000000000006e0 Process mpt/0 (pid: 455, threadinfo ffff81007effa000, task ffff81007fb36080) Stack: ffffffff80266193 ffff81007ee95748 ffff81007a887800 ffff81007a8877e0 ffffffff801ad7d4 ffff810078413700 ffff81007a887800 ffff81007a887a98 ffff81007a8877e0 ffff81007a8877e0 ffffffff880b4e4e ffff81007eea8000 Call Trace: [<ffffffff80266193>] klist_del+0x15/0x2a [<ffffffff801ad7d4>] device_del+0x22/0x17f [<ffffffff880b4e4e>] :scsi_transport_sas:sas_port_delete+0xdf/0xf7 [<ffffffff880fd87d>] :mptsas:mptsas_firmware_event_work+0xeb/0xcc3 [<ffffffff80062efb>] thread_return+0x0/0xdf [<ffffffff8008a60e>] __activate_task+0x27/0x39 [<ffffffff880fd792>] :mptsas:mptsas_firmware_event_work+0x0/0xcc3 [<ffffffff8004cea9>] run_workqueue+0x94/0xe4 [<ffffffff800497be>] worker_thread+0x0/0x122 [<ffffffff8009dbca>] keventd_create_kthread+0x0/0xc4 [<ffffffff800498ae>] worker_thread+0xf0/0x122 [<ffffffff8008ac03>] default_wake_function+0x0/0xe [<ffffffff8009dbca>] keventd_create_kthread+0x0/0xc4 [<ffffffff8009dbca>] keventd_create_kthread+0x0/0xc4 [<ffffffff8003253d>] kthread+0xfe/0x132 [<ffffffff8005dfb1>] child_rip+0xa/0x11 [<ffffffff8009dbca>] keventd_create_kthread+0x0/0xc4 [<ffffffff8003243f>] kthread+0x0/0x132 [<ffffffff8005dfa7>] child_rip+0x0/0x11 Code: read(net): Connection refused
Created attachment 20092 [details] A patch to fix a bug in mptsas with SAS drive hotplugging
Comment on attachment 20092 [details] A patch to fix a bug in mptsas with SAS drive hotplugging OpenSUSE 11.1, kernel 2.6.27.7-9-default, mptsas driver 4.00.43 The driver cannot rediscover a hotplugged SAS drive. The bug is fixed with this patch.
Created attachment 20544 [details] mptsas.c patch for 2.6.29-rc5, SAS/SATA Hotplug issue.
This patch works around the issue and allows for the drives to be rediscovered. This patch works with v3 of the driver included in kernel 2.6.29-rc5, and allows to not get in the trouble of getting v4 of the driver to work on 2.6.29 but rather use the kernel included driver and work around the issue. This is not a full fix, it is a workaround but it has been tested and seems to work fine. I hope this will be of some kind of help for some people.