Bug 12163 - [PATCH]mptsas driver cannot discover some hotplugged SATA/SAS disks
Summary: [PATCH]mptsas driver cannot discover some hotplugged SATA/SAS disks
Status: CLOSED CODE_FIX
Alias: None
Product: IO/Storage
Classification: Unclassified
Component: SCSI (show other bugs)
Hardware: All Linux
: P1 high
Assignee: linux-scsi@vger.kernel.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-12-04 00:38 UTC by Cheng Renquan
Modified: 2012-05-22 16:57 UTC (History)
4 users (show)

See Also:
Kernel Version: 2.6.28-rc6, 2.6.26.2, 2.6.22.16
Subsystem:
Regression: No
Bisected commit-id:


Attachments
The vendor driver 4.00.43.00 from LSI Corporation (134.31 KB, application/x-bzip)
2008-12-05 05:18 UTC, Cheng Renquan
Details
port vendor driver 4.00.43.00 for RHEL5.1 to the latest mainstream git kernel (25.22 KB, patch)
2008-12-05 05:20 UTC, Cheng Renquan
Details | Diff
A patch to fix a bug in mptsas with SAS drive hotplugging (455 bytes, patch)
2009-02-03 07:11 UTC, Nikolai Kopanygin
Details | Diff
mptsas.c patch for 2.6.29-rc5, SAS/SATA Hotplug issue. (497 bytes, patch)
2009-03-16 04:47 UTC, Benjamin ESTRABAUD
Details | Diff

Description Cheng Renquan 2008-12-04 00:38:34 UTC
Latest working kernel version: No
Earliest failing kernel version: 2.6.22.16
Distribution: LinuxFromScratch-6.4
Hardware Environment: LSI-MPTSAS-1068E (http://www.lsi.com/storage_home/products_home/standard_product_ics/sas_ics/lsisas1068e/index.html)
Software Environment:
Problem Description: mptsas driver cannot discover some hotplugged SATA/SAS disks

Steps to reproduce:
1. use LSI-SAS-1068E controller to manage multiple disks, including disks on multiple JBODs;
2. hotunplug and hotplug the disks at random places;

Result:
1. disks direct attached can can rediscovered, disks on the last JBOD can also be rediscovered, other disks cannot.

Like this, disks on HEAD and JBOD3 can be rediscovered, others on JBOD1 and JBOD2 cannot.

HEAD  <=> JBOD1 <=> JBOD2 <=> JBOD3

Reproducible: always
Comment 1 Anonymous Emailer 2008-12-04 07:30:44 UTC
Reply-To: James.Bottomley@HansenPartnership.com

On Thu, 2008-12-04 at 00:47 -0800, bugme-daemon@bugzilla.kernel.org
wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=12163


> Latest working kernel version: No
> Earliest failing kernel version: 2.6.22.16
> Distribution: LinuxFromScratch-6.4
> Hardware Environment: LSI-MPTSAS-1068E
>
> (http://www.lsi.com/storage_home/products_home/standard_product_ics/sas_ics/lsisas1068e/index.html)
> Software Environment:
> Problem Description: mptsas driver cannot discover some hotplugged SATA/SAS
> disks
> 
> Steps to reproduce:
> 1. use LSI-SAS-1068E controller to manage multiple disks, including disks on
> multiple JBODs;
> 2. hotunplug and hotplug the disks at random places;
> 
> Result:
> 1. disks direct attached can can rediscovered, disks on the last JBOD can
> also
> be rediscovered, other disks cannot.
> 
> Like this, disks on HEAD and JBOD3 can be rediscovered, others on JBOD1 and
> JBOD2 cannot.
> 
> HEAD  <=> JBOD1 <=> JBOD2 <=> JBOD3

My suspicion is that this is because there's no firmware event triggered
(the LSI is a fat firmware device, it relies on firmware for almost
everything to function including hotplug).  If this is true, it's
unfixable in the driver.

To verify, try this patch which will print out all the fw events and see
if it prints anything when you hotplug.

James

---

diff --git a/drivers/message/fusion/mptsas.c b/drivers/message/fusion/mptsas.c
index 12b7325..e755bdd 100644
--- a/drivers/message/fusion/mptsas.c
+++ b/drivers/message/fusion/mptsas.c
@@ -2600,6 +2600,8 @@ mptsas_hotplug_work(struct work_struct *work)
 	VirtTarget *vtarget;
 	VirtDevice *vdevice;
 
+	printk("mptsas hotplug event: event %d ioc%d SAS addr %llx device info %x phy_id %d phys_disk_num %d\n", ev->event_type, ioc->id, ev->sas_address, ev->device_info, ev->phy_id, ev->phys_disk_num);
+
 	mutex_lock(&ioc->sas_discovery_mutex);
 	switch (ev->event_type) {
 	case MPTSAS_DEL_DEVICE:
Comment 2 Cheng Renquan 2008-12-05 05:18:44 UTC
Created attachment 19155 [details]
The vendor driver 4.00.43.00 from LSI Corporation

http://www.lsi.com/storage_home/products_home/standard_product_ics/sas_ics/lsisas1068e/index.html

It's just the vendor driver from LSI, you can also download from it and extract them under drivers instead of applying this patch.
Comment 3 Cheng Renquan 2008-12-05 05:20:25 UTC
Created attachment 19156 [details]
port vendor driver 4.00.43.00 for RHEL5.1 to the latest mainstream git kernel

with the previous 4.00.43.00 driver and this, you can run it on the latest git kernel.
Comment 4 Cheng Renquan 2008-12-07 18:19:41 UTC
I have tested and verified that this two patches can let my LSI-SAS-1068E controllers work well with 2.6.26.2 kernels, perfect,

Tested-by: Cheng Renquan <crquan@gmail.com>

but the latest git kernel has not yet been tested. I personally suggest this patch integrated into Greg KH's staging tree.
Comment 5 Cheng Renquan 2008-12-17 20:31:23 UTC
This driver worked well in general situation,

1. use mptsas vendor driver 4.00.43 and this patch to let it work on 2.6.26.2;
2. all disks can be recognized when staticly booting or after hotplugged;

But today we found another severe hotplug problem:

Steps to reproduce:
1. use mdadm to create a soft RAID, e.g. create a md1 with level 5 on sdm,sdn,sdo;
2. if plug the disks in a soft RAID, the disks(sdm,sdn,sdo) cannot be recognized after inserted;
3. if these disks inserted into other slots, they can be recognized by mptsas; 4. if other disks inserted into these slots where (sdm,sdn,sdo) occupied, they cannot be recognized, too.
5. At the same time, other disks not in the soft RAID still support hotplug well;

To sum in one word, the slot which has been occupied by a disk in soft RAID cannot support hotplug anymore (before rebooting).

I suspect this is a generic scsi-level problem, sometimes it reports scsi_target alloc failed with -EEXIST;


Reproducible: always
Comment 6 Cheng Renquan 2008-12-19 01:53:11 UTC
On another type of machine, the driver 4.00.43 with rhel 5.2 x86_64 will cause to a kernel panic.

Steps to reproduce:
1. Install rhel5.2 x86_64, then mptlinux-4.00.43.00-1-rhel5.x86_64.rpm from lsi website;
2. normal spare disks support hotplug well;
3. if create a filesystem and mount it, or create lvm, or create soft RAID on any of the disks, it doesn't support hotplug;
4. if try to plug the disk in a mounted filesystem, or lvm, or soft RAID, the kernel will panic,

the message as:

gektop@tux ~/sforge/linux-2.6 $ netcat -u -l -p 5140

Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP: 
 [<ffffffff800649dd>] _spin_lock+0x0/0xa
PGD 0 
Oops: 0002 [1] SMP 
last sysfs file: /module/ehci_hcd/sections/.text
CPU 1 
Modules linked in: netconsole(U) autofs4(U) hidp(U) l2cap(U) bluetooth(U) sunrpc(U) mptctl(U) loop(U) dm_multipath(U) video(U) sbs(U) backlight(U) i2c_ec(U) button(U) battery(U) asus_acpi(U) acpi_memhotplug(U) ac(U) parport_pc(U) lp(U) parport(U) floppy(U) sg(U) i2c_i801(U) serio_raw(U) i5000_edac(U) i2c_core(U) pcspkr(U) edac_mc(U) e1000e(U) shpchp(U) dm_snapshot(U) dm_zero(U) dm_mirror(U) dm_mod(U) ata_piix(U) libata(U) mptfc(U) scsi_transport_fc(U) mptspi(U) scsi_transport_spi(U) mptsas(U) mptscsih(U) mptbase(U) scsi_transport_sas(U) sd_mod(U) scsi_mod(U) ext3(U) jbd(U) uhci_hcd(U) ohci_hcd(U) ehci_hcd(U)
Pid: 455, comm: mpt/0 Tainted: G      2.6.18-u1219 #1
RIP: 0010:[<ffffffff800649dd>]  [<ffffffff800649dd>] _spin_lock+0x0/0xa
RSP: 0018:ffff81007effbc78  EFLAGS: 00010282
RAX: ffff81007ee95000 RBX: ffff81007a887828 RCX: ffff81007e8ed860
RDX: ffff8100796b0000 RSI: ffff81007a887800 RDI: 0000000000000000
RBP: ffff81007a887800 R08: 00000000ffffffff R09: ffff81007e8ed860
R10: ffff81007e8ed860 R11: ffff81007e8ed860 R12: 0000000000000000
R13: ffff81007ee65c38 R14: ffff81007a887a80 R15: ffffffff880fd792
FS:  0000000000000000(0000) GS:ffff81007ff0b7c0(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 0000000079867000 CR4: 00000000000006e0
Process mpt/0 (pid: 455, threadinfo ffff81007effa000, task ffff81007fb36080)
Stack:  ffffffff80266193 ffff81007ee95748 ffff81007a887800 ffff81007a8877e0
 ffffffff801ad7d4 ffff810078413700 ffff81007a887800 ffff81007a887a98
 ffff81007a8877e0 ffff81007a8877e0 ffffffff880b4e4e ffff81007eea8000
Call Trace:
 [<ffffffff80266193>] klist_del+0x15/0x2a
 [<ffffffff801ad7d4>] device_del+0x22/0x17f
 [<ffffffff880b4e4e>] :scsi_transport_sas:sas_port_delete+0xdf/0xf7
 [<ffffffff880fd87d>] :mptsas:mptsas_firmware_event_work+0xeb/0xcc3
 [<ffffffff80062efb>] thread_return+0x0/0xdf
 [<ffffffff8008a60e>] __activate_task+0x27/0x39
 [<ffffffff880fd792>] :mptsas:mptsas_firmware_event_work+0x0/0xcc3
 [<ffffffff8004cea9>] run_workqueue+0x94/0xe4
 [<ffffffff800497be>] worker_thread+0x0/0x122
 [<ffffffff8009dbca>] keventd_create_kthread+0x0/0xc4
 [<ffffffff800498ae>] worker_thread+0xf0/0x122
 [<ffffffff8008ac03>] default_wake_function+0x0/0xe
 [<ffffffff8009dbca>] keventd_create_kthread+0x0/0xc4
 [<ffffffff8009dbca>] keventd_create_kthread+0x0/0xc4
 [<ffffffff8003253d>] kthread+0xfe/0x132
 [<ffffffff8005dfb1>] child_rip+0xa/0x11
 [<ffffffff8009dbca>] keventd_create_kthread+0x0/0xc4
 [<ffffffff8003243f>] kthread+0x0/0x132
 [<ffffffff8005dfa7>] child_rip+0x0/0x11


Code: read(net): Connection refused
Comment 7 Nikolai Kopanygin 2009-02-03 07:11:22 UTC
Created attachment 20092 [details]
A patch to fix a bug in mptsas with SAS drive hotplugging
Comment 8 Nikolai Kopanygin 2009-02-03 07:12:58 UTC
Comment on attachment 20092 [details]
A patch to fix a bug in mptsas with SAS drive hotplugging

OpenSUSE 11.1, kernel 2.6.27.7-9-default, mptsas driver 4.00.43
The driver cannot rediscover a hotplugged SAS drive.
The bug is fixed with this patch.
Comment 9 Benjamin ESTRABAUD 2009-03-16 04:47:12 UTC
Created attachment 20544 [details]
mptsas.c patch for 2.6.29-rc5, SAS/SATA Hotplug issue.
Comment 10 Benjamin ESTRABAUD 2009-03-16 04:47:31 UTC
This patch works around the issue and allows for the drives to be rediscovered.

This patch works with v3 of the driver included in kernel 2.6.29-rc5, and allows to not get in the trouble of getting v4 of the driver to work on 2.6.29 but rather use the kernel included driver and work around the issue.

This is not a full fix, it is a workaround but it has been tested and seems to work fine.

I hope this will be of some kind of help for some people.

Note You need to log in before you can comment on or make changes to this bug.