Bug 81861 - Oops by mvsas v0.8.16: sas: ataX: end_device-Y:0:Z: dev error handler -> general protection fault, RIP: mvs_task_prep_ata+0x80/0x3a0
Summary: Oops by mvsas v0.8.16: sas: ataX: end_device-Y:0:Z: dev error handler -> gene...
Status: NEW
Alias: None
Product: SCSI Drivers
Classification: Unclassified
Component: Other (show other bugs)
Hardware: x86-64 Linux
: P1 blocking
Assignee: scsi_drivers-other
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-08-07 17:33 UTC by linux-ide
Modified: 2015-04-29 13:41 UTC (History)
5 users (show)

See Also:
Kernel Version: 3.17.1
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Dmesg output from boot (49.73 KB, text/plain)
2014-08-08 08:34 UTC, linux-ide
Details
smartctl -a /dev/sdb (HDS5C3020BLE630) (4.59 KB, text/plain)
2014-08-22 12:16 UTC, linux-ide
Details
sg_ses PCIe port expander card output (309 bytes, text/plain)
2014-08-22 17:00 UTC, linux-ide
Details
Ubuntu Linux/x86_64 3.13.0-35-generic Kernel Configuration (342.66 KB, text/plain)
2014-08-23 22:12 UTC, linux-ide
Details
Patched mvsas dmesg in kernel 3.18.3 (55.02 KB, application/octet-stream)
2015-01-26 23:21 UTC, Nathan R
Details
dmesg output after loading module (6.82 KB, text/plain)
2015-04-29 13:40 UTC, Nathan R
Details

Description linux-ide 2014-08-07 17:33:26 UTC
The issues are (1) error messages and (2) kernel crashes when attaching 4 drives (1 SFF SAS cable) to specific ports of a SAS expander.

The issue is only tested with HP SAS port expander (PMC Sierra PM8005 chip) running firmware 2.08. This expander has 36/4=9 SAS ports.
1 port of type SFF-8088, labelled 1C on the PCB.
8 port of type SFF-8087, labelled 2C till 9C on the PCB.
Port “1C” is connected to a Supermicro SAS2LP-MV8, Marvell 88SE9485 based chip, lspci output is inserted below.

The issue is not always identical. When attaching the 4 drives to different port numbers on the port multiplier, this is what happens in this order:
2C, 3C, 4C = ok
5C         = error
6C, 7C, 8C = kernel crash
9C         = error

After that first run from port 2 till 9, the issue seems more random:
9C = kernel crash
4C = kernel crash
3C = error
9C = error
7C = kernel crash
3C = error
2C = ok
4C = kernel crash

The “error message” on ports 5C and 9C is:
scsi 5:0:4:0: Failed to get diagnostic page 0x8000002
scsi 5:0:4:0: Failed to bind enclosure -19

====
Most testing is done with Ubuntu 14.04.1 running Ubuntu’s supplied mainline kernel 3.16.0-rc6. 
# modprobe -v mvsas
insmod /lib/modules/3.16.0-031600rc6-generic/kernel/drivers/scsi/scsi_transport_sas.ko
insmod /lib/modules/3.16.0-031600rc6-generic/kernel/drivers/scsi/libsas/libsas.ko
insmod /lib/modules/3.16.0-031600rc6-generic/kernel/drivers/scsi/mvsas/mvsas.ko
=============
Other tested kernels, with similar results
=============
kernel Mainline 3.16-20140724
kernel Ubuntu 3.13.11
kernel Ubuntu 3.13.0-24
kernel Ubuntu 3.12.25
kernel Ubuntu 2.6.32 = no SAS expander detected -> no further testing
=============
No drives attached to expander
============
# lsscsi
[4:0:0:0]    disk    ATA      OCZ-VERTEX       1.3   /dev/sda
[5:0:0:0]    enclosu HP       HP SAS EXP Card  2.08  -
============
With 4 drives (brown#4) attached to expander port 2C
============
# lsscsi
[4:0:0:0]    disk    ATA      OCZ-VERTEX       1.3   /dev/sda
[6:0:0:0]    disk    ATA      Hitachi HDS5C302 AAB0  /dev/sdb
[6:0:1:0]    disk    ATA      Hitachi HDS5C302 AAB0  /dev/sdc
[6:0:2:0]    disk    ATA      Hitachi HDS5C302 AAB0  /dev/sdd
[6:0:3:0]    disk    ATA      Hitachi HDS5C302 AAB0  /dev/sde
[6:0:4:0]    enclosu HP       HP SAS EXP Card  2.08  -
============
With 4 drives (brown#4) attached to expander port 3C
============
# lsscsi
[4:0:0:0]    disk    ATA      OCZ-VERTEX       1.3   /dev/sda
[6:0:4:0]    enclosu HP       HP SAS EXP Card  2.08  -
[6:0:5:0]    disk    ATA      Hitachi HDS5C302 AAB0  /dev/sdb
[6:0:6:0]    disk    ATA      Hitachi HDS5C302 AAB0  /dev/sdc
[6:0:7:0]    disk    ATA      Hitachi HDS5C302 AAB0  /dev/sdd
[6:0:8:0]    disk    ATA      Hitachi HDS5C302 AAB0  /dev/sde
============
With 4 drives (brown#4) attached to expander port 4C
============
# lsscsi
[4:0:0:0]    disk    ATA      OCZ-VERTEX       1.3   /dev/sda
[6:0:4:0]    enclosu HP       HP SAS EXP Card  2.08  -
[6:0:9:0]    disk    ATA      Hitachi HDS5C302 AAB0  /dev/sdb
[6:0:10:0]   disk    ATA      Hitachi HDS5C302 AAB0  /dev/sdc
[6:0:11:0]   disk    ATA      Hitachi HDS5C302 AAB0  /dev/sdd
[6:0:12:0]   disk    ATA      Hitachi HDS5C302 AAB0  /dev/sde
============
With 4 drives (brown#4) attached to expander port 5C
============
scsi 5:0:4:0: Failed to get diagnostic page 0x8000002
scsi 5:0:4:0: Failed to bind enclosure -19
# lsscsi
[4:0:0:0]    disk    ATA      OCZ-VERTEX       1.3   /dev/sda
[5:0:4:0]    enclosu HP       HP SAS EXP Card  2.08  -
============
With 4 drives (brown#4) attached to expander port 6C
============
Kernel crash (data from OCR-ed screenshot):
[ 263.190030] R13 ffff88020e837808 R14: ffff88021b4a0080 R15: ffff880036cll200
[ 269.130052] FS: 00007f9ef5abb740(0000) GS:ffff88021b200000(0000) knlGS:0000000000000000	
[ 269.190074] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 269.190091] CR2 00007f9ef5ac2000 CR3: 000000020fbd8000 CR4: 00000000000407f0
[ 269.190111] Stack:		
[ 269.190118]  0000000000000000 0000000000000002 ffff88021f5f7f08 dead000000200200		
[ 269.190145]  ffff38020dl037b0 0000000000000046 ffff88020eb81e38 ffffffff811b06ae	
[ 269.190171]  ffff38020e837798 ffff88020d69bl40 ffff88020dl037b0 ffff88020dl00000	
[ 269.190197] Call Trace:
[ 269.190210] [<fffffff811b06ae>] ? dma_pool_alloc+0xce/0x100
[ 269.190229] [<fffffffc06e44ab>] mvs_task_prep+0x58b/0x620 [mvsas]
[ 269.190248] [<fffffffc06e45a8>] mvs_task_exec.isra.14+0x68/0xf0 [mvsas]
[ 269.190269] [<fffffffc06e5149>] mvs_queue_command+0x39/0x40 [mvsas]
[ 269.190291] [<fffffffc06d48ab>] sas_ata_qc_issue+0x28b/0x2d0 [libsas]
[ 269.190312] [<fffffff8153102f>] ata_qc_issue+0xl8f/0x2d0
[ 269.190331] [<fffffff81537dc0>] ? ata_scsi_rw_xlat+0x230/0x230
[ 269.190349] [<fffffff81535fe4>] ata_scsi_translate+0xb4/0x1b0
[ 269.190369] [<fffffff81539aal>] ata_sas_queuecmd+0xl21/0x2b0
[ 269.190389] [<fffffffc06d387f>] sas_queuecommand+0x20f/0x280 [libsas]
[ 269.190409] [<fffffff8150d6ce>] scsi_dispatch_cmd+0xce/0x280
[ 269.190428] [<fffffff81515dd2>] scsi_request_fn+0x372/0x490
[ 269.190447] [<fffffff813541c7>] __blk_run_queue+0x37/0x50
[ 269.190465] [<fffffff8135305f>] __elv_add_request+0xef/0x310
[ 269.190483] [<fffffff8135el23>] blk_execute_rq_noujait+0xb3/0x190
[ 269.190504] [<fffffff811c2653>] ? kmem_cache_alloc_node+0xle3/0x200
[ 269.190523] [<fffffff8135e28d>] blk_execute_rq+0x8d/0x160
[ 269.190542] [<fffffff812f8bf8>] ? security_capable+0x18/0x20
[ 269.190561] [<fffffff81079el0>] ? ns_capable+0x30/0x60
[ 269.190578] [<fffffff81079ed7>] ? capable+0x17/0x20
[ 269.191191] [<fffffff81369b85>] ? blk_verify_command+0x25/0x70
[ 269.191806] [<fffffff8136ald8>] sg_io+0x168/0x2c0
[ 269.192422] [<fffffff8136a557>] scsi_cmd_ioct1+0x227/0x520
[ 269.193030] [<fffffff81198bfb>] ? __handle_mm_fault+0x1db/0x360
[ 269.193631] [<fffffff8136a89e>] scsi_cmd_blk_ioctl+0x4e/0x60
[ 269.194231] [<fffffff81520ab7>] sd_ioctl+0xd7/0xl60
[ 269.194810] [<fffffff81366b9e>] blkdev_ioctl+0xde/0x810
[ 269.195373] [<fffffff810a8ead>] ? vtime_account_user+0x5d/0x70
[ 269.195921] [<fffffff812152d0>] block_ioct1+0x40/0x50
[ 269.196449] [<fffffffSllf1805>] do_vfs_ioct1+0x75/0x2c0
[ 269.196966] [<fffffff810247b5>] ? syscall_trace_enter+0x165/0x280
[ 269.197475] [<fffffff81168835>] ? context_tracking_user_enter+0x25/0x30
[ 269.197972] [<fffffff811flael>] SyS_ioctl+0x91/0xb0
[ 269.198458] [<fffffff817913bf>] tracesys+0xe1/0xe6
[ 269.198930] Code: 00 00 48 8b 0c  c8 0f 84 a7 02 00 00 44 89 c0 41 b9 00 10 00 00 48 8d 34 80 48 3d 04 70 48 3d b4 c3 b3 55 02
00 8b 43 58 89 46 lc <8b> 89 54 02 00 00 44 89 C0 8b 7b 58 0d 00 00 00 70 4c 8b 53 48
[ 269.200019] RIP [<ffffffffc06e35a0>] mvs_task_prep_ata+0x80/0x3a0 [mvsas]	
[ 269.200534] RSP <ffff88020e837738>		
============
============
With 4 drives (brown#4) attached to expander port 7C
============
Kernel crash (from OCR-ed screenshot):
[ 38.934484] OS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 38.934501] CR2: 0000000000000254 CR3: 0000000001C12000 CR4: 00000000000407e0
[ 38.934522] Stack:
[ 38.934529]  ffff88021b214400 ffff880200000000 0000000000000282 0000000000000000
[ 38.934556]  ffff8300d4c03618 0000000000000046 ffff8300d5b01e38 ffffffff811b06ae
[ 38.934582]  ffff88021b214400 ffff88020d65el40 ffff8800d4c03618 ffff8800d4c00000
[ 38.934608] Call Trace:
[ 38.934619] [<ffffffff811b06ae>] ? dma_pool_alloc+0xce/0xl00
[ 38.934638] [<ffffffffc03c04ab>] mvs_task_prep+0x58b/0x620 [mvsas]
[ 38.934659] [<ffffffff810a29e6>] ? ttwu_do_activate.constprop.111+0x66/0x70
[ 38.934682] [<ffffffffc03c05a8>] mvs_task_exec.isra.14+0x68/0xf0 [mvsas]
[ 38.934703] [<ffffffffc03cll49>] mvs_queue_command+0x39/0x40 [mvsas]
[ 38.934725] [<ffffffffc03a88ab>] sas_ata_qc_issue+0x28b/0x2d0 [libsas]
[ 38.934747] [<ffffffff8153102f>] ata_qc_issue+0xl8f/0x2d0
[ 38.934764] [<ffffffff81531468>] ata_exec_internal_sg+0x2f8/0x5d0
[ 38.934783] [<ffffffff315317b2>] ata_exec_internal+0x72/0xb0
[ 38.934802] [<ffffffff8153Ifaa>] ata_do_dev_read_id+0x2a/0x30
[ 38.934821] [<ffffffffc03a84b0>] ? sas_ata_internal_abort+0xl20/0xl20 [libsas]
[ 38.934843] [<ffffffff81532If5>] ata_dev_read_id+0x245/0x460
[ 38.934861] [<ffffffff3153e99c>] ? ata_eh_reset+0x24c/0xe20
[ 38.934878] [<ffffffff8153d8f8>] ata_eh_revalidate_and_attach+0xl98/0x3a0
[ 38.934899] [<ffffffff8153fd69>] ata_eh_recover+0x599/0x7e0
[ 38.934917] [<ffffffff81534200>] ? sata_print_link_status+0xc0/0xc0
[ 38.934937] [<ffffffffc03a84b0>] ? sas_ata_internal_abort+0xl20/0xl20 [libsas]
[ 38.934959] [<ffffffff81534750>] ? sata_std_hardreset+0x50/0x50
[ 38.934978] [<ffffffffc03a84b0>] ? sas_ata_internal_abort+0xl20/0xl20 [libsas]
[ 38.935618] [<ffffffff81534750>] ? sata_std_hardreset+0x50/0x50
[ 38.936257] [<ffffffffc03a84b0>] ? sas_ata_internal_abort+0xl20/0xl20 [libsas]
[ 38.936905] [<ffffffff31540742>] ata_do_eh+0x52/0xc0
[ 38.937538] [<ffffffff81534200>] ? sata_print_link_status+0xc0/0xc0
[ 38.938163] [<ffffffff815407f7>] ata_std_error_handler+0x47/0x80
[ 38.938783] [<ffffffff8153b8f8>] ? ata_eh_handle_port_resume+0x38/0xl60
[ 38.939410] [<ffffffff8154041b>] ata_scsi_port_error_handler+0x39b/0x5a0
[ 38.940024] [<ffffffffc03a82c5>] async_sas_ata_eh+0x55/0x90 [libsas]
[ 38.940621] [<ffffffff8109a89b>] async_run_entry_fn+0x3b/0xl40
[ 38.941201] [<ffffffff8108c6ff>] process_one_work+0xl7f/0x4c0
[ 38.941767] [<ffffffff8108d46b>] worker_thread+0xllb/0x3f0
[ 38.942320] [<ffffffff8108d350>] ? create_and_start_worker+0x80/0x80
[ 38.942864] [<ffffffff81094479>] kthread+0xc9/0xe0
[ 38.943398] [<ffffffff810943b0>] ? flush_kthread_worker+0xb0/0xb0
[ 38.943927] [<ffffffff817910fc>] ret_from_fork+0x7c/0xb0
[ 38.944443] [<ffffffff810943b0>] ? flush_kthread_worker+0xb0/0xb0
[ 38.944956] Code: 00 00 48 8b 0c c8 0f 84 a7 02 00 00 44 89 c0 41 b9 00 10 00 00 48 8d 34 80 48 8d 04 70 48 8d b4 c3 b8 55 02
00 8b 43 58 89 46 lc <8b> 89 54 02 00 00 44 89 c0 8b 7b 58 Od 00 00 00 70 4c 8b 53 48
[ 38.946132] RIP [<ffffffffc03bf5a0>] mvs_task_prep_ata+0x80/0x3a0 [mvsas]
38.9466881 RSP <ffff88020d7bb7c8>
============
With 4 drives (brown#4) attached to expander port 8C
============
Kernel crash (text from OCR-ed screenshot):
[ 335.117520] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 335.117537] CR2: 00007fff5S6452C0 CR3: 0000000001C12000 CR4: 00000000000407e0
[ 335.117557] Stack:
[ 335.117565] ffff88021b214400 ffff880200000000 0000000000000282 74737572745f7374
[ 335.117591] ffff8800d5b03618 0000000000000046 ffff88020f301e38 ffffffff311b06ae
[ 335.117617] ffff8802lb214400 ffff8800d4bda280 ffff8800d5b03618 ffff8800d5b00000
[ 335.117644] Call Trace:
[ 335.117656] [<ffffffff811b06ae>] ? dma_pool_alloc+0xce/0x100
[ 335.117676] [<ffffffffc03fb4ab>] mvs_task_prep+0x58b/0x620 [mvsas]
[ 335.117697] [<ffffffff310a29e6>] ? ttwu_do_activate.constprop.111+0x66/0x70
[ 335.117720] [<ffffffffc03fb5a8>] mvs_task_exec.isra.14+0x68/0xf0 [mvsas]
[ 335.117741] [<ffffffffc03fcl49>] mvs_queue_command+0x39/0x40 [mvsas]
[ 335.117764] [<ffffffffc03e38ab>] sas_ata_qc_issue+0x28b/0x2d0 [libsas]
[ 335.117786] [<ffffffff8153102f>] ata_qc_issue+0xl8f/0x2d0
[ 335.117804] [<ffffffff81531468>] ata_exec_internal_sg+0x2f8/0x5d0
[ 335.117823] [<ffffffff815317b2>] ata_exec_internal+0x72/0xb0
[ 335.117842] [<ffffffff8153Ifaa>] ata_do_dev_read_id+0x2a/0x30
[ 335.117861] [<ffffffffc03e34b0>] ? sas_ata_internal_abort+0xl20/0xl20 [libsas]
[ 335.117883] [<ffffffff81532If5>] ata_dev_read_id+0x245/0x460
[ 335.117901] [<ffffffff8153e99c>] ? ata_eh_reset+0x24c/0xe20
[ 335.117919] [<ffffffff8153d8f8>] ata_eh_revalidate_and_attach+0xl98/0x3a0
[ 335.117940] [<ffffffff8153fd69>] ata_eh_recover+0x599/0x7e0
[ 335.117959] [<ffffffff31534200>] ? sata_print_link_status+0xc0/0xc0
[ 335.117979] [<ffffffffc03e34b0>] ? sas_ata_internal_abort+0xl20/0xl20 [libsas]
[ 335.118001] [<ffffffff81534750>] ? sata_std_hardreset+0x50/0x50
[ 335.118019] [<ffffffffc03e34b0>] ? sas_ata_internal_abort+0xl20/0xl20 [libsas]
[ 335.118041] [<ffffffff81534750>] ? sata_std_hardreset+0x50/0x50
[ 335.118061] [<ffffffffc03e34b0>] ? sas_ata_internal_abort+0xl20/0xl20 [libsas]
[ 335.118083] [<ffffffff81540742>] ata_do_eh+0x52/0xc0
[ 335.118709] [<ffffffff81534200>] ? sata_print_link_status+0xc0/0xc0
[ 335.119338] [<ffffffff815407f7>] ata_std_error_handler+0x47/0x80
[ 335.119970] [<ffffffff8153b8f8>] ? ata_eh_handle_port_resume+0x38/0xl60
[ 335.120600] [<ffffffff8154041b>] ata_scsi_port_error_handler+0x39b/0x5a0
[ 335.121215] [<ffffffffc03e32c5>] async_sas_ata_eh+0x55/0x90 [libsas]
[ 335.121812] [<ffffffff8109a89b>] async_run_entry_fn+0x3b/0xl40
[ 335.122394] [<ffffffff8108c6ff>] process_one_work+0xl7f/0x4c0
[ 335.122963] [<ffffffff81776ba3>] ? maybe_create_worker+0xbb/0xlc5
[ 335.123520] [<ffffffff8108d46b>] worker_thread+0xllb/0x3f0
[ 335.124064] [<ffffffff8108d350>] ? create_and_start_worker+0x80/0x80
[ 335.124605] [<ffffffff81094479>] kthread+0xc9/0xe0
[ 335.125133] [<ffffffff810943b0>] ? flush_kthread_worker+0xb0/0xb0
[ 335.125654] [<ffffffff8179l0fc>] ret_from_fork+0x7c/0xb0
[ 335.126169] [<ffffffff310943b0>] ? flush_kthread_worker+0xb0/0xb0
[ 335.126685] Code: 00 00 48 8b 0c C8 0f 84 a7 02 00 00 44 89 C0 41 b9 00 10 00 00 48 8d 34 80 48 8d 04 70 48 8d b4 c3 b8 55 02
00 8b 43 58 89 46 lc <8b> 89 54 02 00 00 44 89 C0 8b 7b 58 0d 00 00 00 70 4c 8b 53 48
[ 335.127858] RIP [<ffffffffc03fa5a0>] mvs_task_prep_ata+0x80/0x3a0 [mvsas]
[ 335.128415] RSP <ffff8800d60237c8>
============
With 4 drives (brown#4) attached to expander port 9C
============
scsi 5:0:4:0: Failed to get diagnostic page 0x8000002
scsi 5:0:4:0: Failed to bind enclosure -19
# lsscsi
[4:0:0:0]    disk    ATA      OCZ-VERTEX       1.3   /dev/sda
[5:0:4:0]    enclosu HP       HP SAS EXP Card  2.08  -
============
With 4 drives (brown#4) attached to expander port 9C [a second time],
============
Kernel crash (text from screen OCR):
[ 35.957789] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 35.957806] CR2: 00007f6c3faf8000 CR3: 0000000001c12000 CR4: 00000000000407f0
[ 35.957826] Stack:
[ 35.957833] ffff88021b314400 ffff880200000000 0000000000000282 eb3377d73948ca01
[ 35.957860] ffff88020ed037b0 0000000000000046 ffff88020ec01e38 ffffffff811b06ae
[ 35.957885] ffff88021b314400 ffff88020d66ddc0 ffff88020ed037b0 ffff88020ed00000
[ 35.957912] Call Trace:
[ 35.957924] [<ffffffff811b06ae>] ? dma_pool_alloc+0xce/0x100
[ 35.957944] [<ffffffffc05dl4ab>] mvs_task_prep+0x58b/0x620 [mvsas]
[ 35.957965] [<ffffffff810a29e6>] ? ttwu_do_activate.constprop.111+0x66/0x70
[ 35.957987] [<ffffffffc05dl5a8>] mvs_task_exec.isra.14+0x68/0xf0 [mvsas]
[ 35.958008] [<ffffffffc05d2149>] mvs_queue_command+0x39/0x40 [mvsas]
[ 35.958030] [<ffffffffc05b98ab>] sas_ata_qc_issue+0x28b/0x2d0 [libsas]
[ 35.958052] [<ffffffff8153102f>] ata_qc_issue+0xl8f/0x2d0
[ 35.958069] [<ffffffff81531468>] ata_exec_internal_sg+0x2f8/0x5d0
[ 35.958089] [<ffffffff815317b2>] ata_exec_internal+0x72/0xb0
[ 35.958107] [<ffffffff81531faa>] ata_do_dev_read_id+0x2a/0x30
[ 35.958126] [<ffffffffc05b94b0>] ? sas_ata_internal_abort+0xl20/0xl20 [libsas]
[ 35.958148] [<ffffffff81532If5>] ata_dev_read_id+0x245/0x460
[ 35.958166] [<ffffffff8153e99c>] ? ata_eh_reset+0x24c/0xe20
[ 35.958185] [<ffffffff8153d8f8>] ata_eh_revalidate_and_attach+0xl98/0x3a0
[ 35.958205] [<ffffffff8153fd69>] ata_eh_recover+0x599/0x7e0
[ 35.958223] [<ffffffff81534200>] ? sata_print_link_status+0xc0/0xc0
[ 35.958243] [<ffffffffc05b94b0>] ? sas_ata_internal_abort+0xl20/0xl20 [libsas]
[ 35.958265] [<ffffffff81534750>] ? sata_std_hardreset+0x50/0x50
[ 35.958283] [<ffffffffc05b94b0>] ? sas_ata_internal_abort+0xl20/0xl20 [libsas]
[ 35.958305] [<ffffffff81534750>] ? sata_std_hardreset+0x50/0x50
[ 35.958324] [<ffffffffc05b94b0>] ? sas_ata_internal_abort+0xl20/0xl20 [libsas]
[ 35.958346] [<ffffffff81540742>] ata_do_eh+0x52/0xc0
[ 35.958971] [<ffffffff81534200>] ? sata_print_link_status+0xc0/0xc0
[ 35.959600] [<ffffffff815407f7>] ata_std_error_handler+0x47/0x80
[ 35.960231] [<ffffffff8153b8f8>] ? ata_eh_handle_port_resume+0x38/0xl60
[ 35.960861] [<ffffffff8154041b>] ata_scsi_port_error_handler+0x39b/0x5a0
[ 35.961475] [<ffffffffc05b92c5>] async_sas_ata_eh+0x55/0x90 [libsas]
[ 35.962071] [<ffffffff8109a89b>] async_run_entry_fn+0x3b/0xl40
[ 35.962652] [<ffffffff8108c6ff>] process_one_work+0xl7f/0x4c0
[ 35.963218] [<ffffffff81776ba8>] ? maybe_create_worker+0xbb/0xlc5
[ 35.963775] [<ffffffff8108d46b>] worker_thread+0x11b/0x3f0
[ 35.964319] [<ffffffff8108d350>] ? create_and_start_worker+0x80/0x80
[ 35.964858] [<ffffffff81094479>] kthread+0xc9/0xe0
[ 35.965385] [<ffffffff810943b0>] ? flush_kthread_worker+0xb0/0xb0
[ 35.965904] [<ffffffff8179l0fc>] ret_from_fork+0x7c/0xb0
[ 35.966418] [<ffffffff810943b0>] ? fIush_kthread_worker+0xb0/0xb0
[ 35.966932] Code: 00 00 48 8b 0c c8 0f 84 a7 02 00 00 44 89 C0 41 b9 00 10 00 00 48 8d 34 80 48 8d 04 70 48 8d b4 c3 b8 55 02
00 8b 43 58 89 46 lc <8b> 89 54 02 00 00 44 89 C0 8b 7b 58 0d 00 00 00 70 4c 8b 53 48
[ 35.968100] RIP [<ffffffffc05d05a0>] mvs_task_prep_ata+0x80/0x3a0 [mvsas]
[ 35.968656] RSP <ffff8800d4b077c8>
============
# lspci -nn -s 01: -vv
01:00.0 RAID bus controller [0104]: Marvell Technology Group Ltd. 88SE9485 SAS/SATA 6Gb/s controller [1b4b:9485] (rev 03)
	Subsystem: Marvell Technology Group Ltd. Device [1b4b:9480]
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 16
	Region 0: Memory at f0540000 (64-bit, non-prefetchable) [size=128K]
	Region 2: Memory at f0500000 (64-bit, non-prefetchable) [size=256K]
	Expansion ROM at f0560000 [disabled] [size=64K]
	Capabilities: [40] Power Management version 3
		Flags: PMEClk- DSI- D1+ D2- AuxCurrent=375mA PME(D0+,D1+,D2-,D3hot+,D3cold-)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
		Address: 0000000000000000  Data: 0000
	Capabilities: [70] Express (v2) Endpoint, MSI 00
		DevCap:	MaxPayload 4096 bytes, PhantFunc 0, Latency L0s <1us, L1 <8us
			ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd- ExtTag+ PhantFunc- AuxPwr- NoSnoop-
			MaxPayload 128 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
		LnkCap:	Port #0, Speed 5GT/s, Width x8, ASPM L0s L1, Exit Latency L0s <512ns, L1 <64us
			ClockPM- Surprise- LLActRep- BwNot-
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 5GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Not Supported, TimeoutDis+, LTR-, OBFF Not Supported
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
		LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
	Capabilities: [100 v1] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
		AERCap:	First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
	Capabilities: [140 v1] Virtual Channel
		Caps:	LPEVC=0 RefClk=100ns PATEntryBits=1
		Arb:	Fixed- WRR32- WRR64- WRR128-
		Ctrl:	ArbSelect=Fixed
		Status:	InProgress-
		VC0:	Caps:	PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
			Arb:	Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
			Ctrl:	Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
			Status:	NegoPending- InProgress-
	Kernel driver in use: mvsas
============

A Highpoint Rocket 2720SGL controller (also a Marvell 9485 based chip as far as I know) ran with identical SAS expander and disk drives and power supply without errors/crashes using the Highpoint 4.0.0.1528N driver (mv94xx.ko) on Debian 6.0.6/kernel 2.6.32-46.
Comment 1 linux-ide 2014-08-08 08:19:37 UTC
After setting up netconsole using <https://wiki.ubuntu.com/Kernel/Netconsole>, and enabling kernel boot parameters debug and ignore_loglevel there is are more kernel crash log lines available:
============
[   77.094783] mvsas 0000:01:00.0: mvsas: driver version 0.8.16
[   77.095405] mvsas 0000:01:00.0: mvsas: PCI-E x8, Bandwidth Usage: 5.0 Gbps
[   83.881049] scsi5 : mvsas
[   83.883157] sas: phy-5:4 added to port-5:0, phy_mask:0x1 (50014380182cf0e6)
[   83.883190] /home/apw/COD/linux/drivers/scsi/mvsas/mv_sas.c 1218:set wide port phy map 1
[   83.893532] sas: phy1 matched wide port0
[   83.893558] sas: phy-5:5 added to port-5:0, phy_mask:0x3 (50014380182cf0e6)
[   83.893580] /home/apw/COD/linux/drivers/scsi/mvsas/mv_sas.c 1218:set wide port phy map 3
[   83.913447] sas: phy2 matched wide port0
[   83.913468] sas: phy-5:6 added to port-5:0, phy_mask:0x7 (50014380182cf0e6)
[   83.913491] /home/apw/COD/linux/drivers/scsi/mvsas/mv_sas.c 1218:set wide port phy map 7
[   83.943257] sas: phy3 matched wide port0
[   83.943274] sas: phy-5:7 added to port-5:0, phy_mask:0xf (50014380182cf0e6)
[   83.943294] /home/apw/COD/linux/drivers/scsi/mvsas/mv_sas.c 1218:set wide port phy map f
[   83.982994] sas: DOING DISCOVERY on port 0, pid:6
[   83.984660] sas: ex 50014380182cf0e6 phy00:D:0 attached: 0000000000000000 (no device)
[   83.985256] sas: ex 50014380182cf0e6 phy01:D:0 attached: 0000000000000000 (no device)
[   83.985851] sas: ex 50014380182cf0e6 phy02:D:0 attached: 0000000000000000 (no device)
[   83.986372] sas: ex 50014380182cf0e6 phy03:D:0 attached: 0000000000000000 (no device)
[   83.986933] sas: ex 50014380182cf0e6 phy04:D:0 attached: 0000000000000000 (no device)
[   83.987488] sas: ex 50014380182cf0e6 phy05:D:0 attached: 0000000000000000 (no device)
[   83.988086] sas: ex 50014380182cf0e6 phy06:D:0 attached: 0000000000000000 (no device)
[   83.988603] sas: ex 50014380182cf0e6 phy07:D:0 attached: 0000000000000000 (no device)
[   83.989197] sas: ex 50014380182cf0e6 phy08:D:0 attached: 0000000000000000 (no device)
[   83.989766] sas: ex 50014380182cf0e6 phy09:D:0 attached: 0000000000000000 (no device)
[   83.990300] sas: ex 50014380182cf0e6 phy10:D:0 attached: 0000000000000000 (no device)
[   83.990872] sas: ex 50014380182cf0e6 phy11:D:0 attached: 0000000000000000 (no device)
[   83.991401] sas: ex 50014380182cf0e6 phy12:D:0 attached: 0000000000000000 (no device)
[   83.991978] sas: ex 50014380182cf0e6 phy13:D:0 attached: 0000000000000000 (no device)
[   83.992515] sas: ex 50014380182cf0e6 phy14:D:0 attached: 0000000000000000 (no device)
[   83.993098] sas: ex 50014380182cf0e6 phy15:D:0 attached: 0000000000000000 (no device)
[   83.993625] sas: ex 50014380182cf0e6 phy16:D:0 attached: 0000000000000000 (no device)
[   83.994213] sas: ex 50014380182cf0e6 phy17:D:0 attached: 0000000000000000 (no device)
[   83.994785] sas: ex 50014380182cf0e6 phy18:D:0 attached: 0000000000000000 (no device)
[   83.995316] sas: ex 50014380182cf0e6 phy19:D:0 attached: 0000000000000000 (no device)
[   83.995890] sas: ex 50014380182cf0e6 phy20:D:0 attached: 0000000000000000 (no device)
[   83.996432] sas: ex 50014380182cf0e6 phy21:D:0 attached: 0000000000000000 (no device)
[   83.996998] sas: ex 50014380182cf0e6 phy22:D:0 attached: 0000000000000000 (no device)
[   83.997540] sas: ex 50014380182cf0e6 phy23:D:0 attached: 0000000000000000 (no device)
[   83.998189] sas: ex 50014380182cf0e6 phy24:U:A attached: 5005043011ab0000 (host)
[   83.998812] sas: ex 50014380182cf0e6 phy25:U:A attached: 5005043011ab0000 (host)
[   83.999386] sas: ex 50014380182cf0e6 phy26:U:A attached: 5005043011ab0000 (host)
[   84.000012] sas: ex 50014380182cf0e6 phy27:U:A attached: 5005043011ab0000 (host)
[   84.000575] sas: ex 50014380182cf0e6 phy28:S:0 attached: 0000000000000000 (no device)
[   84.001581] sas: ex 50014380182cf0e6 phy29:S:0 attached: 0000000000000000 (no device)
[   84.002561] sas: ex 50014380182cf0e6 phy30:S:0 attached: 0000000000000000 (no device)
[   84.003550] sas: ex 50014380182cf0e6 phy31:S:0 attached: 0000000000000000 (no device)
[   84.004573] sas: ex 50014380182cf0e6 phy32:S:9 attached: 50014380182cf0e0 (stp)
[   84.005580] sas: ex 50014380182cf0e6 phy33:S:9 attached: 50014380182cf0e1 (stp)
[   84.006543] sas: ex 50014380182cf0e6 phy34:S:9 attached: 50014380182cf0e2 (stp)
[   84.007442] sas: ex 50014380182cf0e6 phy35:S:9 attached: 50014380182cf0e3 (stp)
[   84.008136] sas: ex 50014380182cf0e6 phy36:D:A attached: 50014380182cf0e5 (host+target)
[   84.009969] sas: DONE DISCOVERY on port 0, pid:6, result:0
[   84.010274] sas: Enter sas_scsi_recover_host busy: 0 failed: 0
[   84.010569] sas: ata6: end_device-5:0:32: dev error handler
[   84.010873] sas: ata7: end_device-5:0:33: dev error handler
[   84.011160] sas: ata8: end_device-5:0:34: dev error handler
[   84.011424] sas: ata9: end_device-5:0:35: dev error handler
[   84.164663] general protection fault: 0000 [#1] SMP
[   84.164897] Modules linked in: mvsas libsas scsi_transport_sas ppdev intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel i915 kvm crct10dif_pclmul drm_kms_helper crc32_pclmul drm ghash_clmulni_intel cryptd i2c_algo_bit lpc_ich mei_me microcode mei serio_raw soc_button_array video parport_pc mac_hid netconsole configfs lp parport psmouse ahci libahci r8169 mii
[   84.165752] CPU: 0 PID: 1008 Comm: kworker/u4:5 Not tainted 3.16.0-031600rc6-generic #201407210035
[   84.166027] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./H81 Pro BTC, BIOS P1.50 02/14/2014
[   84.166325] Workqueue: events_unbound async_run_entry_fn
[   84.166630] task: ffff880036d5ef60 ti: ffff8800d4b34000 task.ti: ffff8800d4b34000
[   84.166953] RIP: 0010:[<ffffffffc028e5a0>]  [<ffffffffc028e5a0>] mvs_task_prep_ata+0x80/0x3a0 [mvsas]
[   84.167364] RSP: 0018:ffff8800d4b377c8  EFLAGS: 00010097
[   84.167714] RAX: 000000000000002c RBX: ffff88020f200000 RCX: dead000000200200
[   84.168078] RDX: ffff88020f2037b0 RSI: ffff88020f2255b8 RDI: ffff88020f200000
[   84.168451] RBP: ffff8800d4b37838 R08: 0000000000000000 R09: 0000000000001000
[   84.168834] R10: 0000000000000000 R11: ffff88020f2255b0 R12: ffff88020fbab640
[   84.169228] R13: ffff8800d4b37898 R14: ffff88021b4a0000 R15: ffff880036f19a00
[   84.169628] FS:  0000000000000000(0000) GS:ffff88021b200000(0000) knlGS:0000000000000000
[   84.170044] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   84.170467] CR2: 00007f0031fbf000 CR3: 0000000001c12000 CR4: 00000000000407f0
[   84.170907] Stack:
[   84.171345]  ffff88021b314400 ffff880200000000 0000000000000282 dead000000200200
[   84.171818]  ffff88020f2037b0 0000000000000046 ffff88020cd81e38 ffffffff811b06ae
[   84.172300]  ffff88021b314400 ffff88020fbab640 ffff88020f2037b0 ffff88020f200000
[   84.172791] Call Trace:
[   84.173280]  [<ffffffff811b06ae>] ? dma_pool_alloc+0xce/0x100
[   84.173785]  [<ffffffffc028f4ab>] mvs_task_prep+0x58b/0x620 [mvsas]
[   84.174298]  [<ffffffff810a29e6>] ? ttwu_do_activate.constprop.111+0x66/0x70
[   84.174823]  [<ffffffffc028f5a8>] mvs_task_exec.isra.14+0x68/0xf0 [mvsas]
[   84.175358]  [<ffffffffc0290149>] mvs_queue_command+0x39/0x40 [mvsas]
[   84.175901]  [<ffffffffc02778ab>] sas_ata_qc_issue+0x28b/0x2d0 [libsas]
[   84.176446]  [<ffffffff8153102f>] ata_qc_issue+0x18f/0x2d0
[   84.176997]  [<ffffffff81531468>] ata_exec_internal_sg+0x2f8/0x5d0
[   84.177554]  [<ffffffff815317b2>] ata_exec_internal+0x72/0xb0
[   84.178113]  [<ffffffff81531faa>] ata_do_dev_read_id+0x2a/0x30
[   84.178673]  [<ffffffffc02774b0>] ? sas_ata_internal_abort+0x120/0x120 [libsas]
[   84.179245]  [<ffffffff815321f5>] ata_dev_read_id+0x245/0x460
[   84.179825]  [<ffffffff8153e99c>] ? ata_eh_reset+0x24c/0xe20
[   84.180409]  [<ffffffff8153d8f8>] ata_eh_revalidate_and_attach+0x198/0x3a0
[   84.181002]  [<ffffffff810cd4d1>] ? vprintk_emit+0x1b1/0x560
[   84.181598]  [<ffffffff8153fd69>] ata_eh_recover+0x599/0x7e0
[   84.182200]  [<ffffffff81534200>] ? sata_print_link_status+0xc0/0xc0
[   84.182809]  [<ffffffffc02774b0>] ? sas_ata_internal_abort+0x120/0x120 [libsas]
[   84.183427]  [<ffffffff81534750>] ? sata_std_hardreset+0x50/0x50
[   84.184037]  [<ffffffffc02774b0>] ? sas_ata_internal_abort+0x120/0x120 [libsas]
[   84.184710]  [<ffffffff81534750>] ? sata_std_hardreset+0x50/0x50
[   84.185323]  [<ffffffffc02774b0>] ? sas_ata_internal_abort+0x120/0x120 [libsas]
[   84.185945]  [<ffffffff81540742>] ata_do_eh+0x52/0xc0
[   84.186574]  [<ffffffff81534200>] ? sata_print_link_status+0xc0/0xc0
[   84.187213]  [<ffffffff815407f7>] ata_std_error_handler+0x47/0x80
[   84.187850]  [<ffffffff8153b8f8>] ? ata_eh_handle_port_resume+0x38/0x160
[   84.188473]  [<ffffffff8154041b>] ata_scsi_port_error_handler+0x39b/0x5a0
[   84.189081]  [<ffffffffc02772c5>] async_sas_ata_eh+0x55/0x90 [libsas]
[   84.189673]  [<ffffffff8109a89b>] async_run_entry_fn+0x3b/0x140
[   84.190248]  [<ffffffff8108c6ff>] process_one_work+0x17f/0x4c0
[   84.190812]  [<ffffffff81776ba8>] ? maybe_create_worker+0xbb/0x1c5
[   84.191364]  [<ffffffff8108d46b>] worker_thread+0x11b/0x3f0
[   84.191910]  [<ffffffff8108d350>] ? create_and_start_worker+0x80/0x80
[   84.192446]  [<ffffffff81094479>] kthread+0xc9/0xe0
[   84.192971]  [<ffffffff810943b0>] ? flush_kthread_worker+0xb0/0xb0
[   84.193495]  [<ffffffff817910fc>] ret_from_fork+0x7c/0xb0
[   84.194015]  [<ffffffff810943b0>] ? flush_kthread_worker+0xb0/0xb0
[   84.194534] Code: 00 00 48 8b 0c c8 0f 84 a7 02 00 00 44 89 c0 41 b9 00 10 00 00 48 8d 34 80 48 8d 04 70 48 8d b4 c3 b8 55 02 00 8b 43 58 89 46 1c <8b> 89 54 02 00 00 44 89 c0 8b 7b 58 0d 00 00 00 70 4c 8b 53 48
[   84.195858] RIP  [<ffffffffc028e5a0>] mvs_task_prep_ata+0x80/0x3a0 [mvsas]
[   84.196412]  RSP <ffff8800d4b377c8>
Comment 2 linux-ide 2014-08-08 08:34:10 UTC
Created attachment 145681 [details]
Dmesg output from boot
Comment 3 linux-ide 2014-08-12 20:09:15 UTC
Because Ubuntu doesn't provide debug symbols for their mainline kernel builds <http://comments.gmane.org/gmane.linux.ubuntu.devel.kernel.general/40661> I am reverting back to their kernel version 3.13.0-24.46

That results in a kernel crash on port 8C: 
BUG: unable to handle kernel NULL pointer dereference at 0000000000000255

Full output:
[   25.212661] mvsas 0000:01:00.0: mvsas: driver version 0.8.16
[   25.212703] mvsas 0000:01:00.0: enabling device (0000 -> 0002)
[   25.213249] mvsas 0000:01:00.0: mvsas: PCI-E x8, Bandwidth Usage: 5.0 Gbps
[   31.994771] scsi5 : mvsas
[   31.995530] sas: phy-5:0 added to port-5:0, phy_mask:0x1 (50014380182cf0e6)
[   31.995564] /build/buildd/linux-3.13.0/drivers/scsi/mvsas/mv_sas.c 1218:set wide port phy map 1
[   32.005672] sas: phy1 matched wide port0
[   32.005695] sas: phy-5:1 added to port-5:0, phy_mask:0x3 (50014380182cf0e6)
[   32.005720] /build/buildd/linux-3.13.0/drivers/scsi/mvsas/mv_sas.c 1218:set wide port phy map 3
[   32.025591] sas: phy2 matched wide port0
[   32.025611] sas: phy-5:2 added to port-5:0, phy_mask:0x7 (50014380182cf0e6)
[   32.025635] /build/buildd/linux-3.13.0/drivers/scsi/mvsas/mv_sas.c 1218:set wide port phy map 7
[   32.055410] sas: phy3 matched wide port0
[   32.055427] sas: phy-5:3 added to port-5:0, phy_mask:0xf (50014380182cf0e6)
[   32.055452] /build/buildd/linux-3.13.0/drivers/scsi/mvsas/mv_sas.c 1218:set wide port phy map f
[   32.095144] sas: DOING DISCOVERY on port 0, pid:127
[   32.096843] sas: ex 50014380182cf0e6 phy00:D:0 attached: 0000000000000000 (no device)
[   32.097408] sas: ex 50014380182cf0e6 phy01:D:0 attached: 0000000000000000 (no device)
[   32.097917] sas: ex 50014380182cf0e6 phy02:D:0 attached: 0000000000000000 (no device)
[   32.098503] sas: ex 50014380182cf0e6 phy03:D:0 attached: 0000000000000000 (no device)
[   32.099044] sas: ex 50014380182cf0e6 phy04:D:0 attached: 0000000000000000 (no device)
[   32.099628] sas: ex 50014380182cf0e6 phy05:D:0 attached: 0000000000000000 (no device)
[   32.100205] sas: ex 50014380182cf0e6 phy06:D:0 attached: 0000000000000000 (no device)
[   32.100739] sas: ex 50014380182cf0e6 phy07:D:0 attached: 0000000000000000 (no device)
[   32.101310] sas: ex 50014380182cf0e6 phy08:D:0 attached: 0000000000000000 (no device)
[   32.101840] sas: ex 50014380182cf0e6 phy09:D:0 attached: 0000000000000000 (no device)
[   32.102412] sas: ex 50014380182cf0e6 phy10:D:0 attached: 0000000000000000 (no device)
[   32.102959] sas: ex 50014380182cf0e6 phy11:D:0 attached: 0000000000000000 (no device)
[   32.103545] sas: ex 50014380182cf0e6 phy12:D:0 attached: 0000000000000000 (no device)
[   32.104128] sas: ex 50014380182cf0e6 phy13:D:0 attached: 0000000000000000 (no device)
[   32.104661] sas: ex 50014380182cf0e6 phy14:D:0 attached: 0000000000000000 (no device)
[   32.105273] sas: ex 50014380182cf0e6 phy15:D:0 attached: 0000000000000000 (no device)
[   32.105781] sas: ex 50014380182cf0e6 phy16:D:0 attached: 0000000000000000 (no device)
[   32.106385] sas: ex 50014380182cf0e6 phy17:D:0 attached: 0000000000000000 (no device)
[   32.106904] sas: ex 50014380182cf0e6 phy18:D:0 attached: 0000000000000000 (no device)
[   32.107486] sas: ex 50014380182cf0e6 phy19:D:0 attached: 0000000000000000 (no device)
[   32.108020] sas: ex 50014380182cf0e6 phy20:D:0 attached: 0000000000000000 (no device)
[   32.108605] sas: ex 50014380182cf0e6 phy21:D:0 attached: 0000000000000000 (no device)
[   32.109183] sas: ex 50014380182cf0e6 phy22:D:0 attached: 0000000000000000 (no device)
[   32.109714] sas: ex 50014380182cf0e6 phy23:D:0 attached: 0000000000000000 (no device)
[   32.110357] sas: ex 50014380182cf0e6 phy24:U:A attached: 5005043011ab0000 (host)
[   32.110929] sas: ex 50014380182cf0e6 phy25:U:A attached: 5005043011ab0000 (host)
[   32.111558] sas: ex 50014380182cf0e6 phy26:U:A attached: 5005043011ab0000 (host)
[   32.112181] sas: ex 50014380182cf0e6 phy27:U:A attached: 5005043011ab0000 (host)
[   32.112774] sas: ex 50014380182cf0e6 phy28:S:9 attached: 50014380182cf0dc (stp)
[   32.113366] sas: ex 50014380182cf0e6 phy29:S:9 attached: 50014380182cf0dd (stp)
[   32.113934] sas: ex 50014380182cf0e6 phy30:S:9 attached: 50014380182cf0de (stp)
[   32.114557] sas: ex 50014380182cf0e6 phy31:S:9 attached: 50014380182cf0df (stp)
[   32.115138] sas: ex 50014380182cf0e6 phy32:S:0 attached: 0000000000000000 (no device)
[   32.115654] sas: ex 50014380182cf0e6 phy33:S:0 attached: 0000000000000000 (no device)
[   32.116198] sas: ex 50014380182cf0e6 phy34:S:0 attached: 0000000000000000 (no device)
[   32.116711] sas: ex 50014380182cf0e6 phy35:S:0 attached: 0000000000000000 (no device)
[   32.117003] sas: ex 50014380182cf0e6 phy36:D:A attached: 50014380182cf0e5 (host+target)
[   32.118398] sas: DONE DISCOVERY on port 0, pid:127, result:0
[   32.118435] sas: Enter sas_scsi_recover_host busy: 0 failed: 0
[   32.118465] sas: ata6: end_device-5:0:28: dev error handler
[   32.119140] sas: ata7: end_device-5:0:29: dev error handler
[   32.119333] sas: ata8: end_device-5:0:30: dev error handler
[   32.119368] sas: ata9: end_device-5:0:31: dev error handler
[   32.271218] BUG: unable to handle kernel NULL pointer dereference at 0000000000000255
[   32.271791] IP: [<ffffffffa02d381e>] mvs_task_prep+0x72e/0xd50 [mvsas]
[   32.272365] PGD 0
[   32.272928] Oops: 0000 [#1] SMP
[   32.273480] Modules linked in: mvsas libsas scsi_transport_sas hid_generic usbhid hid x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel cryptd i915 drm_kms_helper serio_raw lpc_ich mei_me mei drm i2c_algo_bit netconsole configfs lp parport video mac_hid psmouse ahci libahci r8169 mii
[   32.275388] CPU: 0 PID: 54 Comm: kworker/u4:1 Not tainted 3.13.0-24-generic #47-Ubuntu
[   32.276028] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./H81 Pro BTC, BIOS P1.80 07/21/2014
[   32.276745] Workqueue: events_unbound async_run_entry_fn
[   32.277389] task: ffff88020fe6afe0 ti: ffff8802136aa000 task.ti: ffff8802136aa000
[   32.278032] RIP: 0010:[<ffffffffa02d381e>]  [<ffffffffa02d381e>] mvs_task_prep+0x72e/0xd50 [mvsas]
[   32.278691] RSP: 0018:ffff8802136ab8c0  EFLAGS: 00010097
[   32.279337] RAX: 000000000000002c RBX: 0000000000000001 RCX: 0000000000000000
[   32.279980] RDX: 0000000000000000 RSI: ffff8800d8c255b8 RDI: ffff8800d8c00000
[   32.280619] RBP: ffff8802136ab958 R08: ffff8800d8c03618 R09: ffff8800363a0000
[   32.281246] R10: ffff880212977600 R11: 0000000000000000 R12: ffff8800d8c00000
[   32.281861] R13: 0000000000000000 R14: ffff8800d8c03618 R15: ffff88020f8dedc0
[   32.282474] FS:  0000000000000000(0000) GS:ffff88021f200000(0000) knlGS:0000000000000000
[   32.283082] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   32.283679] CR2: 0000000000000255 CR3: 0000000002c0e000 CR4: 00000000000407f0
[   32.284278] Stack:
[   32.284880]  ffff88020fe6afe0 ffff880212977200 ffff8802136ab8e0 ffffffff81719ee9
[   32.285520]  ffff880212977600 ffff8800363a0000 ffff8800d8c03618 ffff8800d8c255b0
[   32.286167]  ffff8800d8c02678 0000000000000000 00000001d8c00008 ffff8800d8c255b8
[   32.286821] Call Trace:
[   32.287473]  [<ffffffff81719ee9>] ? schedule+0x29/0x70
[   32.288144]  [<ffffffffa02d3e9d>] mvs_task_exec.isra.13+0x5d/0xe0 [mvsas]
[   32.288832]  [<ffffffffa02d49dc>] mvs_queue_command+0x30c/0x320 [mvsas]
[   32.289530]  [<ffffffff811a013f>] ? kmem_cache_free+0xef/0x120
[   32.290232]  [<ffffffff8119f692>] ? kmem_cache_alloc+0x132/0x140
[   32.290942]  [<ffffffffa028601d>] ? sas_alloc_task+0x1d/0x40 [libsas]
[   32.291662]  [<ffffffffa028fcab>] sas_ata_qc_issue+0x24b/0x290 [libsas]
[   32.292392]  [<ffffffff814f7762>] ata_qc_issue+0x172/0x380
[   32.293128]  [<ffffffff814f7c23>] ata_exec_internal_sg+0x2b3/0x570
[   32.293875]  [<ffffffff814f7f3a>] ata_exec_internal+0x5a/0xa0
[   32.294624]  [<ffffffff814f8334>] ata_dev_read_id+0x274/0x550
[   32.295380]  [<ffffffffa028f8f0>] ? sas_ata_printk+0x80/0x80 [libsas]
[   32.296148]  [<ffffffff81505bab>] ata_eh_recover+0x74b/0x1310
[   32.296923]  [<ffffffff810bcfe8>] ? console_unlock+0x208/0x400
[   32.297707]  [<ffffffff814facd0>] ? ata_phys_link_online+0x30/0x30
[   32.298503]  [<ffffffffa028f8f0>] ? sas_ata_printk+0x80/0x80 [libsas]
[   32.299367]  [<ffffffff814fae50>] ? ata_phys_link_offline+0x30/0x30
[   32.300179]  [<ffffffffa028f8f0>] ? sas_ata_printk+0x80/0x80 [libsas]
[   32.301001]  [<ffffffff814fae50>] ? ata_phys_link_offline+0x30/0x30
[   32.301826]  [<ffffffffa028f8f0>] ? sas_ata_printk+0x80/0x80 [libsas]
[   32.302661]  [<ffffffff81507299>] ata_do_eh+0x49/0xc0
[   32.303503]  [<ffffffff814facd0>] ? ata_phys_link_online+0x30/0x30
[   32.304357]  [<ffffffff8150734e>] ata_std_error_handler+0x3e/0x80
[   32.305215]  [<ffffffff81506dba>] ata_scsi_port_error_handler+0x56a/0x940
[   32.306086]  [<ffffffffa02900aa>] async_sas_ata_eh+0x4a/0x80 [libsas]
[   32.306963]  [<ffffffff81091517>] async_run_entry_fn+0x37/0x130
[   32.307849]  [<ffffffff810838a2>] process_one_work+0x182/0x450
[   32.308735]  [<ffffffff81084641>] worker_thread+0x121/0x410
[   32.309629]  [<ffffffff81084520>] ? rescuer_thread+0x3e0/0x3e0
[   32.310530]  [<ffffffff8108b312>] kthread+0xd2/0xf0
[   32.311437]  [<ffffffff8108b240>] ? kthread_create_on_node+0x1d0/0x1d0
[   32.312351]  [<ffffffff817263fc>] ret_from_fork+0x7c/0xb0
[   32.313255]  [<ffffffff8108b240>] ? kthread_create_on_node+0x1d0/0x1d0
[   32.314160] Code: 63 92 a0 02 00 00 41 80 b8 84 00 00 00 7f 48 8b 80 58 01 00 00 48 8b 1c d0 0f 84 a0 05 00 00 41 8b 44 24 58 48 8b 75 c0 89 46 1c <8b> 8b 54 02 00 00 be 00 10 00 00 41 8b 54 24 58 49 8b 44 24 48
[   32.316308] RIP  [<ffffffffa02d381e>] mvs_task_prep+0x72e/0xd50 [mvsas]
[   32.317292]  RSP <ffff8802136ab8c0>
[   32.318278] CR2: 0000000000000255
Comment 4 linux-ide 2014-08-12 22:02:09 UTC
Trying to debug mvs_task_prep with the help of the tutorial at <http://www.opensourceforu.com/2011/01/understanding-a-kernel-oops/>.

# cat /sys/module/mvsas/sections/.init.text
0xffffffffa00c8000

# cd /lib/modules/3.13.0-24-generic/kernel/drivers/scsi/mvsas

# gdb mvsas.ko

(gdb) add-symbol-file /usr/lib/debug/lib/modules/3.13.0-24-generic/kernel/drivers/scsi/mvsas/mvsas.ko 0xffffffffa00c8000

(gdb) disassemble mvs_task_prep

Hex to decimal: 0x72e = <+1838>

0xffffffffa00ca81e <+1838>:	mov    0x254(%rbx),%ecx

Thanks to the trick from <https://blogs.oracle.com/ksplice/entry/8_gdb_tricks_you_should>
(gdb) set substitute-path /build/buildd /home/user/src

(gdb) list *0xffffffffa00ca81e
0xffffffffa00ca81e is in mvs_task_prep (/build/buildd/linux-3.13.0/drivers/scsi/mvsas/mv_sas.c:471).
Line number 466 out of range; /build/buildd/linux-3.13.0/drivers/scsi/mvsas/mv_sas.c has 306 lines.

I guess my gdb version 7.7 has a line counting bug according to <https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=730630>

A manual approach using <http://kernel.ubuntu.com/git?p=ubuntu/ubuntu-trusty.git;a=blob;f=drivers/scsi/mvsas/mv_sas.c;h=6c1f223a8e1d335fa7c86a374e470e666e848906;hb=HEAD>:

467         slot = &mvi->slot_info[tag];
468         slot->tx = mvi->tx_prod;
469         del_q = TXQ_MODE_I | tag |
470                 (TXQ_CMD_STP << TXQ_CMD_SHIFT) |
471                 (MVS_PHY_ID << TXQ_PHY_SHIFT) |
472                 (mvi_dev->taskfileset << TXQ_SRS_SHIFT);
473         mvi->tx[mvi->tx_prod] = cpu_to_le32(del_q);

Results that "(MVS_PHY_ID << TXQ_PHY_SHIFT)" is the offending code.

How should that be patched?
Comment 5 Alan 2014-08-21 18:35:45 UTC
Thats not a sensible resolution, it can't be faulting on that line.
Comment 6 linux-ide 2014-08-22 12:13:42 UTC
When connecting just a single 4 drive group to the good ports (for example 2C) of the external PCIe expander card:
cold boot = doesn't detect any of the 4 PUIS drives
warm boot = does detect all 4 PUIS drives

When powering up using the warm boot method there don't seem to be errors reported by smartctl neither sg_ses.

However this cold boot issue might be a different issue from this kernel crash. According to debug messages first a "Set Features" (0xEF) is being sent. My guess is that this set features issues subcommand (0x07): spin up media.

And later on the "Identify Device" (0xEC) is sent.


When I correctly read the Hitachi specification the Spin Up (Set Features) should be sent after "Drive Identify". For this Hitachi HDS5C3020BLE630 the Drive_Identify (# sg_sat_identify -v /dev/sdb) word 2 outputs "738c" (hex), which translates to specification "Need Set Feature for spin-up after power-up Identify Device is complete" according to HGST specification page 127.

Is there a boot parameter (or similar way) to load the mvsas driver without sending the "Set Features" (0xEF) command?
Comment 7 linux-ide 2014-08-22 12:16:07 UTC
Created attachment 147751 [details]
smartctl -a /dev/sdb (HDS5C3020BLE630)
Comment 8 linux-ide 2014-08-22 12:17:30 UTC
Comment on attachment 145681 [details]
Dmesg output from boot

This is without loading the mvsas kernel module.
Comment 9 linux-ide 2014-08-22 13:19:17 UTC
re: Thats not a sensible resolution, it can't be faulting on that line.

Another try using a newer version of package gdb-minimal (Ubuntu 7.7-0ubuntu3.2 from trusty-proposed) gives these identical results where address <+1838> maps to line 471 in mvsas.c and that points to "(MVS_PHY_ID << TXQ_PHY_SHIFT) |".

# cat /sys/module/mvsas/sections/.init.text
0xffffffffa01c2000

(gdb) add-symbol-file /usr/lib/debug/lib/modules/3.13.0-24-generic/kernel/drivers/scsi/mvsas/mvsas.ko 0xffffffffa01c2000
add symbol table from file "/usr/lib/debug/lib/modules/3.13.0-24-generic/kernel/drivers/scsi/mvsas/mvsas.ko" at
	.text_addr = 0xffffffffa01c2000

   0xffffffffa01c481e <+1838>:	mov    0x254(%rbx),%ecx

(gdb) list *0xffffffffa01c481e
0xffffffffa01c481e is in mvs_task_prep (/build/buildd/linux-3.13.0/drivers/scsi/mvsas/mv_sas.c:471).
466		}
467		slot = &mvi->slot_info[tag];
468		slot->tx = mvi->tx_prod;
469		del_q = TXQ_MODE_I | tag |
470			(TXQ_CMD_STP << TXQ_CMD_SHIFT) |
471			(MVS_PHY_ID << TXQ_PHY_SHIFT) |
472			(mvi_dev->taskfileset << TXQ_SRS_SHIFT);
473		mvi->tx[mvi->tx_prod] = cpu_to_le32(del_q);
474
475		if (task->data_dir == DMA_FROM_DEVICE)
Comment 10 linux-ide 2014-08-22 14:05:25 UTC
Another test round to see whether there is a difference in crash whether using cold or warm boot:
5C + cold boot = mvs_task_prep+0x72e/0xd50 [mvsas]
5C + warm boot = mvs_task_prep+0x72e/0xd50 [mvsas]
6C + cold boot = mvs_task_prep+0x72e/0xd50 [mvsas]
6C + warm boot = mvs_task_prep+0x72e/0xd50 [mvsas]
7C + cold boot = mvs_task_prep+0x72e/0xd50 [mvsas]
7C + warm boot = mvs_task_prep+0x72e/0xd50 [mvsas]
8C + cold boot = mvs_task_prep+0x72e/0xd50 [mvsas]
8C + warm boot = mvs_task_prep+0x72e/0xd50 [mvsas]
9C + cold boot = mvs_task_prep+0x72e/0xd50 [mvsas]
9C + warm boot = mvs_task_prep+0x72e/0xd50 [mvsas]

In cases 6C, 7C and 9C the r8169 nic 
doesn't come up after the first automatic reboot after cold boot ("Waiting for network configuration..." and "Waiting up to 60 more seconds for network configuration...")
   does come up after the second automatic reboot after cold boot [reproduceable=yes]
Comment 11 linux-ide 2014-08-22 17:00:19 UTC
Created attachment 147771 [details]
sg_ses PCIe port expander card output
Comment 12 Alan 2014-08-22 17:36:22 UTC
   0xffffffffa01c481e <+1838>:	mov    0x254(%rbx),%ecx

is loading an offset from something. It can't be line 471.

It could be line 472, or could be 468. but the offset looks way too big to be either unless its been optimised somewhat. It's not always entirely accurate.

At this point what might be useful is to add lines between then and rebuild ... ie



                printk("[");
467		slot = &mvi->slot_info[tag];
                printk("%d ", tag);
468		slot->tx = mvi->tx_prod;
                printk("%p ",  slot);
469		del_q = TXQ_MODE_I | tag |
470			(TXQ_CMD_STP << TXQ_CMD_SHIFT) |
471			(MVS_PHY_ID << TXQ_PHY_SHIFT) |
472			(mvi_dev->taskfileset << TXQ_SRS_SHIFT);
                printk("%d", mvi->tx_prod]);
473		mvi->tx[mvi->tx_prod] = cpu_to_le32(del_q);
                printk("]\n");

and try again. When it dies just before the oops you should have  lines of the form

[num num num]

the final one of which is incomplete. Where it ends tells us where it died and the values may even give
us a guess at why. If the final [ .. ] sequence is complete then it crashed somewhere else in the routine and gdb is confused.
Comment 13 linux-ide 2014-08-23 20:04:10 UTC
It dies between printing the second and the third variable:

[   30.455440] sas: DONE DISCOVERY on port 0, pid:128, result:0
[   30.455502] sas: Enter sas_scsi_recover_host busy: 0 failed: 0
[   30.455534] sas: ata6: end_device-5:0:20: dev error handler
[   30.455744] sas: ata7: end_device-5:0:21: dev error handler
[   30.456186] sas: ata8: end_device-5:0:22: dev error handler
[   30.456367] sas: ata9: end_device-5:0:23: dev error handler
[   30.611146] [0 ffff8800d8e255b8 44]
[   30.611959] [0 ffff8800d8e255b8 46]
[   30.612511] [2 ffff8800d8e25668
[   30.612537] BUG: unable to handle kernel NULL pointer dereference at 0000000000000255
[   30.613511] IP: [<ffffffffa022c872>] mvs_task_prep+0x782/0xdd0 [mvsas]
[   30.614003] PGD 0
[   30.614486] Oops: 0000 [#1] SMP
[   30.614967] Modules linked in: mvsas(OF) libsas scsi_transport_sas x86_pkg_temp_thermal intel_powerclamp hid_generic coretemp usbhid kvm_intel i915 kvm hid crct10dif_pclmul drm_kms_helper crc32_pclmul ghash_clmulni_intel cryptd drm netconsole configfs i2c_algo_bit serio_raw mei_me lpc_ich mei lp video mac_hid parport psmouse r8169 mii ahci libahci
[   30.616702] CPU: 0 PID: 6 Comm: kworker/u4:0 Tainted: GF          O 3.13.0-35-generic #62-Ubuntu
[   30.617279] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./H81 Pro BTC, BIOS P1.80 07/21/2014
[   30.617853] Workqueue: events_unbound async_run_entry_fn
[   30.618426] task: ffff8802139b0000 ti: ffff8802139ae000 task.ti: ffff8802139ae000
[   30.619007] RIP: 0010:[<ffffffffa022c872>]  [<ffffffffa022c872>] mvs_task_prep+0x782/0xdd0 [mvsas]
[   30.619604] RSP: 0018:ffff8802139af8c0  EFLAGS: 00010096
[   30.620188] RAX: ffff8800d8e03618 RBX: 0000000000000002 RCX: 0000000000002ace
[   30.620779] RDX: 00000000000064e6 RSI: 0000000000000046 RDI: 0000000000000046
[   30.621363] RBP: ffff8802139af958 R08: 0000000000000086 R09: 0000000000000426
[   30.621941] R10: ffff880213bf4098 R11: 0000000000000001 R12: 0000000000000001
[   30.622508] R13: ffff8800d8e00000 R14: ffff8800d8e03618 R15: ffff88007f912500
[   30.623068] FS:  0000000000000000(0000) GS:ffff88021f200000(0000) knlGS:0000000000000000
[   30.623649] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   30.624238] CR2: 0000000000000255 CR3: 0000000002c0e000 CR4: 00000000000407f0
[   30.624844] Stack:
[   30.625450]  ffffffff8109d415 ffff88021f314440 ffff88021f314440 ffff88021f314440
[   30.626097]  ffff88020f97064c ffff8800d8e01e38 ffff880211d6fe00 ffff8800d8e25660
[   30.626752]  ffff88007f740080 ffff8800d8e02678 0000000181098129 ffff8800d8e25668
[   30.627413] Call Trace:
[   30.628072]  [<ffffffff8109d415>] ? sched_clock_cpu+0xb5/0x100
[   30.628753]  [<ffffffffa022cf1d>] mvs_task_exec.isra.13+0x5d/0xe0 [mvsas]
[   30.629450]  [<ffffffffa022da5c>] mvs_queue_command+0x30c/0x320 [mvsas]
[   30.630155]  [<ffffffff811a2362>] ? kmem_cache_alloc+0x1b2/0x1e0
[   30.630867]  [<ffffffffa020c787>] ? sas_free_task+0x37/0x40 [libsas]
[   30.631593]  [<ffffffffa0215cab>] sas_ata_qc_issue+0x24b/0x290 [libsas]
[   30.632326]  [<ffffffff814fe742>] ata_qc_issue+0x172/0x380
[   30.633067]  [<ffffffff814fec03>] ata_exec_internal_sg+0x2b3/0x570
[   30.633817]  [<ffffffff814fef1a>] ata_exec_internal+0x5a/0xa0
[   30.634570]  [<ffffffff814ff314>] ata_dev_read_id+0x274/0x550
[   30.635332]  [<ffffffffa02158f0>] ? sas_ata_printk+0x80/0x80 [libsas]
[   30.636166]  [<ffffffff8150cbab>] ata_eh_recover+0x74b/0x1310
[   30.636938]  [<ffffffff81501cb0>] ? ata_phys_link_online+0x30/0x30
[   30.637721]  [<ffffffffa02158f0>] ? sas_ata_printk+0x80/0x80 [libsas]
[   30.638512]  [<ffffffff81501e30>] ? ata_phys_link_offline+0x30/0x30
[   30.639314]  [<ffffffffa02158f0>] ? sas_ata_printk+0x80/0x80 [libsas]
[   30.640119]  [<ffffffff81501e30>] ? ata_phys_link_offline+0x30/0x30
[   30.640933]  [<ffffffffa02158f0>] ? sas_ata_printk+0x80/0x80 [libsas]
[   30.641758]  [<ffffffff8150e299>] ata_do_eh+0x49/0xc0
[   30.642588]  [<ffffffff81501cb0>] ? ata_phys_link_online+0x30/0x30
[   30.643425]  [<ffffffff8150e34e>] ata_std_error_handler+0x3e/0x80
[   30.644271]  [<ffffffff8150ddba>] ata_scsi_port_error_handler+0x56a/0x940
[   30.645128]  [<ffffffffa02160aa>] async_sas_ata_eh+0x4a/0x80 [libsas]
[   30.645996]  [<ffffffff81091657>] async_run_entry_fn+0x37/0x130
[   30.646871]  [<ffffffff810839d2>] process_one_work+0x182/0x450
[   30.647750]  [<ffffffff810847c1>] worker_thread+0x121/0x410
[   30.648638]  [<ffffffff810846a0>] ? rescuer_thread+0x430/0x430
[   30.649534]  [<ffffffff8108b4a2>] kthread+0xd2/0xf0
[   30.650429]  [<ffffffff8108b3d0>] ? kthread_create_on_node+0x1c0/0x1c0
[   30.651321]  [<ffffffff8172ecbc>] ret_from_fork+0x7c/0xb0
[   30.652211]  [<ffffffff8108b3d0>] ? kthread_create_on_node+0x1c0/0x1c0
[   30.653102] Code: 03 47 23 a0 31 c0 e8 62 b7 4e e1 48 8b 4d c0 41 8b 45 58 48 c7 c7 07 47 23 a0 89 41 1c 48 89 ce 31 c0 e8 46 b7 4e e1 48 8b 45 d0 <41> 8b 8c 24 54 02 00 00 41 bc 00 10 00 00 41 8b 75 58 48 c7 c7
[   30.655215] RIP  [<ffffffffa022c872>] mvs_task_prep+0x782/0xdd0 [mvsas]
[   30.656195]  RSP <ffff8802139af8c0>
[   30.657163] CR2: 0000000000000255
Comment 14 linux-ide 2014-08-23 20:06:31 UTC
By the way:

printk("%d", mvi->tx_prod]);

was changed to:

printk("%d", mvi->tx_prod);

The square bracket after tx_prod was removed.
Comment 15 linux-ide 2014-08-23 22:12:23 UTC
Created attachment 147881 [details]
Ubuntu Linux/x86_64 3.13.0-35-generic Kernel Configuration

This kernel configuration was used to build both the patched and unpatched mvsas.ko
Comment 16 linux-ide 2014-09-23 21:56:05 UTC
When line-by-line dumping the called constants/vars from:
469		del_q = TXQ_MODE_I | tag |
470			(TXQ_CMD_STP << TXQ_CMD_SHIFT) |
471			(MVS_PHY_ID << TXQ_PHY_SHIFT) |
472			(mvi_dev->taskfileset << TXQ_SRS_SHIFT);

using the prepended statements:
        printk("slot=%p ", slot);
        printk(KERN_INFO "TXQ_MODE_I=%d ", TXQ_MODE_I);
        printk(KERN_INFO "tag=%d ", tag);
        printk(KERN_INFO "TXQ_CMD_STP=%d ", TXQ_CMD_STP);
        printk(KERN_INFO "TXQ_CMD_SHIFT=%d ", TXQ_CMD_SHIFT);
        printk(KERN_INFO "MVS_PHY_ID=%d ", MVS_PHY_ID);
        printk(KERN_INFO "TXQ_PHY_SHIFT=%d ", TXQ_PHY_SHIFT);
        del_q = TXQ_MODE_I | tag |
                (TXQ_CMD_STP << TXQ_CMD_SHIFT) |
                (MVS_PHY_ID << TXQ_PHY_SHIFT) |
                (mvi_dev->taskfileset << TXQ_SRS_SHIFT);

the kernel crash occurs after printing "TXQ_CMD_SHIFT" or when trying to output the value of "MVS_PHY_ID":
[  529.113152] sas: DONE DISCOVERY on port 0, pid:133, result:0
[  529.114313] sas: Enter sas_scsi_recover_host busy: 0 failed: 0
[  529.115460] sas: ata7: end_device-6:0:28: dev error handler
[  529.115522] sas: ata8: end_device-6:0:29: dev error handler
[  529.118706] sas: ata9: end_device-6:0:30: dev error handler
[  529.119840] sas: ata10: end_device-6:0:31: dev error handler
[  529.271634] [mvi=ffff8800d3680000, mvi_dev=ffff8800d36836a0 tag=0 slot=ffff8800d36a55b8
[  529.271753] TXQ_MODE_I=268435456 tag=0
[  529.272679] TXQ_CMD_STP=3 TXQ_CMD_SHIFT=29
[  529.273618] MVS_PHY_ID=32768 TXQ_PHY_SHIFT=12 tx_prod=44]
[  529.276091] [mvi=ffff8800d3680000, mvi_dev=ffff8800d3683618 tag=1 slot=ffff8800d36a5610
[  529.276207] TXQ_MODE_I=268435456 tag=1
[  529.277095] TXQ_CMD_STP=3 TXQ_CMD_SHIFT=29
[  529.278038] MVS_PHY_ID=1 TXQ_PHY_SHIFT=12 tx_prod=46]
[  529.280271] [mvi=ffff8800d3680000, mvi_dev=ffff8800d3683618 tag=1 slot=ffff8800d36a5610
[  529.280385] TXQ_MODE_I=268435456 tag=1
[  529.281445] TXQ_CMD_STP=3 TXQ_CMD_SHIFT=29
[  529.282562] MVS_PHY_ID=1 TXQ_PHY_SHIFT=12 tx_prod=48]
[  529.284894] [mvi=ffff8800d3680000, mvi_dev=ffff8800d36837b0 tag=2 slot=ffff8800d36a5668
[  529.285010] TXQ_MODE_I=268435456 tag=2
[  529.286248] TXQ_CMD_STP=3 TXQ_CMD_SHIFT=29
[  529.287555] BUG: unable to handle kernel NULL pointer dereference at 0000000000000257
[  529.290225] IP: [<ffffffffa02888bb>] mvs_task_prep+0x7cb/0xe50 [mvsas]
[  529.291686] PGD 0
[  529.293141] Oops: 0000 [#1] SMP
[  529.294630] Modules linked in: mvsas(OF) libsas scsi_transport_sas x86_pkg_temp_thermal intel_powerclamp coretemp kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel cryptd serio_raw lpc_ich i915 mei_me mei drm_kms_helper video netconsole drm configfs mac_hid i2c_algo_bit psmouse r8169 ahci mii libahci

Any suggestions why accessing "MVS_PHY_ID" leads to the kernel NULL pointer dereference oops?
Comment 17 Leon Woestenberg 2014-09-26 07:04:54 UTC
With TXQ_PHY_SHIFT being 12, and TXQ_CMD_SHIFT being 29, it seems the PHY one-bit-hot coding appears in bits 12 through 28 inclusive.

I.e. 16 bits or PHY ID's are supported.

The register transmitted to the controller seems a 32-bit fixed register, so this seems a hardware limitation rather than software driver limitation.

469		del_q = TXQ_MODE_I | tag |
470			(TXQ_CMD_STP << TXQ_CMD_SHIFT) |
471			(MVS_PHY_ID << TXQ_PHY_SHIFT) |
472			(mvi_dev->taskfileset << TXQ_SRS_SHIFT);
                printk("%d", mvi->tx_prod]);
473		mvi->tx[mvi->tx_prod] = cpu_to_le32(del_q);

Remaining question: how is this supposed to fly with port expanders where PHY ID's get >16?


Thanks to an extensive debug report by e-mail from Rob Elliott (HP Server Storage) --- thanks! --- which I copied ad verbatim:

---
1. Although MVS_PHY_ID looks like a constant, it's really not:
#define MVS_PHY_ID (1U << sas_phy->id)

2. This fault:
[   32.271218] BUG: unable to handle kernel NULL pointer dereference at 0000000000000255
(although 255 looks like a decimal number 0xff, it's really hex 0x255)

at this line:
  0xffffffffa01c481e <+1838>:	mov    0x254(%rbx),%ecx

implies that rbx contains 1, so 0x254 + 1 = 0x255.

3. pahole drivers/scsi/mvsas/mv_sas.o
shows there are two structures with fields at offset 596:
* asd_sas_phy.id
* asd_sas_port.sas_addr[8]

4. objdump -drS drivers/scsi/mvsas/mv_sas.o
shows only a few lines with 0x254(%something), one of which
is the del_q line you've identified:

mvs_task_prep_ata(struct mvs_info *mvi, struct mvs_task_exec_info *tei):
       struct sas_ha_struct *sha = mvi->sas;
       struct sas_task *task = tei->task;
       struct domain_device *dev = task->dev;
       struct sas_phy *sphy = dev->phy;
       struct asd_sas_phy *sas_phy = sha->sas_phy[sphy->number];

       ...
       del_q = TXQ_MODE_I | tag |
               (TXQ_CMD_STP << TXQ_CMD_SHIFT) |
               (MVS_PHY_ID << TXQ_PHY_SHIFT) |
               (mvi_dev->taskfileset << TXQ_SRS_SHIFT);
       mvi->tx[mvi->tx_prod] = cpu_to_le32(del_q);

MVS_PHY_ID =
sas_phy->id =
sha->sas_phy[sphy->number] =
mvi->sas->sas_phy[dev->phy->number] =
mvi->sas->sas_phy[task->dev->phy->number]->id
mvi->sas->sas_phy[tei->task->dev->phy->number]->id

Looking at the offsets reported by pahole, that means:
%rdi->56->344[%rsi->0->0->56->688]->254

mvi->sas->sas_phy is a pointer to a pointer:
struct sas_ha_struct {
...
       struct asd_sas_phy * *     sas_phy;              /*   344     8 */

You might look for somewhere that could accidentally
be setting sas_phy[something] to a for loop index,
with a typecast hiding the problem from the compiler.
Or, the phy->number value being passed might be
out of range; if there were discovery errors, something
might not have been initialized like this function expects.


Rob Elliott    HP Server Storage
---
Comment 18 linux-ide 2014-10-19 15:56:21 UTC
Even after flashing the SAS2LP-MV8 its firmware from version 4.0.0.1800 to version 4.0.0.1812 the mvs_task_prep_ata+0x80/0x3a0 [mvsas] kernel oops issue persists on kernel:

1. "Linux ubuntu25 3.17.1-031701-generic #201410150735 SMP Wed Oct 15 11:36:31 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux" and
2. "Linux ubuntu25 3.17.0-999-generic #201410182205 SMP Sun Oct 19 02:06:22 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux"
Comment 19 Christian Vilhelm 2014-12-17 19:49:06 UTC
The problem was introduced with this patch :
commit 7c237c5f6d5c62724ccd82aecdcd1fd9bd71dc75
Author: Xiangliang Yu <yuxiangl@marvell.com>
Date:   Wed Jan 30 00:25:53 2013 +0800

    [SCSI] mvsas: fixed timeout issue when removing module


The offending line :
    (MVS_PHY_ID << TXQ_PHY_SHIFT)
was before :
    (sas_port->phy_mask << TXQ_PHY_SHIFT)

Reverting the patch corrects the problem for me (kernel 3.18.1)
Comment 20 Nathan R 2015-01-26 23:19:03 UTC
There seems to be various issues with this driver. After reverting that commit, I can load the driver, but insmod takes a long time to return. One time was about 3mins, the other times I gave up waiting and rebooted after 5mins. I'm using an Areca ARC-1320, so I've ended up downgrading my kernel and using the proprietary driver just so that it works.
Comment 21 Nathan R 2015-01-26 23:20:32 UTC
(In reply to Nathan R from comment #20)
> There seems to be various issues with this driver. After reverting that
> commit, I can load the driver, but insmod takes a long time to return. One
> time was about 3mins, the other times I gave up waiting and rebooted after
> 5mins. I'm using an Areca ARC-1320, so I've ended up downgrading my kernel
> and using the proprietary driver just so that it works.

I should add, this was with kernel 3.18.3 and I'll attach the dmesg section from insmod.
Comment 22 Nathan R 2015-01-26 23:21:21 UTC
Created attachment 164841 [details]
Patched mvsas dmesg in kernel 3.18.3
Comment 23 linux-ide 2015-04-23 11:31:26 UTC
On the Linux-scsi mailing list a possible patch was introduced that has been tested to fix another appearance of the mvsas port expander mvs_task_prep panic.

In that case the resulting panics for the combination mvsas + port expander + SATA drives were:
1. RIP  [<ffffffffa00cd7ed>] mvs_task_prep+0x78d/0xe40 [mvsas]
2. RIP  [<ffffffffa00bd90f>] mvs_task_prep+0x73f/0xd50 [mvsas]
3. RIP  [<ffffffffa006f5b0>] mvs_task_prep_ata+0x80/0x3a0 [mvsas]
4. RIP: 0010:[<ffffffffa00f1877>]  [<ffffffffa00f1877>] mvs_task_exec.isra.13+0x827/0xf10 [mvsas]

---

James Bottomley wrote on 16-04-15 at 07:16:

Well, that narrows it down.  It looks like there's a longstanding bug in
mvs_task_prep_ata() where the physical PHY field is populated by taking
an index through the HBA phy table.  This field is ignored for STP but
the phy table is too small and it uses the expander phy number to index
it (hence the GPF as we fall off the end of the phy table trying to
dereference sas_phy->id).

This should fix the problem.

James

---

diff --git a/drivers/scsi/mvsas/mv_sas.c b/drivers/scsi/mvsas/mv_sas.c
index 2d5ab6d..454536c 100644
--- a/drivers/scsi/mvsas/mv_sas.c
+++ b/drivers/scsi/mvsas/mv_sas.c
@@ -441,14 +441,11 @@ static u32 mvs_get_ncq_tag(struct sas_task *task, u32 *tag)
 static int mvs_task_prep_ata(struct mvs_info *mvi,
 			     struct mvs_task_exec_info *tei)
 {
-	struct sas_ha_struct *sha = mvi->sas;
 	struct sas_task *task = tei->task;
 	struct domain_device *dev = task->dev;
 	struct mvs_device *mvi_dev = dev->lldd_dev;
 	struct mvs_cmd_hdr *hdr = tei->hdr;
 	struct asd_sas_port *sas_port = dev->port;
-	struct sas_phy *sphy = dev->phy;
-	struct asd_sas_phy *sas_phy = sha->sas_phy[sphy->number];
 	struct mvs_slot_info *slot;
 	void *buf_prd;
 	u32 tag = tei->tag, hdr_tag;
@@ -468,7 +465,7 @@ static int mvs_task_prep_ata(struct mvs_info *mvi,
 	slot->tx = mvi->tx_prod;
 	del_q = TXQ_MODE_I | tag |
 		(TXQ_CMD_STP << TXQ_CMD_SHIFT) |
-		(MVS_PHY_ID << TXQ_PHY_SHIFT) |
+		((sas_port->phy_mask & TXQ_PHY_MASK) << TXQ_PHY_SHIFT) |
 		(mvi_dev->taskfileset << TXQ_SRS_SHIFT);
 	mvi->tx[mvi->tx_prod] = cpu_to_le32(del_q);
Comment 24 Nathan R 2015-04-29 13:40:15 UTC
Created attachment 175261 [details]
dmesg output after loading module

Just tested the driver from linux-stable since that patch has been merged.
After loading, I get a bunch of "failed to IDENTIFY" errors, then an oops and insmod never returned (so far been 15mins and nothing new in dmesg).

Note You need to log in before you can comment on or make changes to this bug.