Bug 81861
Summary: | Oops by mvsas v0.8.16: sas: ataX: end_device-Y:0:Z: dev error handler -> general protection fault, RIP: mvs_task_prep_ata+0x80/0x3a0 | ||
---|---|---|---|
Product: | SCSI Drivers | Reporter: | linux-ide |
Component: | Other | Assignee: | scsi_drivers-other |
Status: | NEW --- | ||
Severity: | blocking | CC: | alan, christian.vilhelm, linux-ide, nathan.renniewaldock+kernelbugs, sidebranch.linux |
Priority: | P1 | ||
Hardware: | x86-64 | ||
OS: | Linux | ||
Kernel Version: | 3.17.1 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
Dmesg output from boot
smartctl -a /dev/sdb (HDS5C3020BLE630) sg_ses PCIe port expander card output Ubuntu Linux/x86_64 3.13.0-35-generic Kernel Configuration Patched mvsas dmesg in kernel 3.18.3 dmesg output after loading module |
Description
linux-ide
2014-08-07 17:33:26 UTC
After setting up netconsole using <https://wiki.ubuntu.com/Kernel/Netconsole>, and enabling kernel boot parameters debug and ignore_loglevel there is are more kernel crash log lines available: ============ [ 77.094783] mvsas 0000:01:00.0: mvsas: driver version 0.8.16 [ 77.095405] mvsas 0000:01:00.0: mvsas: PCI-E x8, Bandwidth Usage: 5.0 Gbps [ 83.881049] scsi5 : mvsas [ 83.883157] sas: phy-5:4 added to port-5:0, phy_mask:0x1 (50014380182cf0e6) [ 83.883190] /home/apw/COD/linux/drivers/scsi/mvsas/mv_sas.c 1218:set wide port phy map 1 [ 83.893532] sas: phy1 matched wide port0 [ 83.893558] sas: phy-5:5 added to port-5:0, phy_mask:0x3 (50014380182cf0e6) [ 83.893580] /home/apw/COD/linux/drivers/scsi/mvsas/mv_sas.c 1218:set wide port phy map 3 [ 83.913447] sas: phy2 matched wide port0 [ 83.913468] sas: phy-5:6 added to port-5:0, phy_mask:0x7 (50014380182cf0e6) [ 83.913491] /home/apw/COD/linux/drivers/scsi/mvsas/mv_sas.c 1218:set wide port phy map 7 [ 83.943257] sas: phy3 matched wide port0 [ 83.943274] sas: phy-5:7 added to port-5:0, phy_mask:0xf (50014380182cf0e6) [ 83.943294] /home/apw/COD/linux/drivers/scsi/mvsas/mv_sas.c 1218:set wide port phy map f [ 83.982994] sas: DOING DISCOVERY on port 0, pid:6 [ 83.984660] sas: ex 50014380182cf0e6 phy00:D:0 attached: 0000000000000000 (no device) [ 83.985256] sas: ex 50014380182cf0e6 phy01:D:0 attached: 0000000000000000 (no device) [ 83.985851] sas: ex 50014380182cf0e6 phy02:D:0 attached: 0000000000000000 (no device) [ 83.986372] sas: ex 50014380182cf0e6 phy03:D:0 attached: 0000000000000000 (no device) [ 83.986933] sas: ex 50014380182cf0e6 phy04:D:0 attached: 0000000000000000 (no device) [ 83.987488] sas: ex 50014380182cf0e6 phy05:D:0 attached: 0000000000000000 (no device) [ 83.988086] sas: ex 50014380182cf0e6 phy06:D:0 attached: 0000000000000000 (no device) [ 83.988603] sas: ex 50014380182cf0e6 phy07:D:0 attached: 0000000000000000 (no device) [ 83.989197] sas: ex 50014380182cf0e6 phy08:D:0 attached: 0000000000000000 (no device) [ 83.989766] sas: ex 50014380182cf0e6 phy09:D:0 attached: 0000000000000000 (no device) [ 83.990300] sas: ex 50014380182cf0e6 phy10:D:0 attached: 0000000000000000 (no device) [ 83.990872] sas: ex 50014380182cf0e6 phy11:D:0 attached: 0000000000000000 (no device) [ 83.991401] sas: ex 50014380182cf0e6 phy12:D:0 attached: 0000000000000000 (no device) [ 83.991978] sas: ex 50014380182cf0e6 phy13:D:0 attached: 0000000000000000 (no device) [ 83.992515] sas: ex 50014380182cf0e6 phy14:D:0 attached: 0000000000000000 (no device) [ 83.993098] sas: ex 50014380182cf0e6 phy15:D:0 attached: 0000000000000000 (no device) [ 83.993625] sas: ex 50014380182cf0e6 phy16:D:0 attached: 0000000000000000 (no device) [ 83.994213] sas: ex 50014380182cf0e6 phy17:D:0 attached: 0000000000000000 (no device) [ 83.994785] sas: ex 50014380182cf0e6 phy18:D:0 attached: 0000000000000000 (no device) [ 83.995316] sas: ex 50014380182cf0e6 phy19:D:0 attached: 0000000000000000 (no device) [ 83.995890] sas: ex 50014380182cf0e6 phy20:D:0 attached: 0000000000000000 (no device) [ 83.996432] sas: ex 50014380182cf0e6 phy21:D:0 attached: 0000000000000000 (no device) [ 83.996998] sas: ex 50014380182cf0e6 phy22:D:0 attached: 0000000000000000 (no device) [ 83.997540] sas: ex 50014380182cf0e6 phy23:D:0 attached: 0000000000000000 (no device) [ 83.998189] sas: ex 50014380182cf0e6 phy24:U:A attached: 5005043011ab0000 (host) [ 83.998812] sas: ex 50014380182cf0e6 phy25:U:A attached: 5005043011ab0000 (host) [ 83.999386] sas: ex 50014380182cf0e6 phy26:U:A attached: 5005043011ab0000 (host) [ 84.000012] sas: ex 50014380182cf0e6 phy27:U:A attached: 5005043011ab0000 (host) [ 84.000575] sas: ex 50014380182cf0e6 phy28:S:0 attached: 0000000000000000 (no device) [ 84.001581] sas: ex 50014380182cf0e6 phy29:S:0 attached: 0000000000000000 (no device) [ 84.002561] sas: ex 50014380182cf0e6 phy30:S:0 attached: 0000000000000000 (no device) [ 84.003550] sas: ex 50014380182cf0e6 phy31:S:0 attached: 0000000000000000 (no device) [ 84.004573] sas: ex 50014380182cf0e6 phy32:S:9 attached: 50014380182cf0e0 (stp) [ 84.005580] sas: ex 50014380182cf0e6 phy33:S:9 attached: 50014380182cf0e1 (stp) [ 84.006543] sas: ex 50014380182cf0e6 phy34:S:9 attached: 50014380182cf0e2 (stp) [ 84.007442] sas: ex 50014380182cf0e6 phy35:S:9 attached: 50014380182cf0e3 (stp) [ 84.008136] sas: ex 50014380182cf0e6 phy36:D:A attached: 50014380182cf0e5 (host+target) [ 84.009969] sas: DONE DISCOVERY on port 0, pid:6, result:0 [ 84.010274] sas: Enter sas_scsi_recover_host busy: 0 failed: 0 [ 84.010569] sas: ata6: end_device-5:0:32: dev error handler [ 84.010873] sas: ata7: end_device-5:0:33: dev error handler [ 84.011160] sas: ata8: end_device-5:0:34: dev error handler [ 84.011424] sas: ata9: end_device-5:0:35: dev error handler [ 84.164663] general protection fault: 0000 [#1] SMP [ 84.164897] Modules linked in: mvsas libsas scsi_transport_sas ppdev intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel i915 kvm crct10dif_pclmul drm_kms_helper crc32_pclmul drm ghash_clmulni_intel cryptd i2c_algo_bit lpc_ich mei_me microcode mei serio_raw soc_button_array video parport_pc mac_hid netconsole configfs lp parport psmouse ahci libahci r8169 mii [ 84.165752] CPU: 0 PID: 1008 Comm: kworker/u4:5 Not tainted 3.16.0-031600rc6-generic #201407210035 [ 84.166027] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./H81 Pro BTC, BIOS P1.50 02/14/2014 [ 84.166325] Workqueue: events_unbound async_run_entry_fn [ 84.166630] task: ffff880036d5ef60 ti: ffff8800d4b34000 task.ti: ffff8800d4b34000 [ 84.166953] RIP: 0010:[<ffffffffc028e5a0>] [<ffffffffc028e5a0>] mvs_task_prep_ata+0x80/0x3a0 [mvsas] [ 84.167364] RSP: 0018:ffff8800d4b377c8 EFLAGS: 00010097 [ 84.167714] RAX: 000000000000002c RBX: ffff88020f200000 RCX: dead000000200200 [ 84.168078] RDX: ffff88020f2037b0 RSI: ffff88020f2255b8 RDI: ffff88020f200000 [ 84.168451] RBP: ffff8800d4b37838 R08: 0000000000000000 R09: 0000000000001000 [ 84.168834] R10: 0000000000000000 R11: ffff88020f2255b0 R12: ffff88020fbab640 [ 84.169228] R13: ffff8800d4b37898 R14: ffff88021b4a0000 R15: ffff880036f19a00 [ 84.169628] FS: 0000000000000000(0000) GS:ffff88021b200000(0000) knlGS:0000000000000000 [ 84.170044] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 84.170467] CR2: 00007f0031fbf000 CR3: 0000000001c12000 CR4: 00000000000407f0 [ 84.170907] Stack: [ 84.171345] ffff88021b314400 ffff880200000000 0000000000000282 dead000000200200 [ 84.171818] ffff88020f2037b0 0000000000000046 ffff88020cd81e38 ffffffff811b06ae [ 84.172300] ffff88021b314400 ffff88020fbab640 ffff88020f2037b0 ffff88020f200000 [ 84.172791] Call Trace: [ 84.173280] [<ffffffff811b06ae>] ? dma_pool_alloc+0xce/0x100 [ 84.173785] [<ffffffffc028f4ab>] mvs_task_prep+0x58b/0x620 [mvsas] [ 84.174298] [<ffffffff810a29e6>] ? ttwu_do_activate.constprop.111+0x66/0x70 [ 84.174823] [<ffffffffc028f5a8>] mvs_task_exec.isra.14+0x68/0xf0 [mvsas] [ 84.175358] [<ffffffffc0290149>] mvs_queue_command+0x39/0x40 [mvsas] [ 84.175901] [<ffffffffc02778ab>] sas_ata_qc_issue+0x28b/0x2d0 [libsas] [ 84.176446] [<ffffffff8153102f>] ata_qc_issue+0x18f/0x2d0 [ 84.176997] [<ffffffff81531468>] ata_exec_internal_sg+0x2f8/0x5d0 [ 84.177554] [<ffffffff815317b2>] ata_exec_internal+0x72/0xb0 [ 84.178113] [<ffffffff81531faa>] ata_do_dev_read_id+0x2a/0x30 [ 84.178673] [<ffffffffc02774b0>] ? sas_ata_internal_abort+0x120/0x120 [libsas] [ 84.179245] [<ffffffff815321f5>] ata_dev_read_id+0x245/0x460 [ 84.179825] [<ffffffff8153e99c>] ? ata_eh_reset+0x24c/0xe20 [ 84.180409] [<ffffffff8153d8f8>] ata_eh_revalidate_and_attach+0x198/0x3a0 [ 84.181002] [<ffffffff810cd4d1>] ? vprintk_emit+0x1b1/0x560 [ 84.181598] [<ffffffff8153fd69>] ata_eh_recover+0x599/0x7e0 [ 84.182200] [<ffffffff81534200>] ? sata_print_link_status+0xc0/0xc0 [ 84.182809] [<ffffffffc02774b0>] ? sas_ata_internal_abort+0x120/0x120 [libsas] [ 84.183427] [<ffffffff81534750>] ? sata_std_hardreset+0x50/0x50 [ 84.184037] [<ffffffffc02774b0>] ? sas_ata_internal_abort+0x120/0x120 [libsas] [ 84.184710] [<ffffffff81534750>] ? sata_std_hardreset+0x50/0x50 [ 84.185323] [<ffffffffc02774b0>] ? sas_ata_internal_abort+0x120/0x120 [libsas] [ 84.185945] [<ffffffff81540742>] ata_do_eh+0x52/0xc0 [ 84.186574] [<ffffffff81534200>] ? sata_print_link_status+0xc0/0xc0 [ 84.187213] [<ffffffff815407f7>] ata_std_error_handler+0x47/0x80 [ 84.187850] [<ffffffff8153b8f8>] ? ata_eh_handle_port_resume+0x38/0x160 [ 84.188473] [<ffffffff8154041b>] ata_scsi_port_error_handler+0x39b/0x5a0 [ 84.189081] [<ffffffffc02772c5>] async_sas_ata_eh+0x55/0x90 [libsas] [ 84.189673] [<ffffffff8109a89b>] async_run_entry_fn+0x3b/0x140 [ 84.190248] [<ffffffff8108c6ff>] process_one_work+0x17f/0x4c0 [ 84.190812] [<ffffffff81776ba8>] ? maybe_create_worker+0xbb/0x1c5 [ 84.191364] [<ffffffff8108d46b>] worker_thread+0x11b/0x3f0 [ 84.191910] [<ffffffff8108d350>] ? create_and_start_worker+0x80/0x80 [ 84.192446] [<ffffffff81094479>] kthread+0xc9/0xe0 [ 84.192971] [<ffffffff810943b0>] ? flush_kthread_worker+0xb0/0xb0 [ 84.193495] [<ffffffff817910fc>] ret_from_fork+0x7c/0xb0 [ 84.194015] [<ffffffff810943b0>] ? flush_kthread_worker+0xb0/0xb0 [ 84.194534] Code: 00 00 48 8b 0c c8 0f 84 a7 02 00 00 44 89 c0 41 b9 00 10 00 00 48 8d 34 80 48 8d 04 70 48 8d b4 c3 b8 55 02 00 8b 43 58 89 46 1c <8b> 89 54 02 00 00 44 89 c0 8b 7b 58 0d 00 00 00 70 4c 8b 53 48 [ 84.195858] RIP [<ffffffffc028e5a0>] mvs_task_prep_ata+0x80/0x3a0 [mvsas] [ 84.196412] RSP <ffff8800d4b377c8> Created attachment 145681 [details]
Dmesg output from boot
Because Ubuntu doesn't provide debug symbols for their mainline kernel builds <http://comments.gmane.org/gmane.linux.ubuntu.devel.kernel.general/40661> I am reverting back to their kernel version 3.13.0-24.46 That results in a kernel crash on port 8C: BUG: unable to handle kernel NULL pointer dereference at 0000000000000255 Full output: [ 25.212661] mvsas 0000:01:00.0: mvsas: driver version 0.8.16 [ 25.212703] mvsas 0000:01:00.0: enabling device (0000 -> 0002) [ 25.213249] mvsas 0000:01:00.0: mvsas: PCI-E x8, Bandwidth Usage: 5.0 Gbps [ 31.994771] scsi5 : mvsas [ 31.995530] sas: phy-5:0 added to port-5:0, phy_mask:0x1 (50014380182cf0e6) [ 31.995564] /build/buildd/linux-3.13.0/drivers/scsi/mvsas/mv_sas.c 1218:set wide port phy map 1 [ 32.005672] sas: phy1 matched wide port0 [ 32.005695] sas: phy-5:1 added to port-5:0, phy_mask:0x3 (50014380182cf0e6) [ 32.005720] /build/buildd/linux-3.13.0/drivers/scsi/mvsas/mv_sas.c 1218:set wide port phy map 3 [ 32.025591] sas: phy2 matched wide port0 [ 32.025611] sas: phy-5:2 added to port-5:0, phy_mask:0x7 (50014380182cf0e6) [ 32.025635] /build/buildd/linux-3.13.0/drivers/scsi/mvsas/mv_sas.c 1218:set wide port phy map 7 [ 32.055410] sas: phy3 matched wide port0 [ 32.055427] sas: phy-5:3 added to port-5:0, phy_mask:0xf (50014380182cf0e6) [ 32.055452] /build/buildd/linux-3.13.0/drivers/scsi/mvsas/mv_sas.c 1218:set wide port phy map f [ 32.095144] sas: DOING DISCOVERY on port 0, pid:127 [ 32.096843] sas: ex 50014380182cf0e6 phy00:D:0 attached: 0000000000000000 (no device) [ 32.097408] sas: ex 50014380182cf0e6 phy01:D:0 attached: 0000000000000000 (no device) [ 32.097917] sas: ex 50014380182cf0e6 phy02:D:0 attached: 0000000000000000 (no device) [ 32.098503] sas: ex 50014380182cf0e6 phy03:D:0 attached: 0000000000000000 (no device) [ 32.099044] sas: ex 50014380182cf0e6 phy04:D:0 attached: 0000000000000000 (no device) [ 32.099628] sas: ex 50014380182cf0e6 phy05:D:0 attached: 0000000000000000 (no device) [ 32.100205] sas: ex 50014380182cf0e6 phy06:D:0 attached: 0000000000000000 (no device) [ 32.100739] sas: ex 50014380182cf0e6 phy07:D:0 attached: 0000000000000000 (no device) [ 32.101310] sas: ex 50014380182cf0e6 phy08:D:0 attached: 0000000000000000 (no device) [ 32.101840] sas: ex 50014380182cf0e6 phy09:D:0 attached: 0000000000000000 (no device) [ 32.102412] sas: ex 50014380182cf0e6 phy10:D:0 attached: 0000000000000000 (no device) [ 32.102959] sas: ex 50014380182cf0e6 phy11:D:0 attached: 0000000000000000 (no device) [ 32.103545] sas: ex 50014380182cf0e6 phy12:D:0 attached: 0000000000000000 (no device) [ 32.104128] sas: ex 50014380182cf0e6 phy13:D:0 attached: 0000000000000000 (no device) [ 32.104661] sas: ex 50014380182cf0e6 phy14:D:0 attached: 0000000000000000 (no device) [ 32.105273] sas: ex 50014380182cf0e6 phy15:D:0 attached: 0000000000000000 (no device) [ 32.105781] sas: ex 50014380182cf0e6 phy16:D:0 attached: 0000000000000000 (no device) [ 32.106385] sas: ex 50014380182cf0e6 phy17:D:0 attached: 0000000000000000 (no device) [ 32.106904] sas: ex 50014380182cf0e6 phy18:D:0 attached: 0000000000000000 (no device) [ 32.107486] sas: ex 50014380182cf0e6 phy19:D:0 attached: 0000000000000000 (no device) [ 32.108020] sas: ex 50014380182cf0e6 phy20:D:0 attached: 0000000000000000 (no device) [ 32.108605] sas: ex 50014380182cf0e6 phy21:D:0 attached: 0000000000000000 (no device) [ 32.109183] sas: ex 50014380182cf0e6 phy22:D:0 attached: 0000000000000000 (no device) [ 32.109714] sas: ex 50014380182cf0e6 phy23:D:0 attached: 0000000000000000 (no device) [ 32.110357] sas: ex 50014380182cf0e6 phy24:U:A attached: 5005043011ab0000 (host) [ 32.110929] sas: ex 50014380182cf0e6 phy25:U:A attached: 5005043011ab0000 (host) [ 32.111558] sas: ex 50014380182cf0e6 phy26:U:A attached: 5005043011ab0000 (host) [ 32.112181] sas: ex 50014380182cf0e6 phy27:U:A attached: 5005043011ab0000 (host) [ 32.112774] sas: ex 50014380182cf0e6 phy28:S:9 attached: 50014380182cf0dc (stp) [ 32.113366] sas: ex 50014380182cf0e6 phy29:S:9 attached: 50014380182cf0dd (stp) [ 32.113934] sas: ex 50014380182cf0e6 phy30:S:9 attached: 50014380182cf0de (stp) [ 32.114557] sas: ex 50014380182cf0e6 phy31:S:9 attached: 50014380182cf0df (stp) [ 32.115138] sas: ex 50014380182cf0e6 phy32:S:0 attached: 0000000000000000 (no device) [ 32.115654] sas: ex 50014380182cf0e6 phy33:S:0 attached: 0000000000000000 (no device) [ 32.116198] sas: ex 50014380182cf0e6 phy34:S:0 attached: 0000000000000000 (no device) [ 32.116711] sas: ex 50014380182cf0e6 phy35:S:0 attached: 0000000000000000 (no device) [ 32.117003] sas: ex 50014380182cf0e6 phy36:D:A attached: 50014380182cf0e5 (host+target) [ 32.118398] sas: DONE DISCOVERY on port 0, pid:127, result:0 [ 32.118435] sas: Enter sas_scsi_recover_host busy: 0 failed: 0 [ 32.118465] sas: ata6: end_device-5:0:28: dev error handler [ 32.119140] sas: ata7: end_device-5:0:29: dev error handler [ 32.119333] sas: ata8: end_device-5:0:30: dev error handler [ 32.119368] sas: ata9: end_device-5:0:31: dev error handler [ 32.271218] BUG: unable to handle kernel NULL pointer dereference at 0000000000000255 [ 32.271791] IP: [<ffffffffa02d381e>] mvs_task_prep+0x72e/0xd50 [mvsas] [ 32.272365] PGD 0 [ 32.272928] Oops: 0000 [#1] SMP [ 32.273480] Modules linked in: mvsas libsas scsi_transport_sas hid_generic usbhid hid x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel cryptd i915 drm_kms_helper serio_raw lpc_ich mei_me mei drm i2c_algo_bit netconsole configfs lp parport video mac_hid psmouse ahci libahci r8169 mii [ 32.275388] CPU: 0 PID: 54 Comm: kworker/u4:1 Not tainted 3.13.0-24-generic #47-Ubuntu [ 32.276028] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./H81 Pro BTC, BIOS P1.80 07/21/2014 [ 32.276745] Workqueue: events_unbound async_run_entry_fn [ 32.277389] task: ffff88020fe6afe0 ti: ffff8802136aa000 task.ti: ffff8802136aa000 [ 32.278032] RIP: 0010:[<ffffffffa02d381e>] [<ffffffffa02d381e>] mvs_task_prep+0x72e/0xd50 [mvsas] [ 32.278691] RSP: 0018:ffff8802136ab8c0 EFLAGS: 00010097 [ 32.279337] RAX: 000000000000002c RBX: 0000000000000001 RCX: 0000000000000000 [ 32.279980] RDX: 0000000000000000 RSI: ffff8800d8c255b8 RDI: ffff8800d8c00000 [ 32.280619] RBP: ffff8802136ab958 R08: ffff8800d8c03618 R09: ffff8800363a0000 [ 32.281246] R10: ffff880212977600 R11: 0000000000000000 R12: ffff8800d8c00000 [ 32.281861] R13: 0000000000000000 R14: ffff8800d8c03618 R15: ffff88020f8dedc0 [ 32.282474] FS: 0000000000000000(0000) GS:ffff88021f200000(0000) knlGS:0000000000000000 [ 32.283082] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 32.283679] CR2: 0000000000000255 CR3: 0000000002c0e000 CR4: 00000000000407f0 [ 32.284278] Stack: [ 32.284880] ffff88020fe6afe0 ffff880212977200 ffff8802136ab8e0 ffffffff81719ee9 [ 32.285520] ffff880212977600 ffff8800363a0000 ffff8800d8c03618 ffff8800d8c255b0 [ 32.286167] ffff8800d8c02678 0000000000000000 00000001d8c00008 ffff8800d8c255b8 [ 32.286821] Call Trace: [ 32.287473] [<ffffffff81719ee9>] ? schedule+0x29/0x70 [ 32.288144] [<ffffffffa02d3e9d>] mvs_task_exec.isra.13+0x5d/0xe0 [mvsas] [ 32.288832] [<ffffffffa02d49dc>] mvs_queue_command+0x30c/0x320 [mvsas] [ 32.289530] [<ffffffff811a013f>] ? kmem_cache_free+0xef/0x120 [ 32.290232] [<ffffffff8119f692>] ? kmem_cache_alloc+0x132/0x140 [ 32.290942] [<ffffffffa028601d>] ? sas_alloc_task+0x1d/0x40 [libsas] [ 32.291662] [<ffffffffa028fcab>] sas_ata_qc_issue+0x24b/0x290 [libsas] [ 32.292392] [<ffffffff814f7762>] ata_qc_issue+0x172/0x380 [ 32.293128] [<ffffffff814f7c23>] ata_exec_internal_sg+0x2b3/0x570 [ 32.293875] [<ffffffff814f7f3a>] ata_exec_internal+0x5a/0xa0 [ 32.294624] [<ffffffff814f8334>] ata_dev_read_id+0x274/0x550 [ 32.295380] [<ffffffffa028f8f0>] ? sas_ata_printk+0x80/0x80 [libsas] [ 32.296148] [<ffffffff81505bab>] ata_eh_recover+0x74b/0x1310 [ 32.296923] [<ffffffff810bcfe8>] ? console_unlock+0x208/0x400 [ 32.297707] [<ffffffff814facd0>] ? ata_phys_link_online+0x30/0x30 [ 32.298503] [<ffffffffa028f8f0>] ? sas_ata_printk+0x80/0x80 [libsas] [ 32.299367] [<ffffffff814fae50>] ? ata_phys_link_offline+0x30/0x30 [ 32.300179] [<ffffffffa028f8f0>] ? sas_ata_printk+0x80/0x80 [libsas] [ 32.301001] [<ffffffff814fae50>] ? ata_phys_link_offline+0x30/0x30 [ 32.301826] [<ffffffffa028f8f0>] ? sas_ata_printk+0x80/0x80 [libsas] [ 32.302661] [<ffffffff81507299>] ata_do_eh+0x49/0xc0 [ 32.303503] [<ffffffff814facd0>] ? ata_phys_link_online+0x30/0x30 [ 32.304357] [<ffffffff8150734e>] ata_std_error_handler+0x3e/0x80 [ 32.305215] [<ffffffff81506dba>] ata_scsi_port_error_handler+0x56a/0x940 [ 32.306086] [<ffffffffa02900aa>] async_sas_ata_eh+0x4a/0x80 [libsas] [ 32.306963] [<ffffffff81091517>] async_run_entry_fn+0x37/0x130 [ 32.307849] [<ffffffff810838a2>] process_one_work+0x182/0x450 [ 32.308735] [<ffffffff81084641>] worker_thread+0x121/0x410 [ 32.309629] [<ffffffff81084520>] ? rescuer_thread+0x3e0/0x3e0 [ 32.310530] [<ffffffff8108b312>] kthread+0xd2/0xf0 [ 32.311437] [<ffffffff8108b240>] ? kthread_create_on_node+0x1d0/0x1d0 [ 32.312351] [<ffffffff817263fc>] ret_from_fork+0x7c/0xb0 [ 32.313255] [<ffffffff8108b240>] ? kthread_create_on_node+0x1d0/0x1d0 [ 32.314160] Code: 63 92 a0 02 00 00 41 80 b8 84 00 00 00 7f 48 8b 80 58 01 00 00 48 8b 1c d0 0f 84 a0 05 00 00 41 8b 44 24 58 48 8b 75 c0 89 46 1c <8b> 8b 54 02 00 00 be 00 10 00 00 41 8b 54 24 58 49 8b 44 24 48 [ 32.316308] RIP [<ffffffffa02d381e>] mvs_task_prep+0x72e/0xd50 [mvsas] [ 32.317292] RSP <ffff8802136ab8c0> [ 32.318278] CR2: 0000000000000255 Trying to debug mvs_task_prep with the help of the tutorial at <http://www.opensourceforu.com/2011/01/understanding-a-kernel-oops/>. # cat /sys/module/mvsas/sections/.init.text 0xffffffffa00c8000 # cd /lib/modules/3.13.0-24-generic/kernel/drivers/scsi/mvsas # gdb mvsas.ko (gdb) add-symbol-file /usr/lib/debug/lib/modules/3.13.0-24-generic/kernel/drivers/scsi/mvsas/mvsas.ko 0xffffffffa00c8000 (gdb) disassemble mvs_task_prep Hex to decimal: 0x72e = <+1838> 0xffffffffa00ca81e <+1838>: mov 0x254(%rbx),%ecx Thanks to the trick from <https://blogs.oracle.com/ksplice/entry/8_gdb_tricks_you_should> (gdb) set substitute-path /build/buildd /home/user/src (gdb) list *0xffffffffa00ca81e 0xffffffffa00ca81e is in mvs_task_prep (/build/buildd/linux-3.13.0/drivers/scsi/mvsas/mv_sas.c:471). Line number 466 out of range; /build/buildd/linux-3.13.0/drivers/scsi/mvsas/mv_sas.c has 306 lines. I guess my gdb version 7.7 has a line counting bug according to <https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=730630> A manual approach using <http://kernel.ubuntu.com/git?p=ubuntu/ubuntu-trusty.git;a=blob;f=drivers/scsi/mvsas/mv_sas.c;h=6c1f223a8e1d335fa7c86a374e470e666e848906;hb=HEAD>: 467 slot = &mvi->slot_info[tag]; 468 slot->tx = mvi->tx_prod; 469 del_q = TXQ_MODE_I | tag | 470 (TXQ_CMD_STP << TXQ_CMD_SHIFT) | 471 (MVS_PHY_ID << TXQ_PHY_SHIFT) | 472 (mvi_dev->taskfileset << TXQ_SRS_SHIFT); 473 mvi->tx[mvi->tx_prod] = cpu_to_le32(del_q); Results that "(MVS_PHY_ID << TXQ_PHY_SHIFT)" is the offending code. How should that be patched? Thats not a sensible resolution, it can't be faulting on that line. When connecting just a single 4 drive group to the good ports (for example 2C) of the external PCIe expander card: cold boot = doesn't detect any of the 4 PUIS drives warm boot = does detect all 4 PUIS drives When powering up using the warm boot method there don't seem to be errors reported by smartctl neither sg_ses. However this cold boot issue might be a different issue from this kernel crash. According to debug messages first a "Set Features" (0xEF) is being sent. My guess is that this set features issues subcommand (0x07): spin up media. And later on the "Identify Device" (0xEC) is sent. When I correctly read the Hitachi specification the Spin Up (Set Features) should be sent after "Drive Identify". For this Hitachi HDS5C3020BLE630 the Drive_Identify (# sg_sat_identify -v /dev/sdb) word 2 outputs "738c" (hex), which translates to specification "Need Set Feature for spin-up after power-up Identify Device is complete" according to HGST specification page 127. Is there a boot parameter (or similar way) to load the mvsas driver without sending the "Set Features" (0xEF) command? Created attachment 147751 [details]
smartctl -a /dev/sdb (HDS5C3020BLE630)
Comment on attachment 145681 [details]
Dmesg output from boot
This is without loading the mvsas kernel module.
re: Thats not a sensible resolution, it can't be faulting on that line. Another try using a newer version of package gdb-minimal (Ubuntu 7.7-0ubuntu3.2 from trusty-proposed) gives these identical results where address <+1838> maps to line 471 in mvsas.c and that points to "(MVS_PHY_ID << TXQ_PHY_SHIFT) |". # cat /sys/module/mvsas/sections/.init.text 0xffffffffa01c2000 (gdb) add-symbol-file /usr/lib/debug/lib/modules/3.13.0-24-generic/kernel/drivers/scsi/mvsas/mvsas.ko 0xffffffffa01c2000 add symbol table from file "/usr/lib/debug/lib/modules/3.13.0-24-generic/kernel/drivers/scsi/mvsas/mvsas.ko" at .text_addr = 0xffffffffa01c2000 0xffffffffa01c481e <+1838>: mov 0x254(%rbx),%ecx (gdb) list *0xffffffffa01c481e 0xffffffffa01c481e is in mvs_task_prep (/build/buildd/linux-3.13.0/drivers/scsi/mvsas/mv_sas.c:471). 466 } 467 slot = &mvi->slot_info[tag]; 468 slot->tx = mvi->tx_prod; 469 del_q = TXQ_MODE_I | tag | 470 (TXQ_CMD_STP << TXQ_CMD_SHIFT) | 471 (MVS_PHY_ID << TXQ_PHY_SHIFT) | 472 (mvi_dev->taskfileset << TXQ_SRS_SHIFT); 473 mvi->tx[mvi->tx_prod] = cpu_to_le32(del_q); 474 475 if (task->data_dir == DMA_FROM_DEVICE) Another test round to see whether there is a difference in crash whether using cold or warm boot: 5C + cold boot = mvs_task_prep+0x72e/0xd50 [mvsas] 5C + warm boot = mvs_task_prep+0x72e/0xd50 [mvsas] 6C + cold boot = mvs_task_prep+0x72e/0xd50 [mvsas] 6C + warm boot = mvs_task_prep+0x72e/0xd50 [mvsas] 7C + cold boot = mvs_task_prep+0x72e/0xd50 [mvsas] 7C + warm boot = mvs_task_prep+0x72e/0xd50 [mvsas] 8C + cold boot = mvs_task_prep+0x72e/0xd50 [mvsas] 8C + warm boot = mvs_task_prep+0x72e/0xd50 [mvsas] 9C + cold boot = mvs_task_prep+0x72e/0xd50 [mvsas] 9C + warm boot = mvs_task_prep+0x72e/0xd50 [mvsas] In cases 6C, 7C and 9C the r8169 nic doesn't come up after the first automatic reboot after cold boot ("Waiting for network configuration..." and "Waiting up to 60 more seconds for network configuration...") does come up after the second automatic reboot after cold boot [reproduceable=yes] Created attachment 147771 [details]
sg_ses PCIe port expander card output
0xffffffffa01c481e <+1838>: mov 0x254(%rbx),%ecx is loading an offset from something. It can't be line 471. It could be line 472, or could be 468. but the offset looks way too big to be either unless its been optimised somewhat. It's not always entirely accurate. At this point what might be useful is to add lines between then and rebuild ... ie printk("["); 467 slot = &mvi->slot_info[tag]; printk("%d ", tag); 468 slot->tx = mvi->tx_prod; printk("%p ", slot); 469 del_q = TXQ_MODE_I | tag | 470 (TXQ_CMD_STP << TXQ_CMD_SHIFT) | 471 (MVS_PHY_ID << TXQ_PHY_SHIFT) | 472 (mvi_dev->taskfileset << TXQ_SRS_SHIFT); printk("%d", mvi->tx_prod]); 473 mvi->tx[mvi->tx_prod] = cpu_to_le32(del_q); printk("]\n"); and try again. When it dies just before the oops you should have lines of the form [num num num] the final one of which is incomplete. Where it ends tells us where it died and the values may even give us a guess at why. If the final [ .. ] sequence is complete then it crashed somewhere else in the routine and gdb is confused. It dies between printing the second and the third variable: [ 30.455440] sas: DONE DISCOVERY on port 0, pid:128, result:0 [ 30.455502] sas: Enter sas_scsi_recover_host busy: 0 failed: 0 [ 30.455534] sas: ata6: end_device-5:0:20: dev error handler [ 30.455744] sas: ata7: end_device-5:0:21: dev error handler [ 30.456186] sas: ata8: end_device-5:0:22: dev error handler [ 30.456367] sas: ata9: end_device-5:0:23: dev error handler [ 30.611146] [0 ffff8800d8e255b8 44] [ 30.611959] [0 ffff8800d8e255b8 46] [ 30.612511] [2 ffff8800d8e25668 [ 30.612537] BUG: unable to handle kernel NULL pointer dereference at 0000000000000255 [ 30.613511] IP: [<ffffffffa022c872>] mvs_task_prep+0x782/0xdd0 [mvsas] [ 30.614003] PGD 0 [ 30.614486] Oops: 0000 [#1] SMP [ 30.614967] Modules linked in: mvsas(OF) libsas scsi_transport_sas x86_pkg_temp_thermal intel_powerclamp hid_generic coretemp usbhid kvm_intel i915 kvm hid crct10dif_pclmul drm_kms_helper crc32_pclmul ghash_clmulni_intel cryptd drm netconsole configfs i2c_algo_bit serio_raw mei_me lpc_ich mei lp video mac_hid parport psmouse r8169 mii ahci libahci [ 30.616702] CPU: 0 PID: 6 Comm: kworker/u4:0 Tainted: GF O 3.13.0-35-generic #62-Ubuntu [ 30.617279] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./H81 Pro BTC, BIOS P1.80 07/21/2014 [ 30.617853] Workqueue: events_unbound async_run_entry_fn [ 30.618426] task: ffff8802139b0000 ti: ffff8802139ae000 task.ti: ffff8802139ae000 [ 30.619007] RIP: 0010:[<ffffffffa022c872>] [<ffffffffa022c872>] mvs_task_prep+0x782/0xdd0 [mvsas] [ 30.619604] RSP: 0018:ffff8802139af8c0 EFLAGS: 00010096 [ 30.620188] RAX: ffff8800d8e03618 RBX: 0000000000000002 RCX: 0000000000002ace [ 30.620779] RDX: 00000000000064e6 RSI: 0000000000000046 RDI: 0000000000000046 [ 30.621363] RBP: ffff8802139af958 R08: 0000000000000086 R09: 0000000000000426 [ 30.621941] R10: ffff880213bf4098 R11: 0000000000000001 R12: 0000000000000001 [ 30.622508] R13: ffff8800d8e00000 R14: ffff8800d8e03618 R15: ffff88007f912500 [ 30.623068] FS: 0000000000000000(0000) GS:ffff88021f200000(0000) knlGS:0000000000000000 [ 30.623649] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 30.624238] CR2: 0000000000000255 CR3: 0000000002c0e000 CR4: 00000000000407f0 [ 30.624844] Stack: [ 30.625450] ffffffff8109d415 ffff88021f314440 ffff88021f314440 ffff88021f314440 [ 30.626097] ffff88020f97064c ffff8800d8e01e38 ffff880211d6fe00 ffff8800d8e25660 [ 30.626752] ffff88007f740080 ffff8800d8e02678 0000000181098129 ffff8800d8e25668 [ 30.627413] Call Trace: [ 30.628072] [<ffffffff8109d415>] ? sched_clock_cpu+0xb5/0x100 [ 30.628753] [<ffffffffa022cf1d>] mvs_task_exec.isra.13+0x5d/0xe0 [mvsas] [ 30.629450] [<ffffffffa022da5c>] mvs_queue_command+0x30c/0x320 [mvsas] [ 30.630155] [<ffffffff811a2362>] ? kmem_cache_alloc+0x1b2/0x1e0 [ 30.630867] [<ffffffffa020c787>] ? sas_free_task+0x37/0x40 [libsas] [ 30.631593] [<ffffffffa0215cab>] sas_ata_qc_issue+0x24b/0x290 [libsas] [ 30.632326] [<ffffffff814fe742>] ata_qc_issue+0x172/0x380 [ 30.633067] [<ffffffff814fec03>] ata_exec_internal_sg+0x2b3/0x570 [ 30.633817] [<ffffffff814fef1a>] ata_exec_internal+0x5a/0xa0 [ 30.634570] [<ffffffff814ff314>] ata_dev_read_id+0x274/0x550 [ 30.635332] [<ffffffffa02158f0>] ? sas_ata_printk+0x80/0x80 [libsas] [ 30.636166] [<ffffffff8150cbab>] ata_eh_recover+0x74b/0x1310 [ 30.636938] [<ffffffff81501cb0>] ? ata_phys_link_online+0x30/0x30 [ 30.637721] [<ffffffffa02158f0>] ? sas_ata_printk+0x80/0x80 [libsas] [ 30.638512] [<ffffffff81501e30>] ? ata_phys_link_offline+0x30/0x30 [ 30.639314] [<ffffffffa02158f0>] ? sas_ata_printk+0x80/0x80 [libsas] [ 30.640119] [<ffffffff81501e30>] ? ata_phys_link_offline+0x30/0x30 [ 30.640933] [<ffffffffa02158f0>] ? sas_ata_printk+0x80/0x80 [libsas] [ 30.641758] [<ffffffff8150e299>] ata_do_eh+0x49/0xc0 [ 30.642588] [<ffffffff81501cb0>] ? ata_phys_link_online+0x30/0x30 [ 30.643425] [<ffffffff8150e34e>] ata_std_error_handler+0x3e/0x80 [ 30.644271] [<ffffffff8150ddba>] ata_scsi_port_error_handler+0x56a/0x940 [ 30.645128] [<ffffffffa02160aa>] async_sas_ata_eh+0x4a/0x80 [libsas] [ 30.645996] [<ffffffff81091657>] async_run_entry_fn+0x37/0x130 [ 30.646871] [<ffffffff810839d2>] process_one_work+0x182/0x450 [ 30.647750] [<ffffffff810847c1>] worker_thread+0x121/0x410 [ 30.648638] [<ffffffff810846a0>] ? rescuer_thread+0x430/0x430 [ 30.649534] [<ffffffff8108b4a2>] kthread+0xd2/0xf0 [ 30.650429] [<ffffffff8108b3d0>] ? kthread_create_on_node+0x1c0/0x1c0 [ 30.651321] [<ffffffff8172ecbc>] ret_from_fork+0x7c/0xb0 [ 30.652211] [<ffffffff8108b3d0>] ? kthread_create_on_node+0x1c0/0x1c0 [ 30.653102] Code: 03 47 23 a0 31 c0 e8 62 b7 4e e1 48 8b 4d c0 41 8b 45 58 48 c7 c7 07 47 23 a0 89 41 1c 48 89 ce 31 c0 e8 46 b7 4e e1 48 8b 45 d0 <41> 8b 8c 24 54 02 00 00 41 bc 00 10 00 00 41 8b 75 58 48 c7 c7 [ 30.655215] RIP [<ffffffffa022c872>] mvs_task_prep+0x782/0xdd0 [mvsas] [ 30.656195] RSP <ffff8802139af8c0> [ 30.657163] CR2: 0000000000000255 By the way: printk("%d", mvi->tx_prod]); was changed to: printk("%d", mvi->tx_prod); The square bracket after tx_prod was removed. Created attachment 147881 [details]
Ubuntu Linux/x86_64 3.13.0-35-generic Kernel Configuration
This kernel configuration was used to build both the patched and unpatched mvsas.ko
When line-by-line dumping the called constants/vars from: 469 del_q = TXQ_MODE_I | tag | 470 (TXQ_CMD_STP << TXQ_CMD_SHIFT) | 471 (MVS_PHY_ID << TXQ_PHY_SHIFT) | 472 (mvi_dev->taskfileset << TXQ_SRS_SHIFT); using the prepended statements: printk("slot=%p ", slot); printk(KERN_INFO "TXQ_MODE_I=%d ", TXQ_MODE_I); printk(KERN_INFO "tag=%d ", tag); printk(KERN_INFO "TXQ_CMD_STP=%d ", TXQ_CMD_STP); printk(KERN_INFO "TXQ_CMD_SHIFT=%d ", TXQ_CMD_SHIFT); printk(KERN_INFO "MVS_PHY_ID=%d ", MVS_PHY_ID); printk(KERN_INFO "TXQ_PHY_SHIFT=%d ", TXQ_PHY_SHIFT); del_q = TXQ_MODE_I | tag | (TXQ_CMD_STP << TXQ_CMD_SHIFT) | (MVS_PHY_ID << TXQ_PHY_SHIFT) | (mvi_dev->taskfileset << TXQ_SRS_SHIFT); the kernel crash occurs after printing "TXQ_CMD_SHIFT" or when trying to output the value of "MVS_PHY_ID": [ 529.113152] sas: DONE DISCOVERY on port 0, pid:133, result:0 [ 529.114313] sas: Enter sas_scsi_recover_host busy: 0 failed: 0 [ 529.115460] sas: ata7: end_device-6:0:28: dev error handler [ 529.115522] sas: ata8: end_device-6:0:29: dev error handler [ 529.118706] sas: ata9: end_device-6:0:30: dev error handler [ 529.119840] sas: ata10: end_device-6:0:31: dev error handler [ 529.271634] [mvi=ffff8800d3680000, mvi_dev=ffff8800d36836a0 tag=0 slot=ffff8800d36a55b8 [ 529.271753] TXQ_MODE_I=268435456 tag=0 [ 529.272679] TXQ_CMD_STP=3 TXQ_CMD_SHIFT=29 [ 529.273618] MVS_PHY_ID=32768 TXQ_PHY_SHIFT=12 tx_prod=44] [ 529.276091] [mvi=ffff8800d3680000, mvi_dev=ffff8800d3683618 tag=1 slot=ffff8800d36a5610 [ 529.276207] TXQ_MODE_I=268435456 tag=1 [ 529.277095] TXQ_CMD_STP=3 TXQ_CMD_SHIFT=29 [ 529.278038] MVS_PHY_ID=1 TXQ_PHY_SHIFT=12 tx_prod=46] [ 529.280271] [mvi=ffff8800d3680000, mvi_dev=ffff8800d3683618 tag=1 slot=ffff8800d36a5610 [ 529.280385] TXQ_MODE_I=268435456 tag=1 [ 529.281445] TXQ_CMD_STP=3 TXQ_CMD_SHIFT=29 [ 529.282562] MVS_PHY_ID=1 TXQ_PHY_SHIFT=12 tx_prod=48] [ 529.284894] [mvi=ffff8800d3680000, mvi_dev=ffff8800d36837b0 tag=2 slot=ffff8800d36a5668 [ 529.285010] TXQ_MODE_I=268435456 tag=2 [ 529.286248] TXQ_CMD_STP=3 TXQ_CMD_SHIFT=29 [ 529.287555] BUG: unable to handle kernel NULL pointer dereference at 0000000000000257 [ 529.290225] IP: [<ffffffffa02888bb>] mvs_task_prep+0x7cb/0xe50 [mvsas] [ 529.291686] PGD 0 [ 529.293141] Oops: 0000 [#1] SMP [ 529.294630] Modules linked in: mvsas(OF) libsas scsi_transport_sas x86_pkg_temp_thermal intel_powerclamp coretemp kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel cryptd serio_raw lpc_ich i915 mei_me mei drm_kms_helper video netconsole drm configfs mac_hid i2c_algo_bit psmouse r8169 ahci mii libahci Any suggestions why accessing "MVS_PHY_ID" leads to the kernel NULL pointer dereference oops? With TXQ_PHY_SHIFT being 12, and TXQ_CMD_SHIFT being 29, it seems the PHY one-bit-hot coding appears in bits 12 through 28 inclusive. I.e. 16 bits or PHY ID's are supported. The register transmitted to the controller seems a 32-bit fixed register, so this seems a hardware limitation rather than software driver limitation. 469 del_q = TXQ_MODE_I | tag | 470 (TXQ_CMD_STP << TXQ_CMD_SHIFT) | 471 (MVS_PHY_ID << TXQ_PHY_SHIFT) | 472 (mvi_dev->taskfileset << TXQ_SRS_SHIFT); printk("%d", mvi->tx_prod]); 473 mvi->tx[mvi->tx_prod] = cpu_to_le32(del_q); Remaining question: how is this supposed to fly with port expanders where PHY ID's get >16? Thanks to an extensive debug report by e-mail from Rob Elliott (HP Server Storage) --- thanks! --- which I copied ad verbatim: --- 1. Although MVS_PHY_ID looks like a constant, it's really not: #define MVS_PHY_ID (1U << sas_phy->id) 2. This fault: [ 32.271218] BUG: unable to handle kernel NULL pointer dereference at 0000000000000255 (although 255 looks like a decimal number 0xff, it's really hex 0x255) at this line: 0xffffffffa01c481e <+1838>: mov 0x254(%rbx),%ecx implies that rbx contains 1, so 0x254 + 1 = 0x255. 3. pahole drivers/scsi/mvsas/mv_sas.o shows there are two structures with fields at offset 596: * asd_sas_phy.id * asd_sas_port.sas_addr[8] 4. objdump -drS drivers/scsi/mvsas/mv_sas.o shows only a few lines with 0x254(%something), one of which is the del_q line you've identified: mvs_task_prep_ata(struct mvs_info *mvi, struct mvs_task_exec_info *tei): struct sas_ha_struct *sha = mvi->sas; struct sas_task *task = tei->task; struct domain_device *dev = task->dev; struct sas_phy *sphy = dev->phy; struct asd_sas_phy *sas_phy = sha->sas_phy[sphy->number]; ... del_q = TXQ_MODE_I | tag | (TXQ_CMD_STP << TXQ_CMD_SHIFT) | (MVS_PHY_ID << TXQ_PHY_SHIFT) | (mvi_dev->taskfileset << TXQ_SRS_SHIFT); mvi->tx[mvi->tx_prod] = cpu_to_le32(del_q); MVS_PHY_ID = sas_phy->id = sha->sas_phy[sphy->number] = mvi->sas->sas_phy[dev->phy->number] = mvi->sas->sas_phy[task->dev->phy->number]->id mvi->sas->sas_phy[tei->task->dev->phy->number]->id Looking at the offsets reported by pahole, that means: %rdi->56->344[%rsi->0->0->56->688]->254 mvi->sas->sas_phy is a pointer to a pointer: struct sas_ha_struct { ... struct asd_sas_phy * * sas_phy; /* 344 8 */ You might look for somewhere that could accidentally be setting sas_phy[something] to a for loop index, with a typecast hiding the problem from the compiler. Or, the phy->number value being passed might be out of range; if there were discovery errors, something might not have been initialized like this function expects. Rob Elliott HP Server Storage --- Even after flashing the SAS2LP-MV8 its firmware from version 4.0.0.1800 to version 4.0.0.1812 the mvs_task_prep_ata+0x80/0x3a0 [mvsas] kernel oops issue persists on kernel: 1. "Linux ubuntu25 3.17.1-031701-generic #201410150735 SMP Wed Oct 15 11:36:31 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux" and 2. "Linux ubuntu25 3.17.0-999-generic #201410182205 SMP Sun Oct 19 02:06:22 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux" The problem was introduced with this patch : commit 7c237c5f6d5c62724ccd82aecdcd1fd9bd71dc75 Author: Xiangliang Yu <yuxiangl@marvell.com> Date: Wed Jan 30 00:25:53 2013 +0800 [SCSI] mvsas: fixed timeout issue when removing module The offending line : (MVS_PHY_ID << TXQ_PHY_SHIFT) was before : (sas_port->phy_mask << TXQ_PHY_SHIFT) Reverting the patch corrects the problem for me (kernel 3.18.1) There seems to be various issues with this driver. After reverting that commit, I can load the driver, but insmod takes a long time to return. One time was about 3mins, the other times I gave up waiting and rebooted after 5mins. I'm using an Areca ARC-1320, so I've ended up downgrading my kernel and using the proprietary driver just so that it works. (In reply to Nathan R from comment #20) > There seems to be various issues with this driver. After reverting that > commit, I can load the driver, but insmod takes a long time to return. One > time was about 3mins, the other times I gave up waiting and rebooted after > 5mins. I'm using an Areca ARC-1320, so I've ended up downgrading my kernel > and using the proprietary driver just so that it works. I should add, this was with kernel 3.18.3 and I'll attach the dmesg section from insmod. Created attachment 164841 [details]
Patched mvsas dmesg in kernel 3.18.3
On the Linux-scsi mailing list a possible patch was introduced that has been tested to fix another appearance of the mvsas port expander mvs_task_prep panic. In that case the resulting panics for the combination mvsas + port expander + SATA drives were: 1. RIP [<ffffffffa00cd7ed>] mvs_task_prep+0x78d/0xe40 [mvsas] 2. RIP [<ffffffffa00bd90f>] mvs_task_prep+0x73f/0xd50 [mvsas] 3. RIP [<ffffffffa006f5b0>] mvs_task_prep_ata+0x80/0x3a0 [mvsas] 4. RIP: 0010:[<ffffffffa00f1877>] [<ffffffffa00f1877>] mvs_task_exec.isra.13+0x827/0xf10 [mvsas] --- James Bottomley wrote on 16-04-15 at 07:16: Well, that narrows it down. It looks like there's a longstanding bug in mvs_task_prep_ata() where the physical PHY field is populated by taking an index through the HBA phy table. This field is ignored for STP but the phy table is too small and it uses the expander phy number to index it (hence the GPF as we fall off the end of the phy table trying to dereference sas_phy->id). This should fix the problem. James --- diff --git a/drivers/scsi/mvsas/mv_sas.c b/drivers/scsi/mvsas/mv_sas.c index 2d5ab6d..454536c 100644 --- a/drivers/scsi/mvsas/mv_sas.c +++ b/drivers/scsi/mvsas/mv_sas.c @@ -441,14 +441,11 @@ static u32 mvs_get_ncq_tag(struct sas_task *task, u32 *tag) static int mvs_task_prep_ata(struct mvs_info *mvi, struct mvs_task_exec_info *tei) { - struct sas_ha_struct *sha = mvi->sas; struct sas_task *task = tei->task; struct domain_device *dev = task->dev; struct mvs_device *mvi_dev = dev->lldd_dev; struct mvs_cmd_hdr *hdr = tei->hdr; struct asd_sas_port *sas_port = dev->port; - struct sas_phy *sphy = dev->phy; - struct asd_sas_phy *sas_phy = sha->sas_phy[sphy->number]; struct mvs_slot_info *slot; void *buf_prd; u32 tag = tei->tag, hdr_tag; @@ -468,7 +465,7 @@ static int mvs_task_prep_ata(struct mvs_info *mvi, slot->tx = mvi->tx_prod; del_q = TXQ_MODE_I | tag | (TXQ_CMD_STP << TXQ_CMD_SHIFT) | - (MVS_PHY_ID << TXQ_PHY_SHIFT) | + ((sas_port->phy_mask & TXQ_PHY_MASK) << TXQ_PHY_SHIFT) | (mvi_dev->taskfileset << TXQ_SRS_SHIFT); mvi->tx[mvi->tx_prod] = cpu_to_le32(del_q); Created attachment 175261 [details]
dmesg output after loading module
Just tested the driver from linux-stable since that patch has been merged.
After loading, I get a bunch of "failed to IDENTIFY" errors, then an oops and insmod never returned (so far been 15mins and nothing new in dmesg).
|