Bug 201313 - pm80xx mpi_ssp_completion 1883:SAS Address of IO Failure Drive:
Summary: pm80xx mpi_ssp_completion 1883:SAS Address of IO Failure Drive:
Status: NEW
Alias: None
Product: IO/Storage
Classification: Unclassified
Component: SCSI (show other bugs)
Hardware: Other Linux
: P1 normal
Assignee: linux-scsi@vger.kernel.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-10-03 04:42 UTC by MasterCATZ
Modified: 2019-03-29 07:05 UTC (History)
2 users (show)

See Also:
Kernel Version: 4.18.11-041811-generic
Subsystem:
Regression: No
Bisected commit-id:


Attachments
kernel logs for last few days (2.69 MB, application/zip)
2018-10-03 04:42 UTC, MasterCATZ
Details

Description MasterCATZ 2018-10-03 04:42:09 UTC
Created attachment 278901 [details]
kernel logs for last few days

PM8003 PMC-Sierra Rev3 card with rev5 firmware 
connected to  NetApp DS4246 Shelf's

The system was running fine with ubuntu 18.04 with 4.15 kernelthen whent to put another shelf in circulation and everything broke 

either 4.18.11-041811-generic or something else is causing the issue, replaced controller card with backup and also tried backup shelf with the same issue

disks all passed badblock check after 160 hr scan time then when trying to format the SEAGATE ST33000650NS  SM drives they kept dropping out or Pc completely freezing up 


aio@aio:~$ sudo mkfs.ext4 -E lazy_itable_init=0,lazy_journal_init=0 -v -L SRD0NA1B0 -m 1 /dev/sdac1
mke2fs 1.44.1 (24-Mar-2018)
fs_types for mke2fs.conf resolution: 'ext4'
Filesystem label=SRD0NA1B0
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=0 blocks, Stripe width=0 blocks
183148544 inodes, 732566016 blocks
7325660 blocks (1.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=2881486848
22357 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Filesystem UUID: 1f85920d-cd5a-4120-bb84-e290cb8c8808
Superblock backups stored on blocks: 
	32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 
	4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968, 
	102400000, 214990848, 512000000, 550731776, 644972544

Allocating group tables: done                            
Writing inode tables: done                            
Creating journal (262144 blocks): done
Writing superblocks and filesystem accounting information:            
Warning, had trouble writing out superblocks.
aio@aio:~$ sudo mkfs.ext4 -E lazy_itable_init=0,lazy_journal_init=0 -v -L SRD0NA1B0 -m 1 /dev/sdac1
[sudo] password for aio: 
mke2fs 1.44.1 (24-Mar-2018)
fs_types for mke2fs.conf resolution: 'ext4'
Filesystem label=SRD0NA1B0
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=0 blocks, Stripe width=0 blocks
183148544 inodes, 732566016 blocks
7325660 blocks (1.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=2881486848
22357 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Filesystem UUID: 6b4fa378-38aa-4c02-ad46-4e2876ce1cb1
Superblock backups stored on blocks: 
	32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 
	4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968, 
	102400000, 214990848, 512000000, 550731776, 644972544

Allocating group tables: done                            
Writing inode tables: done                            
Creating journal (262144 blocks): done
Writing superblocks and filesystem accounting information: done
Comment 1 MasterCATZ 2018-10-03 04:45:04 UTC
pm80xx mpi_ssp_completion 1874:sas IO status 0x24
[  708.424035] pm80xx mpi_ssp_completion 1883:SAS Address of IO Failure Drive:500605ba00b9cca2
[  708.424241] sd 1:0:8:0: Power-on or device reset occurred
[  708.712116] pm80xx 0000:04:00.0: dev 500605ba00b9cca2 sent sense data, but stat(28) is not CHECK CONDITION
[  709.072865] sas: Enter sas_scsi_recover_host busy: 1 failed: 1
[  709.073843] sd 1:0:8:0: Power-on or device reset occurred
[  709.074231] sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 1 tries: 1
[  709.082924] pm80xx 0000:04:00.0: dev 500605ba00b9cca2 sent sense data, but stat(28) is not CHECK CONDITION
[  709.083028] pm80xx 0000:04:00.0: dev 500605ba00b9cca2 sent sense data, but stat(28) is not CHECK CONDITION
[  709.083030] pm80xx 0000:04:00.0: dev 500605ba00b9cca2 sent sense data, but stat(28) is not CHECK CONDITION
[  709.083032] pm80xx 0000:04:00.0: dev 500605ba00b9cca2 sent sense data, but stat(28) is not CHECK CONDITION
[  709.083148] pm80xx 0000:04:00.0: dev 500605ba00b9cca2 sent sense data, but stat(28) is not CHECK CONDITION
[  709.083151] pm80xx 0000:04:00.0: dev 500605ba00b9cca2 sent sense data, but stat(28) is not CHECK CONDITION
[  709.083259] pm80xx 0000:04:00.0: dev 500605ba00b9cca2 sent sense data, but stat(28) is not CHECK CONDITION
[  709.473399] sas: Enter sas_scsi_recover_host busy: 7 failed: 7
[  709.478894] sd 1:0:8:0: Power-on or device reset occurred
[  709.479203] sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 7 tries: 1
[  709.482057] pm80xx 0000:04:00.0: dev 500605ba00b9cca2 sent sense data, but stat(28) is not CHECK CONDITION
Comment 2 MasterCATZ 2018-10-03 09:08:06 UTC
it might be also related to the SEAGATE ST33000650NS as my other drives are HITACHI HUA723030ALA64SA and I used a spare and it formatted with fewer kernel errors, but still did have some of the same issues displaying 

tried 4 spare PM8003 controllers rev3 and rev5 cards 
and kernels 4.17 - 4.18 

different Shelves and IOM and cables only other thing might be the PCI-e slot or bios setting but old backup OS of Ubuntu 14.04.5 LTS works 

unsure how to add another attachment but kernel log of other trials here 

https://drive.google.com/file/d/1evzTbENvVRfgU_p5I125ADkbbC11dx77/view?usp=sharing
Comment 3 MasterCATZ 2018-10-04 05:31:27 UTC
it might actually be part of this bug 

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=740701
Comment 4 MasterCATZ 2018-10-04 07:23:37 UTC
however it might also have something to do with serial number being read wrong? 


the drive is an 
SEAGATE ST33000650NS  SM (NA01)
Serial # 500605ba00b9be18

but the logs show 

[11125.292669] pm80xx 0000:08:00.0: dev 500605ba00b9be19 sent sense data, but stat(28) is not CHECK CONDITION
[11125.645491] sas: Enter sas_scsi_recover_host busy: 54 failed: 54
Comment 5 Marius Schiffer 2019-03-23 12:36:12 UTC
I have the same problem, using Archlinux and kernel 5.0.2.
Currently testing with 4.14.
For me, one (sometimes two) drives randomly drop every few hours to days.
Comment 6 Deepak Ukey 2019-03-29 06:54:26 UTC
I will try to reproduce the issue on my setup to debug it further.
Comment 7 MasterCATZ 2019-03-29 07:05:52 UTC
I am using a 
LSI / Avago 9302-16e 12Gb/s PCIe 3.0 SAS HBA Raid Controller 03-25688-00  
with out any issues

if you want anything tested I can put the 

PM8003 PMC-Sierra

back in

Note You need to log in before you can comment on or make changes to this bug.