Bug 14534 - mvsas : repeatably disconnects all disks during RAID6 resync
mvsas : repeatably disconnects all disks during RAID6 resync
Status: RESOLVED OBSOLETE
Product: IO/Storage
Classification: Unclassified
Component: SCSI
All Linux
: P1 normal
Assigned To: linux-scsi@vger.kernel.org
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2009-11-03 02:08 UTC by Arno Wagner
Modified: 2015-02-19 15:34 UTC (History)
6 users (show)

See Also:
Kernel Version: 3.0-rc7
Tree: Mainline
Regression: No


Attachments
Error log from /var/log/kern.log (78.19 KB, text/plain)
2009-11-03 02:08 UTC, Arno Wagner
Details
kernel .config (58.21 KB, text/plain)
2009-11-03 02:10 UTC, Arno Wagner
Details
/var/log/messages (3.43 KB, text/plain)
2011-07-04 22:47 UTC, Stonefish
Details
dmesg output (245.20 KB, text/plain)
2011-07-14 10:43 UTC, taeuber
Details

Description Arno Wagner 2009-11-03 02:08:19 UTC
Created attachment 23624 [details]
Error log from /var/log/kern.log

I am in the process of evaluating 8 port SATA/SAS PCI-E controllers for 
a storage server. I started with a Supermicro AOC-SASLP-MV8 which runs 
on a Marvell 6480 host controller. 
- Direct storage access was fine.
- RAID1 with 2 disks was fine. 
- A RAID6 with 4 disks disconnected all drives within a very 
  short time directly after creation, error log attached. The 
  disks only become accessible again after cold reboot and I 
  had to manually disconnect 2 of them in order to be able to 
  access them at all, since  the disks were immediately 
  disconnected again by the resync process.

The disks had no errors in their logs. 3 have seen extended use without 
problems and one is new, but also has a long SMART selftest and some 
hours of use without problems. They are all different, which should 
rule out a controller/disk interaction issue, unless such a problem 
on one port can kill all four ports.

Disk list:
=========
sda: 160GB Samsung SP1614C
sdb: 500GB Seagate Barracuda 7200.12  (new)
sdc: 80 GB Seagate ST380021A, attached via RockerHead 100 SATA->IDE converter
sdd: 40 GB Hitachi Travelstar 5K100 HTS541040G9SA00
sde (system drive, not in RAID): Same as sdc

Output from lspci:
=================
exp:root ~>lspci
00:00.0 Host bridge: Advanced Micro Devices [AMD] RS780 Host Bridge
00:01.0 PCI bridge: Advanced Micro Devices [AMD] RS780 PCI to PCI bridge (int gfx)
00:03.0 PCI bridge: Advanced Micro Devices [AMD] RS780 PCI to PCI bridge (ext gfx port 1)
00:06.0 PCI bridge: Advanced Micro Devices [AMD] RS780 PCI to PCI bridge (PCIE port 2)
00:11.0 SATA controller: ATI Technologies Inc SB700/SB800 SATA Controller [IDE mode]
00:12.0 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI0 Controller
00:12.1 USB Controller: ATI Technologies Inc SB700 USB OHCI1 Controller
00:12.2 USB Controller: ATI Technologies Inc SB700/SB800 USB EHCI Controller
00:13.0 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI0 Controller
00:13.1 USB Controller: ATI Technologies Inc SB700 USB OHCI1 Controller
00:13.2 USB Controller: ATI Technologies Inc SB700/SB800 USB EHCI Controller
00:14.0 SMBus: ATI Technologies Inc SBx00 SMBus Controller (rev 3a)
00:14.1 IDE interface: ATI Technologies Inc SB700/SB800 IDE Controller
00:14.2 Audio device: ATI Technologies Inc SBx00 Azalia (Intel HDA)
00:14.3 ISA bridge: ATI Technologies Inc SB700/SB800 LPC host controller
00:14.4 PCI bridge: ATI Technologies Inc SBx00 PCI to PCI Bridge
00:14.5 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI2 Controller
00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control
01:05.0 VGA compatible controller: ATI Technologies Inc Radeon HD 3300 Graphics
01:05.1 Audio device: ATI Technologies Inc RS780 Azalia controller
02:00.0 SCSI storage controller: Marvell Technology Group Ltd. MV64460/64461/64462 System Controller, Revision B (rev 01)
03:00.0 Ethernet controller: Attansic Technology Corp. L1 Gigabit Ethernet Adapter (rev b0)
exp:root ~>

Attached files:

relevant part of /var/log/kern.log
.config in followup
Comment 1 Arno Wagner 2009-11-03 02:10:23 UTC
Created attachment 23625 [details]
kernel .config
Comment 2 Arno Wagner 2009-11-03 02:22:02 UTC
Please send email to arno@wagner.name. I accidentially posted from the wrong account and the forwarding for my old email seems not to work anymore.

Arno
Comment 3 andy yan 2009-11-03 09:41:02 UTC
I will commit a patch to update the mvsas code to latest version, this version fixed some issue with disk IO.
Comment 4 Arno Wagner 2009-11-03 12:45:56 UTC
I will give it a try when I have it.
Comment 5 andy yan 2009-11-10 01:29:32 UTC
I committed patch, please check it.
Comment 6 Arno Wagner 2009-11-10 15:23:05 UTC
Seems not to be in git yet, at least I cannot find it anywhere. Can you mail me the patch directly and tell me what to apply it to?
Comment 7 andy yan 2009-11-11 01:14:53 UTC
Are you in the SCSI mail list, if yes, you should have received it. OK, I will forward it to you.
Comment 8 Christian Vilhelm 2009-11-11 20:50:40 UTC
I just tried the patches on 2.6.32-rc6, I did not apply the 7th patch as the new libsas version is not in -rc6.
I could create a raid 5 array just fine on 6 SATA disks, using a 64xx based card (Areca 1300). The raid-5 build was not possible without the patches (tried many times with different configurations).
I will do some more tests in the coming days to confirm this.
Many thanks Andy for fixing this.
Comment 9 Arno Wagner 2009-11-14 05:01:40 UTC
Just found the patches in the SCSI mailing list archives at 
http://marc.info. I am in the process of trying them out 
on a 4 disk RAID6 setup.
Comment 10 Arno Wagner 2009-11-14 11:03:55 UTC
I did apply patches 1...6 to 2.6.32-rc7 successfully. Patch 7 applies but then causes a compile error (looks like the same thing Christian found).

Function looks good. No issues on building the RAID6. No issues during fast sequential reading or writing. No issues when compiling kernels in a loop. Hot plugging works. Performance looks about right and is at roughly 1.5 times the throughput of the slowest disk for sequential reading and writing.

I will do some more tests and if anything shows up, I will post it here.

Thanks for the quick fix, Andy!
Comment 11 Stonefish 2011-07-04 22:44:11 UTC
Greetings all,

I'm receiving a very similar problem to this, and have been doing so ever since I got three of those same Supermicro AOC-SASLP-MV8 cards six months back.  I'm currently running just one, but that doesn't help.

I've created a software raid5, and it's usable.  When this error strikes, it does recover.  IO performance drops to zero for about a minute or three, but it does come back.  Almost always :)
I've tried several kernels since about 2.6.34, vanilla and gentoo alike.  Currently on 2.6.39.2.

I was able to write some 140GB of random data to the array yesterday, and it didn't fail until I started running a kvm instance and writing to the virtual disk on it.  That kills it every time.
If I don't write large quantities of the data to the array, it remains quite stable.  I'm able to read at excellent speeds.
The disks are Seagate ST31500341AS

I'll attach a log file.  Is there anything else anyone needs?

Cheers.
Comment 12 Stonefish 2011-07-04 22:47:15 UTC
Created attachment 64642 [details]
/var/log/messages

That's the entire output of /var/log/messages from the time it fails.
Comment 13 taeuber 2011-07-05 08:25:37 UTC
Hello,

it seems my problem is also related to the mvsas driver/scsi subsystem:
http://thread.gmane.org/gmane.linux.kernel/1150608

I can reproduce the problem with a simple:
dd if=/dev/zero of=/dev/sdc

I tested this with the newest stable kernel 2.6.39.2 and two different enterprise SATA drives:
* Seagate Constellation ES
* Hitachi Ultrastar 7k3000

I'd like to help debugging this.
Please give some instructions.

Lars
Comment 14 Stonefish 2011-07-11 04:56:49 UTC
Please note that I've updated to the latest firmware on the AOC-SASLP-MV8, and it didn't help any.
Comment 15 taeuber 2011-07-14 10:40:58 UTC
Hi,

I've tested the newsest 3.0-rc7 version and still get the errors.
I'll attach the dmesg output.

The crash was provoked with:

# dd if=/dev/zero of=/dev/sdc

* ARECA ARC1300 SAS HBA with latest firmware
* Hitachi Ultrastar 7k3000 SATA disks
* 64 bit kernel
* mvsas driver
Comment 16 taeuber 2011-07-14 10:43:05 UTC
Created attachment 65602 [details]
dmesg output

beginning of log is missing because of buffer size limitation
Comment 17 Alan 2015-02-19 15:34:50 UTC
This bug relates to a very old kernel. Closing as obsolete.

Note You need to log in before you can comment on or make changes to this bug.