Kernel Bug Tracker – Bug 14534
mvsas : repeatably disconnects all disks during RAID6 resync
Last modified: 2015-02-19 15:34:50 UTC
Created attachment 23624 [details]
Error log from /var/log/kern.log
I am in the process of evaluating 8 port SATA/SAS PCI-E controllers for
a storage server. I started with a Supermicro AOC-SASLP-MV8 which runs
on a Marvell 6480 host controller.
- Direct storage access was fine.
- RAID1 with 2 disks was fine.
- A RAID6 with 4 disks disconnected all drives within a very
short time directly after creation, error log attached. The
disks only become accessible again after cold reboot and I
had to manually disconnect 2 of them in order to be able to
access them at all, since the disks were immediately
disconnected again by the resync process.
The disks had no errors in their logs. 3 have seen extended use without
problems and one is new, but also has a long SMART selftest and some
hours of use without problems. They are all different, which should
rule out a controller/disk interaction issue, unless such a problem
on one port can kill all four ports.
sda: 160GB Samsung SP1614C
sdb: 500GB Seagate Barracuda 7200.12 (new)
sdc: 80 GB Seagate ST380021A, attached via RockerHead 100 SATA->IDE converter
sdd: 40 GB Hitachi Travelstar 5K100 HTS541040G9SA00
sde (system drive, not in RAID): Same as sdc
Output from lspci:
00:00.0 Host bridge: Advanced Micro Devices [AMD] RS780 Host Bridge
00:01.0 PCI bridge: Advanced Micro Devices [AMD] RS780 PCI to PCI bridge (int gfx)
00:03.0 PCI bridge: Advanced Micro Devices [AMD] RS780 PCI to PCI bridge (ext gfx port 1)
00:06.0 PCI bridge: Advanced Micro Devices [AMD] RS780 PCI to PCI bridge (PCIE port 2)
00:11.0 SATA controller: ATI Technologies Inc SB700/SB800 SATA Controller [IDE mode]
00:12.0 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI0 Controller
00:12.1 USB Controller: ATI Technologies Inc SB700 USB OHCI1 Controller
00:12.2 USB Controller: ATI Technologies Inc SB700/SB800 USB EHCI Controller
00:13.0 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI0 Controller
00:13.1 USB Controller: ATI Technologies Inc SB700 USB OHCI1 Controller
00:13.2 USB Controller: ATI Technologies Inc SB700/SB800 USB EHCI Controller
00:14.0 SMBus: ATI Technologies Inc SBx00 SMBus Controller (rev 3a)
00:14.1 IDE interface: ATI Technologies Inc SB700/SB800 IDE Controller
00:14.2 Audio device: ATI Technologies Inc SBx00 Azalia (Intel HDA)
00:14.3 ISA bridge: ATI Technologies Inc SB700/SB800 LPC host controller
00:14.4 PCI bridge: ATI Technologies Inc SBx00 PCI to PCI Bridge
00:14.5 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI2 Controller
00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control
01:05.0 VGA compatible controller: ATI Technologies Inc Radeon HD 3300 Graphics
01:05.1 Audio device: ATI Technologies Inc RS780 Azalia controller
02:00.0 SCSI storage controller: Marvell Technology Group Ltd. MV64460/64461/64462 System Controller, Revision B (rev 01)
03:00.0 Ethernet controller: Attansic Technology Corp. L1 Gigabit Ethernet Adapter (rev b0)
relevant part of /var/log/kern.log
.config in followup
Created attachment 23625 [details]
Please send email to firstname.lastname@example.org. I accidentially posted from the wrong account and the forwarding for my old email seems not to work anymore.
I will commit a patch to update the mvsas code to latest version, this version fixed some issue with disk IO.
I will give it a try when I have it.
I committed patch, please check it.
Seems not to be in git yet, at least I cannot find it anywhere. Can you mail me the patch directly and tell me what to apply it to?
Are you in the SCSI mail list, if yes, you should have received it. OK, I will forward it to you.
I just tried the patches on 2.6.32-rc6, I did not apply the 7th patch as the new libsas version is not in -rc6.
I could create a raid 5 array just fine on 6 SATA disks, using a 64xx based card (Areca 1300). The raid-5 build was not possible without the patches (tried many times with different configurations).
I will do some more tests in the coming days to confirm this.
Many thanks Andy for fixing this.
Just found the patches in the SCSI mailing list archives at
http://marc.info. I am in the process of trying them out
on a 4 disk RAID6 setup.
I did apply patches 1...6 to 2.6.32-rc7 successfully. Patch 7 applies but then causes a compile error (looks like the same thing Christian found).
Function looks good. No issues on building the RAID6. No issues during fast sequential reading or writing. No issues when compiling kernels in a loop. Hot plugging works. Performance looks about right and is at roughly 1.5 times the throughput of the slowest disk for sequential reading and writing.
I will do some more tests and if anything shows up, I will post it here.
Thanks for the quick fix, Andy!
I'm receiving a very similar problem to this, and have been doing so ever since I got three of those same Supermicro AOC-SASLP-MV8 cards six months back. I'm currently running just one, but that doesn't help.
I've created a software raid5, and it's usable. When this error strikes, it does recover. IO performance drops to zero for about a minute or three, but it does come back. Almost always :)
I've tried several kernels since about 2.6.34, vanilla and gentoo alike. Currently on 188.8.131.52.
I was able to write some 140GB of random data to the array yesterday, and it didn't fail until I started running a kvm instance and writing to the virtual disk on it. That kills it every time.
If I don't write large quantities of the data to the array, it remains quite stable. I'm able to read at excellent speeds.
The disks are Seagate ST31500341AS
I'll attach a log file. Is there anything else anyone needs?
Created attachment 64642 [details]
That's the entire output of /var/log/messages from the time it fails.
it seems my problem is also related to the mvsas driver/scsi subsystem:
I can reproduce the problem with a simple:
dd if=/dev/zero of=/dev/sdc
I tested this with the newest stable kernel 184.108.40.206 and two different enterprise SATA drives:
* Seagate Constellation ES
* Hitachi Ultrastar 7k3000
I'd like to help debugging this.
Please give some instructions.
Please note that I've updated to the latest firmware on the AOC-SASLP-MV8, and it didn't help any.
I've tested the newsest 3.0-rc7 version and still get the errors.
I'll attach the dmesg output.
The crash was provoked with:
# dd if=/dev/zero of=/dev/sdc
* ARECA ARC1300 SAS HBA with latest firmware
* Hitachi Ultrastar 7k3000 SATA disks
* 64 bit kernel
* mvsas driver
Created attachment 65602 [details]
beginning of log is missing because of buffer size limitation
This bug relates to a very old kernel. Closing as obsolete.