Bug 14534
Summary: | mvsas : repeatably disconnects all disks during RAID6 resync | ||
---|---|---|---|
Product: | IO/Storage | Reporter: | Arno Wagner (wagner) |
Component: | SCSI | Assignee: | linux-scsi (linux-scsi) |
Status: | RESOLVED OBSOLETE | ||
Severity: | normal | CC: | alan, arno, ayan, christian.vilhelm, iamthestonefishdammit, taeuber |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 3.0-rc7 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
Error log from /var/log/kern.log
kernel .config /var/log/messages dmesg output |
Created attachment 23625 [details]
kernel .config
Please send email to arno@wagner.name. I accidentially posted from the wrong account and the forwarding for my old email seems not to work anymore. Arno I will commit a patch to update the mvsas code to latest version, this version fixed some issue with disk IO. I will give it a try when I have it. I committed patch, please check it. Seems not to be in git yet, at least I cannot find it anywhere. Can you mail me the patch directly and tell me what to apply it to? Are you in the SCSI mail list, if yes, you should have received it. OK, I will forward it to you. I just tried the patches on 2.6.32-rc6, I did not apply the 7th patch as the new libsas version is not in -rc6. I could create a raid 5 array just fine on 6 SATA disks, using a 64xx based card (Areca 1300). The raid-5 build was not possible without the patches (tried many times with different configurations). I will do some more tests in the coming days to confirm this. Many thanks Andy for fixing this. Just found the patches in the SCSI mailing list archives at http://marc.info. I am in the process of trying them out on a 4 disk RAID6 setup. I did apply patches 1...6 to 2.6.32-rc7 successfully. Patch 7 applies but then causes a compile error (looks like the same thing Christian found). Function looks good. No issues on building the RAID6. No issues during fast sequential reading or writing. No issues when compiling kernels in a loop. Hot plugging works. Performance looks about right and is at roughly 1.5 times the throughput of the slowest disk for sequential reading and writing. I will do some more tests and if anything shows up, I will post it here. Thanks for the quick fix, Andy! Greetings all, I'm receiving a very similar problem to this, and have been doing so ever since I got three of those same Supermicro AOC-SASLP-MV8 cards six months back. I'm currently running just one, but that doesn't help. I've created a software raid5, and it's usable. When this error strikes, it does recover. IO performance drops to zero for about a minute or three, but it does come back. Almost always :) I've tried several kernels since about 2.6.34, vanilla and gentoo alike. Currently on 2.6.39.2. I was able to write some 140GB of random data to the array yesterday, and it didn't fail until I started running a kvm instance and writing to the virtual disk on it. That kills it every time. If I don't write large quantities of the data to the array, it remains quite stable. I'm able to read at excellent speeds. The disks are Seagate ST31500341AS I'll attach a log file. Is there anything else anyone needs? Cheers. Created attachment 64642 [details]
/var/log/messages
That's the entire output of /var/log/messages from the time it fails.
Hello, it seems my problem is also related to the mvsas driver/scsi subsystem: http://thread.gmane.org/gmane.linux.kernel/1150608 I can reproduce the problem with a simple: dd if=/dev/zero of=/dev/sdc I tested this with the newest stable kernel 2.6.39.2 and two different enterprise SATA drives: * Seagate Constellation ES * Hitachi Ultrastar 7k3000 I'd like to help debugging this. Please give some instructions. Lars Please note that I've updated to the latest firmware on the AOC-SASLP-MV8, and it didn't help any. Hi, I've tested the newsest 3.0-rc7 version and still get the errors. I'll attach the dmesg output. The crash was provoked with: # dd if=/dev/zero of=/dev/sdc * ARECA ARC1300 SAS HBA with latest firmware * Hitachi Ultrastar 7k3000 SATA disks * 64 bit kernel * mvsas driver Created attachment 65602 [details]
dmesg output
beginning of log is missing because of buffer size limitation
This bug relates to a very old kernel. Closing as obsolete. |
Created attachment 23624 [details] Error log from /var/log/kern.log I am in the process of evaluating 8 port SATA/SAS PCI-E controllers for a storage server. I started with a Supermicro AOC-SASLP-MV8 which runs on a Marvell 6480 host controller. - Direct storage access was fine. - RAID1 with 2 disks was fine. - A RAID6 with 4 disks disconnected all drives within a very short time directly after creation, error log attached. The disks only become accessible again after cold reboot and I had to manually disconnect 2 of them in order to be able to access them at all, since the disks were immediately disconnected again by the resync process. The disks had no errors in their logs. 3 have seen extended use without problems and one is new, but also has a long SMART selftest and some hours of use without problems. They are all different, which should rule out a controller/disk interaction issue, unless such a problem on one port can kill all four ports. Disk list: ========= sda: 160GB Samsung SP1614C sdb: 500GB Seagate Barracuda 7200.12 (new) sdc: 80 GB Seagate ST380021A, attached via RockerHead 100 SATA->IDE converter sdd: 40 GB Hitachi Travelstar 5K100 HTS541040G9SA00 sde (system drive, not in RAID): Same as sdc Output from lspci: ================= exp:root ~>lspci 00:00.0 Host bridge: Advanced Micro Devices [AMD] RS780 Host Bridge 00:01.0 PCI bridge: Advanced Micro Devices [AMD] RS780 PCI to PCI bridge (int gfx) 00:03.0 PCI bridge: Advanced Micro Devices [AMD] RS780 PCI to PCI bridge (ext gfx port 1) 00:06.0 PCI bridge: Advanced Micro Devices [AMD] RS780 PCI to PCI bridge (PCIE port 2) 00:11.0 SATA controller: ATI Technologies Inc SB700/SB800 SATA Controller [IDE mode] 00:12.0 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI0 Controller 00:12.1 USB Controller: ATI Technologies Inc SB700 USB OHCI1 Controller 00:12.2 USB Controller: ATI Technologies Inc SB700/SB800 USB EHCI Controller 00:13.0 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI0 Controller 00:13.1 USB Controller: ATI Technologies Inc SB700 USB OHCI1 Controller 00:13.2 USB Controller: ATI Technologies Inc SB700/SB800 USB EHCI Controller 00:14.0 SMBus: ATI Technologies Inc SBx00 SMBus Controller (rev 3a) 00:14.1 IDE interface: ATI Technologies Inc SB700/SB800 IDE Controller 00:14.2 Audio device: ATI Technologies Inc SBx00 Azalia (Intel HDA) 00:14.3 ISA bridge: ATI Technologies Inc SB700/SB800 LPC host controller 00:14.4 PCI bridge: ATI Technologies Inc SBx00 PCI to PCI Bridge 00:14.5 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI2 Controller 00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration 00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map 00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller 00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control 01:05.0 VGA compatible controller: ATI Technologies Inc Radeon HD 3300 Graphics 01:05.1 Audio device: ATI Technologies Inc RS780 Azalia controller 02:00.0 SCSI storage controller: Marvell Technology Group Ltd. MV64460/64461/64462 System Controller, Revision B (rev 01) 03:00.0 Ethernet controller: Attansic Technology Corp. L1 Gigabit Ethernet Adapter (rev b0) exp:root ~> Attached files: relevant part of /var/log/kern.log .config in followup