Bug 15722 - unreliable IO for ata disks under heavy io load
Summary: unreliable IO for ata disks under heavy io load
Status: RESOLVED INVALID
Alias: None
Product: SCSI Drivers
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All Linux
: P1 low
Assignee: scsi_drivers-other
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-04-08 05:27 UTC by dujun
Modified: 2010-04-10 00:50 UTC (History)
2 users (show)

See Also:
Kernel Version: 2.6.32-2.6.33
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description dujun 2010-04-08 05:27:39 UTC
with 2.6.32 valilla mptsas driver and 4.22.00 lsi driver, and SATA disks
connected through lsi 24x expander,  SATA passthrough command sometimes
gets resetted during heavy IO.  I found this problem is very similar
with the bug 14831 in bugzilla.kernel.org
https://bugzilla.kernel.org/show_bug.cgi?id=14831 . 
lots of task abort, task reset, bus reset logs in dmesg. the IO is then
stalled for somewhile until the resets success. 

However, the problem only occures when we connect more than 12 SATA
disks through expander and during heavy disk io.  If we use SAS disks or
we connect 16 SATA disks directly to 1068e card or we use less than 12
disks, the problem seems just disappear. 
2.6.21 kernel mptsas driver seems worse.
Comment 1 dujun 2010-04-09 09:15:33 UTC
This should not be related with mptsas driver. We tested the same scenario with lsi 36x expander and could not reproduce the same problem. It seems that the problem is related to the original lsi 24x expander. Maybe the expander firmware problem.
Comment 2 Andrew Morton 2010-04-09 20:57:05 UTC
I think this is more likely to be a libata problem than a scsi problem?

I recategorised it that way, thanks.
Comment 3 dujun 2010-04-10 00:39:31 UTC
(In reply to comment #2)
> I think this is more likely to be a libata problem than a scsi problem?
> 
> I recategorised it that way, thanks.

Andrew, I guess this should not be a libata problem. In our system setup, all those disks are not connected by any sata chips but through lsi expander chip to the lsi 1068e sas hba. As my previous comment, we found this may be caused by some firmware bug in the expander. We are investigating this problem further and will post any update here. 

Also we have a lowend setup with 8 sata disks connected to ahci and silcon image 3124 chips which has no problem under heavy io. So libata seems good to us. 

thank you very much for your help.

Note You need to log in before you can comment on or make changes to this bug.