Bug 15561

Summary: SCSI Generic READ_10 to SATA fails when starting multiple processes
Product: IO/Storage Reporter: Mike Hayward (mh-linux-kernel)
Component: SCSIAssignee: linux-scsi (linux-scsi)
Status: RESOLVED OBSOLETE    
Severity: normal CC: alan
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.18-2.6.32 Subsystem:
Regression: No Bisected commit-id:
Attachments: aborted sg_io_hdr and kernel logs for various kernels

Description Mike Hayward 2010-03-17 21:55:03 UTC
Created attachment 25572 [details]
aborted sg_io_hdr and kernel logs for various kernels

Issuing a lot of concurrent READ_10 commands via sg driver to SATA
drives causes the the commands to be aborted for no good reason.  I
can reproducibly cause the problem within a few seconds on multiple
known good machines and drives over a wide range of kernels.

I queue 16 concurrent 64k reads to each of eight sata drives with
eight separate process which start at roughly the same time.  At least
one and typically several log kernel errors (reset the associated SATA
bus) and return task aborted.

Perhaps it is a clue to what is going on: even if just using one
drive, driver_duration shows the reads take far longer than normal
(greater than 10ms) when first starting to queue io even with only one
drive, after which the performance behaves more like one would expect
from a sata disk drive.  This slow start is exhibited on both arm and
x86_64 architectures although with only one drive I've never seen an
error.

Older x86_64 kernels are less verbose in kernel log and report with
fixed sense instead of sense descriptors, but the same ATA event is
occuring.  See attachment for typical sg_io_hdr and kernel logs.