Bug 11751 - libata: sata_nv: enabling SWNCQ causes errors
Summary: libata: sata_nv: enabling SWNCQ causes errors
Status: CLOSED OBSOLETE
Alias: None
Product: IO/Storage
Classification: Unclassified
Component: Serial ATA (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Tejun Heo
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-10-13 12:57 UTC by Chuck Ebbert
Modified: 2012-05-14 16:50 UTC (History)
1 user (show)

See Also:
Kernel Version: 2.6.26
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
output of smartctl -a (52 bytes, text/plain)
2008-10-14 12:13 UTC, Chuck Ebbert
Details
output of hdparm -I (52 bytes, text/plain)
2008-10-14 12:14 UTC, Chuck Ebbert
Details

Description Chuck Ebbert 2008-10-13 12:57:10 UTC
Latest working kernel version: 2.6.25
Earliest failing kernel version: 2.6.26
Distribution: Fedora 9
Hardware Environment: x86_64
Software Environment:
Problem Description:

SWNCQ is now enabled by default in sata_nv

This causes errors:
https://bugzilla.redhat.com/show_bug.cgi?id=463034

Disabling SWNCQ makes the errors go away.

Sep 21 23:30:26 localhost kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr
0x0 action 0x6 frozen
Sep 21 23:30:26 localhost kernel: ata1.00: cmd
ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
Sep 21 23:30:26 localhost kernel:         res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Sep 21 23:30:26 localhost kernel: ata1.00: status: { DRDY }
Sep 21 23:30:26 localhost kernel: ata1: hard resetting link
Sep 21 23:30:27 localhost kernel: ata1: SATA link up 3.0 Gbps (SStatus 123
SControl 300)
Sep 21 23:30:27 localhost kernel: ata1.00: configured for UDMA/133
Sep 21 23:30:27 localhost kernel: end_request: I/O error, dev sda, sector
312576468
Comment 1 Tejun Heo 2008-10-13 16:28:27 UTC
Please post the results of "hdparm -I" and "smartctl -a" on the drive.
Comment 2 Chuck Ebbert 2008-10-14 12:13:54 UTC
Created attachment 18307 [details]
output of smartctl -a
Comment 3 Chuck Ebbert 2008-10-14 12:14:54 UTC
Created attachment 18309 [details]
output of hdparm -I
Comment 4 Tejun Heo 2008-10-14 21:59:33 UTC
Eh... the above attachments contain links to redhat attachments.  That's a weird way to post files.  Please don't do that.

I have a HD160JJ and it's a pretty well behaved drive.  It never had any problem with NCQ.  Till now, many (but not all) cases of FLUSH(_EXT) timeouts were caused by power problems.  Can you please connect the hard drive to a separate power supply and see whether it makes any difference?

Thanks.
Comment 5 Chuck Ebbert 2008-10-15 13:28:09 UTC
From the comments in the Red Hat bug, I gather the array worked perfectly until sata_nv SWNCQ got enabled in 2.6.26. So I doubt this is caused by power problems.
Comment 6 Tejun Heo 2008-10-15 16:35:30 UTC
Unless it's too difficult to try, I would really like to rule that out as this wouldn't be the first time NCQ enabling and power problem show correlation and drivers really can't do much wrong for FLUSH(_EXT) commands.  It's just a non-data command which tells the drive to flush.  Timeouts on it usually indicate that the problem is on the drive's side.

libata.force=noncq kills the problem too, right?

Note You need to log in before you can comment on or make changes to this bug.