Latest working kernel version: 2.6.25 Earliest failing kernel version: 2.6.26 Distribution: Fedora 9 Hardware Environment: x86_64 Software Environment: Problem Description: SWNCQ is now enabled by default in sata_nv This causes errors: https://bugzilla.redhat.com/show_bug.cgi?id=463034 Disabling SWNCQ makes the errors go away. Sep 21 23:30:26 localhost kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Sep 21 23:30:26 localhost kernel: ata1.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 Sep 21 23:30:26 localhost kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Sep 21 23:30:26 localhost kernel: ata1.00: status: { DRDY } Sep 21 23:30:26 localhost kernel: ata1: hard resetting link Sep 21 23:30:27 localhost kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Sep 21 23:30:27 localhost kernel: ata1.00: configured for UDMA/133 Sep 21 23:30:27 localhost kernel: end_request: I/O error, dev sda, sector 312576468
Please post the results of "hdparm -I" and "smartctl -a" on the drive.
Created attachment 18307 [details] output of smartctl -a
Created attachment 18309 [details] output of hdparm -I
Eh... the above attachments contain links to redhat attachments. That's a weird way to post files. Please don't do that. I have a HD160JJ and it's a pretty well behaved drive. It never had any problem with NCQ. Till now, many (but not all) cases of FLUSH(_EXT) timeouts were caused by power problems. Can you please connect the hard drive to a separate power supply and see whether it makes any difference? Thanks.
From the comments in the Red Hat bug, I gather the array worked perfectly until sata_nv SWNCQ got enabled in 2.6.26. So I doubt this is caused by power problems.
Unless it's too difficult to try, I would really like to rule that out as this wouldn't be the first time NCQ enabling and power problem show correlation and drivers really can't do much wrong for FLUSH(_EXT) commands. It's just a non-data command which tells the drive to flush. Timeouts on it usually indicate that the problem is on the drive's side. libata.force=noncq kills the problem too, right?