Bug 89261
Summary: | ata1.00: failed command: READ FPDMA QUEUED | ||
---|---|---|---|
Product: | IO/Storage | Reporter: | Sumit Rai (sumitrai96) |
Component: | Serial ATA | Assignee: | Tejun Heo (tj) |
Status: | NEW --- | ||
Severity: | normal | CC: | alan, am1n, spamtrap1, sumitrai96, szg00000 |
Priority: | P1 | ||
Hardware: | Intel | ||
OS: | Linux | ||
Kernel Version: | 3.17.4-300.fc21.x86_64 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
dmesg
lspci -nn hdparm -I from Crucial MX100 512GB |
Description
Sumit Rai
2014-12-05 04:25:58 UTC
Created attachment 159761 [details]
dmesg
Created attachment 159771 [details]
lspci -nn
[ 3657.844533] ata1.00: exception Emask 0x0 SAct 0x8000000 SErr 0x50000 action 0x6 frozen [ 3657.844543] ata1: SError: { PHYRdyChg CommWake } That is the device disconnecting and reconnecting by the look of things. Check if there is newer firmware available for your drive,but failing that you may need to see whether running with NCQ or with trim disabled helps. (In reply to Alan from comment #3) > [ 3657.844533] ata1.00: exception Emask 0x0 SAct 0x8000000 SErr 0x50000 > action 0x6 frozen > [ 3657.844543] ata1: SError: { PHYRdyChg CommWake } > > That is the device disconnecting and reconnecting by the look of things. > Check if there is newer firmware available for your drive,but failing that > you may need to see whether running with NCQ or with trim disabled helps. I havn't been able to get newer firmware from Manufacturer. Some folks facing similar issue on same ssd model at https://github.com/linrunner/TLP/issues/84 are suggesting it's ALPM issue, and chaging link_power_management_policy to medium or max helps. I am testing with echo max_performance > /sys/class/scsi_host/host0/link_power_management_policy to see if it fixes the issue. NCQ and TRIM are both enabled. I am reluctant to disable NCQ since It might significantly lower performance. I will try to reproduce with TRIM or NCQ disabled if issue occurs again in max_power. If you do figure out the trigger then please update the bug accordingly and include the hdparm -I output for the drive. That way if we've got multiple reporters we can blacklist the problem features of the drive Changing ALPM settings to max performance does seem reduce the frequency of errors to significant degree, but it doesn't go away completely. I am trying to reproduce it with min_power and ncq disabled. Disabling ncq with libata.force=noncq doesn't resolve this issue either with min_power or max_performance. [srai@localhost ~]$ dmesg | grep ncq [ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-3.17.4-300.fc21.x86_64 root=/dev/mapper/fedora_20-root ro selinux=0 kvm-intel.nested=1 libata.force=noncq [ 0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-3.17.4-300.fc21.x86_64 root=/dev/mapper/fedora_20-root ro selinux=0 kvm-intel.nested=1 libata.force=noncq [ 2.719890] ahci 0000:00:1f.2: flags: 64bit ncq sntf pm led clo pio slum part ems apst [ 3.035244] ata1.00: FORCE: horkage modified (noncq) [ 3.042855] ata2.00: FORCE: horkage modified (noncq) I am moving on to see if disabling TRIM helps. Created attachment 178781 [details] hdparm -I from Crucial MX100 512GB I am observing the same problem on a Thinkpad (S1) Yoga with the same Crucial MX100 512G SSD as Sumit. In my case the initial error is ata1: SError: { CommWake } without the PHYRdyChange, but everything else is similar. For me changing the ALPM setting to max performance seems to have solved the issue. (Upgrading thd drive firmware might have helped too). Based on the comments of multiple users in the linrunner bug report https://github.com/linrunner/TLP/issues/84 and also multiple reports of a similar problem under Windows on Crucial's own support forums, I would think a blacklisting would be in order. |