I am seeing the below error in dmesg: [ 3657.844533] ata1.00: exception Emask 0x0 SAct 0x8000000 SErr 0x50000 action 0x6 frozen [ 3657.844543] ata1: SError: { PHYRdyChg CommWake } [ 3657.844549] ata1.00: failed command: READ FPDMA QUEUED [ 3657.844559] ata1.00: cmd 60/20:d8:98:84:52/00:00:0c:00:00/40 tag 27 ncq 16384 in res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [ 3657.844564] ata1.00: status: { DRDY } [ 3657.844570] ata1: hard resetting link [ 3658.149326] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [ 3658.149803] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out [ 3658.149840] ata1.00: supports DRM functions and may not be fully accessible [ 3658.158491] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out [ 3658.158528] ata1.00: supports DRM functions and may not be fully accessible [ 3658.166701] ata1.00: configured for UDMA/133 [ 3658.177338] ata1.00: device reported invalid CHS sector 0 [ 3658.177381] ata1: EH complete Machine Details: [ 0.000000] DMI: Apple Inc. MacBookPro8,1/Mac-94245B3640C91C81, BIOS MBP81.88Z.0047.B27.1201241646 01/24/12 OS: Fedora 21 SSD: Crucial_CT512MX100 [root@localhost ~]# smartctl -H /dev/sda smartctl 6.2 2014-07-16 r3952 [x86_64-linux-3.17.4-300.fc21.x86_64] (local build) Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED
Created attachment 159761 [details] dmesg
Created attachment 159771 [details] lspci -nn
[ 3657.844533] ata1.00: exception Emask 0x0 SAct 0x8000000 SErr 0x50000 action 0x6 frozen [ 3657.844543] ata1: SError: { PHYRdyChg CommWake } That is the device disconnecting and reconnecting by the look of things. Check if there is newer firmware available for your drive,but failing that you may need to see whether running with NCQ or with trim disabled helps.
(In reply to Alan from comment #3) > [ 3657.844533] ata1.00: exception Emask 0x0 SAct 0x8000000 SErr 0x50000 > action 0x6 frozen > [ 3657.844543] ata1: SError: { PHYRdyChg CommWake } > > That is the device disconnecting and reconnecting by the look of things. > Check if there is newer firmware available for your drive,but failing that > you may need to see whether running with NCQ or with trim disabled helps. I havn't been able to get newer firmware from Manufacturer. Some folks facing similar issue on same ssd model at https://github.com/linrunner/TLP/issues/84 are suggesting it's ALPM issue, and chaging link_power_management_policy to medium or max helps. I am testing with echo max_performance > /sys/class/scsi_host/host0/link_power_management_policy to see if it fixes the issue. NCQ and TRIM are both enabled. I am reluctant to disable NCQ since It might significantly lower performance. I will try to reproduce with TRIM or NCQ disabled if issue occurs again in max_power.
If you do figure out the trigger then please update the bug accordingly and include the hdparm -I output for the drive. That way if we've got multiple reporters we can blacklist the problem features of the drive
Changing ALPM settings to max performance does seem reduce the frequency of errors to significant degree, but it doesn't go away completely. I am trying to reproduce it with min_power and ncq disabled.
Disabling ncq with libata.force=noncq doesn't resolve this issue either with min_power or max_performance. [srai@localhost ~]$ dmesg | grep ncq [ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-3.17.4-300.fc21.x86_64 root=/dev/mapper/fedora_20-root ro selinux=0 kvm-intel.nested=1 libata.force=noncq [ 0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-3.17.4-300.fc21.x86_64 root=/dev/mapper/fedora_20-root ro selinux=0 kvm-intel.nested=1 libata.force=noncq [ 2.719890] ahci 0000:00:1f.2: flags: 64bit ncq sntf pm led clo pio slum part ems apst [ 3.035244] ata1.00: FORCE: horkage modified (noncq) [ 3.042855] ata2.00: FORCE: horkage modified (noncq) I am moving on to see if disabling TRIM helps.
Created attachment 178781 [details] hdparm -I from Crucial MX100 512GB I am observing the same problem on a Thinkpad (S1) Yoga with the same Crucial MX100 512G SSD as Sumit. In my case the initial error is ata1: SError: { CommWake } without the PHYRdyChange, but everything else is similar. For me changing the ALPM setting to max performance seems to have solved the issue. (Upgrading thd drive firmware might have helped too). Based on the comments of multiple users in the linrunner bug report https://github.com/linrunner/TLP/issues/84 and also multiple reports of a similar problem under Windows on Crucial's own support forums, I would think a blacklisting would be in order.