Bug 89171 - failed command: READ FPDMA QUEUED - Emask 0x4 (timeout) - Samsung XP941 MZHPU128
Summary: failed command: READ FPDMA QUEUED - Emask 0x4 (timeout) - Samsung XP941 MZHPU128
Status: RESOLVED CODE_FIX
Alias: None
Product: IO/Storage
Classification: Unclassified
Component: Serial ATA (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Tejun Heo
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-12-02 12:31 UTC by Dominik Mierzejewski
Modified: 2017-01-14 08:17 UTC (History)
3 users (show)

See Also:
Kernel Version: 3.17.4-200.fc20.x86_64
Tree: Fedora
Regression: No


Attachments
lspci -vnn (11.33 KB, text/plain)
2014-12-02 22:54 UTC, Sumit Rai
Details
hdparm -I /dev/sda (3.23 KB, text/plain)
2014-12-02 22:56 UTC, Sumit Rai
Details
samsung-a800-nomsi.patch (449 bytes, patch)
2014-12-03 15:42 UTC, Tejun Heo
Details | Diff
dmesg after applying Tejun's patch (65.01 KB, text/plain)
2014-12-04 13:42 UTC, Dominik Mierzejewski
Details
dmesg before applying Tejun's patch (ran with libata.force=noncq) (65.28 KB, text/plain)
2014-12-04 13:45 UTC, Dominik Mierzejewski
Details

Description Dominik Mierzejewski 2014-12-02 12:31:57 UTC
The following error messages are seen upon system boot:

[   39.828245] ata1.00: failed command: READ FPDMA QUEUED
[   39.828309] ata1.00: cmd 60/18:c8:20:40:7e/00:00:0b:00:00/40 tag 25 ncq 12288 in
         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[   39.828474] ata1.00: status: { DRDY }
[   39.828520] ata1.00: failed command: READ FPDMA QUEUED
[   39.828581] ata1.00: cmd 60/08:d0:10:40:7e/00:00:0b:00:00/40 tag 26 ncq 4096 in
         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[   39.828740] ata1.00: status: { DRDY }
[   39.828788] ata1: hard resetting link
[   40.135267] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[   40.140831] ata1.00: configured for UDMA/133
[   40.140840] ata1.00: device reported invalid CHS sector 0
[   40.140843] ata1.00: device reported invalid CHS sector 0
[   40.140852] ata1: EH complete

The above repeats 3 times, and then system boots up normally after:
[  132.831403] ata1.00: NCQ disabled due to excessive errors
and another link reset.

The machine is Sony Vaio Pro 13:
[    0.000000] DMI: Sony Corporation SVP1322C5E/VAIO, BIOS R2080V7 12/21/2013
running Fedora 20 with kernel 3.17.4-200.fc20.x86_64 and the disk is a PCIe-attached Samsung SSD XP941 MZHPU128HCGM:

$ sudo smartctl --identify=w /dev/sda
smartctl 6.2 2014-07-16 r3952 [x86_64-linux-3.17.4-200.fc20.x86_64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== ATA IDENTIFY DATA ===
Word     Bit     Value   Description

   0      -     0x0040   General configuration
   0      6          1   Not removable controller and/or device [OBS-6]

   1      -     0x3fff   Cylinders [OBS-6]

   2      -     0xc837   Specific configuration (0x37c8/738c/8c73/c837)

   3      -     0x0010   Heads [OBS-6]

   4      -     0x0000   Vendor specific [RET-3]

   5      -     0x0000   Vendor specific [RET-3]

   6      -     0x003f   Sectors per track [OBS-6]

   7-8    -     0x00...  Reserved for CFA (Sectors per card)

   9      -     0x0000   Vendor specific [RET-4]

  10-19   -     .        Serial number (String)
  10-13   .     0x5331:4143:4e59:4144  "S1ACNYAD"
  14-17   .     0x4331:3238:3630:2020  "C12860  "
  18-19   .     0x2020:2020            "    "

  20      -     0x0000   Vendor specific [RET-3]

  21      -     0x0000   Vendor specific [RET-3]

  22      -     0x0000   Vendor specific bytes on READ/WRITE LONG [OBS-4]

  23-26   -     .        Firmware revision (String)
  23-26   .     0x5558:4d36:3430:3151  "UXM6401Q"

  27-46   -     .        Model number (String)
  27-30   .     0x5341:4d53:554e:4720  "SAMSUNG "
  31-34   .     0x4d5a:4850:5531:3238  "MZHPU128"
  35-38   .     0x4843:474d:2d30:3030  "HCGM-000"
  39-42   .     0x3030:2020:2020:2020  "00      "
  43-46   .     0x2020:2020:2020:2020  "        "

  47      -     0x8010   READ/WRITE MULTIPLE support
  47     15:8     0x80   Must be set to 0x80
  47      7:0     0x10   Maximum sectors per DRQ on READ/WRITE MULTIPLE

  48      -     0x4000   Trusted Computing feature set options
  48     15:14     0x1   Must be set to 0x1

  49      -     0x2f00   Capabilities
  49     13          1   Standard standby timer values supported
  49     11          1   IORDY supported
  49     10          1   IORDY may be disabled
  49      9          1   LBA supported
  49      8          1   DMA supported

  50      -     0x4000   Capabilities
  50     15:14     0x1   Must be set to 0x1

  51      -     0x0200   PIO data transfer mode [OBS-5]

  52      -     0x0200   Single Word DMA data transfer mode [OBS-3]

  53      -     0x0007   Field validity / Free-fall Control
  53      2          1   Word 88 (Ultra DMA modes) is valid
  53      1          1   Words 64-70 (PIO modes) are valid
  53      0          1   Words 54-58 (CHS) are valid [OBS-6]

  54      -     0x3fff   Current cylinders [OBS-6]

  55      -     0x0010   Current heads [OBS-6]

  56      -     0x003f   Current sectors per track [OBS-6]

  57-58   -     .        Current capacity in sectors (DWord) [OBS-6]
  57-58   .     0xfc10:00fb  (16514064)

  59      -     0xd110   Sanitize Device - READ/WRITE MULTIPLE support
  59     15          1   BLOCK ERASE EXT supported
  59     14          1   OVERWRITE EXT supported
  59     12          1   Sanitize Device feature set supported
  59      8          1   Bits 7:0 are valid
  59      7:0     0x10   Current sectors per DRQ on READ/WRITE MULTIPLE

  60-61   -     .        User addressable sectors for 28-bit commands (DWord)
  60-61   .     0xc2b0:0ee7  (250069680)

  62      -     0x0000   Single Word DMA modes [OBS-3]

  63      -     0x0007   Multiword DMA modes
  63      2          1   Multiword DMA mode 2 and below supported
  63      1          1   Multiword DMA mode 1 and below supported
  63      0          1   Multiword DMA mode 0 supported

  64      -     0x0003   PIO modes
  64      1          1   PIO mode 4 supported
  64      0          1   PIO mode 3 supported

  65      -     0x0078   Minimum Multiword DMA cycle time per word in ns

  66      -     0x0078   Recommended Multiword DMA cycle time in ns

  67      -     0x0078   Minimum PIO cycle time without flow control in ns

  68      -     0x0078   Minimum PIO cycle time with IORDY flow control in ns

  69      -     0x0e00   Additional support
  69     11          1   READ BUFFER DMA supported
  69     10          1   WRITE BUFFER DMA supported
  69      9          1   SET MAX SET PASSWORD/UNLOCK DMA supported [OBS-ACS-3]

  70      -     0x0000   Reserved

  71-74   -     0x00...  Reserved for IDENTIFY PACKET DEVICE

  75      -     0x001f   Queue depth
  75      4:0     0x1f   Maximum queue depth - 1

  76      -     0x850e   Serial ATA capabilities
  76     15          1   READ LOG DMA EXT as equiv to READ LOG EXT supported
  76     10          1   Phy Event Counters supported
  76      8          1   NCQ feature set supported
  76      3          1   SATA Gen3 signaling speed (6.0 Gb/s) supported
  76      2          1   SATA Gen2 signaling speed (3.0 Gb/s) supported
  76      1          1   SATA Gen1 signaling speed (1.5 Gb/s) supported

  77      -     0x0006   Serial ATA additional capabilities
  77      3:1      0x3   Current Serial ATA signal speed

  78      -     0x0044   Serial ATA features supported
  78      6          1   Software Settings Preservation supported
  78      2          1   DMA Setup auto-activation supported

  79      -     0x0044   Serial ATA features enabled
  79      6          1   Software Settings Preservation enabled
  79      2          1   DMA Setup auto-activation enabled

  80      -     0x03fc   Major version number
  80      9          1   ACS-2 supported
  80      8          1   ATA8-ACS supported
  80      7          1   ATA/ATAPI-7 supported
  80      6          1   ATA/ATAPI-6 supported
  80      5          1   ATA/ATAPI-5 supported
  80      4          1   ATA/ATAPI-4 supported [OBS-8]
  80      3          1   ATA-3 supported [OBS-7]
  80      2          1   ATA-2 supported [OBS-6]

  81      -     0x0039   Minor version number

  82      -     0x746b   Commands and feature sets supported
  82     14          1   NOP supported
  82     13          1   READ BUFFER supported
  82     12          1   WRITE BUFFER supported
  82     10          1   HPA feature set supported [OBS-ACS-3]
  82      6          1   Read look-ahead supported
  82      5          1   Volatile write cache supported
  82      3          1   Power Management feature set supported
  82      1          1   Security feature set supported
  82      0          1   SMART feature set supported

  83      -     0x7d01   Commands and feature sets supported
  83     15:14     0x1   Must be set to 0x1
  83     13          1   FLUSH CACHE EXT supported
  83     12          1   FLUSH CACHE supported
  83     11          1   DCO feature set supported [OBS-ACS-3]
  83     10          1   48-bit Address feature set supported
  83      8          1   SET MAX security extension supported [OBS-ACS-3]
  83      0          1   DOWNLOAD MICROCODE supported

  84      -     0x4163   Commands and feature sets supported
  84     15:14     0x1   Must be set to 0x1
  84      8          1   64-bit World Wide Name supported
  84      6          1   WRITE DMA/MULTIPLE FUA EXT supported
  84      5          1   GPL feature set supported
  84      1          1   SMART self-test supported
  84      0          1   SMART error logging supported

  85      -     0x7469   Commands and feature sets supported or enabled
  85     14          1   NOP supported
  85     13          1   READ BUFFER supported
  85     12          1   WRITE BUFFER supported
  85     10          1   HPA feature set supported [OBS-ACS-3]
  85      6          1   Read look-ahead enabled
  85      5          1   Write cache enabled
  85      3          1   Power Management feature set supported
  85      0          1   SMART feature set enabled

  86      -     0xbc01   Commands and feature sets supported or enabled
  86     15          1   Words 119-120 are valid
  86     13          1   FLUSH CACHE EXT supported
  86     12          1   FLUSH CACHE supported
  86     11          1   DCO feature set supported [OBS-ACS-3]
  86     10          1   48-bit Address features set supported
  86      0          1   DOWNLOAD MICROCODE supported

  87      -     0x4163   Commands and feature sets supported or enabled
  87     15:14     0x1   Must be set to 0x1
  87      8          1   64-bit World Wide Name supported
  87      6          1   WRITE DMA/MULTIPLE FUA EXT supported
  87      5          1   GPL feature set supported
  87      1          1   SMART self-test supported
  87      0          1   SMART error logging supported

  88      -     0x407f   Ultra DMA modes
  88     14          1   Ultra DMA mode 6 selected
  88      6          1   Ultra DMA mode 6 and below supported
  88      5          1   Ultra DMA mode 5 and below supported
  88      4          1   Ultra DMA mode 4 and below supported
  88      3          1   Ultra DMA mode 3 and below supported
  88      2          1   Ultra DMA mode 2 and below supported
  88      1          1   Ultra DMA mode 1 and below supported
  88      0          1   Ultra DMA mode 0 supported

  89      -     0x0003   SECURITY ERASE UNIT time

  90      -     0x0010   ENHANCED SECURITY ERASE UNIT time

  91      -     0x0000   Current APM level

  92      -     0xfffe   Master password revision code

  93      -     0x0000   Hardware reset result (PATA)

  94      -     0x0000   AAM level [OBS-ACS-2]

  95      -     0x0000   Stream Minimum Request Size

  96      -     0x0000   Streaming Transfer Time - DMA

  97      -     0x0000   Streaming Access Latency - DMA and PIO

  98-99   -     0x00...  Streaming Performance Granularity (DWord)

 100-103  -     .        User addressable sectors for 48-bit commands (QWord)
 100-103  .     0xc2b0:0ee7:0000:0000  (250069680)

 104      -     0x0000   Streaming Transfer Time - PIO

 105      -     0x0008   Max blocks of LBA Range Entries per DS MANAGEMENT cmd

 106      -     0x4000   Physical sector size / logical sector size
 106     15:14     0x1   Must be set to 0x1

 107      -     0x0000   Inter-seek delay for ISO 7779 acoustic testing

 108-111  -     .        64-bit World Wide Name
 108-111  .     0x5002:5386:0002:ef69

 112-115  -     0x00...  Reserved for a 128-bit World Wide Name

 116      -     0x0000   Reserved for TLC [OBS-ACS-3]

 117-118  -     0x00...  Logical sector size (DWord)

 119      -     0x401e   Commands and feature sets supported
 119     15:14     0x1   Must be set to 0x1
 119      4          1   DOWNLOAD MICROCODE with mode 3 supported
 119      3          1   READ/WRITE LOG DMA EXT supported
 119      2          1   WRITE UNCORRECTABLE EXT supported
 119      1          1   Write-Read-Verify feature set supported

 120      -     0x401c   Commands and feature sets supported or enabled
 120     15:14     0x1   Must be set to 0x1
 120      4          1   DOWNLOAD MICROCODE with mode 3 supported
 120      3          1   READ/WRITE LOG DMA EXT supported
 120      2          1   WRITE UNCORRECTABLE EXT supported

 121-126  -     0x00...  Reserved

 127      -     0x0000   Removable Media Status Notification [OBS-8]

 128      -     0x0021   Security status
 128      5          1   Enhanced security erase supported
 128      0          1   Security supported

 129-159  -     0x00...  Vendor specific

 160      -     0x0000   CFA power mode

 161-167  -     0x00...  Reserved for CFA

 168      -     0x0000   Form factor

 169      -     0x0001   Data Set Management support
 169      0          1   Trim bit in DATA SET MANAGEMENT command supported

 206      -     0x003d   SCT Command Transport
 206      5          1   SCT Data Tables supported
 206      4          1   SCT Feature Control supported
 206      3          1   SCT Error Recovery Control supported
 206      2          1   SCT Write Same supported
 206      0          1   SCT Command Transport supported

 207-208  -     0x00...  Reserved for CE-ATA

 209      -     0x4000   Alignment of logical sectors
 209     15:14     0x1   Must be set to 0x1

 210-211  -     0x00...  Write-Read-Verify sector count mode 3 (DWord)

 212-213  -     0x00...  Write-Read-Verify sector count mode 2 (DWord)

 214      -     0x0000   NV Cache capabilities [OBS-ACS-3]

 215-216  -     0x00...  NV Cache size in logical blocks (DWord) [OBS-ACS-3]

 217      -     0x0001   Nominal media rotation rate

 218      -     0x0000   Reserved

 219      -     0x0000   NV Cache options [OBS-ACS-3]

 220      -     0x0000   Write-Read-Verify mode

 221      -     0x0000   Reserved

 222      -     0x103f   Transport major version number
 222     15:12     0x1   Transport type: 0x0 = Parallel, 0x1 = Serial
 222      5          1   Reserved    | SATA 3.0
 222      4          1   Reserved    | SATA 2.6
 222      3          1   Reserved    | SATA 2.5
 222      2          1   Reserved    | SATA II: Extensions
 222      1          1   ATA/ATAPI-7 | SATA 1.0a
 222      0          1   ATA8-APT    | ATA8-AST

 223      -     0x0000   Transport minor version number

 224-229  -     0x00...  Reserved

 230-233  -     0x00...  Extended number of user addressable sectors (QWord)

 234      -     0x0000   Minimum blocks per DOWNLOAD MICROCODE mode 3 command

 235      -     0x0800   Maximum blocks per DOWNLOAD MICROCODE mode 3 command

 236-254  -     .        Reserved
 236-239  .     0x0000:0000:0000:0000
 240-243  .     0x0000:0000:0000:0000
 244-247  .     0x0000:0000:0000:0000
 248-251  .     0x0000:0000:0000:0000
 252-254  .     0xff7f:5eff:ffff

 255      -     0x2cef   Integrity word
 255     15:8     0x2c   Checksum
 255      7:0     0xef   Signature
Comment 1 Tejun Heo 2014-12-02 16:24:12 UTC
Can you please post the output of "lspci -nn"?

Thanks.
Comment 2 Sumit Rai 2014-12-02 22:54:15 UTC
I am also experiencing the similar issue. Please take a look.

Description of problem:
I am seeing these error messages in dmesg output and sometimes computer freezes.
However this is a new ssd and SMART tests are passing.
I am dual booting with another OS (Mac OS X) and don't have any issues on that OS.

Version-Release number of selected component (if applicable):
3.17.4-300.fc21.x86_64

BOOT_IMAGE=/boot/vmlinuz-3.17.4-300.fc21.x86_64 root=/dev/mapper/fedora_20-root ro selinux=0

Please also take a look at Bug 1084928 which appears pretty similar.

Dmesg:
[  824.591117] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x50000 action 0x6 frozen
[  824.591324] ata1: SError: { PHYRdyChg CommWake }
[  824.591442] ata1.00: failed command: IDLE
[  824.591553] ata1.00: cmd e3/00:e6:00:00:00/00:00:00:00:00/40 tag 23
         res 40/00:d2:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout)
[  824.591872] ata1.00: status: { DRDY }
[  824.591969] ata1: hard resetting link
[  824.896939] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[  824.897620] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out
[  824.897819] ata1.00: supports DRM functions and may not be fully accessible
[  824.906648] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out
[  824.906840] ata1.00: supports DRM functions and may not be fully accessible
[  824.915151] ata1.00: configured for UDMA/133
[  824.926007] ata1: EH complete
[  925.539433] ata1.00: exception Emask 0x0 SAct 0xc000000 SErr 0x50000 action 0x6 frozen
[  925.543577] ata1: SError: { PHYRdyChg CommWake }
[  925.547750] ata1.00: failed command: READ FPDMA QUEUED
[  925.551492] ata1.00: cmd 60/10:d0:18:c7:68/00:00:39:00:00/40 tag 26 ncq 8192 in
         res 40/00:fa:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout)
[  925.559065] ata1.00: status: { DRDY }
[  925.562429] ata1.00: failed command: WRITE FPDMA QUEUED
[  925.565739] ata1.00: cmd 61/20:d8:b8:87:a5/00:00:39:00:00/40 tag 27 ncq 16384 out
         res 40/00:be:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout)
[  925.572431] ata1.00: status: { DRDY }
[  925.575524] ata1: hard resetting link
[  925.883261] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[  925.888396] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out
[  925.892562] ata1.00: supports DRM functions and may not be fully accessible
[  925.905285] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out
[  925.909043] ata1.00: supports DRM functions and may not be fully accessible
[  925.920859] ata1.00: configured for UDMA/133
[  925.935261] ata1.00: device reported invalid CHS sector 0
[  925.939878] ata1.00: device reported invalid CHS sector 0
[  925.944296] ata1: EH complete

[root@localhost ~]# smartctl -H /dev/sda
smartctl 6.2 2014-07-16 r3952 [x86_64-linux-3.17.4-300.fc21.x86_64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
Comment 3 Sumit Rai 2014-12-02 22:54:52 UTC
Created attachment 159491 [details]
lspci -vnn
Comment 4 Sumit Rai 2014-12-02 22:56:19 UTC
Created attachment 159501 [details]
hdparm -I /dev/sda
Comment 5 Dominik Mierzejewski 2014-12-03 14:29:19 UTC
(In reply to Tejun Heo from comment #1)
> Can you please post the output of "lspci -nn"?

# lspci -nn
00:00.0 Host bridge [0600]: Intel Corporation Haswell-ULT DRAM Controller [8086:0a04] (rev 09)
00:02.0 VGA compatible controller [0300]: Intel Corporation Haswell-ULT Integrated Graphics Controller [8086:0a16] (rev 09)
00:03.0 Audio device [0403]: Intel Corporation Haswell-ULT HD Audio Controller [8086:0a0c] (rev 09)
00:14.0 USB controller [0c03]: Intel Corporation 8 Series USB xHCI HC [8086:9c31] (rev 04)
00:16.0 Communication controller [0780]: Intel Corporation 8 Series HECI #0 [8086:9c3a] (rev 04)
00:1b.0 Audio device [0403]: Intel Corporation 8 Series HD Audio Controller [8086:9c20] (rev 04)
00:1c.0 PCI bridge [0604]: Intel Corporation 8 Series PCI Express Root Port 3 [8086:9c14] (rev e4)
00:1c.3 PCI bridge [0604]: Intel Corporation 8 Series PCI Express Root Port 4 [8086:9c16] (rev e4)
00:1c.4 PCI bridge [0604]: Intel Corporation 8 Series PCI Express Root Port 6 [8086:9c1a] (rev e4)
00:1d.0 USB controller [0c03]: Intel Corporation 8 Series USB EHCI #1 [8086:9c26] (rev 04)
00:1f.0 ISA bridge [0601]: Intel Corporation 8 Series LPC Controller [8086:9c43] (rev 04)
00:1f.3 SMBus [0c05]: Intel Corporation 8 Series SMBus Controller [8086:9c22] (rev 04)
01:00.0 Network controller [0280]: Intel Corporation Wireless 7260 [8086:08b1] (rev 6b)
02:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS5209 PCI Express Card Reader [10ec:5209] (rev 01)
03:00.0 SATA controller [0106]: Samsung Electronics Co Ltd XP941 PCIe SSD [144d:a800] (rev 01)
Comment 6 Dominik Mierzejewski 2014-12-03 14:32:01 UTC
(In reply to Sumit Rai from comment #2)
> I am also experiencing the similar issue. Please take a look.
[...]
> Dmesg:
> [  824.591117] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x50000 action 0x6
> frozen
> [  824.591324] ata1: SError: { PHYRdyChg CommWake }
> [  824.591442] ata1.00: failed command: IDLE
> [  824.591553] ata1.00: cmd e3/00:e6:00:00:00/00:00:00:00:00/40 tag 23
>          res 40/00:d2:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout)
> [  824.591872] ata1.00: status: { DRDY }
> [  824.591969] ata1: hard resetting link
> [  824.896939] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
> [  824.897620] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES)
> filtered out
> [  824.897819] ata1.00: supports DRM functions and may not be fully
> accessible
> [  824.906648] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES)
> filtered out
> [  824.906840] ata1.00: supports DRM functions and may not be fully
> accessible
> [  824.915151] ata1.00: configured for UDMA/133
> [  824.926007] ata1: EH complete

This doesn't look similar at all, the error messages are completely different. Please open a separate bug report instead of introducing noise here.
Comment 7 Dominik Mierzejewski 2014-12-03 14:33:36 UTC
For the record, this was originally reported (by me) in Fedora as https://bugzilla.redhat.com/show_bug.cgi?id=1084928 .
Comment 8 Tejun Heo 2014-12-03 15:42:41 UTC
Created attachment 159561 [details]
samsung-a800-nomsi.patch

Can you please see whether the attached patch resolves the issue? Please post boot dmesg before and after the patch.

Thanks.
Comment 9 Sumit Rai 2014-12-04 00:03:40 UTC
(In reply to Dominik Mierzejewski from comment #6)
> (In reply to Sumit Rai from comment #2)
> > I am also experiencing the similar issue. Please take a look.
> [...]
> > Dmesg:
> > [  824.591117] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x50000 action
> 0x6
> > frozen
> > [  824.591324] ata1: SError: { PHYRdyChg CommWake }
> > [  824.591442] ata1.00: failed command: IDLE
> > [  824.591553] ata1.00: cmd e3/00:e6:00:00:00/00:00:00:00:00/40 tag 23
> >          res 40/00:d2:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout)
> > [  824.591872] ata1.00: status: { DRDY }
> > [  824.591969] ata1: hard resetting link
> > [  824.896939] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
> > [  824.897620] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES)
> > filtered out
> > [  824.897819] ata1.00: supports DRM functions and may not be fully
> > accessible
> > [  824.906648] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES)
> > filtered out
> > [  824.906840] ata1.00: supports DRM functions and may not be fully
> > accessible
> > [  824.915151] ata1.00: configured for UDMA/133
> > [  824.926007] ata1: EH complete
> 
> This doesn't look similar at all, the error messages are completely
> different. Please open a separate bug report instead of introducing noise
> here.

You are right, the part of output you have quoted doesn't look similiar.

1. However if you could please scroll down you will find

"
[  925.539433] ata1.00: exception Emask 0x0 SAct 0xc000000 SErr 0x50000 action 0x6 frozen
[  925.543577] ata1: SError: { PHYRdyChg CommWake }
[  925.547750] ata1.00: failed command: READ FPDMA QUEUED
[  925.551492] ata1.00: cmd 60/10:d0:18:c7:68/00:00:39:00:00/40 tag 26 ncq 8192 in
         res 40/00:fa:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout)
[  925.559065] ata1.00: status: { DRDY }
..................
[  925.920859] ata1.00: configured for UDMA/133
[  925.935261] ata1.00: device reported invalid CHS sector 0
[  925.939878] ata1.00: device reported invalid CHS sector 0
[  925.944296] ata1: EH complete"

which looks pretty similar to the error Mr. Dominik Mierzejewski is getting.
   
2. I looked at your patch, and got the idea to reboot my machine with
pci=nomsi kernel parameter. It seems to fix the issue for now, I am no longer getting the error messages. Thanks.

3. My machine and Mr. Dominik's machine, both are using ssds. I have not seen this issue with different storage media in same setup.

4. pci=nomsi not only fixed READ FPDMA QUEUED. It also fixed the error message
qutoed by you i.e.
[  824.591324] ata1: SError: { PHYRdyChg CommWake }
[  824.591442] ata1.00: failed command: IDLE.

Point number 1, 2, 3, and 4 leads me to believe it's same or similar issue. If it is the same issue it would be redundent to file a new bug report.

If you still believe it's a different issue, I will file a new bug report.

However, it you believe it's the same. Since pci=nomsi disables MSI interrupts system wide. It would help to have a patch with PCI-ID of my system SATA controller only. To that end, I have already attached output of lspci -vnn in my earlier comment.

I am still getting, that I agree is a different issue and should not be discussed in this bug report.
[    3.043884] ata2.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out
[    3.044721] ata2.00: ATAPI: MATSHITADVD-R   UJ-8A8, HA13, max UDMA/100
[    3.048037] ata2.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out
[    3.048863] ata2.00: configured for UDMA/100
Comment 10 Dominik Mierzejewski 2014-12-04 11:06:51 UTC
pci=nomsi helps here as well. I'm building a kernel with Tejun's patch from comment #8 right now.
Comment 11 Dominik Mierzejewski 2014-12-04 13:41:54 UTC
(In reply to Tejun Heo from comment #8)
> Created attachment 159561 [details]
> samsung-a800-nomsi.patch
> 
> Can you please see whether the attached patch resolves the issue? Please
> post boot dmesg before and after the patch.

It does.
Comment 12 Dominik Mierzejewski 2014-12-04 13:42:33 UTC
Created attachment 159691 [details]
dmesg after applying Tejun's patch
Comment 13 Dominik Mierzejewski 2014-12-04 13:45:47 UTC
Created attachment 159701 [details]
dmesg before applying Tejun's patch (ran with libata.force=noncq)
Comment 14 Dominik Mierzejewski 2014-12-04 14:19:44 UTC
Out of curiosity, what are the downsides of disabling MSI for my disk/controller vs. disabling NCQ?
Comment 15 Tejun Heo 2014-12-04 18:12:06 UTC
W/o NCQ, the drive can only process one command at a time which can show up as noticeable performance degradation on certain workloads. Disabling MSI shouldn't cause any noticeable difference. It may lead to marginal increase in CPU consumption but it should be miniscule.

Sumit, your issue is entirely different. It's a different controller leading to different errors under different circumstances. Can you please open a new bug report? This MSI/NCQ issue has already been reported on a similar samsung device, but I don't want to blanket disable MSI on an intel chipset without knowing what's going on.

Thanks.
Comment 16 Tejun Heo 2014-12-04 18:21:51 UTC
Patch posted and applied.

http://lkml.kernel.org/g/20141204181959.GB4080@htj.dyndns.org

Thanks.
Comment 17 Sumit Rai 2014-12-05 03:57:01 UTC
(In reply to Tejun Heo from comment #15)

> Sumit, your issue is entirely different. It's a different controller leading
> to different errors under different circumstances. Can you please open a new
> bug report? This MSI/NCQ issue has already been reported on a similar
> samsung device, but I don't want to blanket disable MSI on an intel chipset
> without knowing what's going on.
> 
> Thanks.

Thanks for clarifying. I have not been able to reproduce this issue after removing virtualbox (vboxdrv) which taints the kernel. I will file a new bug if issue occura again in untainted kernel. Sorry for inconvenience.
Comment 18 Sumit Rai 2014-12-05 04:27:09 UTC
TWIMC: Filed a new bug: https://bugzilla.kernel.org/show_bug.cgi?id=89261.

Note You need to log in before you can comment on or make changes to this bug.