Bug 203475 - Samsung 860 EVO queued TRIM issues
Summary: Samsung 860 EVO queued TRIM issues
Status: NEW
Alias: None
Product: IO/Storage
Classification: Unclassified
Component: Serial ATA (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Tejun Heo
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-05-01 22:00 UTC by Roman Mamedov
Modified: 2021-02-19 15:52 UTC (History)
16 users (show)

See Also:
Kernel Version: 4.14.114
Tree: Mainline
Regression: No


Attachments
dmesg of the errors occuring (14.52 KB, text/plain)
2019-05-01 22:00 UTC, Roman Mamedov
Details
disable queued TRIM for Samsung 860 series SSDs (548 bytes, patch)
2019-05-01 22:01 UTC, Roman Mamedov
Details | Diff

Description Roman Mamedov 2019-05-01 22:00:54 UTC
Created attachment 282579 [details]
dmesg of the errors occuring

I have a Samsung SSD 860 EVO mSATA 500GB SSD connected via an ASMedia ASM1062 Serial ATA Controller. It causes has 20-30 seconds lockups on fstrim (which runs during bootup on my system), with messages such as:

[  332.792044] ata14.00: exception Emask 0x0 SAct 0x3fffe SErr 0x0 action 0x6 frozen
[  332.798271] ata14.00: failed command: SEND FPDMA QUEUED
[  332.804499] ata14.00: cmd 64/01:08:00:00:00/00:00:00:00:00/a0 tag 1 ncq dma 512 out
                        res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[  332.817145] ata14.00: status: { DRDY }

After disabling queued TRIM via the included patch, the issue disappears.
Comment 1 Roman Mamedov 2019-05-01 22:01:44 UTC
Created attachment 282581 [details]
disable queued TRIM for Samsung 860 series SSDs
Comment 2 Solomon Peachy 2019-07-13 12:29:27 UTC
This patch is still relevant for master.  Add my vote to merging this; I'd like to be able to re-enable NCQ on this SSD.
Comment 3 Jens Axboe 2019-07-14 16:57:43 UTC
This patch looks good - any chance you can email one with a proper commit log and signed-off-by etc to linux-ide@vger.kernel.org? And you can CC me, axboe@kernel.dk, and I'll get it queued up for the current kernel.
Comment 4 Roman Mamedov 2019-07-15 17:41:33 UTC
Jens, thanks, sent to https://marc.info/?l=linux-ide&m=156312691006716&w=2, it is now being discussed there.

Solomon: what model do you have that also has a problem with TRIM, 860 EVO mSATA too? And which firmware revision?
Comment 5 Solomon Peachy 2019-07-15 17:54:25 UTC
I have the 1TB SATA (not mSATA!) version.

smartctl -a dump:

Model Family:     Samsung based SSDs
Device Model:     Samsung SSD 860 EVO 1TB
Serial Number:    S3Z8NB0K717690X
LU WWN Device Id: 5 002538 e4054049c
Firmware Version: RVT01B6Q
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-4 T13/BSR INCITS 529 revision 5
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Mon Jul 15 13:47:44 2019 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

kernel log snippet: (Untainted Fedora 5.1.16-300.fc30.x86_64 kernel)

ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
ata1.00: supports DRM functions and may not be fully accessible
ata1.00: ATA-11: Samsung SSD 860 EVO 1TB, RVT01B6Q, max UDMA/133
ata1.00: 1953525168 sectors, multi 1: LBA48 NCQ (depth 32), AA
ata1.00: supports DRM functions and may not be fully accessible
ata1.00: configured for UDMA/133
scsi 0:0:0:0: Direct-Access     ATA      Samsung SSD 860  1B6Q PQ: 0 ANSI: 5
sd 0:0:0:0: Attached scsi generic sg0 type 0
ata1.00: Enabling discard_zeroes_data
sd 0:0:0:0: [sda] 1953525168 512-byte logical blocks: (1.00 TB/932 GiB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
ata1.00: Enabling discard_zeroes_data
sda: sda1 sda2 sda3
ata1.00: Enabling discard_zeroes_data
sd 0:0:0:0: [sda] supports TCG Opal
sd 0:0:0:0: [sda] Attached SCSI disk
Comment 6 Solomon Peachy 2019-07-15 17:59:18 UTC
See also BZ #201693
Comment 7 Roman Mamedov 2019-07-15 18:38:08 UTC
> See also BZ #201693

Did you confirm that with my patch applied you have no problem with 860 EVO on the AMD SATA controller anymore? I thought that one is a hopeless matter and the issues extend to more than just TRIM, to regular (high-speed) reads/writes too. For that reason I moved mine to an ASMedia controller, and here it is clear-cut that only the queued TRIM fails, everything else works fine.
Comment 8 Solomon Peachy 2019-07-15 18:50:52 UTC
I'm building a patched fedora kernel with the patch, and will get back to you later today.

But in the mean time I can confirm that by setting the drive's queue depth to 1, I have no timeout or corruption issues.  [[ echo 1 > /sys/block/sda/device/queue_depth ]]
Comment 9 Solomon Peachy 2019-07-16 02:40:21 UTC
Finally got it built and booted up.. and it went kaboom.

Same kernel (Fedora 5.1.16-300) but with Roman's patch applied, yields much the same kernel log, with this addition:

ata1.00: disabling queued TRIM support

Unfortunately, about 30 seconds later, it went kaboom:

[   35.527148] ata1.00: exception Emask 0x10 SAct 0xfc000 SErr 0x0 action 0x6 frozen
[   35.527155] ata1.00: irq_stat 0x08000000, interface fatal error
[   35.527161] ata1.00: failed command: WRITE FPDMA QUEUED
[   35.527171] ata1.00: cmd 61/20:70:e0:a6:8b/00:00:25:00:00/40 tag 14 ncq dma 16384 out
                        res 40/00:70:e0:a6:8b/00:00:25:00:00/40 Emask 0x10 (ATA bus error)
[   35.527176] ata1.00: status: { DRDY }
[   35.527179] ata1.00: failed command: WRITE FPDMA QUEUED
[   35.527187] ata1.00: cmd 61/08:78:e0:ad:8b/00:00:25:00:00/40 tag 15 ncq dma 4096 out
                        res 40/00:70:e0:a6:8b/00:00:25:00:00/40 Emask 0x10 (ATA bus error)
[   35.527191] ata1.00: status: { DRDY }
[   35.527194] ata1.00: failed command: WRITE FPDMA QUEUED
[   35.527202] ata1.00: cmd 61/20:80:60:d0:91/00:00:25:00:00/40 tag 16 ncq dma 16384 out
                        res 40/00:70:e0:a6:8b/00:00:25:00:00/40 Emask 0x10 (ATA bus error)
[   35.527205] ata1.00: status: { DRDY }
[   35.527208] ata1.00: failed command: WRITE FPDMA QUEUED
[   35.527216] ata1.00: cmd 61/40:88:00:d1:91/00:00:25:00:00/40 tag 17 ncq dma 32768 out
                        res 40/00:70:e0:a6:8b/00:00:25:00:00/40 Emask 0x10 (ATA bus error)
[   35.527219] ata1.00: status: { DRDY }
[   35.527222] ata1.00: failed command: WRITE FPDMA QUEUED
[   35.527230] ata1.00: cmd 61/08:90:c0:51:92/00:00:25:00:00/40 tag 18 ncq dma 4096 out
                        res 40/00:70:e0:a6:8b/00:00:25:00:00/40 Emask 0x10 (ATA bus error)
[   35.527233] ata1.00: status: { DRDY }
[   35.527236] ata1.00: failed command: WRITE FPDMA QUEUED
[   35.527243] ata1.00: cmd 61/20:98:20:52:92/00:00:25:00:00/40 tag 19 ncq dma 16384 out
                        res 40/00:70:e0:a6:8b/00:00:25:00:00/40 Emask 0x10 (ATA bus error)
[   35.527246] ata1.00: status: { DRDY }
[   35.527252] ata1: hard resetting link
[   35.986132] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[   35.986457] ata1.00: supports DRM functions and may not be fully accessible
[   35.987384] ata1.00: disabling queued TRIM support
[   35.989818] ata1.00: supports DRM functions and may not be fully accessible
[   35.990591] ata1.00: disabling queued TRIM support
[   35.992641] ata1.00: configured for UDMA/133
[   35.992670] ata1: EH complete
[   35.992941] ata1.00: Enabling discard_zeroes_data

So perhaps this SSD is simply incompatible with NCQ.  Sigh.
Comment 10 Roman Mamedov 2019-07-16 04:14:12 UTC
> So perhaps this SSD is simply incompatible with NCQ.

Not in general, only in combination with AMD SATA, as discussed in that other bugreport. And indeed there it's not only TRIM, but also regular writes. Any chance you could test on a different controller (ASMedia, Marvell, ...)?
Comment 11 Solomon Peachy 2019-07-16 12:03:54 UTC
It's frustrating that Samsung has demonstrated no interest in solving this problem properly.  It's not like AMD-based systems are _that_ rare.

Every system I have at home is AMD-based or has an incompatible form factor.  I'll see what I can dig up around the office.
Comment 12 Solomon Peachy 2019-07-25 23:50:42 UTC
I just swapped in an ASMedia-based SATA controller, and re-enabled NCQ (by using the default queue_depth).  The system is subjectively much, much faster and is (so far) error free.
Comment 13 Simon Arlott 2020-07-04 09:15:00 UTC
I'm getting the same issue on 4.15..5.4.49 with an Intel ASRock Z170 Extreme4 SATA controller:

[389520.385306] ata2.00: exception Emask 0x0 SAct 0xf SErr 0x0 action 0x6 frozen
[389520.385315] ata2.00: failed command: WRITE FPDMA QUEUED
[389520.385327] ata2.00: cmd 61/60:00:80:8e:20/00:00:98:00:00/40 tag 0 ncq dma 49152 out
                         res 40/00:01:00:4f:c2/00:00:00:00:00/40 Emask 0x4 (timeout)
[389520.385332] ata2.00: status: { DRDY }
[389520.385336] ata2.00: failed command: WRITE FPDMA QUEUED
[389520.385345] ata2.00: cmd 61/20:08:00:8f:20/00:00:98:00:00/40 tag 1 ncq dma 16384 out
                         res 40/00:01:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout)
[389520.385349] ata2.00: status: { DRDY }
[389520.385353] ata2.00: failed command: SEND FPDMA QUEUED
[389520.385364] ata2.00: cmd 64/01:10:00:00:00/00:00:00:00:00/a0 tag 2 ncq dma 512 out
                         res 40/00:01:00:4f:c2/00:00:00:00:00/40 Emask 0x4 (timeout)
[389520.385370] ata2.00: status: { DRDY }
[389520.385374] ata2.00: failed command: WRITE FPDMA QUEUED
[389520.385382] ata2.00: cmd 61/e0:18:b8:ea:77/05:00:97:00:00/40 tag 3 ncq dma 770048 out
                         res 40/00:01:00:4f:c2/00:00:00:00:00/40 Emask 0x4 (timeout)
[389520.385386] ata2.00: status: { DRDY }
[389520.385393] ata2: hard resetting link
[389520.699442] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[389520.701434] ata2.00: supports DRM functions and may not be fully accessible
[389520.704682] ata2.00: supports DRM functions and may not be fully accessible
[389520.707501] ata2.00: configured for UDMA/133
[389520.707511] ata2: EH complete
[389520.707742] ata2.00: Enabling discard_zeroes_data
[389551.093259] ata2.00: exception Emask 0x0 SAct 0x1fc0000 SErr 0x0 action 0x6 frozen
[389551.093261] ata2.00: failed command: WRITE FPDMA QUEUED
[389551.093264] ata2.00: cmd 61/d8:90:a8:bc:a0/09:00:97:00:00/40 tag 18 ncq dma 1290240 ou
                         res 40/00:01:00:4f:c2/00:00:00:00:00/40 Emask 0x4 (timeout)
[389551.093265] ata2.00: status: { DRDY }
[389551.093266] ata2.00: failed command: WRITE FPDMA QUEUED
[389551.093267] ata2.00: cmd 61/e0:98:b8:ea:77/05:00:97:00:00/40 tag 19 ncq dma 770048 out
                         res 40/00:01:00:4f:c2/00:00:00:00:00/40 Emask 0x4 (timeout)
[389551.093268] ata2.00: status: { DRDY }
[389551.093269] ata2.00: failed command: SEND FPDMA QUEUED
[389551.093271] ata2.00: cmd 64/01:a0:00:00:00/00:00:00:00:00/a0 tag 20 ncq dma 512 out
                         res 40/00:01:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout)
[389551.093271] ata2.00: status: { DRDY }
[389551.093272] ata2.00: failed command: WRITE FPDMA QUEUED
[389551.093274] ata2.00: cmd 61/20:a8:00:8f:20/00:00:98:00:00/40 tag 21 ncq dma 16384 out
                         res 40/00:01:00:4f:c2/00:00:00:00:00/40 Emask 0x4 (timeout)
[389551.093274] ata2.00: status: { DRDY }
[389551.093275] ata2.00: failed command: WRITE FPDMA QUEUED
[389551.093295] ata2.00: cmd 61/60:b0:80:8e:20/00:00:98:00:00/40 tag 22 ncq dma 49152 out
                         res 40/00:01:00:4f:c2/00:00:00:00:00/40 Emask 0x4 (timeout)
[389551.093296] ata2.00: status: { DRDY }
[389551.093296] ata2.00: failed command: WRITE FPDMA QUEUED
[389551.093298] ata2.00: cmd 61/b0:b8:80:c6:a0/09:00:97:00:00/40 tag 23 ncq dma 1269760 ou
                         res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[389551.093299] ata2.00: status: { DRDY }
[389551.093300] ata2.00: failed command: WRITE FPDMA QUEUED
[389551.093301] ata2.00: cmd 61/10:c0:f0:21:22/00:00:96:00:00/40 tag 24 ncq dma 8192 out
                         res 40/00:01:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout)
[389551.093302] ata2.00: status: { DRDY }
[389551.093303] ata2: hard resetting link
[389551.407389] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[389551.409259] ata2.00: supports DRM functions and may not be fully accessible
[389551.412712] ata2.00: supports DRM functions and may not be fully accessible
[389551.415759] ata2.00: configured for UDMA/133
[389551.415773] ata2: EH complete
[389581.797243] ata2.00: exception Emask 0x0 SAct 0x3f80 SErr 0x0 action 0x6 frozen
[389581.797246] ata2.00: failed command: WRITE FPDMA QUEUED
[389581.797248] ata2.00: cmd 61/10:38:f0:21:22/00:00:96:00:00/40 tag 7 ncq dma 8192 out
                         res 40/00:01:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout)
[389581.797249] ata2.00: status: { DRDY }
[389581.797250] ata2.00: failed command: WRITE FPDMA QUEUED
[389581.797252] ata2.00: cmd 61/b0:40:80:c6:a0/09:00:97:00:00/40 tag 8 ncq dma 1269760 ou
                         res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[389581.797253] ata2.00: status: { DRDY }
[389581.797253] ata2.00: failed command: WRITE FPDMA QUEUED
[389581.797255] ata2.00: cmd 61/60:48:80:8e:20/00:00:98:00:00/40 tag 9 ncq dma 49152 out
                         res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[389581.797256] ata2.00: status: { DRDY }
[389581.797257] ata2.00: failed command: WRITE FPDMA QUEUED
[389581.797258] ata2.00: cmd 61/20:50:00:8f:20/00:00:98:00:00/40 tag 10 ncq dma 16384 out
                         res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[389581.797259] ata2.00: status: { DRDY }
[389581.797260] ata2.00: failed command: SEND FPDMA QUEUED
[389581.797262] ata2.00: cmd 64/01:58:00:00:00/00:00:00:00:00/a0 tag 11 ncq dma 512 out
                         res 40/00:01:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout)
[389581.797262] ata2.00: status: { DRDY }
[389581.797263] ata2.00: failed command: WRITE FPDMA QUEUED
[389581.797265] ata2.00: cmd 61/e0:60:b8:ea:77/05:00:97:00:00/40 tag 12 ncq dma 770048 out
                         res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[389581.797265] ata2.00: status: { DRDY }
[389581.797266] ata2.00: failed command: WRITE FPDMA QUEUED
[389581.797268] ata2.00: cmd 61/d8:68:a8:bc:a0/09:00:97:00:00/40 tag 13 ncq dma 1290240 ou
                         res 40/00:01:00:4f:c2/00:00:00:00:00/40 Emask 0x4 (timeout)
[389581.797268] ata2.00: status: { DRDY }
[389581.797270] ata2: hard resetting link
[389582.111393] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[389582.113289] ata2.00: supports DRM functions and may not be fully accessible
[389582.116517] ata2.00: supports DRM functions and may not be fully accessible
[389582.119421] ata2.00: configured for UDMA/133
[389582.119438] ata2: EH complete
[389582.119715] ata2.00: Enabling discard_zeroes_data
[389582.120788] ata2.00: Enabling discard_zeroes_data
[389612.533285] ata2.00: NCQ disabled due to excessive errors
[389612.533292] ata2.00: exception Emask 0x0 SAct 0x7c00000f SErr 0x0 action 0x6 frozen
[389612.533301] ata2.00: failed command: WRITE FPDMA QUEUED
[389612.533313] ata2.00: cmd 61/b0:00:80:c6:a0/09:00:97:00:00/40 tag 0 ncq dma 1269760 ou
                         res 40/00:01:00:4f:c2/00:00:00:00:00/40 Emask 0x4 (timeout)
[389612.533317] ata2.00: status: { DRDY }
[389612.533322] ata2.00: failed command: WRITE FPDMA QUEUED
[389612.533331] ata2.00: cmd 61/10:08:f0:21:22/00:00:96:00:00/40 tag 1 ncq dma 8192 out
                         res 40/00:01:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout)
[389612.533335] ata2.00: status: { DRDY }
[389612.533339] ata2.00: failed command: READ FPDMA QUEUED
[389612.533347] ata2.00: cmd 60/18:10:c0:d3:00/00:00:00:00:00/40 tag 2 ncq dma 12288 in
                         res 40/00:01:00:4f:c2/00:00:00:00:00/40 Emask 0x4 (timeout)
[389612.533351] ata2.00: status: { DRDY }
[389612.533354] ata2.00: failed command: READ FPDMA QUEUED
[389612.533363] ata2.00: cmd 60/20:18:80:b9:e7/00:00:58:00:00/40 tag 3 ncq dma 16384 in
                         res 40/00:01:00:4f:c2/00:00:00:00:00/40 Emask 0x4 (timeout)
[389612.533366] ata2.00: status: { DRDY }
[389612.533371] ata2.00: failed command: WRITE FPDMA QUEUED
[389612.533380] ata2.00: cmd 61/d8:d0:a8:bc:a0/09:00:97:00:00/40 tag 26 ncq dma 1290240 ou
                         res 40/00:01:00:4f:c2/00:00:00:00:00/40 Emask 0x4 (timeout)
[389612.533383] ata2.00: status: { DRDY }
[389612.533387] ata2.00: failed command: WRITE FPDMA QUEUED
[389612.533396] ata2.00: cmd 61/e0:d8:b8:ea:77/05:00:97:00:00/40 tag 27 ncq dma 770048 out
                         res 40/00:01:00:4f:c2/00:00:00:00:00/40 Emask 0x4 (timeout)
[389612.533399] ata2.00: status: { DRDY }
[389612.533402] ata2.00: failed command: SEND FPDMA QUEUED
[389612.533410] ata2.00: cmd 64/01:e0:00:00:00/00:00:00:00:00/a0 tag 28 ncq dma 512 out
                         res 40/00:01:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout)
[389612.533414] ata2.00: status: { DRDY }
[389612.533417] ata2.00: failed command: WRITE FPDMA QUEUED
[389612.533426] ata2.00: cmd 61/20:e8:00:8f:20/00:00:98:00:00/40 tag 29 ncq dma 16384 out
                         res 40/00:01:00:4f:c2/00:00:00:00:00/40 Emask 0x4 (timeout)
[389612.533429] ata2.00: status: { DRDY }
[389612.533433] ata2.00: failed command: WRITE FPDMA QUEUED
[389612.533441] ata2.00: cmd 61/60:f0:80:8e:20/00:00:98:00:00/40 tag 30 ncq dma 49152 out
                         res 40/00:01:00:4f:c2/00:00:00:00:00/40 Emask 0x4 (timeout)
[389612.533445] ata2.00: status: { DRDY }
[389612.533451] ata2: hard resetting link
[389612.851755] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[389612.853797] ata2.00: supports DRM functions and may not be fully accessible
[389612.857594] ata2.00: supports DRM functions and may not be fully accessible
[389612.860819] ata2.00: configured for UDMA/133
[389612.860879] ata2: EH complete
[389612.865362] ata2.00: Enabling discard_zeroes_data

This is during an fstrim, and it doesn't happen on the Samsung 850 EVO.

Device Model:     Samsung SSD 850 EVO 2TB
Firmware Version: EMT02B6Q

Device Model:     Samsung SSD 860 EVO 2TB
Firmware Version: RVT04B6Q

00:17.0 SATA controller: Intel Corporation Q170/Q150/B150/H170/H110/Z170/CM236 Chipset SATA Controller [AHCI Mode] (rev 31)
Comment 14 stathis 2020-12-01 20:58:37 UTC
Same issue, different controller:

System: FUJITSU PRIMERGY TX1310 M1/D3219-A1, BIOS V4.6.5.4 R1.11.0 for D3219-A1x 09/25/2018

Kernel: Linux server 5.4.72-gentoo-x86_64 #1 SMP Sat Oct 17 05:17:10 EET 2020 x86_64 Intel(R) Xeon(R) CPU E3-1226 v3 @ 3.30GHz GenuineIntel GNU/Linux

Controller: 00:1f.2 SATA controller: Intel Corporation 8 Series/C220 Series Chipset Family 6-port SATA Controller 1 [AHCI mode] (rev 04)

Device Model:     Samsung SSD 860 EVO 500GB
Firmware Version: RVT04B6Q

[395138.151251] ata6.00: exception Emask 0x10 SAct 0x40003fff SErr 0x400100 action 0x6 frozen
[395138.152011] ata6.00: irq_stat 0x08000008, interface fatal error
[395138.152755] ata6: SError: { UnrecovData Handshk }
[395138.153470] ata6.00: failed command: WRITE FPDMA QUEUED
[395138.154222] ata6.00: cmd 61/08:00:78:38:80/00:00:0e:00:00/40 tag 0 ncq dma 4096 out
                         res 40/00:68:f8:12:c0/00:00:13:00:00/40 Emask 0x10 (ATA bus error)
[395138.155801] ata6.00: status: { DRDY }
[395138.156579] ata6.00: failed command: WRITE FPDMA QUEUED
[395138.156581] ata6.00: cmd 61/08:08:18:26:81/00:00:0e:00:00/40 tag 1 ncq dma 4096 out
                         res 40/00:68:f8:12:c0/00:00:13:00:00/40 Emask 0x10 (ATA bus error)
[395138.156581] ata6.00: status: { DRDY }
[395138.156582] ata6.00: failed command: WRITE FPDMA QUEUED
[395138.156593] ata6.00: cmd 61/08:10:50:59:81/00:00:0e:00:00/40 tag 2 ncq dma 4096 out
                         res 40/00:68:f8:12:c0/00:00:13:00:00/40 Emask 0x10 (ATA bus error)
[395138.156594] ata6.00: status: { DRDY }
[395138.156594] ata6.00: failed command: WRITE FPDMA QUEUED
[395138.156596] ata6.00: cmd 61/08:18:90:6a:81/00:00:0e:00:00/40 tag 3 ncq dma 4096 out
                         res 40/00:68:f8:12:c0/00:00:13:00:00/40 Emask 0x10 (ATA bus error)
[395138.156596] ata6.00: status: { DRDY }
[395138.156597] ata6.00: failed command: WRITE FPDMA QUEUED
[395138.156598] ata6.00: cmd 61/08:20:58:b2:81/00:00:0e:00:00/40 tag 4 ncq dma 4096 out
                         res 40/00:68:f8:12:c0/00:00:13:00:00/40 Emask 0x10 (ATA bus error)
[395138.156599] ata6.00: status: { DRDY }
[395138.156599] ata6.00: failed command: WRITE FPDMA QUEUED
[395138.156601] ata6.00: cmd 61/08:28:b0:26:c0/00:00:0e:00:00/40 tag 5 ncq dma 4096 out
                         res 40/00:68:f8:12:c0/00:00:13:00:00/40 Emask 0x10 (ATA bus error)
[395138.156602] ata6.00: status: { DRDY }
[395138.171913] ata6.00: failed command: WRITE FPDMA QUEUED
[395138.171915] ata6.00: cmd 61/10:30:a0:27:c0/00:00:0e:00:00/40 tag 6 ncq dma 8192 out
                         res 40/00:68:f8:12:c0/00:00:13:00:00/40 Emask 0x10 (ATA bus error)
[395138.171916] ata6.00: status: { DRDY }
[395138.171916] ata6.00: failed command: WRITE FPDMA QUEUED
[395138.171919] ata6.00: cmd 61/08:38:50:2a:c0/00:00:0e:00:00/40 tag 7 ncq dma 4096 out
                         res 40/00:68:f8:12:c0/00:00:13:00:00/40 Emask 0x10 (ATA bus error)
[395138.176836] ata6.00: status: { DRDY }
[395138.176837] ata6.00: failed command: WRITE FPDMA QUEUED
[395138.176839] ata6.00: cmd 61/08:40:e8:49:c8/00:00:0e:00:00/40 tag 8 ncq dma 4096 out
                         res 40/00:68:f8:12:c0/00:00:13:00:00/40 Emask 0x10 (ATA bus error)
[395138.176839] ata6.00: status: { DRDY }
[395138.176840] ata6.00: failed command: WRITE FPDMA QUEUED
[395138.176841] ata6.00: cmd 61/08:48:58:08:80/00:00:0f:00:00/40 tag 9 ncq dma 4096 out
                         res 40/00:68:f8:12:c0/00:00:13:00:00/40 Emask 0x10 (ATA bus error)
[395138.176842] ata6.00: status: { DRDY }
[395138.183063] ata6.00: failed command: WRITE FPDMA QUEUED
[395138.183065] ata6.00: cmd 61/08:50:08:08:c0/00:00:13:00:00/40 tag 10 ncq dma 4096 out
                         res 40/00:68:f8:12:c0/00:00:13:00:00/40 Emask 0x10 (ATA bus error)
[395138.183075] ata6.00: status: { DRDY }
[395138.183076] ata6.00: failed command: WRITE FPDMA QUEUED
[395138.183077] ata6.00: cmd 61/08:58:80:08:c0/00:00:13:00:00/40 tag 11 ncq dma 4096 out
                         res 40/00:68:f8:12:c0/00:00:13:00:00/40 Emask 0x10 (ATA bus error)
[395138.183078] ata6.00: status: { DRDY }
[395138.189053] ata6.00: failed command: WRITE FPDMA QUEUED
[395138.189055] ata6.00: cmd 61/08:60:a8:12:c0/00:00:13:00:00/40 tag 12 ncq dma 4096 out
                         res 40/00:68:f8:12:c0/00:00:13:00:00/40 Emask 0x10 (ATA bus error)
[395138.189055] ata6.00: status: { DRDY }
[395138.189065] ata6.00: failed command: WRITE FPDMA QUEUED
[395138.189066] ata6.00: cmd 61/08:68:f8:12:c0/00:00:13:00:00/40 tag 13 ncq dma 4096 out
                         res 40/00:68:f8:12:c0/00:00:13:00:00/40 Emask 0x10 (ATA bus error)
[395138.189067] ata6.00: status: { DRDY }
[395138.189068] ata6.00: failed command: WRITE FPDMA QUEUED
[395138.189070] ata6.00: cmd 61/08:f0:90:2d:80/00:00:0e:00:00/40 tag 30 ncq dma 4096 out
                         res 40/00:68:f8:12:c0/00:00:13:00:00/40 Emask 0x10 (ATA bus error)
[395138.189071] ata6.00: status: { DRDY }
[395138.199031] ata6: hard resetting link
[395138.511140] ata6: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[395138.517064] ata6.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded
[395138.519256] ata6.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
[395138.521402] ata6.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
[395138.523837] ata6.00: supports DRM functions and may not be fully accessible
[395138.529475] ata6.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded
[395138.530403] ata6.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
[395138.531236] ata6.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
[395138.532417] ata6.00: supports DRM functions and may not be fully accessible
[395138.536106] ata6.00: configured for UDMA/133
[395138.537034] ata6: EH complete
[395138.537973] ata6.00: Enabling discard_zeroes_data


What's the recommended way to go? Disable NCQ?
Comment 15 Roman Mamedov 2020-12-05 09:12:16 UTC
> What's the recommended way to go? Disable NCQ?

I believe if you see "WRITE FPDMA QUEUED" messages, the issue is with NCQ in general, and yes, you should try disabling it for the device. But if you see "SEND FPDMA QUEUED" as in the initial post, then you might've gotten away with disabling just the queued TRIM.

It is surprising to see that it even fails on Intel's controllers as well, all of this was mostly discussed with regard to AMD SATA.
Comment 16 Simon Arlott 2020-12-05 10:35:43 UTC
(In reply to Roman Mamedov from comment #15)
> It is surprising to see that it even fails on Intel's controllers as well,
> all of this was mostly discussed with regard to AMD SATA.

It's not surprising when you realise that queued trim used to be disabled on the Samsung 8* until Samsung's marketing department made an unsubstantiated claim that "the improved queued trim enhances Linux compatibility":

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ca6bfcb2f6d9deab3924bf901e73622a94900473
Comment 17 Hans de Goede 2020-12-05 15:30:23 UTC
(In reply to Simon Arlott from comment #16)
> It's not surprising when you realise that queued trim used to be disabled on
> the Samsung 8* until Samsung's marketing department made an unsubstantiated
> claim that "the improved queued trim enhances Linux compatibility":
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/
> ?id=ca6bfcb2f6d9deab3924bf901e73622a94900473

So it sounds like we just need to revert that patch, or at least re-enable the ATA_HORKAGE_NO_NCQ_TRIM quirk for the 860 series ?
Comment 18 Sitsofe Wheeler 2020-12-08 09:14:08 UTC
Hans: also see https://bugzilla.kernel.org/show_bug.cgi?id=201693 . My personal experience is detailed over on https://marc.info/?t=154644279600003&r=1&w=2 and happens on plain reads. I've been booting with the kernel param libata.force=2.00:noncq to disable NCQ on the second ATA port where the Samsung 860 is plugged in which seems to stabilize things.
Comment 19 stathis 2020-12-08 21:14:45 UTC

(In reply to Sitsofe Wheeler from comment #18)
> Hans: also see https://bugzilla.kernel.org/show_bug.cgi?id=201693 . My
> personal experience is detailed over on
> https://marc.info/?t=154644279600003&r=1&w=2 and happens on plain reads.
> I've been booting with the kernel param libata.force=2.00:noncq to disable
> NCQ on the second ATA port where the Samsung 860 is plugged in which seems
> to stabilize things.


I disabled NCQ for the drive using the equivalent kernel parameter and have not seen these messages again (although they have only appeared once recently - after a few months of the SSD's operation). 

For what is worth it, performance of 4K random reads has seen a tenfold decline (from 380MB/s down to 38MB/s) without NCQ, which I guess is expectable. Performance on other tests, with NCQ vs without NCQ, didn't seem to be affected much.
Comment 20 Andriy 2021-02-12 13:05:29 UTC
Intel controller, same issue.

Model: Samsung SSD 860 EVO 1TB
Firmware Revision:  RVT04B6Q

Machine: Dell Precision M4700
BIOS: A19, 11/30/2018

SATA controller: Intel Corporation 7 Series Chipset Family 6-port SATA Controller [AHCI mode] (rev 04)

Kernel: Linux  5.10.0-1-amd64 #1 SMP Debian 5.10.5-1 (2021-01-09) x86_64 GNU/Linux

Linux version 5.10.0-1-amd64 (debian-kernel@lists.debian.org) (gcc-10 (Debian 10.2.1-5) 10.2.1 20210108, GNU ld (GNU Binutils for Debian) 2.35.1) #1 SMP Debian 5.10.5-1 (2021-01-09)

ata1.00: exception Emask 0x10 SAct 0x7f80 SErr 0x440100 action 0x6 frozen
ata1.00: irq_stat 0x08000000, interface fatal error
ata1: SError: { UnrecovData CommWake Handshk }
ata1.00: failed command: WRITE FPDMA QUEUED
ata1.00: cmd 61/00:38:20:16:02/0a:00:65:00:00/40 tag 7 ncq dma 1310720 ou
         res 40/00:40:20:20:02/00:00:65:00:00/40 Emask 0x10 (ATA bus error)
ata1.00: status: { DRDY }

Disabling NCQ and setting link_power_management_policy to max_performance reduces the frequency of errors.    

echo 1 > /sys/block/sda/device/queue_depth
echo max_performance > /sys/class/scsi_host/host*/link_power_management_policy

I had some days without errors, but occasionally they are happening again mostly after updating/installing packages.
Comment 21 PJBrs 2021-02-19 15:52:04 UTC
I'm encountering this bug as well, on a Thinkpad t450s, a Samsung SSD 860 EVO 1TB (firmware RVT04B6Q) with Slackware-14.2 with kernel upgraded to 5.10.15. I'm adding my info particularly because of my non-AMD SATA controller:

00:1f.2 SATA controller: Intel Corporation Wildcat Point-LP SATA Controller [AHCI Mode] (rev 03) (prog-if 01 [AHCI 1.0])
        Subsystem: Lenovo Wildcat Point-LP SATA Controller [AHCI Mode]
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin B routed to IRQ 44
        Region 0: I/O ports at 30a8 [size=8]
        Region 1: I/O ports at 30b4 [size=4]
        Region 2: I/O ports at 30a0 [size=8]
        Region 3: I/O ports at 30b0 [size=4]
        Region 4: I/O ports at 3060 [size=32]
        Region 5: Memory at f123c000 (32-bit, non-prefetchable) [size=2K]
        Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
                Address: fee00298  Data: 0000
        Capabilities: [70] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold-)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [a8] SATA HBA v1.0 BAR4 Offset=00000004
        Kernel driver in use: ahci

I have two ext4 partitions mounted with discards on, one of which encrypted. I see ata errors just about every time I reboot my machine, and was able to easily provoke it manually by issuing fstrim on my root and home partitions.

I was (apparently) able to work around this bug both by issueing echo 1 > /sys/block/sda/device/queue_depth and by reverting https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ca6bfcb2f6d9deab3924bf901e73622a94900473

Please let me know if there's anything else I can do to help. I personally was quite put off by the sudden onset of all these ata errors after I thought I had prolonged my laptop's life with a nice and big SSD. I'm happy to work around the issue, but it would be better to be able to use vanilla sources without failures.

Note You need to log in before you can comment on or make changes to this bug.