Bug 104421

Summary: NCQ writes on encrypted Samsung SSD 850 EVO cause data corruption
Product: IO/Storage Reporter: throwaway42
Component: LVM2/DMAssignee: Alasdair G Kergon (agk)
Status: CLOSED CODE_FIX    
Severity: normal CC: agk, snitzer
Priority: P1    
Hardware: Intel   
OS: Linux   
Kernel Version: 4.1.6 Subsystem:
Regression: No Bisected commit-id:
Attachments: dmesg ata errors

Description throwaway42 2015-09-11 16:27:32 UTC
Created attachment 187411 [details]
dmesg ata errors

The affected drive contains my root partition, the following specs apply:

SSD Model: Samsung SSD 850 EVO 500GB
SSD Firmware: EMT01B6Q"
Partition encrypted with luks, header info:
    Version:       	1
    Cipher name:   	aes
    Cipher mode:   	xts-plain
    Hash spec:     	sha1
NCQ enabled
TRIM not enabled
Filesystem in LUKS container: f2fs

lsblk -f
sda                                                                                                    
├─sda1                                        vfat                5A43-9F90                            
├─sda2                                        vfat        kernel  5996-1372                            
└─sda3                                        crypto_LUKS         35200bb8-4634-4216-879b-909aae3f460d 
  └─luks-35200bb8-4634-4216-879b-909aae3f460d f2fs        gentoo  a663bae5-b7cc-461e-8050-ca8477b3d79a /


If I start my system with a Kernel >=4.0, my root partition will become readonly after a while. DMESG then shows the ATA errors as seen in the attachment. I can trigger the problem by writing a huge file into a folder on the root partition. After ~20s the partition will be "ro" and dmesg will be full of ata errors.
On LTS Kernel 3.18 the problem doesn't exist yet.

I found a thread on the dm-crypt mailing list where someone describes the same problem also on an ssd with dmcrypt + f2fs, in contrast to my setup he also uses lvm: https://www.redhat.com/archives/dm-devel/2015-May/msg00001.html
Comment 1 Mike Snitzer 2015-09-11 17:44:06 UTC
Please try this patch: https://lkml.org/lkml/2015/9/11/433

And report back on your results.
Comment 2 throwaway42 2015-09-12 09:15:05 UTC
(In reply to Mike Snitzer from comment #1)
> Please try this patch: https://lkml.org/lkml/2015/9/11/433
> 
> And report back on your results.

I applied the patch against kernel 4.1.6, I was able to copy a 10GiB file to the encrypted partition without problems. I will continue testing and reporting.
Comment 3 throwaway42 2015-09-21 09:59:48 UTC
Still no problems as of today
Comment 4 Mike Snitzer 2015-09-25 18:56:13 UTC
Fixed is now upstream and will be included in 4.3-rc3, see:
http://git.kernel.org/linus/586b286b110e94eb31840ac5afc0c24e0881fe34