Bug 195895 - failed to set xfermode (err_mask=0x40) on some SSDs
Summary: failed to set xfermode (err_mask=0x40) on some SSDs
Status: NEW
Alias: None
Product: IO/Storage
Classification: Unclassified
Component: Serial ATA (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Tejun Heo
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-05-28 14:25 UTC by Alex Ivanov
Modified: 2021-11-06 20:29 UTC (History)
11 users (show)

See Also:
Kernel Version: 4.9.26+
Tree: Mainline
Regression: No


Attachments
Add libata.force=nodmalog parameter (2.05 KB, patch)
2021-03-03 18:03 UTC, Reimar D
Details | Diff

Description Alex Ivanov 2017-05-28 14:25:15 UTC
Two drives SB120GB-SPLH-25SAT3 and Qumox 240GB have errors in dmesg. Please check.

SB120GB-SPLH-25SAT3 consists of Marvell 88NV1120 controller and 2x SK Hynix H27QFG8PEM5R-BFC flash

SB120GB-SPLH-25SAT3 drive has the following errors in dmesg:

[    6.671380] ata1.00: qc timeout (cmd 0x47)
[    6.673713] ata1.00: READ LOG DMA EXT failed, trying unqueued
[    6.673717] ata1.00: failed to get Log Directory Emask 0x40
[    6.673719] ata1.00: ATA-10: 120GB SSD, V2.8, max UDMA/133
[    6.673720] ata1.00: 234441648 sectors, multi 0: LBA48 NCQ (depth 31/32), AA
[    6.673722] ata1.00: failed to get Identify Device Data, Emask 0x40
[    6.673730] ata1.00: failed to set xfermode (err_mask=0x40)
[    7.143386] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[    7.143811] ata1.00: NCQ Send/Recv Log not supported
[    7.144547] ata1.00: NCQ Send/Recv Log not supported
[    7.144809] ata1.00: configured for UDMA/133

Qumox 240GB drive has the following erros in dmesg:

ata1.00: qc timeout (cmd 0x47)
ata1.00: READ LOG DMA EXT failed, trying unqueued
ata1.00: failed to get Log Directory Emask 0x40
ata1.00: ATA-10: Qumox 240GB SSD, V2.7, max UDMA/133
ata1.00: 468862128 sectors, multi 1: LBA48 NCQ (depth 31/32),
ata1.00: failed to get Identify Device Data, Emask 0x40
ata1.00: failed to set xfermode (err_mask=0x40)
ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
ata1.00: NCQ Send/Recv Log not supported
ata1.00: NCQ Send/Recv Log not supported 
ata1.00: configured for UDMA/133
Comment 1 nezuxav 2017-07-24 11:05:37 UTC
I also have the same errors on my SSD: Silicon Power S55 120GB SP120GBSS3S55S25.

[    6.112090] ata1.00: qc timeout (cmd 0x47)
[    6.112133] ata1.00: READ LOG DMA EXT failed, trying unqueued
[    6.112176] ata1.00: failed to get Log Directory Emask 0x40
[    6.112178] ata1.00: ATA-10: SPCC Solid State Disk, V3.3, max UDMA/133
[    6.112231] ata1.00: 234441648 sectors, multi 0: LBA48 NCQ (depth 31/32), AA
[    6.112269] ata1.00: failed to set xfermode (err_mask=0x40)
[    6.422991] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[    6.423395] ata1.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
[    6.423398] ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
[    6.423828] ata1.00: NCQ Send/Recv Log not supported
[    6.424158] ata1.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
[    6.424160] ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
[    6.424625] ata1.00: NCQ Send/Recv Log not supported
[    6.424668] ata1.00: configured for UDMA/133
[    6.424873] scsi 0:0:0:0: Direct-Access     ATA      SPCC Solid State V3.3 PQ: 0 ANSI: 5
[    6.438365] sd 0:0:0:0: [sda] 234441648 512-byte logical blocks: (120 GB/112 GiB)
[    6.438487] sd 0:0:0:0: [sda] Write Protect is off
[    6.438517] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
[    6.438540] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[    6.439089]  sda: sda1
[    6.439507] sd 0:0:0:0: [sda] Attached SCSI disk
[    6.751138] ata2: SATA link down (SStatus 0 SControl 300)
Comment 2 huangshuhuai 2018-10-04 17:08:14 UTC
My SSD:Lenovo SPEED UP-CL-240GB

[    1.245078] ata5: SATA max UDMA/133 abar m2048@0xc161b000 port 0xc161b300 irq 28
[    1.557761] ata5: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[    1.557929] ata5.00: FORCE: horkage modified (noncq)
[    1.558096] ata5.00: ACPI cmd ef/10:09:00:00:00:b0 (SET FEATURES) succeeded
[    1.558101] ata5.00: ACPI cmd ef/10:03:00:00:00:b0 (SET FEATURES) filtered out
[    1.558250] ata5.00: ATA-10: Lenovo SPEED UP-CL-240GB, V2.7, max UDMA/133
[    1.558253] ata5.00: 468862128 sectors, multi 0: LBA48 NCQ (not used)
[    6.663379] ata5.00: qc timeout (cmd 0x47)
[    6.663390] ata5.00: READ LOG DMA EXT failed, trying PIO
[    6.663394] ata5.00: failed to get Identify Device Data, Emask 0x40
[    6.663407] ata5.00: failed to set xfermode (err_mask=0x40)
[    6.977799] ata5: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[    6.978161] ata5.00: ACPI cmd ef/10:09:00:00:00:b0 (SET FEATURES) succeeded
[    6.978169] ata5.00: ACPI cmd ef/10:03:00:00:00:b0 (SET FEATURES) filtered out
[    6.978980] ata5.00: ACPI cmd ef/10:09:00:00:00:b0 (SET FEATURES) succeeded
[    6.978987] ata5.00: ACPI cmd ef/10:03:00:00:00:b0 (SET FEATURES) filtered out
[    6.979406] ata5.00: configured for UDMA/133
Comment 3 Rui Salvaterra 2018-10-21 09:40:47 UTC
On my machine (Eee PC 901), falling back to PIO(!!), it seems:

rui@bonnell:~$ dmesg | grep ata1
[    5.860265] ata1: SATA max UDMA/133 cmd 0x1f0 ctl 0x3f6 bmdma 0xffa0 irq 14
[    6.036491] ata1.01: supports DRM functions and may not be fully accessible
[    6.036500] ata1.01: ATA-9: Crucial_CT120M500SSD3, MU05, max UDMA/133
[    6.036505] ata1.01: 234441648 sectors, multi 16: LBA48 NCQ (depth 0/32)
[    6.036667] ata1.01: READ LOG DMA EXT failed, trying PIO
[    6.053435] ata1.01: supports DRM functions and may not be fully accessible
[    6.068017] ata1.01: Enabling discard_zeroes_data
[    6.068857] ata1.01: Enabling discard_zeroes_data
[    6.070837] ata1.01: Enabling discard_zeroes_data
rui@bonnell:~$ uname -a
Linux bonnell 4.19.0-041900rc8-generic #201810150631 SMP Mon Oct 15 06:43:36 UTC 2018 i686 i686 i686 GNU/Linux
rui@bonnell:~$
Comment 4 faust6 2018-11-02 20:26:38 UTC
Smartbuy Splash 3.

Nov 01 21:11:10 server kernel: ata1.00: qc timeout (cmd 0x47)
Nov 01 21:11:10 server kernel: ata1.00: READ LOG DMA EXT failed, trying PIO
Nov 01 21:11:10 server kernel: ata1.00: NCQ Send/Recv Log not supported
Nov 01 21:11:10 server kernel: ata1.00: ATA-10: 120GB SSD, V3.24, max UDMA/133
Nov 01 21:11:10 server kernel: ata1.00: 234441648 sectors, multi 0: LBA48 NCQ (depth 32), AA
Nov 01 21:11:10 server kernel: ata1.00: failed to set xfermode (err_mask=0x40)
Nov 01 21:11:10 server kernel: ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Nov 01 21:11:10 server kernel: ata1.00: NCQ Send/Recv Log not supported
Nov 01 21:11:10 server kernel: ata1.00: NCQ Send/Recv Log not supported
Nov 01 21:11:10 server kernel: ata1.00: configured for UDMA/133

Linux server 4.18.16-arch1-1-ARCH #1 SMP PREEMPT Sat Oct 20 22:06:45 UTC 2018 x86_64 GNU/Linux
Comment 5 ohill 2019-03-24 22:24:13 UTC
Since libata refactoring (kernel 4.11 VS kernel 4.13), regression on Micron SSD MX500 M.2
SSD is plugged on a IDE <> SATA converter.

Tried many combinations at boot a(libata.force=1.0:udma100,1.0:noncq,1.0:norst,1.0:80c) without any improvement, xfermode can not be setted.

[    0.000000] Linux version 4.18.16-300.fc29.x86_64 (mockbuild@bkernel04.phx2.fedoraproject.org) (gcc version 8.2.1 20180801 (Red Hat 8.2.1-2) (GCC)) #1 SMP Sat Oct 20 23:24:08 UTC 2018
[    0.000000] Command line: BOOT_IMAGE=vmlinuz initrd=initrd.img inst.stage2=hd:LABEL=Fedora-WS-dvd-x86_64-29 rescue libata.force=1.0:dump_id,1.0:norst,1.0:noncq,1.0:udma100,1.0:80c
[    0.112290] SCSI subsystem initialized
[    0.112505] libata version 3.00 loaded.
[   14.672096] ata_piix 0000:00:1f.1: version 2.13
[   14.674330] scsi host0: ata_piix
[   14.674723] scsi host1: ata_piix
[   14.674921] ata1: PATA max UDMA/100 cmd 0x1f0 ctl 0x3f6 bmdma 0xffa0 irq 14
[   14.675060] ata2: PATA max UDMA/100 cmd 0x170 ctl 0x376 bmdma 0xffa8 irq 15
[   14.675411] ata2: port disabled--ignoring
[   14.855932] ata1: FORCE: cable set to 80c
[   14.856058] ata1.00: FORCE: horkage modified (dump_id)
[   14.856168] ata1.00: FORCE: horkage modified (noncq)
[   14.856280] ata1.00: supports DRM functions and may not be fully accessible
[   14.856395] ata1.00: ATA-10: CT500MX500SSD4, M3CR023, max UDMA/133
[   14.856506] ata1.00: 976773168 sectors, multi 1: LBA48
[   19.936058] ata1.00: qc timeout (cmd 0x47)
[   19.936175] ata1.00: READ LOG DMA EXT failed, trying PIO
[   19.936289] ata1.00: ATA Identify Device Log not supported
[   19.936399] ata1.00: Security Log not supported
[   19.936519] ata1.01: ATAPI: Optiarc  DVD RW AD-7910A, 1.D1, max UDMA/33
[   19.936636] ata1.00: FORCE: xfer_mask set to udma100
[   19.936787] ata1.00: failed to set xfermode (err_mask=0x40)
[   25.140019] ata1: link is slow to respond, please be patient (ready=0)
[   29.976019] ata1: SRST failed (errno=-16)
[   35.172019] ata1: link is slow to respond, please be patient (ready=0)
[   40.008020] ata1: SRST failed (errno=-16)
[   45.204019] ata1: link is slow to respond, please be patient (ready=0)
[   75.052022] ata1: SRST failed (errno=-16)
[   80.096021] ata1: SRST failed (errno=-16)
[   80.108322] ata1: reset failed, giving up
[   80.108431] ata1.00: disabled
[   80.108539] ata1.01: disabled
Comment 6 Mike Kuznetsov 2019-04-05 10:57:11 UTC
mike ~$ uname -a
Linux delorean 5.0.0-8-generic #9-Ubuntu SMP Tue Mar 12 21:58:11 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
mike ~$ cat /etc/lsb-release 
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=19.04
DISTRIB_CODENAME=disco
DISTRIB_DESCRIPTION="Ubuntu Disco Dingo (development branch)"

[   16.214236] ata1.00: qc timeout (cmd 0x47)
[   16.214287] ata1.00: READ LOG DMA EXT failed, trying PIO
[   16.214324] ata1.00: NCQ Send/Recv Log not supported
[   16.214351] ata1.00: ATA-10: 120GB SSD, V2.7, max UDMA/133
[   16.214381] ata1.00: 234441648 sectors, multi 0: LBA48 NCQ (depth 32), AA
[   16.214415] ata1.00: failed to get Identify Device Data, Emask 0x40
[   16.214420] ata1.00: failed to set xfermode (err_mask=0x40)
[   16.529959] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[   16.530382] ata1.00: NCQ Send/Recv Log not supported
[   16.531114] ata1.00: NCQ Send/Recv Log not supported
[   16.531383] ata1.00: configured for UDMA/133
Comment 7 Enrico Bartky 2019-12-06 11:59:01 UTC
There is a ATA_HORKAGE_* for "READ LOG DMA EXT" - fails: ATA_HORKAGE_NO_DMA_LOG (current 5.x Kernel) / ATA_HORKAGE_NO_NCQ_LOG (LongTerm 4.4 Kernel).

Extend the "ata_device_blacklist" - table in drivers/ata/libata-core.c (device string, firmware-rev or NULL, ATA_HORKAGE*.)

Background: I had the same issue with my FSC Lifebook and a PATA <-> mSATA adapter (Transcend mSATA SSD 230S).
Comment 8 Enrico Bartky 2019-12-06 12:15:22 UTC
In addition to my comment: READ LOG DMA EXT - fails causes the next ATA-commands fails too (set xfermode,etc). (at least in my environment)



(In reply to Enrico Bartky from comment #7)
> There is a ATA_HORKAGE_* for "READ LOG DMA EXT" - fails:
> ATA_HORKAGE_NO_DMA_LOG (current 5.x Kernel) / ATA_HORKAGE_NO_NCQ_LOG
> (LongTerm 4.4 Kernel).
> 
> Extend the "ata_device_blacklist" - table in drivers/ata/libata-core.c
> (device string, firmware-rev or NULL, ATA_HORKAGE*.)
> 
> Background: I had the same issue with my FSC Lifebook and a PATA <-> mSATA
> adapter (Transcend mSATA SSD 230S).
Comment 9 CorpChAoS 2020-07-07 12:10:01 UTC
Jul 07 04:07:53
kernel: ata1: SATA max UDMA/133 abar m2048@0xfed1c800 port 0xfed1c900 irq 28
kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
kernel: ata1.00: qc timeout (cmd 0x47)
kernel: ata1.00: READ LOG DMA EXT failed, trying PIO
kernel: ata1.00: NCQ Send/Recv Log not supported
kernel: ata1.00: ATA-10: SPCC Solid State Disk, V2.8, max UDMA/133
kernel: ata1.00: 234441648 sectors, multi 1: LBA48 NCQ (depth 32), AA
kernel: ata1.00: failed to get Identify Device Data, Emask 0x40
kernel: ata1.00: failed to set xfermode (err_mask=0x40)
kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
kernel: ata1.00: NCQ Send/Recv Log not supported
kernel: ata1.00: NCQ Send/Recv Log not supported
kernel: ata1.00: configured for UDMA/133


Same issue. Still hasn't been resolved
Comment 10 CorpChAoS 2020-07-07 12:15:48 UTC
It seems that most people with this problem have it occuring with an SSD:

$ sudo hdparm -iI /dev/sdb

/dev/sdb:

 Model=SPCC Solid State Disk, FwRev=V2.8, SerialNo=P1602347000000003975
 Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs }
 RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=0
 BuffType=unknown, BuffSize=unknown, MaxMultSect=1, MultSect=1
 CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=234441648
 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes:  pio0 pio3 pio4 
 DMA modes:  mdma0 mdma1 mdma2 
 UDMA modes: udma0 udma1 udma2 udma3 udma4 udma5 *udma6 
 AdvancedPM=yes: disabled (255) WriteCache=enabled
 Drive conforms to: unknown:  ATA/ATAPI-3,4,5,6,7

 * signifies the current active mode


ATA device, with non-removable media
	Model Number:       SPCC Solid State Disk                   
	Serial Number:      P1602347000000003975
	Firmware Revision:  V2.8    
	Media Serial Num:   MM32g16K4CE1MABT5200                    
	Media Manufacturer: 
	Transport:          Serial, ATA8-AST, SATA 1.0a, SATA II Extensions, SATA Rev 2.5, SATA Rev 2.6, SATA Rev 3.0
Standards:
	Used: unknown (minor revision code 0x011b) 
	Supported: 10 9 8 7 6 5 
	Likely used: 10
Configuration:
	Logical		max	current
	cylinders	16383	16383
	heads		16	16
	sectors/track	63	63
	--
	CHS current addressable sectors:    16514064
	LBA    user addressable sectors:   234441648
	LBA48  user addressable sectors:   234441648
	Logical  Sector size:                   512 bytes
	Physical Sector size:                   512 bytes
	Logical Sector-0 offset:                  0 bytes
	device size with M = 1024*1024:      114473 MBytes
	device size with M = 1000*1000:      120034 MBytes (120 GB)
	cache/buffer size  = unknown
	Nominal Media Rotation Rate: Solid State Device
Capabilities:
	LBA, IORDY(can be disabled)
	Queue depth: 32
	Standby timer values: spec'd by Standard, no device specific minimum
	R/W multiple sector transfer: Max = 1	Current = 1
	Advanced power management level: 254
	DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6 
	     Cycle time: min=120ns recommended=120ns
	PIO: pio0 pio1 pio2 pio3 pio4 
	     Cycle time: no flow control=120ns  IORDY flow control=120ns
Commands/features:
	Enabled	Supported:
	   *	SMART feature set
	   *	Power Management feature set
	   *	Write cache
	   *	Look-ahead
	   *	WRITE_BUFFER command
	   *	READ_BUFFER command
	   *	NOP cmd
	   *	DOWNLOAD_MICROCODE
	   *	Advanced Power Management feature set
	   *	48-bit Address feature set
	   *	Mandatory FLUSH_CACHE
	   *	FLUSH_CACHE_EXT
	   *	General Purpose Logging feature set
	   *	WRITE_{DMA|MULTIPLE}_FUA_EXT
	   *	64-bit World wide name
	   *	WRITE_UNCORRECTABLE_EXT command
	   *	{READ,WRITE}_DMA_EXT_GPL commands
	   *	Segmented DOWNLOAD_MICROCODE
	   *	Gen1 signaling speed (1.5Gb/s)
	   *	Gen2 signaling speed (3.0Gb/s)
	   *	Gen3 signaling speed (6.0Gb/s)
	   *	Native Command Queueing (NCQ)
	   *	Host automatic Partial to Slumber transitions
	   *	Device automatic Partial to Slumber transitions
	   *	READ_LOG_DMA_EXT equivalent to READ_LOG_EXT
	   *	DMA Setup Auto-Activate optimization
	    	Device Sleep (DEVSLP)
	   *	WRITE BUFFER DMA command
	   *	READ BUFFER DMA command
	   *	Data Set Management TRIM supported (limit 8 blocks)
	   *	Deterministic read ZEROs after TRIM
Logical Unit WWN Device Identifier: 502b2a201d1c1b1a
	NAA		: 5
	IEEE OUI	: 02b2a2
	Unique ID	: 01d1c1b1a
Device Sleep:
	DEVSLP Exit Timeout (DETO): 20 ms (drive)
	Minimum DEVSLP Assertion Time (MDAT): 10 ms (drive)
Checksum: correct
Comment 11 Otto 2021-02-02 03:38:32 UTC
It happens at random for me, like once in every three bootups. The drive works fine in my other PC with an Intel QM77 chipset.
Either running it with NCQ disabled or in IDE mode fixes it for me.

$ uname -a
Linux fastloader1564 4.19.0-14-amd64 #1 SMP Debian 4.19.171-2 (2021-01-30) x86_64 GNU/Linux

$ lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description:    Debian GNU/Linux 10 (buster)
Release:        10
Codename:       buster

$ lspci -ks 00:1f.2
00:1f.2 SATA controller: Intel Corporation 5 Series/3400 Series Chipset 4 port SATA AHCI Controller (rev 06)
        Subsystem: Dell 5 Series/3400 Series Chipset 4 port SATA AHCI Controller
        Kernel driver in use: ahci
        Kernel modules: ahci

# dmesg | grep -i ata1
[    2.088435] ata1: SATA max UDMA/133 abar m2048@0xf0705000 port 0xf0705100 irq 24
[    2.401172] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[    7.490467] ata1.00: qc timeout (cmd 0x47)
[    7.490480] ata1.00: READ LOG DMA EXT failed, trying PIO
[    7.490482] ata1.00: NCQ Send/Recv Log not supported
[    7.490485] ata1.00: ATA-10: HS-SSD-C100 120G, V4.15.0, max UDMA/133
[    7.490488] ata1.00: 234441648 sectors, multi 0: LBA48 NCQ (depth 32), AA
[    7.490498] ata1.00: failed to set xfermode (err_mask=0x40)
[    7.805990] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[    7.806499] ata1.00: NCQ Send/Recv Log not supported
[    7.807077] ata1.00: NCQ Send/Recv Log not supported
[    7.807083] ata1.00: configured for UDMA/133

# hdparm -I /dev/sda
SG_IO: bad/missing sense data, sb[]:  70 00 05 00 00 00 00 0a 00 00 00 00 24 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 HDIO_GET_IDENTITY failed: Invalid argument

/dev/sda:

libata.force=noncq
# dmesg | grep -i ata1
[    1.994136] ata1: SATA max UDMA/133 abar m2048@0xf0705000 port 0xf0705100 irq 24
[    2.308106] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[    2.308353] ata1.00: FORCE: horkage modified (noncq)
[    2.308359] ata1.00: ATA-10: HS-SSD-C100 120G, V4.15.0, max UDMA/133
[    2.308361] ata1.00: 234441648 sectors, multi 0: LBA48 NCQ (not used)
[    2.308739] ata1.00: configured for UDMA/133
Comment 12 Reimar D 2021-03-03 18:03:36 UTC
Created attachment 295611 [details]
Add libata.force=nodmalog parameter

It seems to me like there are a few different issues and I am not sure all of them are related to the READ_DMA_LOG command.
As I ran into the issue in comment 5 myself I made a patch that adds a libata.force=nodmalog parameter which disables the use of the READ_DMA_LOG command.
It would be useful if anyone is able to test if it works and helps for their problem.
(how to auto-detect if it's needed might be a much bigger problem though)
Comment 13 Reimar D 2021-03-03 18:14:42 UTC
Note that adding to ata_device_blacklist as per comment 7 seems incorrect at least in my case, the MX500 device works fine in a native SATA M.2 slot, it only fails when using an PATA-SATA converter.
I do not know if and how detecting the presence of such a converter is possible, or if maybe any device connected to a PATA controller should have this flag set by default or so.
Comment 14 Paul Menzel 2021-03-04 07:56:33 UTC
On the MSI B350M MORTAR [1] with a M.2 Crucial MX500 SSD (SATA)

    CT1000MX500SSD4, M3CR020, max UDMA/133

I am only getting this error sometimes, so in my case it seems a timing problem or SSD firmware issue?

Should I create a new issue for my problem?

PS: I haven’t done a firmware update to M3CR033 yet, as it does not seem to be easily doable with fwupd/LVFS, and I am afraid of losing the data in case it fails.


[1]: https://de.msi.com/Motherboard/support/B350M-MORTAR.html
[2]: https://www.crucial.de/support/ssd-support/mx500-support
Comment 15 Reimar D 2021-03-04 16:43:30 UTC
I may have put that a bit too strongly, I tested in an M.2 slot against a Asrock AB350 Pro4 only for a very short time.
So it is possible your issue has the same underlying cause, but the PATA<->SATA converter case I was looking at had a 100% failure rate without the workaround.
If you have the ability to test my patch (+ libata.force=nodmalog kernel parameter) that might at least confirm if your problem is really due to the READ LOG DMA EXT command - if so then you'd at least have a workaround if we manage to get the patch accepted.
Btw one thing I found suspicious is that the code executes the READ LOG DMA EXT command first, but only after that does it set the transfer mode to DMA.
Is that definitely legal, to use a command involving DMA without being in DMA transfer mode? (sorry if it's a stupid question, I have no idea about the protocol)
Comment 16 Paul Menzel 2021-03-04 16:48:40 UTC
I am going to test it in the next week, when I have access to the system again.

Nice observation regarding the order. I am not sure, if the maintainers actually follow this bug report. I suggest, you sent the patch to the maintainers, and also ask the ordering question on the mailing list to get an answer.
Comment 17 Enrico Bartky 2021-03-15 13:04:09 UTC
The content of the libata-force-nodmalog patch works perfectly with:

Debian 9 (kernel 4.9.x)
Debian 10 (kernel 4.19.x)
Debian 11 (kernel 5.10.x)

(manually applied the two lines)

If it is possible to check the underlying controller to be "real" SATA (and no PATA/SATA converter), than we have an automatism?!

In my environment the bootlog contains following:

[...]
ata1: PATA max UDMA/100 [...]
[...]
ata1.00: ATA-9: TS128GM[...]
[...]

So it is clear (hopefully)
Comment 18 Paul Menzel 2021-03-15 13:23:33 UTC
(In reply to Paul Menzel from comment #16)
> I am going to test it in the next week, when I have access to the system
> again.

I can confirm, that with your patch, the error/delay does not happen with the M.2 SATA Crucial SSD (compared to randomly appearing with upstream Linux kernel).
Comment 19 Reimar D 2021-03-16 20:45:33 UTC
Thanks for checking. So it might be that Crucial drives are affected regardless of controller, so that that might be possible thing to automate a workaround on.
Unfortunately I don't see an entirely clear pattern for the other cases reported.
I also don't know what the point of this command really is, there seems to have been an effort to enable usage of this command, but none of the commit messages seemed to describe WHY it is desirable.
For the moment, I've submitted the patch to the linux-ide list, so that there's a chance to get the workaround in mainline.
I'll have to see if I can find the time and some expert with time to start a discussion on how this is all supposed to work and why, which might show a path to a proper solution...
Comment 20 Paul Menzel 2021-03-16 21:08:30 UTC
(In reply to Reimar D from comment #19)

[…]

> For the moment, I've submitted the patch to the linux-ide list, so that
> there's a chance to get the workaround in mainline.

Nice. I do not yet see it in the archive [1] though.

[1]: https://lore.kernel.org/linux-ide/
Comment 21 Reimar D 2021-03-17 18:07:20 UTC
The patch is now on the ML and in the archive for real.
It seems the list doesn't accept email from gmx.de :(
Comment 22 Paul Menzel 2021-07-12 07:06:04 UTC
(In reply to Reimar D from comment #21)
> The patch is now on the ML and in the archive for real.

Awesome. For some reason the search on lore.kernel.org did not find it, but here are the URLs:

1.  https://lore.kernel.org/linux-ide/20210317180413.2992-1-Reimar.Doeffinger@gmx.de/
2.  https://patchwork.ozlabs.org/project/linux-ide/patch/20210317180413.2992-1-Reimar.Doeffinger@gmx.de/

Unfortunately nobody replied yet.

> It seems the list doesn't accept email from gmx.de :(

That would be bad? I know that LKML rejects messages containing HTML.
Comment 23 Reimar D 2021-08-19 08:21:43 UTC
During the very helpful discussion I realized that there seems to be a very straight-forward logic bug in the kernel, the patch for that is here:
https://lore.kernel.org/linux-ide/20210819081340.4362-1-Reimar.Doeffinger@gmx.de/
I'd welcome testing, especially with other IDE controllers, but especially by you Paul Menzel.
Since I'd like to be sure that your issue really is due to the MX500 and not just also caused by this - which will decide on whether I should continue to push the original change as well or drop it.
Btw the latest firmware version for your device is 23 (and the one mine has), 33 is for a different hardware revision. Unfortunately there is no changelog for version 23 compared to 22, so it's possible something was fixed there. Or maybe not.
Comment 24 Paul Menzel 2021-09-27 08:58:19 UTC
I replied to the patch with my Tested-by line. Thank you for your great work.
Comment 25 Reimar D 2021-11-06 20:29:29 UTC
The fix has been merged for 5.16: https://git.kernel.org/torvalds/c/61f90a8e8068c1176593858df9daf02b430fb4d7
It has also been marked for backporting to stable kernels, so if I understand the process right once 5.16 is out it should make its way into older kernels as well - not sure it will make it into all relevant distribution kernels on its own though.
If it doesn't fix things for anyone who reported on this ticket you should probably speak up, it might be a different problem.
If someone has permissions to change the status of this issue it might be a good time to do so.

Note You need to log in before you can comment on or make changes to this bug.