Bug 60731

Summary: MacBook Air 6,2 ata command timeout prevents boot
Product: IO/Storage Reporter: sam.attwell
Component: Serial ATAAssignee: Jeff Garzik (jgarzik)
Status: NEW ---    
Severity: high CC: adrien-xx-kernel-bz, adrien.lassere, afiestas, aim, alex, alexfeinman, andrew.cobb, ben, bpshacklett, dorin, funtoos, gwhite, june, kaloz, kernel-bugzilla, lasse.makholm, levex, luke.marsden, miek, satadru, stephen, szg00000, tj, uriahheep, vingarzan
Priority: P1    
Hardware: x86-64   
OS: Linux   
Kernel Version: 3.11.0-rc4 Subsystem:
Regression: No Bisected commit-id:
Attachments: dmesg output for succesful boot
dmesg output for unsuccessful boot
3.11-rc7 dmesg with ncq
MacBookAir6,2 DSDT Dissasembly
MacBookAir6,2 Mac OS X 10.9 dmesg
MacBookAir6,2 System Information SATA Report
ahci: add NO_NCQ flag to this controller
ahci: add NO_NCQ flag to this controller
dmesg log from kernel-3.12 with proposed patch applied
ahci-noncq.patch
0001-ahci-disable-NCQ-on-Samsung-pci-e-SSDs-on-macbooks.patch
samsung-1600-nomsi.patch

Description sam.attwell 2013-08-11 22:49:34 UTC
Booting on the MacBook Air
Comment 1 sam.attwell 2013-08-11 22:58:32 UTC
Created attachment 107180 [details]
dmesg output for succesful boot

Booting on the MacBook Air 6,2 fails most of the time after ata command for ncq.

MacBook Air 6,2
Intel Core i7-4650U 1.7Ghz
8GB RAM
128GB SSD
3.11.0-rc4

Appending libata.force=noncq seems to resolve the problem.

Succesful boot dmesg attached (kern.log doesn't report on failed boot here).
Comment 2 sam.attwell 2013-08-11 23:01:08 UTC
(In reply to sam.attwell from comment #1)
> Created attachment 107180 [details]
> dmesg output for succesful boot
> 
> Booting on the MacBook Air 6,2 fails most of the time after ata command for
> ncq.
> 
> MacBook Air 6,2
> Intel Core i7-4650U 1.7Ghz
> 8GB RAM
> 128GB SSD
> 3.11.0-rc4
> 
> Appending libata.force=noncq seems to resolve the problem.
> 
> Succesful boot dmesg attached (kern.log doesn't report on failed boot here).

Here's lscpi:

00:00.0 Host bridge [0600]: Intel Corporation Haswell-ULT DRAM Controller [8086:0a04] (rev 09)
00:02.0 VGA compatible controller [0300]: Intel Corporation Haswell-ULT Integrated Graphics Controller [8086:0a26] (rev 09)
00:03.0 Audio device [0403]: Intel Corporation Device [8086:0a0c] (rev 09)
00:14.0 USB controller [0c03]: Intel Corporation Lynx Point-LP USB xHCI HC [8086:9c31] (rev 04)
00:16.0 Communication controller [0780]: Intel Corporation Lynx Point-LP HECI #0 [8086:9c3a] (rev 04)
00:1b.0 Audio device [0403]: Intel Corporation Lynx Point-LP HD Audio Controller [8086:9c20] (rev 04)
00:1c.0 PCI bridge [0604]: Intel Corporation Lynx Point-LP PCI Express Root Port 1 [8086:9c10] (rev e4)
00:1c.1 PCI bridge [0604]: Intel Corporation Lynx Point-LP PCI Express Root Port 2 [8086:9c12] (rev e4)
00:1c.2 PCI bridge [0604]: Intel Corporation Lynx Point-LP PCI Express Root Port 3 [8086:9c14] (rev e4)
00:1c.4 PCI bridge [0604]: Intel Corporation Lynx Point-LP PCI Express Root Port 5 [8086:9c18] (rev e4)
00:1c.5 PCI bridge [0604]: Intel Corporation Lynx Point-LP PCI Express Root Port 6 [8086:9c1a] (rev e4)
00:1f.0 ISA bridge [0601]: Intel Corporation Lynx Point-LP LPC Controller [8086:9c43] (rev 04)
00:1f.3 SMBus [0c05]: Intel Corporation Lynx Point-LP SMBus Controller [8086:9c22] (rev 04)
02:00.0 Multimedia controller [0480]: Broadcom Corporation Device [14e4:1570]
03:00.0 Network controller [0280]: Broadcom Corporation Device [14e4:43a0] (rev 03)
04:00.0 SATA controller [0106]: Samsung Electronics Co Ltd Device [144d:1600] (rev 01)
05:00.0 PCI bridge [0604]: Intel Corporation DSL3510 Thunderbolt Port [Cactus Ridge] [8086:1547] (rev 03)
06:00.0 PCI bridge [0604]: Intel Corporation DSL3510 Thunderbolt Port [Cactus Ridge] [8086:1547] (rev 03)
06:03.0 PCI bridge [0604]: Intel Corporation DSL3510 Thunderbolt Port [Cactus Ridge] [8086:1547] (rev 03)
06:04.0 PCI bridge [0604]: Intel Corporation DSL3510 Thunderbolt Port [Cactus Ridge] [8086:1547] (rev 03)
06:05.0 PCI bridge [0604]: Intel Corporation DSL3510 Thunderbolt Port [Cactus Ridge] [8086:1547] (rev 03)
06:06.0 PCI bridge [0604]: Intel Corporation DSL3510 Thunderbolt Port [Cactus Ridge] [8086:1547] (rev 03)
07:00.0 System peripheral [0880]: Intel Corporation DSL3510 Thunderbolt Port [Cactus Ridge] [8086:1547] (rev 03)
08:00.0 PCI bridge [0604]: Intel Corporation DSL3510 Thunderbolt Controller [Cactus Ridge] [8086:1549]
09:00.0 PCI bridge [0604]: Intel Corporation DSL3510 Thunderbolt Controller [Cactus Ridge] [8086:1549]
0a:00.0 Ethernet controller [0200]: Broadcom Corporation NetXtreme BCM57762 Gigabit Ethernet PCIe [14e4:1682]
Comment 3 sam.attwell 2013-08-11 23:19:34 UTC
Created attachment 107181 [details]
dmesg output for unsuccessful boot

Here's an edited dmesg for unsuccessful boot.
Comment 4 Imre Kaloz 2013-08-29 13:30:28 UTC
Created attachment 107355 [details]
3.11-rc7 dmesg with ncq

The problem is still there with -rc7. Sometimes the system fails to boot up, sometimes it causes "pauses". I've attached dmesg from -rc7.
Comment 5 Levente Kurusa 2013-11-01 15:43:40 UTC
Hi,

Can you try the following kernel parameter? "libata.force=3.0"
Comment 6 Alex Markley 2013-11-01 22:40:45 UTC
All,

I am also running a MacBookAir6,2 and I think I am experiencing some variant of this problem.

(In reply to Levente Kurusa from comment #5)
> Can you try the following kernel parameter? "libata.force=3.0"

I've tested with kernel-3.12.0-0.rc7.git2.1.fc21.x86_64 and kernel-3.11.6-200.fc19.x86_64.

I tried libata.force=3.0 with both kernels and it prevents the system from booting 100% of the time. (I do not know where to get further debugging information in that case.)

With no libata kernel flags, I definitely find the same sequence of error messages in dmesg as reported by Sam and Imre.

Using libata.force=noncq suppresses the error messages and smooths out the booting process.

Thanks!
Comment 7 Alex Markley 2013-11-07 03:37:26 UTC
All,

I am still experiencing this issue with 3.12 release. The kernel often fails to boot, and if it does boot it will frequently hang up the whole system with IO waits, followed by a flood of ata messages in dmesg.

Please advise on further troubleshooting steps?

I am still using libata.force=1:noncq to make things work, but this seems like an ugly workaround!

Thanks.
Comment 8 Levente Kurusa 2013-11-07 11:49:37 UTC
Hi,

Can you try it with an another operating system? (win xp, mac, etc.)
There are a lot of chips with broken NCQ support. I will look into these, in the meantime please try to test it with another OS. Also, try to change some BIOS options as well.
Comment 9 Alex Markley 2013-11-07 13:49:03 UTC
Levente,

Thanks for your quick reply!

(In reply to Levente Kurusa from comment #8)
> Hi,
> 
> Can you try it with an another operating system? (win xp, mac, etc.)
> There are a lot of chips with broken NCQ support. I will look into these, in
> the meantime please try to test it with another OS. Also, try to change some
> BIOS options as well.

Per my earlier post (in comment #6) I am testing on a MacBookAir6,2. It has a samsung SSD right on the PCIe bus:

http://www.ifixit.com/Teardown/MacBook+Air+13-Inch+Mid+2013+Teardown/15042#s49088

I think it goes without saying that this hardware works great on Mac OS X. ;) I can confirm it works flawlessly on OS X. (Also, no BIOS.)

If NCQ needs to be disabled for this class of hardware, shouldn't the kernel automatically detect that? The default behavior is very bad for novice users.

Thanks,
Comment 10 Levente Kurusa 2013-11-09 12:35:10 UTC
We have several controllers where the NCQ is disabled. In case it works on Mac OS X, can you post the disassembled DSDT table?

Reason is, it could be somekind of a stupid _OSI related bug, which we had in the past.
Comment 11 Alex Markley 2013-11-09 14:47:36 UTC
(In reply to Levente Kurusa from comment #10)
> can you post the disassembled DSDT table?

I will gladly post that information if you can link me to instructions on how to do that?
Comment 12 Levente Kurusa 2013-11-09 14:50:03 UTC
(In reply to Alex Markley from comment #11)
> (In reply to Levente Kurusa from comment #10)
> > can you post the disassembled DSDT table?
> 
> I will gladly post that information if you can link me to instructions on
> how to do that?

Pardon me. :-)

cat /sys/firmware/acpi/tables/DSDT > dsdt.aml
iasl -d dsdt.aml

This will dump out a dsdt.dsl, please post it.
Comment 13 Alex Markley 2013-11-09 14:59:09 UTC
Created attachment 113951 [details]
MacBookAir6,2 DSDT Dissasembly

Here was the command output. (Note the warning!)

Intel ACPI Component Architecture
ASL Optimizing Compiler version 20130823-64 [Oct  8 2013]
Copyright (c) 2000 - 2013 Intel Corporation

Loading Acpi table from file dsdt.aml
Acpi table [DSDT] successfully installed and loaded
Pass 1 parse of [DSDT]
Pass 2 parse of [DSDT]
Parsing Deferred Opcodes (Methods/Buffers/Packages/Regions)

Parsing completed

Found 6 external control methods, reparsing with new information
Pass 1 parse of [DSDT]
Pass 2 parse of [DSDT]
Parsing Deferred Opcodes (Methods/Buffers/Packages/Regions)

Parsing completed
Disassembly completed
ASL Output:    dsdt.dsl - 278845 bytes

iASL Warning: There were 6 external control methods found during
disassembly, but additional ACPI tables to resolve these externals
were not specified. The resulting disassembler output file may not
compile because the disassembler did not know how many arguments
to assign to these methods. To specify the tables needed to resolve
external control method references, use the one of the following
example iASL invocations:
    iasl -e <ssdt1.aml,ssdt2.aml...> -d <dsdt.aml>
    iasl -e <dsdt.aml,ssdt2.aml...> -d <ssdt1.aml>
Comment 14 Levente Kurusa 2013-11-09 15:18:45 UTC
Couple of debugging steps I would do:
Try the following kernel params (without the enclosing quotes):
"acpi_osi="!Linux""
"acpi_osi=! acpi_osi="Darwin""
"acpi_osi= acpi_osi="Darwin""

Also, it might be worth checking that Mac OS X uses NCQ. Unfortunately, I do not know how to find that information in Mac.
Comment 15 Alex Markley 2013-11-09 16:35:57 UTC
Levente,

Thanks for providing some debugging steps. Can you clarify for me, am I testing with all of those kernel parameters on the same command line? Or should I boot once for each acpi_osi parameter and record results with each one?

I will look into the NCQ issue on OS X. It's entirely possible that some system logs will make mention of it. I can boot into OS X and inspect the logs.

Thanks.
Comment 16 Levente Kurusa 2013-11-09 16:39:50 UTC
(In reply to Alex Markley from comment #15)
> Levente,
> 
> Thanks for providing some debugging steps. Can you clarify for me, am I
> testing with all of those kernel parameters on the same command line? Or
> should I boot once for each acpi_osi parameter and record results with each
> one?
> 
For each boot attempt, take one line. :-)
Post the results as well.

> I will look into the NCQ issue on OS X. It's entirely possible that some
> system logs will make mention of it. I can boot into OS X and inspect the
> logs.
Yes please. If you can post something like a 'dmesg' for Mac that'd be great.
Comment 17 Alex Markley 2013-11-09 17:52:05 UTC
Created attachment 113961 [details]
MacBookAir6,2 Mac OS X 10.9 dmesg

Here are the contents of the Mac OS X dmesg buffer after booting on my MacBookAir6,2.
Comment 18 Alex Markley 2013-11-09 17:54:00 UTC
Created attachment 113971 [details]
MacBookAir6,2 System Information SATA Report

This is a copy/paste from the OS X System Information report on SATA devices. Notice that NCQ is explicitly mentioned as being supported.
Comment 19 Levente Kurusa 2013-11-09 18:06:16 UTC
(In reply to Alex Markley from comment #18)
> Created attachment 113971 [details]
> MacBookAir6,2 System Information SATA Report
> 
> This is a copy/paste from the OS X System Information report on SATA
> devices. Notice that NCQ is explicitly mentioned as being supported.

Supported doesn't mean it is in use.
I have not found anything that might be problematic in the Mac 'dmesg'. I will check the chip's datasheet if it differs from the ahci docs.
Comment 20 Alex Markley 2013-11-09 18:14:51 UTC
I tried booting with all three sets of acpi_osi= parameters. (I also of course removed libata.force=1:noncq from my boot parameters.)

In all three cases, I ran into severe NCQ-related command timeouts preventing system boot.

This was all tested with kernel 3.12 release.

Please advise on further troubleshooting steps. :)
Comment 21 Levente Kurusa 2013-11-09 18:42:41 UTC
Created attachment 113981 [details]
ahci: add NO_NCQ flag to this controller

Please try to apply this patch with:

patch drivers/ata/ahci.c patch.patch

then recompile the kernel and install it with:
make defconfig && make -j4 && make install

(commands must be executed from the kernel source tree's root)
After this, please report the dmesg and the results! :-)
Comment 22 Levente Kurusa 2013-11-09 18:43:28 UTC
(In reply to Levente Kurusa from comment #21)
> Created attachment 113981 [details]
> ahci: add NO_NCQ flag to this controller
> 
> Please try to apply this patch with:
> 
> patch drivers/ata/ahci.c patch.patch
> 
> then recompile the kernel and install it with:
> make defconfig && make -j4 && make install
> 
> (commands must be executed from the kernel source tree's root)
> After this, please report the dmesg and the results! :-)

Please disregard. Wrong patch attached.
Comment 23 Levente Kurusa 2013-11-09 18:44:43 UTC
Created attachment 113991 [details]
ahci: add NO_NCQ flag to this controller

Please do what I posted with this patch.
Comment 24 Alex Markley 2013-11-09 19:23:07 UTC
I will go ahead and test this patch ASAP.

However, my question is this: Would there be any hope of NCQ support being fixed rather than blacklisted? I don't know much about ATA but wouldn't it be desirable to enable the full command set?
Comment 25 Alex Markley 2013-11-09 19:26:47 UTC
Also, with regards to investigating the hardware more closely, would it help if I disassemble the laptop and get part numbers off of the SSD module?

I believe the SSD module is considered a "user serviceable" part, so I can gain access to it easily.
Comment 26 Levente Kurusa 2013-11-10 08:55:34 UTC
This a great question. Either Linux's NCQ support is broken, because we see lots of controllers not working on Linux. But, we also have the other hand, lot of chips work. So it is kinda difficult to tell right now what is the problem.

No, this is not the drive's fault. This is the controller's (or Linux's) fault most likely.
Comment 27 Alex Markley 2013-11-10 13:08:40 UTC
(In reply to Levente Kurusa from comment #26)
> No, this is not the drive's fault. This is the controller's (or Linux's)
> fault most likely.

On this device the ATA controller is directly on the board with the flash chips. This SSD module is a card that plugs directly into the PCIe bus.

That's why I thought a scan of the board might help. :)
Comment 28 Levente Kurusa 2013-11-12 11:07:15 UTC
lspci told me what is t(In reply to Alex Markley from comment #27)
> (In reply to Levente Kurusa from comment #26)
> > No, this is not the drive's fault. This is the controller's (or Linux's)
> > fault most likely.
> 
> On this device the ATA controller is directly on the board with the flash
> chips. This SSD module is a card that plugs directly into the PCIe bus.
> 

lspci told us what controller controlls the SSD so there is no need. How is the patch doing?
Comment 29 Alex Markley 2013-11-13 23:05:05 UTC
(In reply to Levente Kurusa from comment #28)
> How is the patch doing?

Levente,

Thanks for your patience! I was able to test the patch today.

I tested it against kernel-3.12 and unfortunately it does NOT appear to successfully disable NCQ. I will attach a dmesg illustrating the problem.

Thanks.
Comment 30 Alex Markley 2013-11-13 23:06:25 UTC
Created attachment 114581 [details]
dmesg log from kernel-3.12 with proposed patch applied
Comment 31 Tejun Heo 2013-11-14 05:29:13 UTC
Given the controller is intel ahci, it's highly unlikely that the controller or driver is at fault. Prolly best to blacklist the ssd device from ata_device_blacklist[].

Thanks.
Comment 32 Levente Kurusa 2013-11-15 18:04:45 UTC
(In reply to Tejun Heo from comment #31)
> Given the controller is intel ahci, it's highly unlikely that the controller
> or driver is at fault. Prolly best to blacklist the ssd device from
> ata_device_blacklist[].
> 
> Thanks.

The controller picked up by the ahci module is: [144d:1600], which is not an Intel one.

Also, it might be worth dumping the _GTF, maybe it has some chip-specific initialization sequence. If I recall correctly we had something like that years ago.
Comment 33 Tejun Heo 2013-11-16 04:59:35 UTC
Ooh, right, this is the new sata express thing, so, yeah, then the controller could actually be at fault here. I wonder whether macos uses ncq at all. Is there any way to find out which commands are being used there?

Thanks.
Comment 34 Alex Markley 2013-11-16 16:32:22 UTC
I'd be glad to perform any troubleshooting / debugging steps on Mac OS X to determine answers to some of these questions. I just need some help figuring out _what_ steps to take. :)

I'm competent with the terminal, but I'm not an expert in OS X by any means...
Comment 35 Tejun Heo 2013-11-17 00:30:06 UTC
Heh and I have no clue whatsoever about mac. The best thing would be snooping what commands are going to the device but seeing whether there's significant bandwidth difference between queue depth 1 and queue depth 32 reads using io benchmark tools could be a useful indicator.

Thanks.
Comment 36 uriahheep 2014-01-08 18:18:41 UTC
Just FYI, at least two other recent Apple laptop models use the same SATA controller for their SSD's: the Macbook Pro Retina 11,1 and the Macbook Pro Retina 11,3 (probably the 11,2 as well)... Here are indications that this is the case, please see the lspci's on the following pages:

Macbook Pro 11,3: https://bbs.archlinux.org/viewtopic.php?pid=1368083#p1368083
Macbook Pro 11,1: https://wiki.debian.org/InstallingDebianOn/Apple/MacBookPro/11-1#lspci

I am ready to provide my assistance regarding this issue as well.

Thanks,
uriah
Comment 37 Greg White 2014-01-23 18:28:20 UTC
I have a Macbook Pro 11,3 with the same issue.  If I can be of any help, don't hesitate to let me know.  I'm generally running the latest version of the kernel from git, so building a kernel is no issue.
Comment 38 Tejun Heo 2014-02-14 20:23:42 UTC
Created attachment 126161 [details]
ahci-noncq.patch

Hmmm... no idea why Levente's previous patch didn't work. Can someone please try the attached patch and report dmesg?

Thanks.
Comment 39 Levente Kurusa 2014-02-14 21:10:04 UTC
Guys,

please post us your queue depths if Tejun's patch is not working.

cat /sys/block/sda/device/queue_depth

(where sda is your main SDD/HDD)

Also, try increasing your queue depth via echoing 1,2,4,etc to it.
But, I am not sure if it will work with a force=noncq.
Tejun, do you think it could work? It might give some extra hints if
disabling NCQ will not work. I'd say if it turns out that a QD of 32 does
not work, we might be up and running with a QD of 8 or so. That will
still have some performance gains.

Thanks,
Levente Kurusa
Comment 40 Dragos Vingarzan 2014-02-17 11:39:33 UTC
I applied Tejun's patch and unfortunately I can't say that I see a difference, other than actually being able to boot well without libata.force=noncq. I mean the speed is still dropping from 8-900 MB/sec to <1MB/sec.

I have a MacBook Pro 11,3.

➜  ~  cat /proc/cmdline                              
BOOT_IMAGE=/vmlinuz-3.13.0-8-generic.efi.signed root=UUID=aa56465a-a88b-4902-8c54-6dd18749bcce ro allow-discards root_trim=yes quiet splash rootflags=data=writeback crashkernel=384M-:128M vt.handoff=7
➜  ~  cat /sys/block/sda/device/queue_depth          
1
➜  ~  echo 2 > /sys/block/sda/device/queue_depth          
echo: write error: invalid argument

No clue why the echo fails. As you could see from above, I did not had libata.force=noncq. So, any other way to change the QD?

Cheers&Thanks,
-Dragos
Comment 41 Dragos Vingarzan 2014-02-17 18:16:14 UTC
based on the patch, it seems that NCQ is disabled by it... Well, not what I was looking for to be honest. I mean can we do something other than blacklisting this controller?

I have Mac OS X and Win7 running on this machine, so let me know if I can help with any traces. Needless to say, only in Linux I have terrible performance.
Comment 42 Levente Kurusa 2014-02-17 18:45:04 UTC
Excellent,

do you know any way so that we can make sure that win7 or mac OS X actually
USES (not SUPPORTS, but USES it) NCQ?

Oh, and I have an idea which might be worth dumping, but unfortunately I
don't know how to dump the _GTF field. Tejun, maybe you know how to do that?
Reason I want to do so, is because IIRC _GTF is used to setup the devices, right?
If it so, it is entirely possible that the ACPI BIOS might want to write somekind
of init-sequence to the device. Currently, (looking at dmesg) we don't write
the contents of _GTF to the device since its length (8) is bigger than the
count of IDE registers(7)... Just a thought.


Thanks,
Levente Kurusa
Comment 43 Dragos Vingarzan 2014-02-18 09:51:02 UTC
That is a good question. So I booted in Windows and HD-Tune seems to think that it does support NCQ. Performance varies of course with the block size: for 8MB blocks it averages close to 1GB/s, still getting 2-300MB/s with 64KB blocks. Throughput is not very stable, but nothing to complain about.

An interesting thing was that HD-Tune reported it to be running quite hot at 55C. In Linux it ranges between 41C and 53C. I did a test and there seems to be no  correlation between temperature and performance. So I'd rule that out.

The behavior for me is that on frequent operations it grinds down with fsync() sys calls taking 1-2 seconds to complete for every small file. I have the feeling then that doing something stupid and counterintuitive (e.g. cat /dev/sda) actually makes the performance of fsync better. Then if there is no disk activity for a while it comes back to sane performance.
Comment 44 Tejun Heo 2014-02-18 15:11:13 UTC
Levente, you can access _GTF using acpidump. I'm a bit skeptical whether _GTF would be doing anything drastic tho. Note that the controller is embedded with the drive and the problem is likely to be between the controller and host rather than what it represents as the attached drive, I suspect.

For now, blacklisting ncq is probably the right thing to do so that those machines can at least boot.

Thanks.
Comment 45 Tejun Heo 2014-02-18 15:27:51 UTC
Created attachment 126611 [details]
0001-ahci-disable-NCQ-on-Samsung-pci-e-SSDs-on-macbooks.patch

Applied to libata/for-3.14-fixes.  Thanks.
Comment 46 Lasse Makholm 2014-04-23 13:10:14 UTC
I'm also affected by this on a new MacBook Pro 11,3. With a vanilla 3.14 kernel (that includes the above patch and thus disables NCQ for this controller by default, the problems are far from gone.

With NCQ disabled, the system runs and boots mostly without (I/O-related) problems but for certain workloads, performance drops through the floor. It seems to bottom out at about 3 writes/second. Typically this happens during write and fsync() intensive (and presumably single-threaded) workloads such as during apt-get install|upgrade or running resize2fs. I see no errors in dmesg or any other signs of anything wrong - except lousy performance...

I'm wondering if NCQ is a bit of a red herring here and merely aggravates a different root problem. I don't, however, have the knowledge to know where to look instead...

Please let us know if there is anything we can do to help to figure this out. Be it testing patches/running tests/diagnostics/etc...
Comment 47 Lasse Makholm 2014-04-23 13:21:04 UTC
Here is an iostat log snippet illustrating the poor I/O performance while apt is installing a few bug packages:

Device:  wrqm/s     w/s    wkB/s  w_await   svctm   %util
sda        0.00    0.00     0.00     0.00    0.00  100.00
sda       23.00   11.00   200.00   753.82   90.91  100.00
sda        0.00    7.00   136.00  1005.14  142.86  100.00
sda       10.00    9.00   328.00   840.00  111.11  100.00
sda       29.00   11.00   124.00   556.36   90.91  100.00
sda        0.00   18.00   132.00  1474.67   55.56  100.00
sda        0.00    4.00    16.00  1591.00  250.00  100.00
sda        0.00    7.00    36.00  1840.00  142.86  100.00
sda        0.00    0.00     0.00     0.00    0.00  100.00
sda       15.00    7.00    52.00   481.71  142.86  100.00
sda        0.00    4.00    28.00   568.00  250.00  100.00
sda       21.00   10.00   144.00   461.20  100.00  100.00
sda       11.00   10.00   356.00   492.80  100.00  100.00
sda        1.00    3.00    24.00  1112.00  333.33  100.00
sda       18.00    8.00   132.00   549.00  125.00  100.00
sda        0.00    3.00    16.00   872.00  333.33  100.00
sda       27.00    5.00    64.00   892.00  200.00  100.00
sda        0.00    5.00    72.00    20.00  200.80  100.40
sda        6.00   18.00   644.00   545.78   55.56  100.00
sda       17.00   17.00   132.00   191.76   58.82  100.00
sda      141.00  296.00  3976.00    17.26    3.20   94.80
sda        0.00    3.00     8.00   945.33  333.33  100.00
sda       22.00    1.00     0.00    28.00 1000.00  100.00
sda        7.00    8.00   444.00   704.00  125.00  100.00
sda        0.00    5.00     8.00   691.20  200.00  100.00
sda        0.00    5.00    12.00   548.80  200.00  100.00
sda       22.00    8.00   104.00   711.50  125.00  100.00
sda       37.00   55.00   700.00   375.85   18.18  100.00
sda       43.00   72.00   416.00    55.50   13.67   98.40
sda      143.00  170.00  1352.00     1.39    5.57   95.20
sda       26.00    5.00   112.00   308.80  200.00  100.00
sda        0.00    3.00     4.00   757.33  333.33  100.00
sda        6.00    8.00  1468.00   561.00  125.00  100.00
sda       20.00    4.00    76.00   734.00  250.00  100.00
sda       14.00    4.00    88.00   842.00  250.00  100.00
sda        0.00    0.00     0.00     0.00    0.00  100.00

And here with all the columns: http://pastebin.com/yYrcxMu4
Comment 48 Levente Kurusa 2014-04-23 15:02:03 UTC
Hi,

can you please post a dmesg with the patch applied?

Thanks
Levente Kurusa
Comment 49 Lasse Makholm 2014-04-23 16:03:48 UTC
Sure: http://pastebin.com/UFjzssQ3
Comment 50 Brennan Shacklett 2014-09-03 06:10:31 UTC
fsync performance is bad because SYNCHRONIZE CACHE / FLUSH CACHE takes anywhere from 1.2 to .02 seconds (I'm basing this off of the dmesg timestamps between the issued command and ahci_hw_interrupt). I tend to get much more predictable performance by just turning off write caching for the SSD (or by manually disabling the flush command), but that obviously isn't a real solution. 

I haven't been able to figure out the underlying cause of these issues, but I have an affected machine (MBP 11,3), and some kernel development experience so if anyone could point me in the right direction (or any direction really) that would be very helpful.

Thanks,
Brennan
Comment 51 Brennan Shacklett 2014-09-04 03:36:17 UTC
Also of interest: disabling MSI on the SSD fixes NCQ but the poor FLUSH_EXT performance remains.
Comment 52 Nathan Grennan 2014-09-17 18:42:24 UTC
I tried Brennan's idea of disabling MSI on the SSD, using device_nomsi=0x1600144d on the kernel command line. It does seem to work.

A thought I had was to try nobarrier as an ext4 mount option on my / filesystem. It restores performance, though at the cost of potential reliability. It is a trade off I am willing to make for now. I do break off /home as it's own filesystem. My really valuable data is there. Worst case I reinstall the OS. I would guess OSX doesn't do write barriers.
Comment 53 Tejun Heo 2014-09-17 21:50:29 UTC
Created attachment 150741 [details]
samsung-1600-nomsi.patch

The attached patch plugs MSI and allows NCQ. Can someone please verify that the patch makes NCQ work w/o any extra kernel params?

Thanks!
Comment 54 dorin 2014-09-18 19:47:34 UTC
(In reply to Tejun Heo from comment #53)

appears to be working fine for me.
Comment 55 Imre Kaloz 2014-10-19 22:57:59 UTC
(In reply to Tejun Heo from comment #53)

confirmed working here, too
Comment 56 Ben Copeland 2014-10-23 08:26:36 UTC
I have a MBP 11,2 and the patch made no difference. Still get high IO wait.

Kernel compile time before: 9.15.98
Kernel compile time after: 9.20.20.

I also noticed when launching chromium I get extremely high IO wait. 

I *DO* not run with libata.force=noncq.
Comment 57 Ben Copeland 2014-10-23 08:29:35 UTC
11,1 not 11,2. (MBP 13")
Comment 58 Imre Kaloz 2014-10-23 09:57:19 UTC
Are you sure NCQ is active? Did you also disable barriers for ext4?
Comment 59 Ben Copeland 2014-10-23 10:31:12 UTC
okay.. added barrier=0 into fstab and now iowait issue seems to have gone away. 

Just booted into my "non-patched" kernel which has NCQ disabled, and the iowait issues have also gone away. Why does disabling barriers make everything so much better? It is also a dangerous place to be in, I was hoping for a performance boost without disabling barriers.

queue_depth in patched kernel = 31
queue_depth in non-patched kernel = 1

Thanks for your time.
Comment 60 Tejun Heo 2014-10-27 14:20:03 UTC
W/o barrier, journal is kinda useless and you're risking corrupting the filesystem across power loss, which may or may not be okay depending on your use case, I guess. I'll apply the patch to libata/for-3.18-fixes w/ stable cc'd.

Thanks.
Comment 61 Tejun Heo 2014-10-27 14:32:01 UTC
Patch applied to libata/for-3.18-fixes w/ stable cc'd.

http://lkml.kernel.org/g/20141027143052.GM4436@htj.dyndns.org

Thanks.
Comment 62 Alex Fiestas 2015-02-20 21:28:29 UTC
Another affected user here, anything I can do to give debug information please ask... :/
Comment 63 Alex Fiestas 2015-02-20 21:51:42 UTC
To clarify affected on the poor performance of FLUSH_EXT, the MSI/NCQ issue is fixed (or workarounded) on 3.19.
Comment 64 Alex Feinman 2015-05-07 15:28:30 UTC
On MBP 11,3 (Samsung PCIe SSD 144d:1600) with kernel 3.19 (Ubuntu 15.04) the problem remains. High IOwait when performing any disk activity. The worst offenders are launching Chrome to restore a large number of windows and dpkg (e.g. installing linux-headers, which apparently involves writing many small files)

Example iostat when launching chrome:
Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00    31.00    4.00    7.00   200.00   380.00   105.45     6.53  697.45 1104.00  465.14  90.91 100.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           7.65    0.00    1.13   46.55    0.00   44.67

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               1.00    27.00   13.00   17.00   684.00   936.50   108.03     6.37  252.27  196.92  294.59  33.33 100.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           3.03    0.00    0.38   52.84    0.00   43.74

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               1.00    19.00    5.00    7.00   324.00   236.00    93.33     9.06  842.67  916.80  789.71  83.33 100.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           2.02    0.00    0.51   36.36    0.00   61.11

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               1.00     0.00    2.00    5.00   260.00    84.00    98.29    13.07  980.00 1148.00  912.80 142.86 100.00


This goes on for several minutes. Setting barrier=0 in fstab solves the problem, but as others pointed out, it makes fs vulnerable to corruption and is not really a solution
Comment 65 devsk 2015-05-13 05:42:56 UTC
Another user affected by this. The 'sync' takes several seconds to finish. The 4k random IO performance is horrendous (in KB/s).

The whole point of an SSD is to improve the 4k random IO performance...:)

The question is: where is this slow FLUSH_EXT performance resulting from for this SSD? How can we debug this?

It certainly is not an issue when booted into OS X.
Comment 66 Stephen Niedzielski 2015-05-17 01:47:22 UTC
apt-get install brings my MacBook Pro 15" mid-2014 to an unusable stutter for many minutes. Chromium also often displays the status line "Waiting for AppCache..." for several to ten seconds when visiting any website. To summarize, if I understand correctly, the present workaround in layman's terms is:

1. Add barrier=0 option to /etc/fstab. This risks filesystem corruption in the event of a power loss.
2. Add device_nomsi=0x1600144d to GRUB_CMDLINE_LINUX_DEFAULT in /etc/default/grub and run grub-mkconfig -o /boot/grub/grub.cfg.
Comment 67 Alex Feinman 2015-05-17 04:02:31 UTC
(In reply to Stephen Niedzielski from comment #66)
> apt-get install brings my MacBook Pro 15" mid-2014 to an unusable stutter
> for many minutes. Chromium also often displays the status line "Waiting for
> AppCache..." for several to ten seconds when visiting any website. To
> summarize, if I understand correctly, the present workaround in layman's
> terms is:
> 
> 1. Add barrier=0 option to /etc/fstab. This risks filesystem corruption in
> the event of a power loss.
> 2. Add device_nomsi=0x1600144d to GRUB_CMDLINE_LINUX_DEFAULT in
> /etc/default/grub and run grub-mkconfig -o /boot/grub/grub.cfg.

Yes, this pretty much summarizes my experiences
Comment 68 Brennan Shacklett 2015-05-18 23:51:31 UTC
(In reply to Stephen Niedzielski from comment #66)
> apt-get install brings my MacBook Pro 15" mid-2014 to an unusable stutter
> for many minutes. Chromium also often displays the status line "Waiting for
> AppCache..." for several to ten seconds when visiting any website. To
> summarize, if I understand correctly, the present workaround in layman's
> terms is:
> 
> 1. Add barrier=0 option to /etc/fstab. This risks filesystem corruption in
> the event of a power loss.
> 2. Add device_nomsi=0x1600144d to GRUB_CMDLINE_LINUX_DEFAULT in
> /etc/default/grub and run grub-mkconfig -o /boot/grub/grub.cfg.

Modern kernels should only need 1. I'll look into this more in the near future -- I stopped working on it when school started for me.
Comment 69 Stephen Niedzielski 2015-05-19 03:29:07 UTC
Thank you for the confirmation, Alex and Brennan. I appreciate it. I've only tried with both changes made on Linux 3.19.0. Chromium has been working much better and apt-get works night and day better. I've disabled suspend, which seems to fail sporadically on me previous to the change, and am hoping regular back ups will be adequate in the event of a corrupted filesystem.
Comment 70 Stephen Niedzielski 2015-05-25 03:48:05 UTC
I tried with just barrier=0 on 3.13.0-rc7 and that works too.
Comment 71 June Tate-Gans 2015-08-29 19:33:50 UTC
Running Debian testing debian-installer with kernel 4.1.3-1 and I note that the high I/O wait condition is still a problem, even on MacBookPro11,1 systems.
Comment 72 June Tate-Gans 2015-08-31 17:00:46 UTC
I should also note that this issue prompted me to use btrfs in conjunction due to its faster speed than ext4 (prior to finding this bug), which eventually lead to filesystem-killing corruption over time. Still trying to isolate whether or not this was simply btrfs failing, or something relating to this bug.
Comment 73 Stephen Niedzielski 2015-08-31 17:11:50 UTC
At some point, I think when I upgraded my distro, barrier=0 seemed to stop working for me so I also set device_nomsi=0x1600144d again. At any rate, the two changes _seem_ to be working fine.
Comment 74 Satadru Pramanik 2016-01-14 02:48:55 UTC
Are there any updates to this issue for newer kernels? I have a MacbookPro11,3 as listed above on kernel 4.4.0 and I'm getting the exact same issues with slow small writes as listed previously.

Setting device_nomsi=0x1600144d doesn't appear to do anything, and setting libata.force=noncq seems to be a partial workaround... but performance is much much higher in OS X or Windows 10.

Amusingly, booting the machine in virtualbox from OS X and just giving virtualbox access to the raw disk partitions through OS X allows disk access to work just fine.

Any information I can give to help find a way to resolve this issue?
Comment 75 Stephen Niedzielski 2016-01-14 04:39:22 UTC
I use a MacBookPro11,3 with kernel v4.2.0-23. I haven't done a comparison with / without in a while, but just device_nomsi=0x1600144d seems to work fine. I do notice slowdowns when I'm installing packages via apt-get but it's nothing dramatic like before. Make sure you run update-grub :) I don't think I've tried libata.force=noncq.
Comment 76 Satadru Pramanik 2016-01-14 05:19:23 UTC
The device_nomsi=0x1600144d option doesn't seem to be making any difference on kernel 4.4. Do you really notice any drive benchmark changes with it enabled? Does that change at all with a newer kernel?

Is there any way to fix msi on this device? I imagine all newer MBPs will have this sort of issue...
Comment 77 Stephen Niedzielski 2016-01-14 14:56:27 UTC
I haven't run any benchmarks or tried with v4.4.
Comment 78 Luke Marsden 2017-01-27 08:04:17 UTC
This is a MacBookPro11,1 running Ubuntu 16.10 with kernel 4.8.0-34-generic.

The onboard SSD is:
04:00.0 SATA controller: Samsung Electronics Co Ltd Apple PCIe SSD (rev 01)

I'm seeing the same high iowait times (observed in await column in `iostat -xz 1` output), which can be triggered easily by apt or other disk-intensive workloads such as bonnie++ (`bonnie++ -d /tmp -s 128 -b -r 0` is enough to trigger it).

I tried `device_nomsi=0x1600144d` to kernel boot args first, this _might_ have made a slight improvement to iowait when running bonnie++, I _think_ it reduced pegged-at-100% `%util` column. But I was eyeballing it, so this might have been confirmation bias.

Then I added `libata.force=noncq` to kernel boot args and this reduced the output of `cat /sys/block/sda/device/queue_depth` from 31 to 1, and I _think_ it made another _slight_ improvement, but things were still pretty rough.

Finally, I added `barrier=0` to my /etc/fstab for the / partition, and (predictably) the overall performance of the machine skyrocketed. It's a night-and-day difference.

There's clearly something still amiss with the system when the first two workarounds are in place. I'm OK with running with the dangerous `barrier=0` setting, but it would be great for other users to not have to debug and find this thread to get it fixed :)

I'm happy to provide debugging data on request and/or to engage in an interactive debugging session if anyone's interested in trying to get to the bottom of this.
Comment 79 Luke Marsden 2017-01-27 08:10:23 UTC
Some additional context: I was able to compare `await` times on this machine versus a Thinkpad X230 running the same kernel and the same bonnie++ test. `await` times were 100ms+ on the MacBookPro11,1 and sub-5ms on the Thinkpad X230, which has a SAMSUNG SSD 830 Series (CXM03B1Q) in it.