Bug 51861

Summary: Intel SSD 520 stops working under load (SSDSC2BW180A3L in Lenovo ThinkPad T430s)
Product: IO/Storage Reporter: Robert Buchholz (rhbugs)
Component: Serial ATAAssignee: Alan (alan)
Status: ASSIGNED ---    
Severity: normal CC: alan, cscsordas+bugzilla.kernel.org, knud.poulsen, pere, szg00000, vrodic
Priority: P1    
Hardware: All   
OS: Linux   
URL: http://forums.lenovo.com/t5/T400-T500-and-newer-T-series/T430s-Intel-SSD-520-180GB-issue/td-p/888083
Kernel Version: 3.6.10-4.fc18.x86_64 Tree: Mainline
Regression: No
Attachments: dmesg written during the disk detach
unofficial Lenovo ssd fw LB3i which /may/ fix this
SSDSC2BW180A3L with LE1i firmware in a T430, UEFI G1ETB0WW (2.70 ) 01/21/2016
SSDSC2BW180A3L with LB3i firmware in a T430, UEFI G1ETB0WW (2.70 ) 01/21/2016

Description Robert Buchholz 2012-12-20 17:45:58 UTC
During heavy write load, the device stops responding, all subsequent read or write access fails with IO errors. I can reproduce the issue syncing 10 1k+ mail maildirs using a multi-threaded MUA at the same time or by inserting many rows into a sqlite file while writing a log file.

The problem has been reported by several users of the notebook/hdd combination:

This does not happen using Windows or Linux using a Samsung SSD. Reportedly, at least Kernel 3.2-3.6.10, with vanilla, Ubuntu and Fedora default configuration are affected.

lshw extract:
          description: BIOS
          vendor: LENOVO
          physical id: e
          version: G7ET29WW (1.11 )
          date: 05/24/2012
          size: 128KiB
          capacity: 15MiB
          capabilities: pci pnp upgrade shadowing cdboot bootselect edd int13floppy720 int5printscreen int9keyboard int14serial int17printer int10video acpi usb biosbootspecification uefi

             description: SATA controller
             product: 7 Series Chipset Family 6-port SATA Controller [AHCI mode]
             vendor: Intel Corporation
             physical id: 1f.2
             bus info: pci@0000:00:1f.2
             version: 04
             width: 32 bits
             clock: 66MHz
             capabilities: storage msi pm ahci_1.0 bus_master cap_list
             configuration: driver=ahci latency=0
             resources: irq:42 ioport:50a8(size=8) ioport:50bc(size=4) ioport:50a0(size=8) ioport:50b8(size=4) ioport:5060(size=32) memory:d2538000-d25387ff

          physical id: 0
          logical name: scsi0
          capabilities: emulated
             description: ATA Disk
             product: INTEL SSDSC2BW18
             physical id: 0.0.0
             bus info: scsi@0:0.0.0
             logical name: /dev/sda
             version: LE1i
             size: 167GiB (180GB)
             capabilities: partitioned partitioned:dos
             configuration: ansiversion=5 sectorsize=512 signature=0000f15b
Comment 1 Robert Buchholz 2012-12-20 17:46:41 UTC
Created attachment 89521 [details]
dmesg written during the disk detach
Comment 2 Alan 2013-01-02 14:38:54 UTC
[ 1843.098382] ata1.00: exception Emask 0x0 SAct 0xc SErr 0x0 action 0x6 frozen
[ 1843.098387] ata1.00: failed command: WRITE FPDMA QUEUED
[ 1843.098391] ata1.00: cmd 61/08:10:50:96:e6/00:00:03:00:00/40 tag 2 ncq 4096 out
         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1843.098393] ata1.00: status: { DRDY }
[ 1843.098395] ata1.00: failed command: WRITE FPDMA QUEUED
[ 1843.098398] ata1.00: cmd 61/08:18:60:96:e6/00:00:03:00:00/40 tag 3 ncq 4096 out
         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1843.098400] ata1.00: status: { DRDY }
[ 1843.098404] ata1: hard resetting link
[ 1848.440762] ata1: link is slow to respond, please be patient (ready=0)
[ 1853.120703] ata1: COMRESET failed (errno=-16)
[ 1853.120711] ata1: hard resetting link
[ 1858.464024] ata1: link is slow to respond, please be patient (ready=0)
[ 1863.144072] ata1: COMRESET failed (errno=-16)
[ 1863.144080] ata1: hard resetting link
[ 1868.486435] ata1: link is slow to respond, please be patient (ready=0)
[ 1898.093447] ata1: COMRESET failed (errno=-16)
[ 1898.093458] ata1: limiting SATA link speed to 3.0 Gbps
[ 1898.093463] ata1: hard resetting link
[ 1903.129518] ata1: COMRESET failed (errno=-16)
[ 1903.129525] ata1: reset failed, giving up
[ 1903.129527] ata1.00: disabled

Your drive simply stopped responding. Linux tried to reset it but it never came back. Check you have the latest firmware from Intel.

I'll see what I can find out about this with our disk folks.
Comment 3 Robert Buchholz 2013-01-03 14:19:30 UTC
Thanks for getting back on this issue. I booted Intel's latest firmware update tool 1112335T036202M388208850.iso dated 11/27/2012 and verified that I already had the latest firmware installed when I encountered this error.
Comment 4 Christoph Gritschenberger 2013-01-05 11:48:16 UTC
Got the same issue on my brand-new T430s (INTEL SSDSC2BW240A3L (LE1i)).
It's very hard to reproduce but seems related to heavy disk usage like indexers do (e.g. Dropbox startup ip).

I'm running Linux 3.6.11-1-ARCH.
Comment 5 Christoph Gritschenberger 2013-01-05 11:50:53 UTC
Maybe it's worth noting. I'm running my laptop in UEFI-only mode with CSM disabled.
Comment 6 Christoph Gritschenberger 2013-01-11 00:54:25 UTC
OK, for me the issue went away on its own. After the third reinstall (from the same install-media) it just stopped happening. I'd guess the Controller has singled out the defective sectors.
Comment 7 Petter Reinholdtsen 2013-07-04 08:09:16 UTC
I believe this is the problem also reported to the Debian BTS as
http://bugs.debian.org/691427 , and the problem I experience with my Thinkpad

In that bug report, Mathieu Desnoyers report that he wrote a tool to trigger the
bug and made it available at https://git.efficios.com/?p=test-ssd.git;a=tree .

In http://forums.lenovo.com/t5/T400-T500-and-newer-T-series/T430s-Intel-SSD-520-180GB-issue/td-p/888083/page/2
a user claim his problem went away after replacing the motherboard.  Perhaps it
really is a controller issue, or perhaps his problem just went away on its own
after a few reboots?

Just wanted to let you know that the problem still affects users. :)
Comment 8 Robert Buchholz 2013-10-22 12:12:04 UTC
Also still seeing this error when I do multi-threaded writes to that disk.
Comment 9 Alan 2014-01-20 18:32:51 UTC
Does disabling NCQ on the drive help. If it does then we can probably blacklist that exact device identifier, but if not then I can't see anything we can do but close as WONTFIX.
Comment 10 Christoph Gritschenberger 2014-03-12 07:44:20 UTC
I don't think this is related to Linux after all.
I can reproduce this problem under Linux AND Windows (8) using sqlite.
*It's rather an RMA-casem, not a kernel-bug.*

1. generate a large SQL-INSERT-script using the shell-script at the bottom.
--> generate.sh > out.sql
2. execute it against a new sqlite-database
--> sqlite3 test.sqlite < out.sql
3. wait a few minutes

On Linux I was able to observe the behavior described in this bug. On Windows I saw, that I could no longer open new applications, but only use those that are already open. Eventually the system would crash with a bluescreen.

--> generate.sh

for x in `seq 100000`; do
    VALUES ($x, 'FOO_${x}_1', 'FOO_${x}_2', 'FOO_${x}_3', 'FOO_${x}_4', 'FOO_${x}_5', 'FOO_${x}_6', 'FOO_${x}_7', 'FOO_${x}_8', 'FOO_${x}_9', 'FOO_${x}_10');"
Comment 11 Knud Poulsen 2014-06-10 18:23:04 UTC
Check out: http://forums.lenovo.com/t5/X-Series-ThinkPad-Laptops/x230-SATA-errors-with-180GB-Intel-520-SSD-under-heavy-write-load/td-p/1066041/page/5

A Lenovo employee has posted an ssd firmware update "LB3i" that may fix this, search for "Re: x230: SATA errors with 180GB Intel 520 SSD under heavy write load"

I've attached his file here also, password: Lenovo

Testing it myself now...
Comment 12 Knud Poulsen 2014-06-10 18:24:47 UTC
Created attachment 138951 [details]
unofficial Lenovo ssd fw LB3i which /may/ fix this

unofficial Lenovo fw LB3i from: http://forums.lenovo.com/t5/X-Series-ThinkPad-Laptops/x230-SATA-errors-with-180GB-Intel-520-SSD-under-heavy-write-load/td-p/1066041/page/5 , unzip with password: Lenovo
Comment 13 Knud Poulsen 2014-06-12 07:34:37 UTC
The LB3i firmware for 520 ssd's alleviates (but does not solve) the problem in my case. The ssd still becomes unresponsive under continuous write, but after a significantly longer time than with the previous firmware, up from 10 minutes to 30 minutes in my case, this only happens on the Lenovo laptop, not when hooking it up to another computer.
Comment 14 Christoph Gritschenberger 2014-06-13 08:35:03 UTC
I had no luck with the LB3i-firmware. Still got a bluescreen on Windows.
Sadly Lenovo refuses to admit the bug (despite my efforts to proof and reproduce it)
Comment 15 Csaba Csordás 2016-05-15 17:07:33 UTC
I'm having the same problem. The newest experimental Lenovo firmware did not resolve the issue. So, this drive is useless. Wish I could flash it with the Intel firmware 400i.
Comment 16 Csaba Csordás 2016-05-15 17:15:12 UTC
Created attachment 216331 [details]
SSDSC2BW180A3L with LE1i firmware in a T430, UEFI G1ETB0WW (2.70 ) 01/21/2016
Comment 17 Csaba Csordás 2016-05-15 17:15:19 UTC
Created attachment 216341 [details]
SSDSC2BW180A3L with LB3i firmware in a T430, UEFI G1ETB0WW (2.70 ) 01/21/2016
Comment 18 Csaba Csordás 2016-05-23 16:48:54 UTC
Interestingly, Hewlett Packard has also released the LB3i firmware version in sp61213.
They also mention:
"For the Intel SSD 520 Series models: 
- Provides improved PHY margin setting to prevented losing the link to the SATA

However, that is for the "HP variant" of SSDSC2BW180A3: "INTEL SSDSC2BW180A3H="180GB Intel SSD 520 Series Drive".