Bug 51861
Summary: | Intel SSD 520 stops working under load (SSDSC2BW180A3L in Lenovo ThinkPad T430s) | ||
---|---|---|---|
Product: | IO/Storage | Reporter: | Robert Buchholz (rhbugs) |
Component: | Serial ATA | Assignee: | Alan (alan) |
Status: | ASSIGNED --- | ||
Severity: | normal | CC: | alan, cscsordas+bugzilla.kernel.org, knud.poulsen, pere, szg00000, vrodic |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
URL: | http://forums.lenovo.com/t5/T400-T500-and-newer-T-series/T430s-Intel-SSD-520-180GB-issue/td-p/888083 | ||
Kernel Version: | 3.6.10-4.fc18.x86_64 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
dmesg written during the disk detach
unofficial Lenovo ssd fw LB3i which /may/ fix this SSDSC2BW180A3L with LE1i firmware in a T430, UEFI G1ETB0WW (2.70 ) 01/21/2016 SSDSC2BW180A3L with LB3i firmware in a T430, UEFI G1ETB0WW (2.70 ) 01/21/2016 |
Description
Robert Buchholz
2012-12-20 17:45:58 UTC
Created attachment 89521 [details]
dmesg written during the disk detach
[ 1843.098382] ata1.00: exception Emask 0x0 SAct 0xc SErr 0x0 action 0x6 frozen [ 1843.098387] ata1.00: failed command: WRITE FPDMA QUEUED [ 1843.098391] ata1.00: cmd 61/08:10:50:96:e6/00:00:03:00:00/40 tag 2 ncq 4096 out res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [ 1843.098393] ata1.00: status: { DRDY } [ 1843.098395] ata1.00: failed command: WRITE FPDMA QUEUED [ 1843.098398] ata1.00: cmd 61/08:18:60:96:e6/00:00:03:00:00/40 tag 3 ncq 4096 out res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [ 1843.098400] ata1.00: status: { DRDY } [ 1843.098404] ata1: hard resetting link [ 1848.440762] ata1: link is slow to respond, please be patient (ready=0) [ 1853.120703] ata1: COMRESET failed (errno=-16) [ 1853.120711] ata1: hard resetting link [ 1858.464024] ata1: link is slow to respond, please be patient (ready=0) [ 1863.144072] ata1: COMRESET failed (errno=-16) [ 1863.144080] ata1: hard resetting link [ 1868.486435] ata1: link is slow to respond, please be patient (ready=0) [ 1898.093447] ata1: COMRESET failed (errno=-16) [ 1898.093458] ata1: limiting SATA link speed to 3.0 Gbps [ 1898.093463] ata1: hard resetting link [ 1903.129518] ata1: COMRESET failed (errno=-16) [ 1903.129525] ata1: reset failed, giving up [ 1903.129527] ata1.00: disabled Your drive simply stopped responding. Linux tried to reset it but it never came back. Check you have the latest firmware from Intel. I'll see what I can find out about this with our disk folks. Thanks for getting back on this issue. I booted Intel's latest firmware update tool 1112335T036202M388208850.iso dated 11/27/2012 and verified that I already had the latest firmware installed when I encountered this error. Got the same issue on my brand-new T430s (INTEL SSDSC2BW240A3L (LE1i)). It's very hard to reproduce but seems related to heavy disk usage like indexers do (e.g. Dropbox startup ip). I'm running Linux 3.6.11-1-ARCH. Maybe it's worth noting. I'm running my laptop in UEFI-only mode with CSM disabled. OK, for me the issue went away on its own. After the third reinstall (from the same install-media) it just stopped happening. I'd guess the Controller has singled out the defective sectors. I believe this is the problem also reported to the Debian BTS as http://bugs.debian.org/691427 , and the problem I experience with my Thinkpad X230. In that bug report, Mathieu Desnoyers report that he wrote a tool to trigger the bug and made it available at https://git.efficios.com/?p=test-ssd.git;a=tree . In http://forums.lenovo.com/t5/T400-T500-and-newer-T-series/T430s-Intel-SSD-520-180GB-issue/td-p/888083/page/2 a user claim his problem went away after replacing the motherboard. Perhaps it really is a controller issue, or perhaps his problem just went away on its own after a few reboots? Just wanted to let you know that the problem still affects users. :) Also still seeing this error when I do multi-threaded writes to that disk. Does disabling NCQ on the drive help. If it does then we can probably blacklist that exact device identifier, but if not then I can't see anything we can do but close as WONTFIX. I don't think this is related to Linux after all. I can reproduce this problem under Linux AND Windows (8) using sqlite. *It's rather an RMA-casem, not a kernel-bug.* Steps: 1. generate a large SQL-INSERT-script using the shell-script at the bottom. --> generate.sh > out.sql 2. execute it against a new sqlite-database --> sqlite3 test.sqlite < out.sql 3. wait a few minutes On Linux I was able to observe the behavior described in this bug. On Windows I saw, that I could no longer open new applications, but only use those that are already open. Eventually the system would crash with a bluescreen. --> generate.sh #!/bin/sh echo "CREATE TABLE TEST_1 ( ID INTEGER, VALUE VARCHAR, VALUE2 VARCHAR, VALUE3 VARCHAR, VALUE4 VARCHAR, VALUE5 VARCHAR, VALUE6 VARCHAR, VALUE7 VARCHAR, VALUE8 VARCHAR, VALUE9 VARCHAR, VALUE10 VARCHAR );" for x in `seq 100000`; do echo "INSERT INTO TEST_1 (ID, VALUE, VALUE2, VALUE3, VALUE4, VALUE5, VALUE6, VALUE7, VALUE8, VALUE9, VALUE10) VALUES ($x, 'FOO_${x}_1', 'FOO_${x}_2', 'FOO_${x}_3', 'FOO_${x}_4', 'FOO_${x}_5', 'FOO_${x}_6', 'FOO_${x}_7', 'FOO_${x}_8', 'FOO_${x}_9', 'FOO_${x}_10');" done Check out: http://forums.lenovo.com/t5/X-Series-ThinkPad-Laptops/x230-SATA-errors-with-180GB-Intel-520-SSD-under-heavy-write-load/td-p/1066041/page/5 A Lenovo employee has posted an ssd firmware update "LB3i" that may fix this, search for "Re: x230: SATA errors with 180GB Intel 520 SSD under heavy write load" I've attached his file here also, password: Lenovo Testing it myself now... Created attachment 138951 [details] unofficial Lenovo ssd fw LB3i which /may/ fix this unofficial Lenovo fw LB3i from: http://forums.lenovo.com/t5/X-Series-ThinkPad-Laptops/x230-SATA-errors-with-180GB-Intel-520-SSD-under-heavy-write-load/td-p/1066041/page/5 , unzip with password: Lenovo The LB3i firmware for 520 ssd's alleviates (but does not solve) the problem in my case. The ssd still becomes unresponsive under continuous write, but after a significantly longer time than with the previous firmware, up from 10 minutes to 30 minutes in my case, this only happens on the Lenovo laptop, not when hooking it up to another computer. I had no luck with the LB3i-firmware. Still got a bluescreen on Windows. Sadly Lenovo refuses to admit the bug (despite my efforts to proof and reproduce it) I'm having the same problem. The newest experimental Lenovo firmware did not resolve the issue. So, this drive is useless. Wish I could flash it with the Intel firmware 400i. Created attachment 216331 [details]
SSDSC2BW180A3L with LE1i firmware in a T430, UEFI G1ETB0WW (2.70 ) 01/21/2016
Created attachment 216341 [details]
SSDSC2BW180A3L with LB3i firmware in a T430, UEFI G1ETB0WW (2.70 ) 01/21/2016
Interestingly, Hewlett Packard has also released the LB3i firmware version in sp61213. They also mention: "For the Intel SSD 520 Series models: - Provides improved PHY margin setting to prevented losing the link to the SATA drive." However, that is for the "HP variant" of SSDSC2BW180A3: "INTEL SSDSC2BW180A3H="180GB Intel SSD 520 Series Drive". ftp://ftp.hp.com/pub/softpaq/sp61001-61500/sp61213.cva ftp://ftp.hp.com/pub/softpaq/sp61001-61500/sp61213.html ftp://ftp.hp.com/pub/softpaq/sp61001-61500/sp61213.exe |