Bug 202633 - "nvme timeout aborting" error causes system hang
Summary: "nvme timeout aborting" error causes system hang
Status: NEW
Alias: None
Product: IO/Storage
Classification: Unclassified
Component: Other (show other bugs)
Hardware: x86-64 Linux
: P1 blocking
Assignee: io_other
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-02-20 22:59 UTC by lists
Modified: 2019-09-04 06:55 UTC (History)
2 users (show)

See Also:
Kernel Version: 4.20.10 5.0.2
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
screen with journal logs (861.17 KB, image/jpeg)
2019-02-20 22:59 UTC, lists
Details
dmesg output after the event (98.17 KB, text/plain)
2019-03-10 23:06 UTC, root
Details
same crash using kernel 5.0.1 (537.57 KB, image/jpeg)
2019-03-14 10:19 UTC, lists
Details

Description lists 2019-02-20 22:59:11 UTC
Created attachment 281249 [details]
screen with journal logs

kernel 4.20.10 (Arch Linux) on thinkpad p1, randomically the system hangs and I need to hard reset,

the log shows:

I/O 50 QID 4 timeout, aborting
....
I/O 50 QID 4 timeout, reset controller
I/O 20 QID 0 timeout, reset controller

you can find attached a screen with the last logs.

Here are the disk details:

LENSE30512GMSP34MEAT3TA
  DeviceId:             c6a0cfba7c7d81e253fce571e1d1e9f6003ae1c7
  Guid:                 70cc283d-bc6c-5b87-9977-17c5f95c3168
  Guid:                 0696debf-90a2-5c62-8c96-10857c206b91
  Serial:               FBFB18111JB0000637
  Summary:              NVM Express Solid State Drive
  Plugin:               nvme
  Flags:                internal|updatable|require-ac|registered|needs-reboot
  Vendor:               Lenovo
  VendorId:             NVME:0x17AA
  Version:              1.4.0412
  Icon:                 drive-harddisk
  Created:              2019-02-20

please let me know if you need additional info, I hope this bug could be resolved soon, thanks!
Comment 1 lists 2019-02-27 18:19:00 UTC
the issue happens once every 2-3 days. 

Has anyone run into the same problem?
Comment 2 root 2019-03-10 23:04:19 UTC
I run into this issue but only while playing games. Presumably the higher I/O load caused from loading resources from disk causes it. In order to actually get my system back into a usable state I have to fully poweroff, not reboot, and then power it back on. This is happening for me in the mainline kernel 5.0.0
Comment 3 root 2019-03-10 23:06:00 UTC
Created attachment 281689 [details]
dmesg output after the event
Comment 4 lists 2019-03-14 10:19:05 UTC
Created attachment 281819 [details]
same crash using kernel 5.0.1

this happen using kernel 5.0.1 too. What info are needed to make a kernel developer interested in this issue? thanks
Comment 5 lists 2019-03-14 11:56:04 UTC
I update my uefi bios to the latest available version, non nvme firmware update is available for my drive at this time.

I don't have windows installed so I cannot verify if this is an hw issue or a sw issue on linux.

root@scoopta.ninja, do you also have Windows installed on your your system? If so do the issue happen on Windows too?
Comment 6 root 2019-03-14 16:42:41 UTC
(In reply to lists from comment #5)
> I update my uefi bios to the latest available version, non nvme firmware
> update is available for my drive at this time.
> 
> I don't have windows installed so I cannot verify if this is an hw issue or
> a sw issue on linux.
> 
> root@scoopta.ninja, do you also have Windows installed on your your system?
> If so do the issue happen on Windows too?

Sorry, I haven't touched windows in the last 4 years. That being said I'm fairly certain it's a software issue because certain kernel versions don't seem to exhibit the issue or at least if they do it's so rare I've never seen it. Take the 4.19 series for example. 4.19.25-27 do but 4.19.24 doesn't. I haven't tried any of the newer 4.19 versions. In fact running 4.19.24 is my current temporary fix. I can't guarantee it doesn't suffer from this but if it does it's so rare that I doubt it's a coincidence.
Comment 7 xonatius 2019-09-04 06:11:46 UTC
https://support.lenovo.com/us/en/solutions/ht508405 - lenovo released a firmware update for the issue. If your NVMe become a complete vegetable, contact lenovo support and they'll replace it (In my case I got Samsung MZVLB512HAJQ-000L7)
Comment 8 lists 2019-09-04 06:55:19 UTC
The firmware update made no difference for me.

I contacted lenovo and they replaced my ssd (I got a SAMSUNG MZVLB512HAJQ-000L7 too), no problems since then (27 Jun 2019).

Note You need to log in before you can comment on or make changes to this bug.