Bug 196325

Summary: Resume issue: "nvme nvme0: I/O 18 QID 0 timeout, disable controller"
Product: IO/Storage Reporter: Martin Peres (martin.peres)
Component: OtherAssignee: io_other
Status: NEW ---    
Severity: normal CC: jan.public, kbusch, marc, systeminterrupt, tomi.p.sarvela
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: Linux-next-20170710 Subsystem:
Regression: No Bisected commit-id:

Description Martin Peres 2017-07-11 13:12:45 UTC
On Intel GFX CI's kbl-7700k, we have been hitting this issue once every other suspend-resume cycle:

<4>[  175.090115] nvme nvme0: I/O 18 QID 0 timeout, disable controller
<3>[  175.213259] nvme nvme0: failed to set APST feature (-4)
<3>[  185.329984] INFO: task rs:main Q:Reg:519 blocked for more than 60 seconds.
<3>[  185.337698]       Tainted: G     U          4.12.0-next-20170711-CI-next-20170711 #1
<3>[  185.345655] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
<6>[  185.353722] rs:main Q:Reg   D    0   519      1 0x00000000
<4>[  185.353724] Call Trace:
<4>[  185.353743]  __schedule+0x405/0xb40
<4>[  185.353746]  ? bit_wait+0x50/0x50
<4>[  185.353747]  schedule+0x3b/0x90
<4>[  185.353750]  io_schedule+0x11/0x40
<4>[  185.353753]  bit_wait_io+0xc/0x50
<4>[  185.353755]  __wait_on_bit+0x5c/0x90
<4>[  185.353758]  out_of_line_wait_on_bit+0x7c/0x80
<4>[  185.353760]  ? bit_waitqueue+0x30/0x30
<4>[  185.353762]  do_get_write_access+0x1e8/0x660
<4>[  185.353765]  jbd2_journal_get_write_access+0x2e/0x60
<4>[  185.353767]  __ext4_journal_get_write_access+0x62/0xb0
<4>[  185.353769]  ? ext4_dirty_inode+0x43/0x60
<4>[  185.353771]  ext4_reserve_inode_write+0x90/0xb0
<4>[  185.353773]  ext4_mark_inode_dirty+0x69/0x2e0
<4>[  185.353775]  ext4_dirty_inode+0x43/0x60
<4>[  185.353778]  __mark_inode_dirty+0x78/0x710
<4>[  185.353781]  generic_update_time+0x73/0xc0
<4>[  185.353783]  ? mnt_clone_write+0x39/0x60
<4>[  185.353785]  file_update_time+0x90/0xe0
<4>[  185.353788]  __generic_file_write_iter+0x97/0x1e0
<4>[  185.353791]  ext4_file_write_iter+0xf4/0x390
<4>[  185.353795]  __vfs_write+0xc1/0x130
<4>[  185.353798]  vfs_write+0xc5/0x1d0
<4>[  185.353800]  ? mutex_lock_nested+0x16/0x20
<4>[  185.353802]  SyS_write+0x44/0xb0
<4>[  185.353804]  entry_SYSCALL_64_fastpath+0x1c/0xb1
<4>[  185.353806] RIP: 0033:0x7f02b461068d
<4>[  185.353807] RSP: 002b:00007f02b19c25c0 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
<4>[  185.353809] RAX: ffffffffffffffda RBX: ffffffff8146e023 RCX: 00007f02b461068d
<4>[  185.353810] RDX: 00000000000000d1 RSI: 00007f02a8002fb0 RDI: 0000000000000005
<4>[  185.353811] RBP: ffffc900009cbf88 R08: 00000081b0e51e30 R09: 0000000000000003
<4>[  185.353812] R10: 00000081b0e50db0 R11: 0000000000000293 R12: 00000081b0e50ce0
<4>[  185.353813] R13: 00007f02b19c25e0 R14: 00007f02a8002cf0 R15: 00007f02a8002fb0
<4>[  185.353823]  ? __this_cpu_preempt_check+0x13/0x20
<4>[  185.353827]
<4>[  185.353827] Showing all locks held in the system:
<4>[  185.353834] 2 locks held by khungtaskd/66:
<4>[  185.353845]  #0:  (rcu_read_lock){......}, at: [<ffffffff81138c2c>] watchdog+0x9c/0x620
<4>[  185.353851]  #1:  (tasklist_lock){.+.+..}, at: [<ffffffff810d76ad>] debug_show_all_locks+0x3d/0x1a0
<4>[  185.353857] 1 lock held by khugepaged/71:
<4>[  185.353857]  #0:  (&mm->mmap_sem){++++++}, at: [<ffffffff811f2fea>] khugepaged+0x49a/0x2360
<4>[  185.353865] 4 locks held by rs:main Q:Reg/519:
<4>[  185.353866]  #0:  (&f->f_pos_lock){+.+.+.}, at: [<ffffffff8122a59e>] __fdget_pos+0x3e/0x50
<4>[  185.353871]  #1:  (sb_writers#3){.+.+.+}, at: [<ffffffff81206cfb>] vfs_write+0x17b/0x1d0
<4>[  185.353880]  #2:  (&sb->s_type->i_mutex_key#14){+.+.+.}, at: [<ffffffff8129e9f4>] ext4_file_write_iter+0x54/0x390
<4>[  185.353889]  #3:  (jbd2_handle){++++..}, at: [<ffffffff812ea0ff>] start_this_handle+0xff/0x440
<4>[  185.353897] 1 lock held by java/1073:
<4>[  185.353898]  #0:  (&mm->mmap_sem){++++++}, at: [<ffffffff8104b8d1>] __do_page_fault+0x491/0x560
<4>[  185.353905] 1 lock held by java/1170:
<4>[  185.353906]  #0:  (&mm->mmap_sem){++++++}, at: [<ffffffff811a3db1>] vm_mmap_pgoff+0x51/0xd0
<4>[  185.353911] 3 locks held by java/1171:
<4>[  185.353912]  #0:  (&mm->mmap_sem){++++++}, at: [<ffffffff8104b545>] __do_page_fault+0x105/0x560
<4>[  185.353917]  #1:  (sb_pagefaults){.+.+..}, at: [<ffffffff812b5d8d>] ext4_page_mkwrite+0x4d/0x4b0
<4>[  185.353922]  #2:  (jbd2_handle){++++..}, at: [<ffffffff812ea0ff>] start_this_handle+0xff/0x440
<4>[  185.353927] 1 lock held by java/2422:
<4>[  185.353928]  #0:  (&mm->mmap_sem){++++++}, at: [<ffffffff811a3db1>] vm_mmap_pgoff+0x51/0xd0
<4>[  185.353937] 6 locks held by kworker/u16:59/1553:
<4>[  185.353937]  #0:  ("writeback"){.+.+.+}, at: [<ffffffff8109c208>] process_one_work+0x178/0x670
<4>[  185.353943]  #1:  ((&(&wb->dwork)->work)){+.+.+.}, at: [<ffffffff8109c208>] process_one_work+0x178/0x670
<4>[  185.353948]  #2:  (&type->s_umount_key#33){++++++}, at: [<ffffffff8120ace6>] trylock_super+0x16/0x50
<4>[  185.353953]  #3:  (&sbi->s_journal_flag_rwsem){.+.+.+}, at: [<ffffffff8118e137>] do_writepages+0x17/0x70
<4>[  185.353959]  #4:  (jbd2_handle){++++..}, at: [<ffffffff812ea0ff>] start_this_handle+0xff/0x440
<4>[  185.353963]  #5:  (&ei->i_data_sem){++++..}, at: [<ffffffff812ad6a6>] ext4_map_blocks+0x126/0x5b0
<4>[  185.353970] 2 locks held by kworker/u16:73/1567:
<4>[  185.353971]  #0:  ("nvme-wq"){.+.+.+}, at: [<ffffffff8109c208>] process_one_work+0x178/0x670
<4>[  185.353976]  #1:  ((&dev->ctrl.reset_work)){+.+.+.}, at: [<ffffffff8109c208>] process_one_work+0x178/0x670
<4>[  185.353982] 1 lock held by python3/2697:
<4>[  185.353983]  #0:  (&sbi->s_journal_flag_rwsem){.+.+.+}, at: [<ffffffff8118e137>] do_writepages+0x17/0x70
<4>[  185.353989] 3 locks held by dhclient/2736:
<4>[  185.353990]  #0:  (sb_writers#3){.+.+.+}, at: [<ffffffff81206cfb>] vfs_write+0x17b/0x1d0
<4>[  185.353996]  #1:  (&sb->s_type->i_mutex_key#14){+.+.+.}, at: [<ffffffff8129e9f4>] ext4_file_write_iter+0x54/0x390
<4>[  185.354002]  #2:  (jbd2_handle){++++..}, at: [<ffffffff812ea0ff>] start_this_handle+0xff/0x440
<4>[  185.354007]
<4>[  185.354008] =============================================
<4>[  185.354008]
<4>[  185.354009] NMI backtrace for cpu 0
<4>[  185.354011] CPU: 0 PID: 66 Comm: khungtaskd Tainted: G     U          4.12.0-next-20170711-CI-next-20170711 #1
<4>[  185.354012] Hardware name: Gigabyte Technology Co., Ltd. Z170X-UD5/Z170X-UD5-CF, BIOS F22 03/06/2017
<4>[  185.354013] Call Trace:
<4>[  185.354014]  dump_stack+0x67/0x97
<4>[  185.354016]  nmi_cpu_backtrace+0xbf/0xd0
<4>[  185.354018]  ? irq_force_complete_move+0xf0/0xf0
<4>[  185.354020]  nmi_trigger_cpumask_backtrace+0xc0/0x100
<4>[  185.354022]  arch_trigger_cpumask_backtrace+0x14/0x20
<4>[  185.354023]  watchdog+0x45d/0x620
<4>[  185.354025]  kthread+0x10f/0x150
<4>[  185.354026]  ? reset_hung_task_detector+0x10/0x10
<4>[  185.354028]  ? kthread_create_on_node+0x40/0x40
<4>[  185.354029]  ret_from_fork+0x27/0x40
<6>[  185.354033] Sending NMI from CPU 0 to CPUs 1-7:
<4>[  185.354037] NMI backtrace for cpu 4 skipped: idling at pc 0xffffffff8188dba0
<4>[  185.354038] NMI backtrace for cpu 2
<4>[  185.354039] CPU: 2 PID: 231 Comm: systemd-journal Tainted: G     U          4.12.0-next-20170711-CI-next-20170711 #1
<4>[  185.354039] Hardware name: Gigabyte Technology Co., Ltd. Z170X-UD5/Z170X-UD5-CF, BIOS F22 03/06/2017
<4>[  185.354039] task: ffff880230e45040 task.stack: ffffc900005d8000
<4>[  185.354040] RIP: 0033:0x7f8f384beffb
<4>[  185.354041] RSP: 002b:00007ffe3146d050 EFLAGS: 00000206
<4>[  185.354042] RAX: 000000723eac9b1c RBX: 00007ffe3146d5b0 RCX: 00007f8f3882f4c0
<4>[  185.354042] RDX: 00007f8f3882f4c0 RSI: 00007f8f3882e8c0 RDI: 000000723eac9b1c
<4>[  185.354043] RBP: 00007ffe3146d5a0 R08: 0000000000000000 R09: 0000000000000011
<4>[  185.354043] R10: 0000000000000069 R11: 00000000ffffffff R12: 000000723eac9b0a
<4>[  185.354044] R13: 00007ffe3146d6e8 R14: 0000000000000000 R15: 000000723eac9b1c
<4>[  185.354045] FS:  00007f8f392938c0(0000) GS:ffff88023ec80000(0000) knlGS:0000000000000000
<4>[  185.354045] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[  185.354046] CR2: 00007f8f367c4018 CR3: 00000002307c2000 CR4: 00000000003406e0
<4>[  185.354046] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>[  185.354047] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
<4>[  185.354048] NMI backtrace for cpu 6 skipped: idling at pc 0xffffffff8188dba0
<4>[  185.354065] NMI backtrace for cpu 5 skipped: idling at pc 0xffffffff8188dba0
<4>[  185.354066] NMI backtrace for cpu 1 skipped: idling at pc 0xffffffff8188dba0
<4>[  185.354068] NMI backtrace for cpu 7 skipped: idling at pc 0xffffffff8188dba0
<4>[  185.354069] NMI backtrace for cpu 3 skipped: idling at pc 0xffffffff8188dba0
<0>[  185.355038] Kernel panic - not syncing: hung_task: blocked tasks
<4>[  185.361177] CPU: 0 PID: 66 Comm: khungtaskd Tainted: G     U          4.12.0-next-20170711-CI-next-20170711 #1
<4>[  185.371474] Hardware name: Gigabyte Technology Co., Ltd. Z170X-UD5/Z170X-UD5-CF, BIOS F22 03/06/2017
<4>[  185.380816] Call Trace:
<4>[  185.383321]  dump_stack+0x67/0x97
<4>[  185.386710]  panic+0xcf/0x212
<4>[  185.389777]  ? irq_force_complete_move+0xf0/0xf0
<4>[  185.394511]  watchdog+0x469/0x620
<4>[  185.397883]  kthread+0x10f/0x150
<4>[  185.401199]  ? reset_hung_task_detector+0x10/0x10
<4>[  185.406046]  ? kthread_create_on_node+0x40/0x40
<4>[  185.410682]  ret_from_fork+0x27/0x40
<0>[  185.414378] Kernel Offset: disabled

Any idea what the bug is? Suggested next steps?
Comment 1 Marc Thomas 2017-07-11 13:59:04 UTC
It might be worth checking the Intel 600p M.2 NVMe firmware is up to date.

# smartctl -a /dev/nvme0 | grep "Firmware Version"
Firmware Version:                   PSF121C


I have one in my workstation which I recently updated, so the above version is likely still current. Mine doesn't get disabled on suspend/resume @ kernel 4.11.9.
(However, it's an AMD Ryzen CPU, so different chipset etc. There are other issues on suspend/resume - but it doesn't panic).

PS: Assuming this report is for the Intel 600p M.2 NVMe mentioned here:
https://bugzilla.kernel.org/show_bug.cgi?id=196223
Comment 2 Martin Peres 2017-07-11 14:22:48 UTC
Yes, it is this controller. The firmware was upgraded a couple of weeks ago:

# smartctl -a /dev/nvme0 | grep "Firmware Version"
Firmware Version:                   PSF121C
Comment 3 Keith Busch 2017-07-27 16:34:36 UTC
It looks like the APST feature is causing more trouble for certain devices, and it also looks like the driver is failing to handle this condition correctly.

I had requested a parameter to disable APST when others were enabling this power savings feature, but that didn't happen. I will submit a patch to provide that option.

I also see the nvme initialization is broken if a timeout occurs for both APST and directives enabling... I will submit a patch to fix that as well.

As to why the controller is failing the APST feature enabling in the first place, though, I am not sure.
Comment 4 Keith Busch 2017-07-27 18:07:32 UTC
I posted a patch to fix the stuck work-queue.

It turns out there is already a way to disable APST. Could you give that a try? The kernel parameter to do so should be 'nvme_core.default_ps_max_latency=0'.
Comment 5 Tomi Sarvela 2017-07-31 14:08:16 UTC
I picked up next-20170731 and tested with it. Booted 7700K + NVMe with parameter nvme_core.default_ps_max_latency=0 and started looping gem_exec_suspend@basic-S3.

The behaviour was different than before: I didn't get hard hang right away.

First loops went ok, suspend/resume worked. Then, when (I think) resuming on around tenth loop, I got the 
failed to set APST feature (-4)"

I could Ctrl-C to the command line and look around, but any disk-based command just waited.

Within minute I got the following trace:

kbl-7700k login: [  156.914098] nvme nvme0: failed to set APST feature (-4)
[  196.097915] Buffer I/O error on device nvme0n1p2, logical block 55157764
[  246.774622] INFO: task rs:main Q:Reg:462 blocked for more than 60 seconds.
[  246.785499]       Tainted: G     U          4.13.0-rc3-next-20170731-CI-next-20170731 #1
[  246.794531] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  246.804594] Kernel panic - not syncing: hung_task: blocked tasks
[  246.811421] CPU: 0 PID: 66 Comm: khungtaskd Tainted: G     U          4.13.0-rc3-next-20170731-C1
[  246.822714] Hardware name: Gigabyte Technology Co., Ltd. Z170X-UD5/Z170X-UD5-CF, BIOS F22 03/06/7
[  246.832732] Call Trace:
[  246.835984]  dump_stack+0x68/0x9f
[  246.840075]  panic+0xcf/0x212
[  246.843835]  watchdog+0x460/0x610
[  246.847933]  kthread+0x10f/0x150
[  246.851955]  ? reset_hung_task_detector+0x10/0x10
[  246.857442]  ? kthread_create_on_node+0x40/0x40
[  246.862747]  ret_from_fork+0x27/0x40
[  246.867136] Kernel Offset: disabled
[  246.972943] Rebooting in 1 seconds..
[  247.977326] ACPI MEMORY or I/O RESET_REG.
ERROR: Boot option loading failed
Comment 6 Keith Busch 2017-08-01 07:58:43 UTC
I think you just got random behavior rather than changed anything: the parameter as written was misspelled (my fault), and should be "nvme_core.default_ps_max_latency_us=0".

In any case, the parameter should not be necessary if the drive functions correctly, so I've raised the issue directly with the device's product group. 

I'll also ping the nvme mailing list to see about applying the patch to get the driver to gracefully unbind in this scenario, though the storage device would still be unusable in this scenario.
Comment 7 systeminterrupt 2020-12-13 21:10:05 UTC
This keeps happening on my device and whenever it happens my device stops taking in keyboard and mouse input and does not refresh the screen either. This makes the system unusable for about a minute or so. Funnily enough the mouse pointer moves. Please let me know if I can contribute with diagnostic information. This is what I see in /var/log/syslog:

Dec 13 20:56:31 hostnamemasked kernel: [   83.018757] nvme nvme0: I/O 7 QID 0 timeout, reset controller
Dec 13 20:57:33 hostnamemasked kernel: [  144.507062] nvme nvme0: 4/0/0 default/read/poll queues
....
Dec 13 21:04:04 hostnamemasked kernel: [  535.632765] nvme nvme0: I/O 20 QID 0 timeout, reset controller
Dec 13 21:05:05 hostnamemasked kernel: [  597.113570] nvme nvme0: 4/0/0 default/read/poll queues
Comment 8 Tomi Sarvela 2020-12-14 09:42:12 UTC
(In reply to systeminterrupt from comment #7)
> This keeps happening on my device and whenever it happens my device stops
> taking in keyboard and mouse input and does not refresh the screen either.
> This makes the system unusable for about a minute or so. Funnily enough the
> mouse pointer moves. Please let me know if I can contribute with diagnostic
> information. This is what I see in /var/log/syslog:
> 
> Dec 13 20:56:31 hostnamemasked kernel: [   83.018757] nvme nvme0: I/O 7 QID
> 0 timeout, reset controller
> Dec 13 20:57:33 hostnamemasked kernel: [  144.507062] nvme nvme0: 4/0/0
> default/read/poll queues
> ....
> Dec 13 21:04:04 hostnamemasked kernel: [  535.632765] nvme nvme0: I/O 20 QID
> 0 timeout, reset controller
> Dec 13 21:05:05 hostnamemasked kernel: [  597.113570] nvme nvme0: 4/0/0
> default/read/poll queues

Please create a new bug about this issue. To my eyes it seems that the nvme drive is timeouting IO requests, which might be caused eg. by failing drive (the usual case), slightly bad NVMe connection, or motherboard issues.

My advice would be checking the SMART values from the drive (with 'nvme' tool) and if there is anything suspicious, replace the drive. Suspect values are high error counts (should be close to 0) or available spare going towards 0%.
Comment 9 Keith Busch 2020-12-14 17:17:53 UTC
This doesn't look related to the bugzilla.

Strangely, it's an admin command timing out, and it's not happening during controller init. I'd be curious to know what the command is and where it's coming from. Do you happen to know what's submitting this admin command? If you're not sure and want to find out, you could do something like the following (as root, and assuming your kernel has ftrace enabled):

  # echo “qid == 0” > /sys/kernel/debug/tracing/events/nvme/filter
  # echo 1 > /sys/kernel/debug/tracing/events/nvme/enable
 
Then after observing the timeout message, you can get the trace like:
 
  # cat /sys/kernel/debug/tracing/trace