Bug 196325 - Resume issue: "nvme nvme0: I/O 18 QID 0 timeout, disable controller"
Summary: Resume issue: "nvme nvme0: I/O 18 QID 0 timeout, disable controller"
Status: NEW
Alias: None
Product: IO/Storage
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: io_other
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-07-11 13:12 UTC by Martin Peres
Modified: 2018-08-12 10:20 UTC (History)
4 users (show)

See Also:
Kernel Version: Linux-next-20170710
Tree: Mainline
Regression: No


Attachments

Description Martin Peres 2017-07-11 13:12:45 UTC
On Intel GFX CI's kbl-7700k, we have been hitting this issue once every other suspend-resume cycle:

<4>[  175.090115] nvme nvme0: I/O 18 QID 0 timeout, disable controller
<3>[  175.213259] nvme nvme0: failed to set APST feature (-4)
<3>[  185.329984] INFO: task rs:main Q:Reg:519 blocked for more than 60 seconds.
<3>[  185.337698]       Tainted: G     U          4.12.0-next-20170711-CI-next-20170711 #1
<3>[  185.345655] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
<6>[  185.353722] rs:main Q:Reg   D    0   519      1 0x00000000
<4>[  185.353724] Call Trace:
<4>[  185.353743]  __schedule+0x405/0xb40
<4>[  185.353746]  ? bit_wait+0x50/0x50
<4>[  185.353747]  schedule+0x3b/0x90
<4>[  185.353750]  io_schedule+0x11/0x40
<4>[  185.353753]  bit_wait_io+0xc/0x50
<4>[  185.353755]  __wait_on_bit+0x5c/0x90
<4>[  185.353758]  out_of_line_wait_on_bit+0x7c/0x80
<4>[  185.353760]  ? bit_waitqueue+0x30/0x30
<4>[  185.353762]  do_get_write_access+0x1e8/0x660
<4>[  185.353765]  jbd2_journal_get_write_access+0x2e/0x60
<4>[  185.353767]  __ext4_journal_get_write_access+0x62/0xb0
<4>[  185.353769]  ? ext4_dirty_inode+0x43/0x60
<4>[  185.353771]  ext4_reserve_inode_write+0x90/0xb0
<4>[  185.353773]  ext4_mark_inode_dirty+0x69/0x2e0
<4>[  185.353775]  ext4_dirty_inode+0x43/0x60
<4>[  185.353778]  __mark_inode_dirty+0x78/0x710
<4>[  185.353781]  generic_update_time+0x73/0xc0
<4>[  185.353783]  ? mnt_clone_write+0x39/0x60
<4>[  185.353785]  file_update_time+0x90/0xe0
<4>[  185.353788]  __generic_file_write_iter+0x97/0x1e0
<4>[  185.353791]  ext4_file_write_iter+0xf4/0x390
<4>[  185.353795]  __vfs_write+0xc1/0x130
<4>[  185.353798]  vfs_write+0xc5/0x1d0
<4>[  185.353800]  ? mutex_lock_nested+0x16/0x20
<4>[  185.353802]  SyS_write+0x44/0xb0
<4>[  185.353804]  entry_SYSCALL_64_fastpath+0x1c/0xb1
<4>[  185.353806] RIP: 0033:0x7f02b461068d
<4>[  185.353807] RSP: 002b:00007f02b19c25c0 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
<4>[  185.353809] RAX: ffffffffffffffda RBX: ffffffff8146e023 RCX: 00007f02b461068d
<4>[  185.353810] RDX: 00000000000000d1 RSI: 00007f02a8002fb0 RDI: 0000000000000005
<4>[  185.353811] RBP: ffffc900009cbf88 R08: 00000081b0e51e30 R09: 0000000000000003
<4>[  185.353812] R10: 00000081b0e50db0 R11: 0000000000000293 R12: 00000081b0e50ce0
<4>[  185.353813] R13: 00007f02b19c25e0 R14: 00007f02a8002cf0 R15: 00007f02a8002fb0
<4>[  185.353823]  ? __this_cpu_preempt_check+0x13/0x20
<4>[  185.353827]
<4>[  185.353827] Showing all locks held in the system:
<4>[  185.353834] 2 locks held by khungtaskd/66:
<4>[  185.353845]  #0:  (rcu_read_lock){......}, at: [<ffffffff81138c2c>] watchdog+0x9c/0x620
<4>[  185.353851]  #1:  (tasklist_lock){.+.+..}, at: [<ffffffff810d76ad>] debug_show_all_locks+0x3d/0x1a0
<4>[  185.353857] 1 lock held by khugepaged/71:
<4>[  185.353857]  #0:  (&mm->mmap_sem){++++++}, at: [<ffffffff811f2fea>] khugepaged+0x49a/0x2360
<4>[  185.353865] 4 locks held by rs:main Q:Reg/519:
<4>[  185.353866]  #0:  (&f->f_pos_lock){+.+.+.}, at: [<ffffffff8122a59e>] __fdget_pos+0x3e/0x50
<4>[  185.353871]  #1:  (sb_writers#3){.+.+.+}, at: [<ffffffff81206cfb>] vfs_write+0x17b/0x1d0
<4>[  185.353880]  #2:  (&sb->s_type->i_mutex_key#14){+.+.+.}, at: [<ffffffff8129e9f4>] ext4_file_write_iter+0x54/0x390
<4>[  185.353889]  #3:  (jbd2_handle){++++..}, at: [<ffffffff812ea0ff>] start_this_handle+0xff/0x440
<4>[  185.353897] 1 lock held by java/1073:
<4>[  185.353898]  #0:  (&mm->mmap_sem){++++++}, at: [<ffffffff8104b8d1>] __do_page_fault+0x491/0x560
<4>[  185.353905] 1 lock held by java/1170:
<4>[  185.353906]  #0:  (&mm->mmap_sem){++++++}, at: [<ffffffff811a3db1>] vm_mmap_pgoff+0x51/0xd0
<4>[  185.353911] 3 locks held by java/1171:
<4>[  185.353912]  #0:  (&mm->mmap_sem){++++++}, at: [<ffffffff8104b545>] __do_page_fault+0x105/0x560
<4>[  185.353917]  #1:  (sb_pagefaults){.+.+..}, at: [<ffffffff812b5d8d>] ext4_page_mkwrite+0x4d/0x4b0
<4>[  185.353922]  #2:  (jbd2_handle){++++..}, at: [<ffffffff812ea0ff>] start_this_handle+0xff/0x440
<4>[  185.353927] 1 lock held by java/2422:
<4>[  185.353928]  #0:  (&mm->mmap_sem){++++++}, at: [<ffffffff811a3db1>] vm_mmap_pgoff+0x51/0xd0
<4>[  185.353937] 6 locks held by kworker/u16:59/1553:
<4>[  185.353937]  #0:  ("writeback"){.+.+.+}, at: [<ffffffff8109c208>] process_one_work+0x178/0x670
<4>[  185.353943]  #1:  ((&(&wb->dwork)->work)){+.+.+.}, at: [<ffffffff8109c208>] process_one_work+0x178/0x670
<4>[  185.353948]  #2:  (&type->s_umount_key#33){++++++}, at: [<ffffffff8120ace6>] trylock_super+0x16/0x50
<4>[  185.353953]  #3:  (&sbi->s_journal_flag_rwsem){.+.+.+}, at: [<ffffffff8118e137>] do_writepages+0x17/0x70
<4>[  185.353959]  #4:  (jbd2_handle){++++..}, at: [<ffffffff812ea0ff>] start_this_handle+0xff/0x440
<4>[  185.353963]  #5:  (&ei->i_data_sem){++++..}, at: [<ffffffff812ad6a6>] ext4_map_blocks+0x126/0x5b0
<4>[  185.353970] 2 locks held by kworker/u16:73/1567:
<4>[  185.353971]  #0:  ("nvme-wq"){.+.+.+}, at: [<ffffffff8109c208>] process_one_work+0x178/0x670
<4>[  185.353976]  #1:  ((&dev->ctrl.reset_work)){+.+.+.}, at: [<ffffffff8109c208>] process_one_work+0x178/0x670
<4>[  185.353982] 1 lock held by python3/2697:
<4>[  185.353983]  #0:  (&sbi->s_journal_flag_rwsem){.+.+.+}, at: [<ffffffff8118e137>] do_writepages+0x17/0x70
<4>[  185.353989] 3 locks held by dhclient/2736:
<4>[  185.353990]  #0:  (sb_writers#3){.+.+.+}, at: [<ffffffff81206cfb>] vfs_write+0x17b/0x1d0
<4>[  185.353996]  #1:  (&sb->s_type->i_mutex_key#14){+.+.+.}, at: [<ffffffff8129e9f4>] ext4_file_write_iter+0x54/0x390
<4>[  185.354002]  #2:  (jbd2_handle){++++..}, at: [<ffffffff812ea0ff>] start_this_handle+0xff/0x440
<4>[  185.354007]
<4>[  185.354008] =============================================
<4>[  185.354008]
<4>[  185.354009] NMI backtrace for cpu 0
<4>[  185.354011] CPU: 0 PID: 66 Comm: khungtaskd Tainted: G     U          4.12.0-next-20170711-CI-next-20170711 #1
<4>[  185.354012] Hardware name: Gigabyte Technology Co., Ltd. Z170X-UD5/Z170X-UD5-CF, BIOS F22 03/06/2017
<4>[  185.354013] Call Trace:
<4>[  185.354014]  dump_stack+0x67/0x97
<4>[  185.354016]  nmi_cpu_backtrace+0xbf/0xd0
<4>[  185.354018]  ? irq_force_complete_move+0xf0/0xf0
<4>[  185.354020]  nmi_trigger_cpumask_backtrace+0xc0/0x100
<4>[  185.354022]  arch_trigger_cpumask_backtrace+0x14/0x20
<4>[  185.354023]  watchdog+0x45d/0x620
<4>[  185.354025]  kthread+0x10f/0x150
<4>[  185.354026]  ? reset_hung_task_detector+0x10/0x10
<4>[  185.354028]  ? kthread_create_on_node+0x40/0x40
<4>[  185.354029]  ret_from_fork+0x27/0x40
<6>[  185.354033] Sending NMI from CPU 0 to CPUs 1-7:
<4>[  185.354037] NMI backtrace for cpu 4 skipped: idling at pc 0xffffffff8188dba0
<4>[  185.354038] NMI backtrace for cpu 2
<4>[  185.354039] CPU: 2 PID: 231 Comm: systemd-journal Tainted: G     U          4.12.0-next-20170711-CI-next-20170711 #1
<4>[  185.354039] Hardware name: Gigabyte Technology Co., Ltd. Z170X-UD5/Z170X-UD5-CF, BIOS F22 03/06/2017
<4>[  185.354039] task: ffff880230e45040 task.stack: ffffc900005d8000
<4>[  185.354040] RIP: 0033:0x7f8f384beffb
<4>[  185.354041] RSP: 002b:00007ffe3146d050 EFLAGS: 00000206
<4>[  185.354042] RAX: 000000723eac9b1c RBX: 00007ffe3146d5b0 RCX: 00007f8f3882f4c0
<4>[  185.354042] RDX: 00007f8f3882f4c0 RSI: 00007f8f3882e8c0 RDI: 000000723eac9b1c
<4>[  185.354043] RBP: 00007ffe3146d5a0 R08: 0000000000000000 R09: 0000000000000011
<4>[  185.354043] R10: 0000000000000069 R11: 00000000ffffffff R12: 000000723eac9b0a
<4>[  185.354044] R13: 00007ffe3146d6e8 R14: 0000000000000000 R15: 000000723eac9b1c
<4>[  185.354045] FS:  00007f8f392938c0(0000) GS:ffff88023ec80000(0000) knlGS:0000000000000000
<4>[  185.354045] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[  185.354046] CR2: 00007f8f367c4018 CR3: 00000002307c2000 CR4: 00000000003406e0
<4>[  185.354046] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>[  185.354047] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
<4>[  185.354048] NMI backtrace for cpu 6 skipped: idling at pc 0xffffffff8188dba0
<4>[  185.354065] NMI backtrace for cpu 5 skipped: idling at pc 0xffffffff8188dba0
<4>[  185.354066] NMI backtrace for cpu 1 skipped: idling at pc 0xffffffff8188dba0
<4>[  185.354068] NMI backtrace for cpu 7 skipped: idling at pc 0xffffffff8188dba0
<4>[  185.354069] NMI backtrace for cpu 3 skipped: idling at pc 0xffffffff8188dba0
<0>[  185.355038] Kernel panic - not syncing: hung_task: blocked tasks
<4>[  185.361177] CPU: 0 PID: 66 Comm: khungtaskd Tainted: G     U          4.12.0-next-20170711-CI-next-20170711 #1
<4>[  185.371474] Hardware name: Gigabyte Technology Co., Ltd. Z170X-UD5/Z170X-UD5-CF, BIOS F22 03/06/2017
<4>[  185.380816] Call Trace:
<4>[  185.383321]  dump_stack+0x67/0x97
<4>[  185.386710]  panic+0xcf/0x212
<4>[  185.389777]  ? irq_force_complete_move+0xf0/0xf0
<4>[  185.394511]  watchdog+0x469/0x620
<4>[  185.397883]  kthread+0x10f/0x150
<4>[  185.401199]  ? reset_hung_task_detector+0x10/0x10
<4>[  185.406046]  ? kthread_create_on_node+0x40/0x40
<4>[  185.410682]  ret_from_fork+0x27/0x40
<0>[  185.414378] Kernel Offset: disabled

Any idea what the bug is? Suggested next steps?
Comment 1 Marc Thomas 2017-07-11 13:59:04 UTC
It might be worth checking the Intel 600p M.2 NVMe firmware is up to date.

# smartctl -a /dev/nvme0 | grep "Firmware Version"
Firmware Version:                   PSF121C


I have one in my workstation which I recently updated, so the above version is likely still current. Mine doesn't get disabled on suspend/resume @ kernel 4.11.9.
(However, it's an AMD Ryzen CPU, so different chipset etc. There are other issues on suspend/resume - but it doesn't panic).

PS: Assuming this report is for the Intel 600p M.2 NVMe mentioned here:
https://bugzilla.kernel.org/show_bug.cgi?id=196223
Comment 2 Martin Peres 2017-07-11 14:22:48 UTC
Yes, it is this controller. The firmware was upgraded a couple of weeks ago:

# smartctl -a /dev/nvme0 | grep "Firmware Version"
Firmware Version:                   PSF121C
Comment 3 Keith Busch 2017-07-27 16:34:36 UTC
It looks like the APST feature is causing more trouble for certain devices, and it also looks like the driver is failing to handle this condition correctly.

I had requested a parameter to disable APST when others were enabling this power savings feature, but that didn't happen. I will submit a patch to provide that option.

I also see the nvme initialization is broken if a timeout occurs for both APST and directives enabling... I will submit a patch to fix that as well.

As to why the controller is failing the APST feature enabling in the first place, though, I am not sure.
Comment 4 Keith Busch 2017-07-27 18:07:32 UTC
I posted a patch to fix the stuck work-queue.

It turns out there is already a way to disable APST. Could you give that a try? The kernel parameter to do so should be 'nvme_core.default_ps_max_latency=0'.
Comment 5 Tomi Sarvela 2017-07-31 14:08:16 UTC
I picked up next-20170731 and tested with it. Booted 7700K + NVMe with parameter nvme_core.default_ps_max_latency=0 and started looping gem_exec_suspend@basic-S3.

The behaviour was different than before: I didn't get hard hang right away.

First loops went ok, suspend/resume worked. Then, when (I think) resuming on around tenth loop, I got the 
failed to set APST feature (-4)"

I could Ctrl-C to the command line and look around, but any disk-based command just waited.

Within minute I got the following trace:

kbl-7700k login: [  156.914098] nvme nvme0: failed to set APST feature (-4)
[  196.097915] Buffer I/O error on device nvme0n1p2, logical block 55157764
[  246.774622] INFO: task rs:main Q:Reg:462 blocked for more than 60 seconds.
[  246.785499]       Tainted: G     U          4.13.0-rc3-next-20170731-CI-next-20170731 #1
[  246.794531] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  246.804594] Kernel panic - not syncing: hung_task: blocked tasks
[  246.811421] CPU: 0 PID: 66 Comm: khungtaskd Tainted: G     U          4.13.0-rc3-next-20170731-C1
[  246.822714] Hardware name: Gigabyte Technology Co., Ltd. Z170X-UD5/Z170X-UD5-CF, BIOS F22 03/06/7
[  246.832732] Call Trace:
[  246.835984]  dump_stack+0x68/0x9f
[  246.840075]  panic+0xcf/0x212
[  246.843835]  watchdog+0x460/0x610
[  246.847933]  kthread+0x10f/0x150
[  246.851955]  ? reset_hung_task_detector+0x10/0x10
[  246.857442]  ? kthread_create_on_node+0x40/0x40
[  246.862747]  ret_from_fork+0x27/0x40
[  246.867136] Kernel Offset: disabled
[  246.972943] Rebooting in 1 seconds..
[  247.977326] ACPI MEMORY or I/O RESET_REG.
ERROR: Boot option loading failed
Comment 6 Keith Busch 2017-08-01 07:58:43 UTC
I think you just got random behavior rather than changed anything: the parameter as written was misspelled (my fault), and should be "nvme_core.default_ps_max_latency_us=0".

In any case, the parameter should not be necessary if the drive functions correctly, so I've raised the issue directly with the device's product group. 

I'll also ping the nvme mailing list to see about applying the patch to get the driver to gracefully unbind in this scenario, though the storage device would still be unusable in this scenario.

Note You need to log in before you can comment on or make changes to this bug.