Bug 73501 (mq_vblk_s3) - blk-mq broke PM suspend in virtio-blk -- virtual machine hangs mid-suspend
Summary: blk-mq broke PM suspend in virtio-blk -- virtual machine hangs mid-suspend
Status: RESOLVED CODE_FIX
Alias: mq_vblk_s3
Product: IO/Storage
Classification: Unclassified
Component: Block Layer (show other bugs)
Hardware: x86-64 Linux
: P1 normal
Assignee: Jens Axboe
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-04-04 00:44 UTC by Laszlo Ersek
Modified: 2014-04-09 13:52 UTC (History)
1 user (show)

See Also:
Kernel Version: 3.13 and onward
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
Reset percpu counters for CPU_DEAD_FROZEN (430 bytes, patch)
2014-04-04 20:42 UTC, Jens Axboe
Details | Diff

Description Laszlo Ersek 2014-04-04 00:44:24 UTC
** Disclaimer:

I'm not sure if I should report this against blk-mq or virtio-blk, given that the function in question (blk_mq_stop_hw_queues()) is not called across the entire tree anywhere else but virtio-blk:

virtblk_freeze() [drivers/block/virtio_blk.c]
  blk_mq_stop_hw_queues() [block/blk-mq.c]

** With this disclaimer out of the way, here's the problem:

When Fedora 19 upgraded its kernel to 3.13, my qemu-kvm virtual machine using the 3.13 guest kernel ceased to be suspendable (as in, S3).

I filed a detailed problem report in this public RHBZ, with symptoms and reproducer steps:

  https://bugzilla.redhat.com/show_bug.cgi?id=1074235

The problem only manifests if the virtual machine:
- uses more than 1 VCPU
- uses at least one virtio-blk device.

Bisection of the *upstream* stable kernel fingered commit

  commit 1cf7e9c68fe84248174e998922b39e508375e7c1
  Author: Jens Axboe <axboe@kernel.dk>
  Date:   Fri Nov 1 10:52:52 2013 -0600

      virtio_blk: blk-mq support

The issue persists until at least 3.15.0-0.rc0.git8.1.fc21 <http://koji.fedoraproject.org/koji/buildinfo?buildID=508873>, which seems to correspond to Linux v3.14-7247-gcd6362befe4c:

  commit cd6362befe4cc7bf589a5236d2a780af2d47bcc9
  Merge: 0f1b1e6 b1586f0
  Author: Linus Torvalds <torvalds@linux-foundation.org>
  Date:   Wed Apr 2 20:53:45 2014 -0700

      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next

I decided to report the bug also here (ie. in the kernel bugzilla) *only* after I realized that nothing but virtio-blk combines S3 (ie. virtblk_freeze()) with blk-mq (ie. blk_mq_stop_hw_queues()).

This could mean that the conversion of virtio-blk to blk-mq (in commit 1cf7e9c6), esp. the conversion of virtblk_freeze(), was incorrect; *or* it could mean that the blk_mq_stop_hw_queues() function (added in commit 280d45f6) was.

Thank you.
Comment 1 Jens Axboe 2014-04-04 02:12:27 UTC
Thanks for the detailed bug report, and taking the time to bisect this. I'll try and reproduce this and come up with a fix for it, as I don't immediately see what could be causing this. I do have a few theories on the cpu hot unplug being problematic. I'll come back with more info as soon as I have it.
Comment 2 Jens Axboe 2014-04-04 20:42:59 UTC
Created attachment 131451 [details]
Reset percpu counters for CPU_DEAD_FROZEN
Comment 3 Jens Axboe 2014-04-04 20:44:55 UTC
Please try this patch. You wont believe how much of this day it took me to track this down... Basically it's a bug in the percpu_counter library. If the cpu isn't online anymore, then we need to ensure that the per-cpu part is cleared out and added to the general counter. Without responding to CPU_DEAD_FROZEN, we'll leave it in a state where the CPU isn't in the online mask anymore, but we haven't moved it's private state to the percpu summed count yet.

I hope this fixes it for you.
Comment 4 Laszlo Ersek 2014-04-04 22:21:18 UTC
Ah, this is beautiful. Quite the archeological find!

The _FROZEN variants of the CPU hotplug events (== "occuring while tasks are
frozen due to a suspend operation in progress") were introduced in

  commit 8bb7844286fb8c9fce6f65d8288aeb09d03a5e0d
  Author: Rafael J. Wysocki <rjw@sisk.pl>
  Date:   Wed May 9 02:35:10 2007 -0700

      Add suspend-related notifications for CPU hotplug

Later, function percpu_counter_hotcpu_callback() was introduced in

  commit c67ad917cbf21b2862e2cf8e8b28339872ef7927
  Author: Andrew Morton <akpm@linux-foundation.org>
  Date:   Sun Jul 15 23:39:51 2007 -0700

      percpu_counters(): use cpu notifiers

The first commit (included in v2.6.22-rc1) did locate all references to
CPU_DEAD, and extended them to CPU_DEAD_FROZEN as well. It updated
"Documentation/cpu-hotplug.txt" too.

Alas, the second commit (included in v2.6.23-rc1) introduced
percpu_counter_hotcpu_callback() heeding only CPU_DEAD.

Similarly to the first commit cited, your patch replaces

  !(action == CPU_DEAD)

with

  !(action == CPU_DEAD || action == CPU_DEAD_FROZEN)

(In reply to Jens Axboe from comment #2)
> Created attachment 131451 [details]
> Reset percpu counters for CPU_DEAD_FROZEN

Reproduced problem with upstream tree at
3c83e61e67256e0bb08c46cc2db43b58fd617251.

Applied proposed fix on top, and retested. The patch works:

[   88.182727] PM: Syncing filesystems ... done.
[   88.243650] PM: Preparing system for mem sleep
[   88.643711] Freezing user space processes ... (elapsed 0.002 seconds)
               done.
[   88.647078] Freezing remaining freezable tasks ... (elapsed 0.001
               seconds) done.
[   88.649939] PM: Entering mem sleep
[   88.897396] PM: suspend of devices complete after 245.633 msecs
[   88.898618] PM: suspend devices took 0.247 seconds
[   88.903673] PM: late suspend of devices complete after 4.050 msecs
[   88.908523] PM: noirq suspend of devices complete after 3.714 msecs
[   88.909743] ACPI: Preparing to enter system sleep state S3
[   88.911351] PM: Saving platform NVS memory
[   88.916503] Disabling non-boot CPUs ...
[   88.918188] Unregister pv shared memory for cpu 1
[   88.926024] smpboot: CPU 1 is now offline
[   88.932892] Unregister pv shared memory for cpu 2
[   88.940975] smpboot: CPU 2 is now offline
[   88.944127] Unregister pv shared memory for cpu 3
[   88.946962] Broke affinity for irq 1
[   88.947554] Broke affinity for irq 10
[   88.947554] Broke affinity for irq 12
[   88.947554] Broke affinity for irq 15
[   88.950919] smpboot: CPU 3 is now offline

Virt-manager displayed "Suspended". I was able to resume the VM as well.

Suspending and resuming the VM several times in sequence works too.

Tested-by: Laszlo Ersek <lersek@redhat.com>

Greatly appreciated!

Note You need to log in before you can comment on or make changes to this bug.