** Disclaimer: I'm not sure if I should report this against blk-mq or virtio-blk, given that the function in question (blk_mq_stop_hw_queues()) is not called across the entire tree anywhere else but virtio-blk: virtblk_freeze() [drivers/block/virtio_blk.c] blk_mq_stop_hw_queues() [block/blk-mq.c] ** With this disclaimer out of the way, here's the problem: When Fedora 19 upgraded its kernel to 3.13, my qemu-kvm virtual machine using the 3.13 guest kernel ceased to be suspendable (as in, S3). I filed a detailed problem report in this public RHBZ, with symptoms and reproducer steps: https://bugzilla.redhat.com/show_bug.cgi?id=1074235 The problem only manifests if the virtual machine: - uses more than 1 VCPU - uses at least one virtio-blk device. Bisection of the *upstream* stable kernel fingered commit commit 1cf7e9c68fe84248174e998922b39e508375e7c1 Author: Jens Axboe <axboe@kernel.dk> Date: Fri Nov 1 10:52:52 2013 -0600 virtio_blk: blk-mq support The issue persists until at least 3.15.0-0.rc0.git8.1.fc21 <http://koji.fedoraproject.org/koji/buildinfo?buildID=508873>, which seems to correspond to Linux v3.14-7247-gcd6362befe4c: commit cd6362befe4cc7bf589a5236d2a780af2d47bcc9 Merge: 0f1b1e6 b1586f0 Author: Linus Torvalds <torvalds@linux-foundation.org> Date: Wed Apr 2 20:53:45 2014 -0700 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next I decided to report the bug also here (ie. in the kernel bugzilla) *only* after I realized that nothing but virtio-blk combines S3 (ie. virtblk_freeze()) with blk-mq (ie. blk_mq_stop_hw_queues()). This could mean that the conversion of virtio-blk to blk-mq (in commit 1cf7e9c6), esp. the conversion of virtblk_freeze(), was incorrect; *or* it could mean that the blk_mq_stop_hw_queues() function (added in commit 280d45f6) was. Thank you.
Thanks for the detailed bug report, and taking the time to bisect this. I'll try and reproduce this and come up with a fix for it, as I don't immediately see what could be causing this. I do have a few theories on the cpu hot unplug being problematic. I'll come back with more info as soon as I have it.
Created attachment 131451 [details] Reset percpu counters for CPU_DEAD_FROZEN
Please try this patch. You wont believe how much of this day it took me to track this down... Basically it's a bug in the percpu_counter library. If the cpu isn't online anymore, then we need to ensure that the per-cpu part is cleared out and added to the general counter. Without responding to CPU_DEAD_FROZEN, we'll leave it in a state where the CPU isn't in the online mask anymore, but we haven't moved it's private state to the percpu summed count yet. I hope this fixes it for you.
Ah, this is beautiful. Quite the archeological find! The _FROZEN variants of the CPU hotplug events (== "occuring while tasks are frozen due to a suspend operation in progress") were introduced in commit 8bb7844286fb8c9fce6f65d8288aeb09d03a5e0d Author: Rafael J. Wysocki <rjw@sisk.pl> Date: Wed May 9 02:35:10 2007 -0700 Add suspend-related notifications for CPU hotplug Later, function percpu_counter_hotcpu_callback() was introduced in commit c67ad917cbf21b2862e2cf8e8b28339872ef7927 Author: Andrew Morton <akpm@linux-foundation.org> Date: Sun Jul 15 23:39:51 2007 -0700 percpu_counters(): use cpu notifiers The first commit (included in v2.6.22-rc1) did locate all references to CPU_DEAD, and extended them to CPU_DEAD_FROZEN as well. It updated "Documentation/cpu-hotplug.txt" too. Alas, the second commit (included in v2.6.23-rc1) introduced percpu_counter_hotcpu_callback() heeding only CPU_DEAD. Similarly to the first commit cited, your patch replaces !(action == CPU_DEAD) with !(action == CPU_DEAD || action == CPU_DEAD_FROZEN) (In reply to Jens Axboe from comment #2) > Created attachment 131451 [details] > Reset percpu counters for CPU_DEAD_FROZEN Reproduced problem with upstream tree at 3c83e61e67256e0bb08c46cc2db43b58fd617251. Applied proposed fix on top, and retested. The patch works: [ 88.182727] PM: Syncing filesystems ... done. [ 88.243650] PM: Preparing system for mem sleep [ 88.643711] Freezing user space processes ... (elapsed 0.002 seconds) done. [ 88.647078] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done. [ 88.649939] PM: Entering mem sleep [ 88.897396] PM: suspend of devices complete after 245.633 msecs [ 88.898618] PM: suspend devices took 0.247 seconds [ 88.903673] PM: late suspend of devices complete after 4.050 msecs [ 88.908523] PM: noirq suspend of devices complete after 3.714 msecs [ 88.909743] ACPI: Preparing to enter system sleep state S3 [ 88.911351] PM: Saving platform NVS memory [ 88.916503] Disabling non-boot CPUs ... [ 88.918188] Unregister pv shared memory for cpu 1 [ 88.926024] smpboot: CPU 1 is now offline [ 88.932892] Unregister pv shared memory for cpu 2 [ 88.940975] smpboot: CPU 2 is now offline [ 88.944127] Unregister pv shared memory for cpu 3 [ 88.946962] Broke affinity for irq 1 [ 88.947554] Broke affinity for irq 10 [ 88.947554] Broke affinity for irq 12 [ 88.947554] Broke affinity for irq 15 [ 88.950919] smpboot: CPU 3 is now offline Virt-manager displayed "Suspended". I was able to resume the VM as well. Suspending and resuming the VM several times in sequence works too. Tested-by: Laszlo Ersek <lersek@redhat.com> Greatly appreciated!
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=e39435ce68bb4685288f78b1a7e24311f7ef939f