Bug 73501 (mq_vblk_s3)
Summary: | blk-mq broke PM suspend in virtio-blk -- virtual machine hangs mid-suspend | ||
---|---|---|---|
Product: | IO/Storage | Reporter: | Laszlo Ersek (laszlo.ersek) |
Component: | Block Layer | Assignee: | Jens Axboe (axboe) |
Status: | RESOLVED CODE_FIX | ||
Severity: | normal | CC: | hch |
Priority: | P1 | ||
Hardware: | x86-64 | ||
OS: | Linux | ||
Kernel Version: | 3.13 and onward | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Attachments: | Reset percpu counters for CPU_DEAD_FROZEN |
Description
Laszlo Ersek
2014-04-04 00:44:24 UTC
Thanks for the detailed bug report, and taking the time to bisect this. I'll try and reproduce this and come up with a fix for it, as I don't immediately see what could be causing this. I do have a few theories on the cpu hot unplug being problematic. I'll come back with more info as soon as I have it. Created attachment 131451 [details]
Reset percpu counters for CPU_DEAD_FROZEN
Please try this patch. You wont believe how much of this day it took me to track this down... Basically it's a bug in the percpu_counter library. If the cpu isn't online anymore, then we need to ensure that the per-cpu part is cleared out and added to the general counter. Without responding to CPU_DEAD_FROZEN, we'll leave it in a state where the CPU isn't in the online mask anymore, but we haven't moved it's private state to the percpu summed count yet. I hope this fixes it for you. Ah, this is beautiful. Quite the archeological find! The _FROZEN variants of the CPU hotplug events (== "occuring while tasks are frozen due to a suspend operation in progress") were introduced in commit 8bb7844286fb8c9fce6f65d8288aeb09d03a5e0d Author: Rafael J. Wysocki <rjw@sisk.pl> Date: Wed May 9 02:35:10 2007 -0700 Add suspend-related notifications for CPU hotplug Later, function percpu_counter_hotcpu_callback() was introduced in commit c67ad917cbf21b2862e2cf8e8b28339872ef7927 Author: Andrew Morton <akpm@linux-foundation.org> Date: Sun Jul 15 23:39:51 2007 -0700 percpu_counters(): use cpu notifiers The first commit (included in v2.6.22-rc1) did locate all references to CPU_DEAD, and extended them to CPU_DEAD_FROZEN as well. It updated "Documentation/cpu-hotplug.txt" too. Alas, the second commit (included in v2.6.23-rc1) introduced percpu_counter_hotcpu_callback() heeding only CPU_DEAD. Similarly to the first commit cited, your patch replaces !(action == CPU_DEAD) with !(action == CPU_DEAD || action == CPU_DEAD_FROZEN) (In reply to Jens Axboe from comment #2) > Created attachment 131451 [details] > Reset percpu counters for CPU_DEAD_FROZEN Reproduced problem with upstream tree at 3c83e61e67256e0bb08c46cc2db43b58fd617251. Applied proposed fix on top, and retested. The patch works: [ 88.182727] PM: Syncing filesystems ... done. [ 88.243650] PM: Preparing system for mem sleep [ 88.643711] Freezing user space processes ... (elapsed 0.002 seconds) done. [ 88.647078] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done. [ 88.649939] PM: Entering mem sleep [ 88.897396] PM: suspend of devices complete after 245.633 msecs [ 88.898618] PM: suspend devices took 0.247 seconds [ 88.903673] PM: late suspend of devices complete after 4.050 msecs [ 88.908523] PM: noirq suspend of devices complete after 3.714 msecs [ 88.909743] ACPI: Preparing to enter system sleep state S3 [ 88.911351] PM: Saving platform NVS memory [ 88.916503] Disabling non-boot CPUs ... [ 88.918188] Unregister pv shared memory for cpu 1 [ 88.926024] smpboot: CPU 1 is now offline [ 88.932892] Unregister pv shared memory for cpu 2 [ 88.940975] smpboot: CPU 2 is now offline [ 88.944127] Unregister pv shared memory for cpu 3 [ 88.946962] Broke affinity for irq 1 [ 88.947554] Broke affinity for irq 10 [ 88.947554] Broke affinity for irq 12 [ 88.947554] Broke affinity for irq 15 [ 88.950919] smpboot: CPU 3 is now offline Virt-manager displayed "Suspended". I was able to resume the VM as well. Suspending and resuming the VM several times in sequence works too. Tested-by: Laszlo Ersek <lersek@redhat.com> Greatly appreciated! |