[ 7496.552971] INFO: task qemu-system-x86:5978 blocked for more than 120 seconds. [ 7496.552987] Tainted: G I 4.14.0-0.rc1.git3.1.fc28.x86_64 #1 [ 7496.552996] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 7496.553006] qemu-system-x86 D12240 5978 1 0x00000004 [ 7496.553024] Call Trace: [ 7496.553044] __schedule+0x2dc/0xbb0 [ 7496.553055] ? trace_hardirqs_on+0xd/0x10 [ 7496.553074] schedule+0x3d/0x90 [ 7496.553087] vhost_net_ubuf_put_and_wait+0x73/0xa0 [vhost_net] [ 7496.553100] ? finish_wait+0x90/0x90 [ 7496.553115] vhost_net_ioctl+0x542/0x910 [vhost_net] [ 7496.553144] do_vfs_ioctl+0xa6/0x6c0 [ 7496.553166] SyS_ioctl+0x79/0x90 [ 7496.553182] entry_SYSCALL_64_fastpath+0x1f/0xbe [ 7496.553190] RIP: 0033:0x7fa1ea0e1817 [ 7496.553196] RSP: 002b:00007ffe3d854bc8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 7496.553209] RAX: ffffffffffffffda RBX: 000000000000001d RCX: 00007fa1ea0e1817 [ 7496.553215] RDX: 00007ffe3d854bd0 RSI: 000000004008af30 RDI: 0000000000000021 [ 7496.553222] RBP: 000055e33352b610 R08: 000055e33024a6f0 R09: 000055e330245d92 [ 7496.553228] R10: 000055e33344e7f0 R11: 0000000000000246 R12: 000055e33351a000 [ 7496.553235] R13: 0000000000000001 R14: 0000000400000000 R15: 0000000000000000 [ 7496.553284] Showing all locks held in the system: [ 7496.553313] 1 lock held by khungtaskd/161: [ 7496.553319] #0: (tasklist_lock){.+.+}, at: [<ffffffff8111740d>] debug_show_all_locks+0x3d/0x1a0 [ 7496.553373] 1 lock held by in:imklog/1194: [ 7496.553379] #0: (&f->f_pos_lock){+.+.}, at: [<ffffffff8130ecfc>] __fdget_pos+0x4c/0x60 [ 7496.553541] 1 lock held by qemu-system-x86/5978: [ 7496.553547] #0: (&dev->mutex#3){+.+.}, at: [<ffffffffc077e498>] vhost_net_ioctl+0x358/0x910 [vhost_net] I'm currently bisecting to figure out which commit breaks this but for some reasons, when hitting this commit: # bad: [46d4b68f891bee5d83a32508bfbd9778be6b1b63] Merge tag 'wireless-drivers-next-for-davem-2017-08-07' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next git bisect bad 46d4b68f891bee5d83a32508bfbd9778be6b1b63 the host will not allow SSHd to establish a new session and when starting a KVM guest, the host will hard lock. I'm still bisecting but I marked that commit as bad even though perhaps it would be good.
The same happens with v4.15.0-rc*
I'm hitting the exact same issue on multiple hosts - shouldn't this be a regression?
Take a look here: https://www.mail-archive.com/netdev@vger.kernel.org/msg204727.html
Great that's a nice thing I didn't find prior today ... With 4.15.0-0.rc2.git2.1 it's even worse: [jenkins@zappa linux-stable-new]$ sudo virsh list Id Name State ---------------------------------------------------- 2 undercloud-0-rhosp9 in shutdown 17 ceph-0-rhosp9 in shutdown 18 control-2-rhosp9 in shutdown 19 control-1-rhosp9 in shutdown 20 control-0-rhosp9 in shutdown
I posted that to the KVM mailing list yesterday: It looks like the first bad commit would be the following: [jenkins@zappa linux-stable-new]$ sudo bash bisect.sh -g 3ece782693c4b64d588dd217868558ab9a19bfe7 is the first bad commit commit 3ece782693c4b64d588dd217868558ab9a19bfe7 Author: Willem de Bruijn <willemb@google.com> Date: Thu Aug 3 16:29:38 2017 -0400 sock: skb_copy_ubufs support for compound pages Refine skb_copy_ubufs to support compound pages. With upcoming TCP zerocopy sendmsg, such fragments may appear. The existing code replaces each page one for one. Splitting each compound page into an independent number of regular pages can result in exceeding limit MAX_SKB_FRAGS if data is not exactly page aligned. Instead, fill all destination pages but the last to PAGE_SIZE. Split the existing alloc + copy loop into separate stages: 1. compute bytelength and minimum number of pages to store this. 2. allocate 3. copy, filling each page except the last to PAGE_SIZE bytes 4. update skb frag array Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> :040000 040000 f1b652be7e59b1046400cad8e6be25028a88b8e2 6ecf86d9f06a2d98946f531f1e4cf803de071b10 M include :040000 040000 8420cf451fcf51f669ce81437ce7e0aacc33d2eb 4fc8384362693e4619fab39b0a945f6f2349226b M net Here is the bisect log: [root@zappa linux-stable-new]# git bisect log git bisect start # bad: [2bd6bf03f4c1c59381d62c61d03f6cc3fe71f66e] Linux 4.14-rc1 git bisect bad 2bd6bf03f4c1c59381d62c61d03f6cc3fe71f66e # good: [e87c13993f16549e77abce9744af844c55154349] Linux 4.13.16 git bisect good e87c13993f16549e77abce9744af844c55154349 # good: [569dbb88e80deb68974ef6fdd6a13edb9d686261] Linux 4.13 git bisect good 569dbb88e80deb68974ef6fdd6a13edb9d686261 # good: [569dbb88e80deb68974ef6fdd6a13edb9d686261] Linux 4.13 git bisect good 569dbb88e80deb68974ef6fdd6a13edb9d686261 # bad: [aae3dbb4776e7916b6cd442d00159bea27a695c1] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next git bisect bad aae3dbb4776e7916b6cd442d00159bea27a695c1 # good: [bf1d6b2c76eda86159519bf5c427b1fa8f51f733] Merge tag 'staging-4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging git bisect good bf1d6b2c76eda86159519bf5c427b1fa8f51f733 # bad: [e833251ad813168253fef9915aaf6a8c883337b0] rxrpc: Add notification of end-of-Tx phase git bisect bad e833251ad813168253fef9915aaf6a8c883337b0 # bad: [46d4b68f891bee5d83a32508bfbd9778be6b1b63] Merge tag 'wireless-drivers-next-for-davem-2017-08-07' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next git bisect bad 46d4b68f891bee5d83a32508bfbd9778be6b1b63 # good: [cf6c6ea352faadb15d1373d890bf857080b218a4] iwlwifi: mvm: fix the FIFO numbers in A000 devices git bisect good cf6c6ea352faadb15d1373d890bf857080b218a4 # good: [65205cc465e9b37abbdbb3d595c46081b97e35bc] sctp: remove the typedef sctp_addiphdr_t git bisect good 65205cc465e9b37abbdbb3d595c46081b97e35bc # bad: [ecbd87b8430419199cc9dd91598d5552a180f558] phylink: add support for MII ioctl access to Clause 45 PHYs git bisect bad ecbd87b8430419199cc9dd91598d5552a180f558 # bad: [52267790ef52d7513879238ca9fac22c1733e0e3] sock: add MSG_ZEROCOPY git bisect bad 52267790ef52d7513879238ca9fac22c1733e0e3 # good: [04b1d4e50e82536c12da00ee04a77510c459c844] net: core: Make the FIB notification chain generic git bisect good 04b1d4e50e82536c12da00ee04a77510c459c844 # good: [9217d8c2fe743f02a3ce6d430fe3b5d514fd5f1c] ipv6: Regenerate host route according to node pointer upon loopback up git bisect good 9217d8c2fe743f02a3ce6d430fe3b5d514fd5f1c # good: [0a7fd1ac2a6b316ceeb9a57a41ce0c45f6bff549] mlxsw: spectrum_router: Add support for route replace git bisect good 0a7fd1ac2a6b316ceeb9a57a41ce0c45f6bff549 # good: [84b7187ca2338832e3af58eb5123c02bb6921e4e] Merge branch 'mlxsw-Support-for-IPv6-UC-router' git bisect good 84b7187ca2338832e3af58eb5123c02bb6921e4e # bad: [3ece782693c4b64d588dd217868558ab9a19bfe7] sock: skb_copy_ubufs support for compound pages git bisect bad 3ece782693c4b64d588dd217868558ab9a19bfe7 # good: [98ba0bd5505dcbb90322a4be07bcfe6b8a18c73f] sock: allocate skbs from optmem git bisect good 98ba0bd5505dcbb90322a4be07bcfe6b8a18c73f # first bad commit: [3ece782693c4b64d588dd217868558ab9a19bfe7] sock: skb_copy_ubufs support for compound pages
Does reverting only that patch resolve the issue? The logic in it is quite complex, but it is only needed for zerocopy with MSG_ZEROCOPY. And then only in edge cases. This is likely not used here. The code is blocking on vhost_net use of zerocopy. Which does not build skbuffs with zerocopy data in compound pages. The patch adds checks against shared and cloned skbs that were not present before. It is not safe to modify skb frags[] on on either, but perhaps this changed return path causes buffers to not be released, causing vhost_net_ubuf_put_and_wait to wait seemingly indefinitely. + if (skb_shared(skb) || skb_unclone(skb, gfp_mask)) + return -EINVAL;
If that fixes it, the output with this patch would be informative. diff --git a/net/core/skbuff.c b/net/core/skbuff.c index a95877a8ac8b..7c4ba11cb429 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -941,8 +941,14 @@ int skb_copy_ubufs(struct sk_buff *skb, gfp_t gfp_mask) if (!num_frags) return 0; - if (skb_shared(skb) || skb_unclone(skb, gfp_mask)) + if (skb_shared(skb)) { + WARN_ONCE(1, "shared skb"); return -EINVAL; + } + if (skb_unclone(skb, gfp_mask)) { + WARN_ONCE(1, "cloned skb"); + return -EINVAL; + }
If I revert the patch 3ece782693c4b64d588dd217868558ab9a19bfe7, I can't apply the patch you mentionned unless you want me to apply it on the unreverted branch ?
Yes, I mean that if that revert fixes it, then this patch is the cause. Then it would be helpful to see whether it is due to those more stringent precondition checks and, if so, which codepath it is that leads to this function with an skb that does not match those checks. Existing code should pass them too, even if the checks where not there in practice before. Sorry for the confusion.
So please apply it on the unreverted branch.
I've applied the suggested patch on the current branch but the patch didn't apply somehow ... I had to change it manually so I guess it's the tabs that are used for indentation and the spaces I copy/pasted from this BZ. I couldn't revert the commit using "git revert" so my guess here is that I'd have to go back to before this commit, compile and retry to reproduce but we already know it works according to the bisection so I was wondering if we still had to try that ? I tried checking out skbuff.c from the working 4.13.0 branch but the kernel didn't compile because some functions were probably changed and the validation failed at the end.
> we already know it works according to the bisection > so I was wondering if we still had to try that ? It is not strictly necessary, but it would increase confidence that we found the culprit. I don't know how easy or hard it is to reproduce for you, so skip if this it too much work. But in case you can reproduce it, I pushed v4.14 with the (indeed non-trivial) revert to https://github.com/wdebruij/linux/commits/v4.14-revert-copyubufs If only time for either this or the WARN_ONCE instrumentation, then that variant is more helpful. Thanks.
It's really easy to reproduce but I hit issues at various stages of the CI I'm running when I hit this issue. Sometimes the VMs is stuck in shutting down, sometimes one of the VM won't get an IP address from the DHCP server it has to contact, sometimes the server will simply kernel panic. I've applied that patch to my current v4.14.5 branch and am compiling it now and as soon as it's done, I'll reboot and retry reproducing this again.
So I can still reproduce this issue with the patch reverted... Now I'm puzzled.
It is quite likely that this patch --or patchset-- is involved, as the hang happens in cleaning up zerocopy skb state and this patchset extends zerocopy. Because of conflicts I had to resolve this patch is not a pure revert, but the changes should be limited to replacing uarg->callback(uarg, false); skb_shinfo(skb)->tx_flags &= ~SKBTX_DEV_ZEROCOPY; with an skb_zcopy_clear(skb, false) that boils down to the same. One option would be to test while reverting the entire patchset. I may be able to do that, no idea how many conflicts I'd have to resolve. Perhaps first, could you try the other suggestion with the warnings. If those fire, then we know for sure that this is implicated.
I tried the WARN_ONCE on the v4.14.5 branch and I didn't find those output in the logs. Does this make sense?
Thanks for trying. You did encounter the hangs? Then it's not due to having added those checks and failing skb_copy_ubufs as of this patch. It's increasingly likely that it is another patch in the patchset, then.
(In reply to David Hill from comment #16) > I tried the WARN_ONCE on the v4.14.5 branch and I didn't find those output > in the logs. Does this make sense? Could you please try to revert patchset "Remove UDP Fragmentation Offload support" as described by Andreas here: https://www.mail-archive.com/netdev@vger.kernel.org/msg204727.html ? He added the complete revert patch for 4.14.4
Andreas just mentioned in that thread that the full revert resolves all issues for him, including this hang in vhost_net_ubuf_put_and_wait
So will it get fixed or will we have to rollback those commits ?
I meant to give a +1 to Klaus's request to test that that solves your regression, too. We will have to dig deeper and fix the actual remaining issue, instead of reverting the full patchset. That revert solved a lot of bugs in the UFO stack, so an wholesale revert is not a solution.
The following fix has been merged skbuff: orphan frags before zerocopy clone 268b790679422a89e9ab0685d9f291edae780c98 v4.15-rc5~10^2^2~1 http://patchwork.ozlabs.org/patch/851715/ That resolved another report of what is most likely the same issue. Tested together with this follow-up that is currently awaiting review to net-next skbuff: in skb_segment, call zerocopy functions once per nskb http://patchwork.ozlabs.org/patch/852598/
I'll try reproducing the problem with 4.15-rc5 and will keep you updated .
I can confirm that with 4.14.9 with the fix applied, I can no longer reproduce the problem. My VMs are shutting down nicely now. I can finally get off the EOL 4.13 series. Thank you Willem for your efforts to resolve this!
Glad to hear it! Thanks for updating the bug. Time to close it?
Maybe we should wait and see if David Hill can also confirm the problem goes away on his setup before closing it?
(In reply to Willem de Bruijn from comment #22) > The following fix has been merged > > skbuff: orphan frags before zerocopy clone > 268b790679422a89e9ab0685d9f291edae780c98 > v4.15-rc5~10^2^2~1 > http://patchwork.ozlabs.org/patch/851715/ > > That resolved another report of what is most likely > the same issue. Tested together with this follow-up > that is currently awaiting review to net-next > > skbuff: in skb_segment, call zerocopy functions once per nskb > http://patchwork.ozlabs.org/patch/852598/ The second patch is partially un-doing something in the first patch. Is this correct, and all that is necessary for the 4.14.9 kernel?
> The second patch is partially un-doing something in the first patch. It changes how often the calls are performed. They only depend on the state of the origin and destination skb. Calling them more often (once for each frag) serves no purpose. > Is this correct, I would not have sent it if I thought otherwise. > and all that is necessary for the 4.14.9 kernel? No, the second one is an optimization sent onto to net-next, which will be 4.16. I split them to make the actual bugfix that needs to be backported to stable as small and obvious as possible.
I just retried with 4.15-rc5 and it looks like I'm no longer able to reproduce this issue.
Patches for this issue were deployed to kernel 4.14.9 or they'll only land on 4.16? I have been affected by this issue ever since kernel 4.14 landed on archlinux. Sometimes it happens when rebooting a guest also. Another thing that's worth nothing is that this doesn't seem to happen when the guest was just started and there was not much network traffic yet. But, after you start using the guest, the qemu process remains in D state after shutting down the guest. And the only way to not (possibly) loose data on the host (due to a forced shutdown) is to use the Magic SysRq keys and either REISUB or REISUO combinations.
The patch is in 4.15-rc5 and the backport to 4.14 is queued: https://patchwork.kernel.org/patch/10138995/
Thanks for the info Willem. There is a workaround for this issue, which is to pass the following option to the vhost_net module: options vhost_net experimental_zcopytx=0 I have tested this and it works.
(In reply to Willem de Bruijn from comment #31) > The patch is in 4.15-rc5 and the backport to 4.14 is queued: > > https://patchwork.kernel.org/patch/10138995/ With that patch, we still sometimes see Ubuntu VM terminate but qemu process is left in "uninterruptible sleep". Related?
Hard to say without knowing what the process is sleeping on. If it still happens with Giancarlo's suggestion from #32 then we could exclude this bug.
LimeTech, the kernel should be telling you exactly why the task is in blocked state, unless you have disabled the hung_task_timeout_secs with: echo 0 > /proc/sys/kernel/hung_task_timeout_secs If you have done so, revert that, and you should be able to know why. If, on the call trace you see vhost_net, then I think it is the same issue. That workaround works just fine, I have been able to shutdown/reboot several guests, with different OS'es, across 2 hosts with it.
We have confirmed this issue seems be solved in 4.14.11.
Host: ubuntu 17.10, vanilla kernel 4.14.12, nested virtualization and vhost_net workaround aplied options kvm_intel nested=1 options vhost_net experimental_zcopytx=0 Problem: can always be reproduced on redhat/centos7.x, after about 8 hour of guest uptime, guest machine hangs How to reproduce: boot a centos/redhat7.x guest vm (a minimal installation should be ok), and wait about 8hours, the period may vary. You can give a tail command on syslog to see some detailed message (for example tail -f /var/log/messages) Guest kernel: 3.10.0-693.11.6.el7.x86_64 Syslog output: /var/log/messages Jan 10 12:56:03 kvm178 dbus[756]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.freedesktop.nm-dispatcher.service' Jan 10 12:56:03 kvm178 dhclient[911]: bound to 192.168.122.178 -- renewal in 1257 seconds. Jan 10 12:56:28 kvm178 dbus[756]: [system] Failed to activate service 'org.freedesktop.nm_dispatcher': timed out Jan 10 12:56:28 kvm178 dbus-daemon: dbus[756]: [system] Failed to activate service 'org.freedesktop.nm_dispatcher': timed out Jan 10 12:58:40 kvm178 kernel: INFO: task systemd:1 blocked for more than 120 seconds. Jan 10 12:58:40 kvm178 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jan 10 12:58:40 kvm178 kernel: systemd D ffff88004d1d8000 0 1 0 0x00000000 Jan 10 12:58:40 kvm178 kernel: Call Trace: Jan 10 12:58:40 kvm178 kernel: [<ffffffff816ab6d9>] schedule+0x29/0x70 Jan 10 12:58:40 kvm178 kernel: [<ffffffff810625bf>] kvm_async_pf_task_wait+0x1df/0x230 Jan 10 12:58:40 kvm178 kernel: [<ffffffff810b34b0>] ? wake_up_atomic_t+0x30/0x30 Jan 10 12:58:40 kvm178 kernel: [<ffffffff816afc00>] ? error_swapgs+0x61/0x18d Jan 10 12:58:40 kvm178 kernel: [<ffffffff816afcef>] ? error_swapgs+0x150/0x18d Jan 10 12:58:40 kvm178 kernel: [<ffffffff816b32d6>] do_async_page_fault+0x96/0xd0 Jan 10 12:58:40 kvm178 kernel: [<ffffffff816af928>] async_page_fault+0x28/0x30 Jan 10 13:00:40 kvm178 kernel: INFO: task systemd:1 blocked for more than 120 seconds. Jan 10 13:00:40 kvm178 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jan 10 13:00:40 kvm178 kernel: systemd D ffff88004d1d8000 0 1 0 0x00000000 Jan 10 13:00:40 kvm178 kernel: Call Trace: Jan 10 13:00:40 kvm178 kernel: [<ffffffff816ab6d9>] schedule+0x29/0x70 Jan 10 13:00:40 kvm178 kernel: [<ffffffff810625bf>] kvm_async_pf_task_wait+0x1df/0x230 Jan 10 13:00:40 kvm178 kernel: [<ffffffff810b34b0>] ? wake_up_atomic_t+0x30/0x30 Jan 10 13:00:40 kvm178 kernel: [<ffffffff816afc00>] ? error_swapgs+0x61/0x18d Jan 10 13:00:40 kvm178 kernel: [<ffffffff816afcef>] ? error_swapgs+0x150/0x18d Jan 10 13:00:40 kvm178 kernel: [<ffffffff816b32d6>] do_async_page_fault+0x96/0xd0 Jan 10 13:00:40 kvm178 kernel: [<ffffffff816af928>] async_page_fault+0x28/0x30 Jan 10 13:01:26 kvm178 systemd-logind: Failed to start session scope session-23.scope: Connection timed out Jan 10 13:02:40 kvm178 kernel: INFO: task systemd:1 blocked for more than 120 seconds. Jan 10 13:02:40 kvm178 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jan 10 13:02:40 kvm178 kernel: systemd D ffff88004d1d8000 0 1 0 0x00000000 Jan 10 13:02:40 kvm178 kernel: Call Trace: Jan 10 13:02:40 kvm178 kernel: [<ffffffff816ab6d9>] schedule+0x29/0x70 Jan 10 13:02:40 kvm178 kernel: [<ffffffff810625bf>] kvm_async_pf_task_wait+0x1df/0x230 Jan 10 13:02:40 kvm178 kernel: [<ffffffff810b34b0>] ? wake_up_atomic_t+0x30/0x30 Jan 10 13:02:40 kvm178 kernel: [<ffffffff816afc00>] ? error_swapgs+0x61/0x18d Jan 10 13:02:40 kvm178 kernel: [<ffffffff816afcef>] ? error_swapgs+0x150/0x18d Jan 10 13:02:40 kvm178 kernel: [<ffffffff816b32d6>] do_async_page_fault+0x96/0xd0 Jan 10 13:02:40 kvm178 kernel: [<ffffffff816af928>] async_page_fault+0x28/0x30 Jan 10 13:04:40 kvm178 kernel: INFO: task systemd:1 blocked for more than 120 seconds. Jan 10 13:04:40 kvm178 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jan 10 13:04:40 kvm178 kernel: systemd D ffff88004d1d8000 0 1 0 0x00000000 Jan 10 13:04:40 kvm178 kernel: Call Trace: Jan 10 13:04:40 kvm178 kernel: [<ffffffff816ab6d9>] schedule+0x29/0x70 Jan 10 13:04:40 kvm178 kernel: [<ffffffff810625bf>] kvm_async_pf_task_wait+0x1df/0x230 Jan 10 13:04:40 kvm178 kernel: [<ffffffff810b34b0>] ? wake_up_atomic_t+0x30/0x30 Jan 10 13:04:40 kvm178 kernel: [<ffffffff816afc00>] ? error_swapgs+0x61/0x18d Jan 10 13:04:40 kvm178 kernel: [<ffffffff816afcef>] ? error_swapgs+0x150/0x18d Jan 10 13:04:40 kvm178 kernel: [<ffffffff816b32d6>] do_async_page_fault+0x96/0xd0 Jan 10 13:04:40 kvm178 kernel: [<ffffffff816af928>] async_page_fault+0x28/0x30 Jan 10 13:06:40 kvm178 kernel: INFO: task systemd:1 blocked for more than 120 seconds. Jan 10 13:06:40 kvm178 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jan 10 13:06:40 kvm178 kernel: systemd D ffff88004d1d8000 0 1 0 0x00000000 Jan 10 13:06:40 kvm178 kernel: Call Trace: Jan 10 13:06:40 kvm178 kernel: [<ffffffff816ab6d9>] schedule+0x29/0x70 Jan 10 13:06:40 kvm178 kernel: [<ffffffff810625bf>] kvm_async_pf_task_wait+0x1df/0x230 Jan 10 13:06:40 kvm178 kernel: [<ffffffff810b34b0>] ? wake_up_atomic_t+0x30/0x30 Jan 10 13:06:40 kvm178 kernel: [<ffffffff816afc00>] ? error_swapgs+0x61/0x18d Jan 10 13:06:40 kvm178 kernel: [<ffffffff816afcef>] ? error_swapgs+0x150/0x18d Jan 10 13:06:40 kvm178 kernel: [<ffffffff816b32d6>] do_async_page_fault+0x96/0xd0 Jan 10 13:06:40 kvm178 kernel: [<ffffffff816af928>] async_page_fault+0x28/0x30 Jan 10 13:08:40 kvm178 kernel: INFO: task systemd:1 blocked for more than 120 seconds. Jan 10 13:08:40 kvm178 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jan 10 13:08:40 kvm178 kernel: systemd D ffff88004d1d8000 0 1 0 0x00000000 Jan 10 13:08:40 kvm178 kernel: Call Trace: Jan 10 13:08:40 kvm178 kernel: [<ffffffff816ab6d9>] schedule+0x29/0x70 Jan 10 13:08:40 kvm178 kernel: [<ffffffff810625bf>] kvm_async_pf_task_wait+0x1df/0x230 Jan 10 13:08:40 kvm178 kernel: [<ffffffff810b34b0>] ? wake_up_atomic_t+0x30/0x30 Jan 10 13:08:40 kvm178 kernel: [<ffffffff816afc00>] ? error_swapgs+0x61/0x18d Jan 10 13:08:40 kvm178 kernel: [<ffffffff816afcef>] ? error_swapgs+0x150/0x18d Jan 10 13:08:40 kvm178 kernel: [<ffffffff816b32d6>] do_async_page_fault+0x96/0xd0 Jan 10 13:08:40 kvm178 kernel: [<ffffffff816af928>] async_page_fault+0x28/0x30 Jan 10 13:10:40 kvm178 kernel: INFO: task systemd:1 blocked for more than 120 seconds. Jan 10 13:10:40 kvm178 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jan 10 13:10:40 kvm178 kernel: systemd D ffff88004d1d8000 0 1 0 0x00000000 Jan 10 13:10:40 kvm178 kernel: Call Trace: Jan 10 13:10:40 kvm178 kernel: [<ffffffff816ab6d9>] schedule+0x29/0x70 Jan 10 13:10:40 kvm178 kernel: [<ffffffff810625bf>] kvm_async_pf_task_wait+0x1df/0x230 Jan 10 13:10:40 kvm178 kernel: [<ffffffff810b34b0>] ? wake_up_atomic_t+0x30/0x30 Jan 10 13:10:40 kvm178 kernel: [<ffffffff816afc00>] ? error_swapgs+0x61/0x18d Jan 10 13:10:40 kvm178 kernel: [<ffffffff816afcef>] ? error_swapgs+0x150/0x18d Jan 10 13:10:40 kvm178 kernel: [<ffffffff816b32d6>] do_async_page_fault+0x96/0xd0 Jan 10 13:10:40 kvm178 kernel: [<ffffffff816af928>] async_page_fault+0x28/0x30 Jan 10 13:12:40 kvm178 kernel: INFO: task systemd:1 blocked for more than 120 seconds. Jan 10 13:12:40 kvm178 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jan 10 13:12:40 kvm178 kernel: systemd D ffff88004d1d8000 0 1 0 0x00000000 Jan 10 13:12:40 kvm178 kernel: Call Trace: Jan 10 13:12:40 kvm178 kernel: [<ffffffff816ab6d9>] schedule+0x29/0x70 Jan 10 13:12:40 kvm178 kernel: [<ffffffff810625bf>] kvm_async_pf_task_wait+0x1df/0x230 Jan 10 13:12:40 kvm178 kernel: [<ffffffff810b34b0>] ? wake_up_atomic_t+0x30/0x30 Jan 10 13:12:40 kvm178 kernel: [<ffffffff816afc00>] ? error_swapgs+0x61/0x18d Jan 10 13:12:40 kvm178 kernel: [<ffffffff816afcef>] ? error_swapgs+0x150/0x18d Jan 10 13:12:40 kvm178 kernel: [<ffffffff816b32d6>] do_async_page_fault+0x96/0xd0 Jan 10 13:12:40 kvm178 kernel: [<ffffffff816af928>] async_page_fault+0x28/0x30 Jan 10 13:14:40 kvm178 kernel: INFO: task systemd:1 blocked for more than 120 seconds. Jan 10 13:14:40 kvm178 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jan 10 13:14:40 kvm178 kernel: systemd D ffff88004d1d8000 0 1 0 0x00000000 Jan 10 13:14:40 kvm178 kernel: Call Trace: Jan 10 13:14:40 kvm178 kernel: [<ffffffff816ab6d9>] schedule+0x29/0x70 Jan 10 13:14:40 kvm178 kernel: [<ffffffff810625bf>] kvm_async_pf_task_wait+0x1df/0x230 Jan 10 13:14:40 kvm178 kernel: [<ffffffff810b34b0>] ? wake_up_atomic_t+0x30/0x30 Jan 10 13:14:40 kvm178 kernel: [<ffffffff816afc00>] ? error_swapgs+0x61/0x18d Jan 10 13:14:40 kvm178 kernel: [<ffffffff816afcef>] ? error_swapgs+0x150/0x18d Jan 10 13:14:40 kvm178 kernel: [<ffffffff816b32d6>] do_async_page_fault+0x96/0xd0 Jan 10 13:14:40 kvm178 kernel: [<ffffffff816af928>] async_page_fault+0x28/0x30 Jan 10 13:16:40 kvm178 kernel: INFO: task systemd:1 blocked for more than 120 seconds. Jan 10 13:16:40 kvm178 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jan 10 13:16:40 kvm178 kernel: systemd D ffff88004d1d8000 0 1 0 0x00000000 Jan 10 13:16:40 kvm178 kernel: Call Trace: Jan 10 13:16:40 kvm178 kernel: [<ffffffff816ab6d9>] schedule+0x29/0x70 Jan 10 13:16:40 kvm178 kernel: [<ffffffff810625bf>] kvm_async_pf_task_wait+0x1df/0x230 Jan 10 13:16:40 kvm178 kernel: [<ffffffff810b34b0>] ? wake_up_atomic_t+0x30/0x30 Jan 10 13:16:40 kvm178 kernel: [<ffffffff816afc00>] ? error_swapgs+0x61/0x18d Jan 10 13:16:40 kvm178 kernel: [<ffffffff816afcef>] ? error_swapgs+0x150/0x18d Jan 10 13:16:40 kvm178 kernel: [<ffffffff816b32d6>] do_async_page_fault+0x96/0xd0 Jan 10 13:16:40 kvm178 kernel: [<ffffffff816af928>] async_page_fault+0x28/0x30 .... guest died, guest cpu 100%, hard reset on guest needed. Guests with redhat/centos6.x (kernel 2.6.32-696.18.7.el6.x86_64) and windows10 doesn't have problems. Hope this could help.
@bubez 4.14.12 does not require the workaround anymore. Can you remove it and test again? Also, I think this issue has another patchset to improve it further on 4.16.
@Razzolini host kernel: vanilla 4.14.13 Tried again without the options vhost_net experimental_zcopytx=0 and without the dhcp server of libvirtd/dnsmasq and after 5 days, all redhat7.x machines Guest kernel: 3.10.0-693.11.6.el7.x86_64 were ko again. 5 days of guests uptime is better than 8 hours, but is not enough.
@bubez As far as I know this bug doesn't kill or freeze the guests nor the host. It just prevents the guests qemu processes from being ended, they remain on D state. Last week I had a guest running for more than 4 days. But since this is not a server, I did suspend the host a few times in between. I'll have more data into servers this week. Also, the log you sent is not the same for this bug, which is on the vhost_net module. I think you might be hitting another issue.
@bubez: I suspect that your issue is related to: https://marc.info/?l=linux-kernel&m=151602104019357&w=2 And is not related to this issue.