Bug 197861

Summary:	Shutting down a VM with Kernel 4.14 will sometime hang and a reboot is the only way to recover.
Product:	Virtualization	Reporter:	David Hill (hilld)
Component:	kvm	Assignee:	virtualization_kvm
Status:	NEW ---
Severity:	normal	CC:	ao, dmoulding, fan4326, grazzolini, hilld, Ian.kumlien, kmueller, kyle, mark.mielke, michele.mase, tomm, willemb
Priority:	P1
Hardware:	Intel
OS:	Linux
Kernel Version:	4.14-rc8	Subsystem:
Regression:	No	Bisected commit-id:

Description David Hill 2017-11-13 15:35:57 UTC

[ 7496.552971] INFO: task qemu-system-x86:5978 blocked for more than 120 seconds.
[ 7496.552987]       Tainted: G          I     4.14.0-0.rc1.git3.1.fc28.x86_64 #1
[ 7496.552996] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 7496.553006] qemu-system-x86 D12240  5978      1 0x00000004
[ 7496.553024] Call Trace:
[ 7496.553044]  __schedule+0x2dc/0xbb0
[ 7496.553055]  ? trace_hardirqs_on+0xd/0x10
[ 7496.553074]  schedule+0x3d/0x90
[ 7496.553087]  vhost_net_ubuf_put_and_wait+0x73/0xa0 [vhost_net]
[ 7496.553100]  ? finish_wait+0x90/0x90
[ 7496.553115]  vhost_net_ioctl+0x542/0x910 [vhost_net]
[ 7496.553144]  do_vfs_ioctl+0xa6/0x6c0
[ 7496.553166]  SyS_ioctl+0x79/0x90
[ 7496.553182]  entry_SYSCALL_64_fastpath+0x1f/0xbe
[ 7496.553190] RIP: 0033:0x7fa1ea0e1817
[ 7496.553196] RSP: 002b:00007ffe3d854bc8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 7496.553209] RAX: ffffffffffffffda RBX: 000000000000001d RCX: 00007fa1ea0e1817
[ 7496.553215] RDX: 00007ffe3d854bd0 RSI: 000000004008af30 RDI: 0000000000000021
[ 7496.553222] RBP: 000055e33352b610 R08: 000055e33024a6f0 R09: 000055e330245d92
[ 7496.553228] R10: 000055e33344e7f0 R11: 0000000000000246 R12: 000055e33351a000
[ 7496.553235] R13: 0000000000000001 R14: 0000000400000000 R15: 0000000000000000
[ 7496.553284]
               Showing all locks held in the system:
[ 7496.553313] 1 lock held by khungtaskd/161:
[ 7496.553319]  #0:  (tasklist_lock){.+.+}, at: [<ffffffff8111740d>] debug_show_all_locks+0x3d/0x1a0
[ 7496.553373] 1 lock held by in:imklog/1194:
[ 7496.553379]  #0:  (&f->f_pos_lock){+.+.}, at: [<ffffffff8130ecfc>] __fdget_pos+0x4c/0x60
[ 7496.553541] 1 lock held by qemu-system-x86/5978:
[ 7496.553547]  #0:  (&dev->mutex#3){+.+.}, at: [<ffffffffc077e498>] vhost_net_ioctl+0x358/0x910 [vhost_net]



I'm currently bisecting to figure out which commit breaks this but for some reasons, when hitting this commit:

# bad: [46d4b68f891bee5d83a32508bfbd9778be6b1b63] Merge tag 'wireless-drivers-next-for-davem-2017-08-07' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next
git bisect bad 46d4b68f891bee5d83a32508bfbd9778be6b1b63

the host will not allow SSHd to establish a new session and when starting a KVM guest, the host will hard lock.   I'm still bisecting but I marked that commit as bad even though perhaps it would be good.

Comment 1 David Hill 2017-12-04 20:33:31 UTC

The same happens with v4.15.0-rc*

Comment 2 Ian Kumlien 2017-12-11 11:34:02 UTC

I'm hitting the exact same issue on multiple hosts - shouldn't this be a regression?

Comment 3 Klaus Mueller 2017-12-12 18:55:05 UTC

Take a look here: 
https://www.mail-archive.com/netdev@vger.kernel.org/msg204727.html

Comment 4 David Hill 2017-12-12 19:15:15 UTC

Great that's a nice thing I didn't find prior today ...   With 4.15.0-0.rc2.git2.1 it's even worse:

[jenkins@zappa linux-stable-new]$ sudo virsh list
 Id    Name                           State
----------------------------------------------------
 2     undercloud-0-rhosp9            in shutdown
 17    ceph-0-rhosp9                  in shutdown
 18    control-2-rhosp9               in shutdown
 19    control-1-rhosp9               in shutdown
 20    control-0-rhosp9               in shutdown

Comment 5 David Hill 2017-12-12 19:18:12 UTC

I posted that to the KVM mailing list yesterday:

It looks like the first bad commit would be the following:

[jenkins@zappa linux-stable-new]$ sudo bash bisect.sh -g
3ece782693c4b64d588dd217868558ab9a19bfe7 is the first bad commit
commit 3ece782693c4b64d588dd217868558ab9a19bfe7
Author: Willem de Bruijn <willemb@google.com>
Date:   Thu Aug 3 16:29:38 2017 -0400

    sock: skb_copy_ubufs support for compound pages

    Refine skb_copy_ubufs to support compound pages. With upcoming TCP
    zerocopy sendmsg, such fragments may appear.

    The existing code replaces each page one for one. Splitting each
    compound page into an independent number of regular pages can result
    in exceeding limit MAX_SKB_FRAGS if data is not exactly page aligned.

    Instead, fill all destination pages but the last to PAGE_SIZE.
    Split the existing alloc + copy loop into separate stages:
    1. compute bytelength and minimum number of pages to store this.
    2. allocate
    3. copy, filling each page except the last to PAGE_SIZE bytes
    4. update skb frag array

    Signed-off-by: Willem de Bruijn <willemb@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

:040000 040000 f1b652be7e59b1046400cad8e6be25028a88b8e2 6ecf86d9f06a2d98946f531f1e4cf803de071b10 M    include
:040000 040000 8420cf451fcf51f669ce81437ce7e0aacc33d2eb 4fc8384362693e4619fab39b0a945f6f2349226b M    net

Here is the bisect log:

[root@zappa linux-stable-new]# git bisect log
git bisect start
# bad: [2bd6bf03f4c1c59381d62c61d03f6cc3fe71f66e] Linux 4.14-rc1
git bisect bad 2bd6bf03f4c1c59381d62c61d03f6cc3fe71f66e
# good: [e87c13993f16549e77abce9744af844c55154349] Linux 4.13.16
git bisect good e87c13993f16549e77abce9744af844c55154349
# good: [569dbb88e80deb68974ef6fdd6a13edb9d686261] Linux 4.13
git bisect good 569dbb88e80deb68974ef6fdd6a13edb9d686261
# good: [569dbb88e80deb68974ef6fdd6a13edb9d686261] Linux 4.13
git bisect good 569dbb88e80deb68974ef6fdd6a13edb9d686261
# bad: [aae3dbb4776e7916b6cd442d00159bea27a695c1] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
git bisect bad aae3dbb4776e7916b6cd442d00159bea27a695c1
# good: [bf1d6b2c76eda86159519bf5c427b1fa8f51f733] Merge tag 'staging-4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
git bisect good bf1d6b2c76eda86159519bf5c427b1fa8f51f733
# bad: [e833251ad813168253fef9915aaf6a8c883337b0] rxrpc: Add notification of end-of-Tx phase
git bisect bad e833251ad813168253fef9915aaf6a8c883337b0
# bad: [46d4b68f891bee5d83a32508bfbd9778be6b1b63] Merge tag 'wireless-drivers-next-for-davem-2017-08-07' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next
git bisect bad 46d4b68f891bee5d83a32508bfbd9778be6b1b63
# good: [cf6c6ea352faadb15d1373d890bf857080b218a4] iwlwifi: mvm: fix the FIFO numbers in A000 devices
git bisect good cf6c6ea352faadb15d1373d890bf857080b218a4
# good: [65205cc465e9b37abbdbb3d595c46081b97e35bc] sctp: remove the typedef sctp_addiphdr_t
git bisect good 65205cc465e9b37abbdbb3d595c46081b97e35bc
# bad: [ecbd87b8430419199cc9dd91598d5552a180f558] phylink: add support for MII ioctl access to Clause 45 PHYs
git bisect bad ecbd87b8430419199cc9dd91598d5552a180f558
# bad: [52267790ef52d7513879238ca9fac22c1733e0e3] sock: add MSG_ZEROCOPY
git bisect bad 52267790ef52d7513879238ca9fac22c1733e0e3
# good: [04b1d4e50e82536c12da00ee04a77510c459c844] net: core: Make the FIB notification chain generic
git bisect good 04b1d4e50e82536c12da00ee04a77510c459c844
# good: [9217d8c2fe743f02a3ce6d430fe3b5d514fd5f1c] ipv6: Regenerate host route according to node pointer upon loopback up
git bisect good 9217d8c2fe743f02a3ce6d430fe3b5d514fd5f1c
# good: [0a7fd1ac2a6b316ceeb9a57a41ce0c45f6bff549] mlxsw: spectrum_router: Add support for route replace
git bisect good 0a7fd1ac2a6b316ceeb9a57a41ce0c45f6bff549
# good: [84b7187ca2338832e3af58eb5123c02bb6921e4e] Merge branch 'mlxsw-Support-for-IPv6-UC-router'
git bisect good 84b7187ca2338832e3af58eb5123c02bb6921e4e
# bad: [3ece782693c4b64d588dd217868558ab9a19bfe7] sock: skb_copy_ubufs support for compound pages
git bisect bad 3ece782693c4b64d588dd217868558ab9a19bfe7
# good: [98ba0bd5505dcbb90322a4be07bcfe6b8a18c73f] sock: allocate skbs from optmem
git bisect good 98ba0bd5505dcbb90322a4be07bcfe6b8a18c73f
# first bad commit: [3ece782693c4b64d588dd217868558ab9a19bfe7] sock: skb_copy_ubufs support for compound pages

Comment 6 Willem de Bruijn 2017-12-12 22:13:30 UTC

Does reverting only that patch resolve the issue?

The logic in it is quite complex, but it is only needed for zerocopy with MSG_ZEROCOPY. And then only in edge cases.

This is likely not used here. The code is blocking on vhost_net use of zerocopy. Which does not build skbuffs with zerocopy data in compound pages.

The patch adds checks against shared and cloned skbs that were not present before. It is not safe to modify skb frags[] on on either, but perhaps this changed return path causes buffers to not be released, causing vhost_net_ubuf_put_and_wait to wait seemingly indefinitely.

+       if (skb_shared(skb) || skb_unclone(skb, gfp_mask))
+               return -EINVAL;

Comment 7 Willem de Bruijn 2017-12-12 22:25:48 UTC

If that fixes it, the output with this patch would be informative.

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index a95877a8ac8b..7c4ba11cb429 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -941,8 +941,14 @@ int skb_copy_ubufs(struct sk_buff *skb, gfp_t gfp_mask)
        if (!num_frags)
                return 0;
 
-       if (skb_shared(skb) || skb_unclone(skb, gfp_mask))
+       if (skb_shared(skb)) {
+               WARN_ONCE(1, "shared skb");
                return -EINVAL;
+       }
+       if (skb_unclone(skb, gfp_mask)) {
+               WARN_ONCE(1, "cloned skb");
+               return -EINVAL;
+       }

Comment 8 David Hill 2017-12-13 03:48:02 UTC

If I revert the patch 3ece782693c4b64d588dd217868558ab9a19bfe7, I can't apply the patch you mentionned unless you want me to apply it on the unreverted branch ?

Comment 9 Willem de Bruijn 2017-12-13 04:21:29 UTC

Yes, I mean that if that revert fixes it, then this patch is the cause. Then it would be helpful to see whether it is due to those more stringent precondition checks and, if so, which codepath it is that leads to this function with an skb that does not match those checks. Existing code should pass them too, even if the checks where not there in practice before. Sorry for the confusion.

Comment 10 Willem de Bruijn 2017-12-13 04:22:23 UTC

So please apply it on the unreverted branch.

Comment 11 David Hill 2017-12-13 18:05:22 UTC

I've applied the suggested patch on the current branch but the patch didn't apply somehow ... I had to change it manually so I guess it's the tabs that are used for indentation and the spaces I copy/pasted from this BZ.   I couldn't revert the commit using "git revert" so my guess here is that I'd have to go back to before this commit, compile and retry to reproduce but we already know it works according to the bisection so I was wondering if we still had to try that ?  I tried checking out skbuff.c from the working 4.13.0 branch but the kernel didn't compile because some functions were probably changed and the validation failed at the end.

Comment 12 Willem de Bruijn 2017-12-13 19:17:12 UTC

> we already know it works according to the bisection
> so I was wondering if we still had to try that ?

It is not strictly necessary, but it would increase confidence
that we found the culprit. I don't know how easy or hard it is
to reproduce for you, so skip if this it too much work.

But in case you can reproduce it, I pushed v4.14 with the
(indeed non-trivial) revert to

  https://github.com/wdebruij/linux/commits/v4.14-revert-copyubufs

If only time for either this or the WARN_ONCE instrumentation,
then that variant is more helpful. Thanks.

Comment 13 David Hill 2017-12-13 20:27:51 UTC

It's really easy to reproduce but I hit issues at various stages of the CI I'm running when I hit this issue.

Sometimes the VMs is stuck in shutting down, sometimes one of the VM won't get an IP address from the DHCP server it has to contact, sometimes the server will simply kernel panic.   

I've applied that patch to my current v4.14.5 branch and am compiling it now and as soon as it's done, I'll reboot and retry reproducing this again.

Comment 14 David Hill 2017-12-13 22:48:48 UTC

So I can still reproduce this issue with the patch reverted... Now I'm puzzled.

Comment 15 Willem de Bruijn 2017-12-13 23:09:28 UTC

It is quite likely that this patch --or patchset-- is involved, as the hang happens in cleaning up zerocopy skb state and this patchset extends zerocopy.

Because of conflicts I had to resolve this patch is not a pure revert, but the changes should be limited to replacing

        uarg->callback(uarg, false);
        skb_shinfo(skb)->tx_flags &= ~SKBTX_DEV_ZEROCOPY;
 
with an skb_zcopy_clear(skb, false) that boils down to the same.

One option would be to test while reverting the entire patchset. I may be able to do that, no idea how many conflicts I'd have to resolve.

Perhaps first, could you try the other suggestion with the warnings. If those fire, then we know for sure that this is implicated.

Comment 16 David Hill 2017-12-14 01:54:25 UTC

I tried the WARN_ONCE on the v4.14.5 branch and I didn't find those output in the logs.  Does this make sense?

Comment 17 Willem de Bruijn 2017-12-14 02:36:43 UTC

Thanks for trying. You did encounter the hangs? Then it's not due to having added those checks and failing skb_copy_ubufs as of this patch.

It's increasingly likely that it is another patch in the patchset, then.

Comment 18 Klaus Mueller 2017-12-14 03:23:32 UTC

(In reply to David Hill from comment #16)
> I tried the WARN_ONCE on the v4.14.5 branch and I didn't find those output
> in the logs.  Does this make sense?

Could you please try to revert patchset "Remove UDP
Fragmentation Offload support" as described by Andreas here: 
https://www.mail-archive.com/netdev@vger.kernel.org/msg204727.html ?

He added the complete revert patch for 4.14.4

Comment 19 Willem de Bruijn 2017-12-14 17:39:38 UTC

Andreas just mentioned in that thread that the full revert resolves all issues for him, including this hang in

  vhost_net_ubuf_put_and_wait

Comment 20 David Hill 2017-12-14 17:58:26 UTC

So will it get fixed or will we have to rollback those commits ?

Comment 21 Willem de Bruijn 2017-12-14 18:06:49 UTC

I meant to give a +1 to Klaus's request to test that that solves your regression, too.

We will have to dig deeper and fix the actual remaining issue, instead of reverting the full patchset. That revert solved a lot of bugs in the UFO stack, so an wholesale revert is not a solution.

Comment 22 Willem de Bruijn 2017-12-27 04:00:12 UTC

The following fix has been merged

  skbuff: orphan frags before zerocopy clone
  268b790679422a89e9ab0685d9f291edae780c98
  v4.15-rc5~10^2^2~1
  http://patchwork.ozlabs.org/patch/851715/

That resolved another report of what is most likely
the same issue. Tested together with this follow-up
that is currently awaiting review to net-next 

  skbuff: in skb_segment, call zerocopy functions once per nskb
  http://patchwork.ozlabs.org/patch/852598/

Comment 23 David Hill 2017-12-27 16:58:22 UTC

I'll try reproducing the problem with 4.15-rc5 and will keep you updated .

Comment 24 Dan Moulding 2017-12-27 17:40:03 UTC

I can confirm that with 4.14.9 with the fix applied, I can no longer reproduce the problem. My VMs are shutting down nicely now.

I can finally get off the EOL 4.13 series. Thank you Willem for your efforts to resolve this!

Comment 25 Willem de Bruijn 2017-12-27 17:50:49 UTC

Glad to hear it! Thanks for updating the bug. Time to close it?

Comment 26 Dan Moulding 2017-12-27 19:22:59 UTC

Maybe we should wait and see if David Hill can also confirm the problem goes away on his setup before closing it?

Comment 27 LimeTech 2017-12-27 22:04:23 UTC

(In reply to Willem de Bruijn from comment #22)
> The following fix has been merged
> 
>   skbuff: orphan frags before zerocopy clone
>   268b790679422a89e9ab0685d9f291edae780c98
>   v4.15-rc5~10^2^2~1
>   http://patchwork.ozlabs.org/patch/851715/
> 
> That resolved another report of what is most likely
> the same issue. Tested together with this follow-up
> that is currently awaiting review to net-next 
> 
>   skbuff: in skb_segment, call zerocopy functions once per nskb
>   http://patchwork.ozlabs.org/patch/852598/

The second patch is partially un-doing something in the first patch.  Is this correct, and all that is necessary for the 4.14.9 kernel?

Comment 28 Willem de Bruijn 2017-12-27 23:26:34 UTC

> The second patch is partially un-doing something in the first patch.

It changes how often the calls are performed. They only depend on the state of the origin and destination skb. Calling them more often (once for each frag) serves no purpose.

> Is this correct,

I would not have sent it if I thought otherwise.

> and all that is necessary for the 4.14.9 kernel?

No, the second one is an optimization sent onto to net-next, which will be 4.16.

I split them to make the actual bugfix that needs to be backported to stable as small and obvious as possible.

Comment 29 David Hill 2017-12-28 17:20:38 UTC

I just retried with 4.15-rc5 and it looks like I'm no longer able to reproduce this issue.

Comment 30 Giancarlo Razzolini 2018-01-01 16:01:34 UTC

Patches for this issue were deployed to kernel 4.14.9 or they'll only land on 4.16? I have been affected by this issue ever since kernel 4.14 landed on archlinux. Sometimes it happens when rebooting a guest also. Another thing that's worth nothing is that this doesn't seem to happen when the guest was just started and there was not much network traffic yet. But, after you start using the guest, the qemu process remains in D state after shutting down the guest. And the only way to not (possibly) loose data on the host (due to a forced shutdown) is to use the Magic SysRq keys and either REISUB or REISUO combinations.

Comment 31 Willem de Bruijn 2018-01-02 10:01:40 UTC

The patch is in 4.15-rc5 and the backport to 4.14 is queued:

https://patchwork.kernel.org/patch/10138995/

Comment 32 Giancarlo Razzolini 2018-01-02 13:33:07 UTC

Thanks for the info Willem. There is a workaround for this issue, which is to pass the following option to the vhost_net module:

options vhost_net  experimental_zcopytx=0

I have tested this and it works.

Comment 33 LimeTech 2018-01-02 16:43:48 UTC

(In reply to Willem de Bruijn from comment #31)
> The patch is in 4.15-rc5 and the backport to 4.14 is queued:
> 
> https://patchwork.kernel.org/patch/10138995/

With that patch, we still sometimes see Ubuntu VM terminate but qemu process is left in "uninterruptible sleep".  Related?

Comment 34 Willem de Bruijn 2018-01-03 09:35:28 UTC

Hard to say without knowing what the process is sleeping on.

If it still happens with Giancarlo's suggestion from #32 then we could exclude this bug.

Comment 35 Giancarlo Razzolini 2018-01-03 11:41:44 UTC

LimeTech, the kernel should be telling you exactly why the task is in blocked state, unless you have disabled the hung_task_timeout_secs with:

echo 0 > /proc/sys/kernel/hung_task_timeout_secs

If you have done so, revert that, and you should be able to know why. If, on the call trace you see vhost_net, then I think it is the same issue. That workaround works just fine, I have been able to shutdown/reboot several guests, with different OS'es, across 2 hosts with it.

Comment 36 LimeTech 2018-01-04 22:55:50 UTC

We have confirmed this issue seems be solved in 4.14.11.

Comment 37 bubez 2018-01-10 13:21:14 UTC

Host: ubuntu 17.10, vanilla kernel 4.14.12, nested virtualization and vhost_net workaround aplied
options kvm_intel nested=1
options vhost_net experimental_zcopytx=0

Problem: can always be reproduced on redhat/centos7.x, after about 8 hour of guest uptime, guest machine hangs

How to reproduce: boot a centos/redhat7.x guest vm (a minimal installation should be ok), and wait about 8hours, the period may vary. You can give a tail command on syslog to see some detailed message (for example tail -f /var/log/messages)

Guest kernel: 3.10.0-693.11.6.el7.x86_64

Syslog output: /var/log/messages
Jan 10 12:56:03 kvm178 dbus[756]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.freedesktop.nm-dispatcher.service'
Jan 10 12:56:03 kvm178 dhclient[911]: bound to 192.168.122.178 -- renewal in 1257 seconds.
Jan 10 12:56:28 kvm178 dbus[756]: [system] Failed to activate service 'org.freedesktop.nm_dispatcher': timed out
Jan 10 12:56:28 kvm178 dbus-daemon: dbus[756]: [system] Failed to activate service 'org.freedesktop.nm_dispatcher': timed out
Jan 10 12:58:40 kvm178 kernel: INFO: task systemd:1 blocked for more than 120 seconds.
Jan 10 12:58:40 kvm178 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 10 12:58:40 kvm178 kernel: systemd         D ffff88004d1d8000     0     1      0 0x00000000
Jan 10 12:58:40 kvm178 kernel: Call Trace:
Jan 10 12:58:40 kvm178 kernel: [<ffffffff816ab6d9>] schedule+0x29/0x70
Jan 10 12:58:40 kvm178 kernel: [<ffffffff810625bf>] kvm_async_pf_task_wait+0x1df/0x230
Jan 10 12:58:40 kvm178 kernel: [<ffffffff810b34b0>] ? wake_up_atomic_t+0x30/0x30
Jan 10 12:58:40 kvm178 kernel: [<ffffffff816afc00>] ? error_swapgs+0x61/0x18d
Jan 10 12:58:40 kvm178 kernel: [<ffffffff816afcef>] ? error_swapgs+0x150/0x18d
Jan 10 12:58:40 kvm178 kernel: [<ffffffff816b32d6>] do_async_page_fault+0x96/0xd0
Jan 10 12:58:40 kvm178 kernel: [<ffffffff816af928>] async_page_fault+0x28/0x30
Jan 10 13:00:40 kvm178 kernel: INFO: task systemd:1 blocked for more than 120 seconds.
Jan 10 13:00:40 kvm178 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 10 13:00:40 kvm178 kernel: systemd         D ffff88004d1d8000     0     1      0 0x00000000
Jan 10 13:00:40 kvm178 kernel: Call Trace:
Jan 10 13:00:40 kvm178 kernel: [<ffffffff816ab6d9>] schedule+0x29/0x70
Jan 10 13:00:40 kvm178 kernel: [<ffffffff810625bf>] kvm_async_pf_task_wait+0x1df/0x230
Jan 10 13:00:40 kvm178 kernel: [<ffffffff810b34b0>] ? wake_up_atomic_t+0x30/0x30
Jan 10 13:00:40 kvm178 kernel: [<ffffffff816afc00>] ? error_swapgs+0x61/0x18d
Jan 10 13:00:40 kvm178 kernel: [<ffffffff816afcef>] ? error_swapgs+0x150/0x18d
Jan 10 13:00:40 kvm178 kernel: [<ffffffff816b32d6>] do_async_page_fault+0x96/0xd0
Jan 10 13:00:40 kvm178 kernel: [<ffffffff816af928>] async_page_fault+0x28/0x30
Jan 10 13:01:26 kvm178 systemd-logind: Failed to start session scope session-23.scope: Connection timed out
Jan 10 13:02:40 kvm178 kernel: INFO: task systemd:1 blocked for more than 120 seconds.
Jan 10 13:02:40 kvm178 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 10 13:02:40 kvm178 kernel: systemd         D ffff88004d1d8000     0     1      0 0x00000000
Jan 10 13:02:40 kvm178 kernel: Call Trace:
Jan 10 13:02:40 kvm178 kernel: [<ffffffff816ab6d9>] schedule+0x29/0x70
Jan 10 13:02:40 kvm178 kernel: [<ffffffff810625bf>] kvm_async_pf_task_wait+0x1df/0x230
Jan 10 13:02:40 kvm178 kernel: [<ffffffff810b34b0>] ? wake_up_atomic_t+0x30/0x30
Jan 10 13:02:40 kvm178 kernel: [<ffffffff816afc00>] ? error_swapgs+0x61/0x18d
Jan 10 13:02:40 kvm178 kernel: [<ffffffff816afcef>] ? error_swapgs+0x150/0x18d
Jan 10 13:02:40 kvm178 kernel: [<ffffffff816b32d6>] do_async_page_fault+0x96/0xd0
Jan 10 13:02:40 kvm178 kernel: [<ffffffff816af928>] async_page_fault+0x28/0x30
Jan 10 13:04:40 kvm178 kernel: INFO: task systemd:1 blocked for more than 120 seconds.
Jan 10 13:04:40 kvm178 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 10 13:04:40 kvm178 kernel: systemd         D ffff88004d1d8000     0     1      0 0x00000000
Jan 10 13:04:40 kvm178 kernel: Call Trace:
Jan 10 13:04:40 kvm178 kernel: [<ffffffff816ab6d9>] schedule+0x29/0x70
Jan 10 13:04:40 kvm178 kernel: [<ffffffff810625bf>] kvm_async_pf_task_wait+0x1df/0x230
Jan 10 13:04:40 kvm178 kernel: [<ffffffff810b34b0>] ? wake_up_atomic_t+0x30/0x30
Jan 10 13:04:40 kvm178 kernel: [<ffffffff816afc00>] ? error_swapgs+0x61/0x18d
Jan 10 13:04:40 kvm178 kernel: [<ffffffff816afcef>] ? error_swapgs+0x150/0x18d
Jan 10 13:04:40 kvm178 kernel: [<ffffffff816b32d6>] do_async_page_fault+0x96/0xd0
Jan 10 13:04:40 kvm178 kernel: [<ffffffff816af928>] async_page_fault+0x28/0x30
Jan 10 13:06:40 kvm178 kernel: INFO: task systemd:1 blocked for more than 120 seconds.
Jan 10 13:06:40 kvm178 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 10 13:06:40 kvm178 kernel: systemd         D ffff88004d1d8000     0     1      0 0x00000000
Jan 10 13:06:40 kvm178 kernel: Call Trace:
Jan 10 13:06:40 kvm178 kernel: [<ffffffff816ab6d9>] schedule+0x29/0x70
Jan 10 13:06:40 kvm178 kernel: [<ffffffff810625bf>] kvm_async_pf_task_wait+0x1df/0x230
Jan 10 13:06:40 kvm178 kernel: [<ffffffff810b34b0>] ? wake_up_atomic_t+0x30/0x30
Jan 10 13:06:40 kvm178 kernel: [<ffffffff816afc00>] ? error_swapgs+0x61/0x18d
Jan 10 13:06:40 kvm178 kernel: [<ffffffff816afcef>] ? error_swapgs+0x150/0x18d
Jan 10 13:06:40 kvm178 kernel: [<ffffffff816b32d6>] do_async_page_fault+0x96/0xd0
Jan 10 13:06:40 kvm178 kernel: [<ffffffff816af928>] async_page_fault+0x28/0x30
Jan 10 13:08:40 kvm178 kernel: INFO: task systemd:1 blocked for more than 120 seconds.
Jan 10 13:08:40 kvm178 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 10 13:08:40 kvm178 kernel: systemd         D ffff88004d1d8000     0     1      0 0x00000000
Jan 10 13:08:40 kvm178 kernel: Call Trace:
Jan 10 13:08:40 kvm178 kernel: [<ffffffff816ab6d9>] schedule+0x29/0x70
Jan 10 13:08:40 kvm178 kernel: [<ffffffff810625bf>] kvm_async_pf_task_wait+0x1df/0x230
Jan 10 13:08:40 kvm178 kernel: [<ffffffff810b34b0>] ? wake_up_atomic_t+0x30/0x30
Jan 10 13:08:40 kvm178 kernel: [<ffffffff816afc00>] ? error_swapgs+0x61/0x18d
Jan 10 13:08:40 kvm178 kernel: [<ffffffff816afcef>] ? error_swapgs+0x150/0x18d
Jan 10 13:08:40 kvm178 kernel: [<ffffffff816b32d6>] do_async_page_fault+0x96/0xd0
Jan 10 13:08:40 kvm178 kernel: [<ffffffff816af928>] async_page_fault+0x28/0x30
Jan 10 13:10:40 kvm178 kernel: INFO: task systemd:1 blocked for more than 120 seconds.
Jan 10 13:10:40 kvm178 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 10 13:10:40 kvm178 kernel: systemd         D ffff88004d1d8000     0     1      0 0x00000000
Jan 10 13:10:40 kvm178 kernel: Call Trace:
Jan 10 13:10:40 kvm178 kernel: [<ffffffff816ab6d9>] schedule+0x29/0x70
Jan 10 13:10:40 kvm178 kernel: [<ffffffff810625bf>] kvm_async_pf_task_wait+0x1df/0x230
Jan 10 13:10:40 kvm178 kernel: [<ffffffff810b34b0>] ? wake_up_atomic_t+0x30/0x30
Jan 10 13:10:40 kvm178 kernel: [<ffffffff816afc00>] ? error_swapgs+0x61/0x18d
Jan 10 13:10:40 kvm178 kernel: [<ffffffff816afcef>] ? error_swapgs+0x150/0x18d
Jan 10 13:10:40 kvm178 kernel: [<ffffffff816b32d6>] do_async_page_fault+0x96/0xd0
Jan 10 13:10:40 kvm178 kernel: [<ffffffff816af928>] async_page_fault+0x28/0x30
Jan 10 13:12:40 kvm178 kernel: INFO: task systemd:1 blocked for more than 120 seconds.
Jan 10 13:12:40 kvm178 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 10 13:12:40 kvm178 kernel: systemd         D ffff88004d1d8000     0     1      0 0x00000000
Jan 10 13:12:40 kvm178 kernel: Call Trace:
Jan 10 13:12:40 kvm178 kernel: [<ffffffff816ab6d9>] schedule+0x29/0x70
Jan 10 13:12:40 kvm178 kernel: [<ffffffff810625bf>] kvm_async_pf_task_wait+0x1df/0x230
Jan 10 13:12:40 kvm178 kernel: [<ffffffff810b34b0>] ? wake_up_atomic_t+0x30/0x30
Jan 10 13:12:40 kvm178 kernel: [<ffffffff816afc00>] ? error_swapgs+0x61/0x18d
Jan 10 13:12:40 kvm178 kernel: [<ffffffff816afcef>] ? error_swapgs+0x150/0x18d
Jan 10 13:12:40 kvm178 kernel: [<ffffffff816b32d6>] do_async_page_fault+0x96/0xd0
Jan 10 13:12:40 kvm178 kernel: [<ffffffff816af928>] async_page_fault+0x28/0x30
Jan 10 13:14:40 kvm178 kernel: INFO: task systemd:1 blocked for more than 120 seconds.
Jan 10 13:14:40 kvm178 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 10 13:14:40 kvm178 kernel: systemd         D ffff88004d1d8000     0     1      0 0x00000000
Jan 10 13:14:40 kvm178 kernel: Call Trace:
Jan 10 13:14:40 kvm178 kernel: [<ffffffff816ab6d9>] schedule+0x29/0x70
Jan 10 13:14:40 kvm178 kernel: [<ffffffff810625bf>] kvm_async_pf_task_wait+0x1df/0x230
Jan 10 13:14:40 kvm178 kernel: [<ffffffff810b34b0>] ? wake_up_atomic_t+0x30/0x30
Jan 10 13:14:40 kvm178 kernel: [<ffffffff816afc00>] ? error_swapgs+0x61/0x18d
Jan 10 13:14:40 kvm178 kernel: [<ffffffff816afcef>] ? error_swapgs+0x150/0x18d
Jan 10 13:14:40 kvm178 kernel: [<ffffffff816b32d6>] do_async_page_fault+0x96/0xd0
Jan 10 13:14:40 kvm178 kernel: [<ffffffff816af928>] async_page_fault+0x28/0x30
Jan 10 13:16:40 kvm178 kernel: INFO: task systemd:1 blocked for more than 120 seconds.
Jan 10 13:16:40 kvm178 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 10 13:16:40 kvm178 kernel: systemd         D ffff88004d1d8000     0     1      0 0x00000000
Jan 10 13:16:40 kvm178 kernel: Call Trace:
Jan 10 13:16:40 kvm178 kernel: [<ffffffff816ab6d9>] schedule+0x29/0x70
Jan 10 13:16:40 kvm178 kernel: [<ffffffff810625bf>] kvm_async_pf_task_wait+0x1df/0x230
Jan 10 13:16:40 kvm178 kernel: [<ffffffff810b34b0>] ? wake_up_atomic_t+0x30/0x30
Jan 10 13:16:40 kvm178 kernel: [<ffffffff816afc00>] ? error_swapgs+0x61/0x18d
Jan 10 13:16:40 kvm178 kernel: [<ffffffff816afcef>] ? error_swapgs+0x150/0x18d
Jan 10 13:16:40 kvm178 kernel: [<ffffffff816b32d6>] do_async_page_fault+0x96/0xd0
Jan 10 13:16:40 kvm178 kernel: [<ffffffff816af928>] async_page_fault+0x28/0x30
....
guest died, guest cpu 100%, hard reset on guest needed.

Guests with redhat/centos6.x (kernel 2.6.32-696.18.7.el6.x86_64) and windows10 doesn't have problems.
Hope this could help.

Comment 38 Giancarlo Razzolini 2018-01-10 14:25:34 UTC

@bubez

4.14.12 does not require the workaround anymore. Can you remove it and test again? Also, I think this issue has another patchset to improve it further on 4.16.

Comment 39 bubez 2018-01-15 13:26:22 UTC

@Razzolini
host kernel: vanilla 4.14.13
Tried again without the
options vhost_net experimental_zcopytx=0
and without the dhcp server of libvirtd/dnsmasq
and after 5 days, all redhat7.x machines Guest kernel: 3.10.0-693.11.6.el7.x86_64 were ko again.
5 days of guests uptime is better than 8 hours, but is not enough.

Comment 40 Giancarlo Razzolini 2018-01-15 13:33:33 UTC

@bubez

As far as I know this bug doesn't kill or freeze the guests nor the host. It just prevents the guests qemu processes from being ended, they remain on D state. Last week I had a guest running for more than 4 days. But since this is not a server, I did suspend the host a few times in between. I'll have more data into servers this week.

Also, the log you sent is not the same for this bug, which is on the vhost_net module. I think you might be hitting another issue.

Comment 41 Ian Kumlien 2018-01-15 13:34:00 UTC

@bubez: I suspect that your issue is related to:
https://marc.info/?l=linux-kernel&m=151602104019357&w=2

And is not related to this issue.