NFSv4 mounts over openvpn over ethernet sometimes lock up. kernel-3.9.9-201.fc18.x86_64 [<ffffffffa000a559>] rpc_wait_bit_killable+0x39/0x90 [sunrpc] [<ffffffffa000a51d>] __rpc_wait_for_completion_task+0x2d/0x30 [sunrpc] [<ffffffffa077bdfa>] nfs_initiate_commit+0xea/0x110 [nfs] [<ffffffffa077dbcb>] nfs_generic_commit_list+0x8b/0xf0 [nfs] [<ffffffffa077dd25>] nfs_commit_inode+0xf5/0x160 [nfs] [<ffffffffa076f0ee>] nfs_release_page+0x8e/0xb0 [nfs] [<ffffffff81133a12>] try_to_release_page+0x32/0x50 [<ffffffff8114768f>] shrink_page_list+0x5ff/0x990 [<ffffffff81148027>] shrink_inactive_list+0x157/0x410 [<ffffffff81148869>] shrink_lruvec+0x229/0x4e0 [<ffffffff81148b86>] shrink_zone+0x66/0x180 [<ffffffff81149010>] do_try_to_free_pages+0x100/0x620 [<ffffffff81149838>] try_to_free_pages+0xf8/0x180 [<ffffffff8113d805>] __alloc_pages_nodemask+0x6a5/0xae0 [<ffffffff8117c308>] alloc_pages_current+0xb8/0x190 [<ffffffff811862e0>] new_slab+0x2d0/0x3a0 [<ffffffff8165aeaf>] __slab_alloc+0x393/0x560 [<ffffffff8118a62f>] __kmalloc_node_track_caller+0xbf/0x2a0 [<ffffffff815454dc>] __kmalloc_reserve.isra.51+0x3c/0xa0 [<ffffffff81547a6c>] __alloc_skb+0x7c/0x290 [<ffffffff81540221>] sock_alloc_send_pskb+0x1d1/0x350 [<ffffffff815403b5>] sock_alloc_send_skb+0x15/0x20 [<ffffffff8158eb38>] __ip_append_data.isra.38+0x5c8/0xa00 [<ffffffff8159115d>] ip_make_skb+0xdd/0x120 [<ffffffff815b8d3b>] udp_sendmsg+0x2db/0xa10 [<ffffffff815c3e63>] inet_sendmsg+0x63/0xb0 [<ffffffff8153aa30>] sock_sendmsg+0xb0/0xe0 [<ffffffff8153df5d>] sys_sendto+0x12d/0x180 [<ffffffff8166afd9>] system_call_fastpath+0x16/0x1b [<ffffffffffffffff>] 0xffffffffffffffff Fedora Bug analysis by Jeff Layton: https://bugzilla.redhat.com/show_bug.cgi?id=834808#c16 ------------------------------------------------------------------------------- I simply don't see a way to reasonably fix this. The core of the problem is a disconnect between the daemon handling your VPN and the NFS subsystem. If the kernel somehow knew that it shouldn't try to flush NFS pages in order to handle memory allocations for this particular daemon, then that would work, but there isn't a way to do that as far as I can tell. Another idea might be to never allow the nfs_releasepage to block, but that might cause other problems. -------------------------------------------------------------------------------
...and even if we don't allow releasepage to block here we could still end up in direct reclaim here AFAICT. Note that this sort of deadlock is not specific to VPN daemons either. Any time the kernel has to rely on a userland program to talk to the NFS server, and that userland program can trigger a memory allocation you can end up in a similar situation.
As Jeff says: there is no reasonable solution for this kind of bug. There will always be a danger of deadlocks when using openvpn. Closing as "will_not_fix".