Bug 60591 - nfs over openvpn locks up
Summary: nfs over openvpn locks up
Status: CLOSED WILL_NOT_FIX
Alias: None
Product: File System
Classification: Unclassified
Component: NFS (show other bugs)
Hardware: x86-64 Linux
: P1 low
Assignee: Trond Myklebust
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-07-20 12:01 UTC by Jan Kratochvil
Modified: 2014-01-13 19:12 UTC (History)
1 user (show)

See Also:
Kernel Version: 3.9.9-201.fc18
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Jan Kratochvil 2013-07-20 12:01:21 UTC
NFSv4 mounts over openvpn over ethernet sometimes lock up.

kernel-3.9.9-201.fc18.x86_64

[<ffffffffa000a559>] rpc_wait_bit_killable+0x39/0x90 [sunrpc]
[<ffffffffa000a51d>] __rpc_wait_for_completion_task+0x2d/0x30 [sunrpc]
[<ffffffffa077bdfa>] nfs_initiate_commit+0xea/0x110 [nfs]
[<ffffffffa077dbcb>] nfs_generic_commit_list+0x8b/0xf0 [nfs]
[<ffffffffa077dd25>] nfs_commit_inode+0xf5/0x160 [nfs]
[<ffffffffa076f0ee>] nfs_release_page+0x8e/0xb0 [nfs]
[<ffffffff81133a12>] try_to_release_page+0x32/0x50
[<ffffffff8114768f>] shrink_page_list+0x5ff/0x990
[<ffffffff81148027>] shrink_inactive_list+0x157/0x410
[<ffffffff81148869>] shrink_lruvec+0x229/0x4e0
[<ffffffff81148b86>] shrink_zone+0x66/0x180
[<ffffffff81149010>] do_try_to_free_pages+0x100/0x620
[<ffffffff81149838>] try_to_free_pages+0xf8/0x180
[<ffffffff8113d805>] __alloc_pages_nodemask+0x6a5/0xae0
[<ffffffff8117c308>] alloc_pages_current+0xb8/0x190
[<ffffffff811862e0>] new_slab+0x2d0/0x3a0
[<ffffffff8165aeaf>] __slab_alloc+0x393/0x560
[<ffffffff8118a62f>] __kmalloc_node_track_caller+0xbf/0x2a0
[<ffffffff815454dc>] __kmalloc_reserve.isra.51+0x3c/0xa0
[<ffffffff81547a6c>] __alloc_skb+0x7c/0x290
[<ffffffff81540221>] sock_alloc_send_pskb+0x1d1/0x350
[<ffffffff815403b5>] sock_alloc_send_skb+0x15/0x20
[<ffffffff8158eb38>] __ip_append_data.isra.38+0x5c8/0xa00
[<ffffffff8159115d>] ip_make_skb+0xdd/0x120
[<ffffffff815b8d3b>] udp_sendmsg+0x2db/0xa10
[<ffffffff815c3e63>] inet_sendmsg+0x63/0xb0
[<ffffffff8153aa30>] sock_sendmsg+0xb0/0xe0
[<ffffffff8153df5d>] sys_sendto+0x12d/0x180
[<ffffffff8166afd9>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff

Fedora Bug analysis by Jeff Layton:
https://bugzilla.redhat.com/show_bug.cgi?id=834808#c16
-------------------------------------------------------------------------------
I simply don't see a way to reasonably fix this. The core of the problem is a disconnect between the daemon handling your VPN and the NFS subsystem.

If the kernel somehow knew that it shouldn't try to flush NFS pages in order to handle memory allocations for this particular daemon, then that would work, but there isn't a way to do that as far as I can tell. Another idea might be to never allow the nfs_releasepage to block, but that might cause other problems.
-------------------------------------------------------------------------------
Comment 1 Jeff Layton 2013-07-20 12:22:50 UTC
...and even if we don't allow releasepage to block here we could still end up in direct reclaim here AFAICT.

Note that this sort of deadlock is not specific to VPN daemons either. Any time the kernel has to rely on a userland program to talk to the NFS server, and that userland program can trigger a memory allocation you can end up in a similar situation.
Comment 2 Trond Myklebust 2014-01-13 19:12:47 UTC
As Jeff says: there is no reasonable solution for this kind of bug. There
will always be a danger of deadlocks when using openvpn.

Closing as "will_not_fix".

Note You need to log in before you can comment on or make changes to this bug.