Trying to suspend fails to a hanging nfsv4.1-svc: [ 242.963957] PM: Syncing filesystems ... done. [ 243.535088] PM: Preparing system for mem sleep [ 243.730776] Freezing user space processes ... (elapsed 0.002 seconds) done. [ 243.733034] Freezing remaining freezable tasks ... [ 263.736906] Freezing of tasks failed after 20.007 seconds (1 tasks refusing to freeze, wq_busy=0): [ 263.737154] nfsv4.1-svc D ffff8800a0fd7d78 0 951 2 0x00000000 [ 263.737165] ffff8800a0fd7d78 ffff8800a0f66d50 0000000000013f00 ffff8800a0fd7fd8 [ 263.737174] 0000000000013f00 ffff88012af86d50 ffff8800a0f66d50 ffffffffa0298aa0 [ 263.737181] ffff8800a0d70000 ffffffffa0298aa0 0000000000000000 ffff8800a0fd7cd8 [ 263.737188] Call Trace: [ 263.737239] [<ffffffffa0298aa0>] ? rpc_destroy_wait_queue+0x20/0x20 [sunrpc] [ 263.737262] [<ffffffffa0298aa0>] ? rpc_destroy_wait_queue+0x20/0x20 [sunrpc] [ 263.737282] [<ffffffffa0298654>] ? rpc_release_resources_task+0x34/0x40 [sunrpc] [ 263.737301] [<ffffffffa02997b8>] ? __rpc_execute+0x2c8/0x4a0 [sunrpc] [ 263.737313] [<ffffffff810db9bb>] ? lock_timer_base.isra.37+0x2b/0x50 [ 263.737323] [<ffffffff8155e999>] schedule+0x29/0x70 [ 263.737332] [<ffffffff8156144b>] schedule_timeout+0x11b/0x250 [ 263.737340] [<ffffffff810db360>] ? migrate_timer_list+0xd0/0xd0 [ 263.737362] [<ffffffffa083e1ba>] nfs41_callback_svc+0x1aa/0x1e0 [nfsv4] [ 263.737369] [<ffffffff810b4380>] ? wait_woken+0x90/0x90 [ 263.737387] [<ffffffffa083e010>] ? nfs4_callback_svc+0x60/0x60 [nfsv4] [ 263.737396] [<ffffffff81091748>] kthread+0xd8/0xf0 [ 263.737404] [<ffffffff81091670>] ? kthread_create_on_node+0x1c0/0x1c0 [ 263.737412] [<ffffffff81562758>] ret_from_fork+0x58/0x90 [ 263.737419] [<ffffffff81091670>] ? kthread_create_on_node+0x1c0/0x1c0 The issue also occurs with 4.0rc5 and didn't occur with 3.19.1. Maybe it's related to 0c0f2544c2355899dcf62821ed0b230b60b79c22
Also affected: 3.19.3, 3.14.36 / 3.14.37.
According to https://forum.manjaro.org/index.php?topic=21508.0 downgrading nfs-utils to 1.3.1-1 fixes the problem.
I too have encountered this issue, but noticed it not when attempting a suspend, but merely by the fact that my load averages at idle are now 1.00 1.00 1.00. I also noticed nfsv4.1-svc stuck in D state: PID TTY STAT TIME MAJFL TRS DRS RSS %MEM COMMAND 2267 ? D 0:00 0 0 0 0 0.0 [nfsv4.1-svc] It does indeed seem to be related to the aforementioned commit. I reverted that patch against 3.19.3 and it resolved the issue. It's still present in 4.0-rc6. I can also confirm that downgrading nfs-utils to 1.3.1 (from 1.3.2) also resolves the issue.
As this issue might be causing my issues suspending/hibernating while NFS mounts exist I'm trying to understand the logic of the suspected commit. Is it valid to call schedule_timeout (with any value) before calling finish_wait; which is what I presume is closing the critical section? Previously bjo@nord-west.org 2015-03-25 11:30:54 UTC referenced: http://kernel.suse.com/cgit/kernel/commit/?h=stable&id=0c0f2544c2355899dcf62821ed0b230b60b79c22 This patch is trying to convert the logic from incorrectly flagged interruptable operations to an un-interruptable logic, therefore I cannot assume that the ordering of operations it modifies is also valid. } else { spin_unlock_bh(&serv->sv_cb_lock); - schedule(); + /* schedule_timeout to game the hung task watchdog */ + schedule_timeout(60 * HZ); + finish_wait(&serv->sv_cb_waitq, &wq); } - finish_wait(&serv->sv_cb_waitq, &wq);
Setting the value Defaultvers=4.0 in /etc/nfsmount.conf as recommended here: https://bugs.archlinux.org/task/44323#comment134552 also solves this problem for me (with nfs-utils 1.3.2).
(In reply to Mikkel Oscar Lyderik from comment #5) > Setting the value Defaultvers=4.0 in /etc/nfsmount.conf as recommended here: > https://bugs.archlinux.org/task/44323#comment134552 also solves this problem > for me (with nfs-utils 1.3.2). That is a *workaround* not a solution.
Was this fixed by 5d05e54af3cd "nfs: fix high load average due to callback thread sleeping" (went into 4.1-rc1)?