Bug 201655 - nfsd hangs forever sporadically on rwsem_down_read_failed
Summary: nfsd hangs forever sporadically on rwsem_down_read_failed
Status: RESOLVED UNREPRODUCIBLE
Alias: None
Product: File System
Classification: Unclassified
Component: NFS (show other bugs)
Hardware: Intel Linux
: P1 high
Assignee: bfields
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-11-10 20:59 UTC by Kevin Liu
Modified: 2022-01-21 20:14 UTC (History)
5 users (show)

See Also:
Kernel Version: 4.19, 4.19.1, 4.18.17
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Kevin Liu 2018-11-10 20:59:23 UTC
I'm on NixOS unstable 19.03, with an Intel i7-4820k with 20 GB ram and 10 GB zram (though it reproduces without zram as well). I run several dozen Docker containers through Kubernetes, each with an NFS mount inside. Randomly, however (it can take up to a day or as little as 10 minutes to occur), all reads hang as nfsd locks up:

```
Nov 10 15:29:55 rem kernel: INFO: task htop:6608 blocked for more than 120 seconds.
Nov 10 15:29:55 rem kernel:       Tainted: P           O      4.19.1 #1-NixOS
Nov 10 15:29:55 rem kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 10 15:29:55 rem kernel: htop            D    0  6608   6591 0x00000000
Nov 10 15:29:55 rem kernel: Call Trace:
Nov 10 15:29:55 rem kernel:  ? __schedule+0x1d3/0x6f0
Nov 10 15:29:55 rem kernel:  schedule+0x28/0x80
Nov 10 15:29:55 rem kernel:  rwsem_down_read_failed+0xee/0x150
Nov 10 15:29:55 rem kernel:  ? call_rwsem_down_read_failed+0x14/0x30
Nov 10 15:29:55 rem kernel:  call_rwsem_down_read_failed+0x14/0x30
Nov 10 15:29:55 rem kernel:  down_read+0x1c/0x30
Nov 10 15:29:55 rem kernel:  __access_remote_vm+0x55/0x2e0
Nov 10 15:29:55 rem kernel:  proc_pid_cmdline_read+0x1c0/0x310
Nov 10 15:29:55 rem kernel:  __vfs_read+0x36/0x170
Nov 10 15:29:55 rem kernel:  vfs_read+0x89/0x130
Nov 10 15:29:55 rem kernel:  ksys_read+0x52/0xc0
Nov 10 15:29:55 rem kernel:  do_syscall_64+0x4e/0x100
Nov 10 15:29:55 rem kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Nov 10 15:29:55 rem kernel: RIP: 0033:0x7fee3e824281
Nov 10 15:29:55 rem kernel: Code: Bad RIP value.
Nov 10 15:29:55 rem kernel: RSP: 002b:00007fff11de02a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
Nov 10 15:29:55 rem kernel: RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007fee3e824281
Nov 10 15:29:55 rem kernel: RDX: 0000000000001000 RSI: 00007fff11de0370 RDI: 0000000000000005
Nov 10 15:29:55 rem kernel: RBP: 0000000000001000 R08: 00007fee3eaeae00 R09: 000000000000001e
Nov 10 15:29:55 rem kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 00007fff11de0370
Nov 10 15:29:55 rem kernel: R13: 0000000000000000 R14: 0000000000000005 R15: 00000000008c5760
Nov 10 15:29:55 rem kernel: INFO: task nfsd:7464 blocked for more than 120 seconds.
Nov 10 15:29:55 rem kernel:       Tainted: P           O      4.19.1 #1-NixOS
Nov 10 15:29:55 rem kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 10 15:29:55 rem kernel: nfsd            D    0  7464      2 0x80000000
Nov 10 15:29:55 rem kernel: Call Trace:
Nov 10 15:29:55 rem kernel:  ? __schedule+0x1d3/0x6f0
Nov 10 15:29:55 rem kernel:  schedule+0x28/0x80
Nov 10 15:29:55 rem kernel:  rwsem_down_write_failed+0x15e/0x350
Nov 10 15:29:55 rem kernel:  ? call_rwsem_down_write_failed+0x13/0x20
Nov 10 15:29:55 rem kernel:  call_rwsem_down_write_failed+0x13/0x20
Nov 10 15:29:55 rem kernel:  down_write+0x29/0x40
Nov 10 15:29:55 rem kernel:  ext4_file_write_iter+0x91/0x3d0 [ext4]
Nov 10 15:29:55 rem kernel:  ? nfsd_proc_write+0x160/0x160 [nfsd]
Nov 10 15:29:55 rem kernel:  ? exportfs_decode_fh+0xf2/0x2b0
Nov 10 15:29:55 rem kernel:  do_iter_readv_writev+0x12a/0x1a0
Nov 10 15:29:55 rem kernel:  do_iter_write+0x80/0x190
Nov 10 15:29:55 rem kernel:  nfsd_vfs_write+0xed/0x470 [nfsd]
Nov 10 15:29:55 rem kernel:  nfsd4_write+0x18a/0x220 [nfsd]
Nov 10 15:29:55 rem kernel:  nfsd4_proc_compound+0x4f3/0x6e0 [nfsd]
Nov 10 15:29:55 rem kernel:  nfsd_dispatch+0xf5/0x230 [nfsd]
Nov 10 15:29:55 rem kernel:  svc_process_common+0x4c4/0x720 [sunrpc]
Nov 10 15:29:55 rem kernel:  ? svc_recv+0x2c2/0xa10 [sunrpc]
Nov 10 15:29:55 rem kernel:  ? nfsd_destroy+0x60/0x60 [nfsd]
Nov 10 15:29:55 rem kernel:  svc_process+0xd7/0xf0 [sunrpc]
Nov 10 15:29:55 rem kernel:  nfsd+0xe3/0x150 [nfsd]
Nov 10 15:29:55 rem kernel:  kthread+0x113/0x130
Nov 10 15:29:55 rem kernel:  ? kthread_create_worker_on_cpu+0x70/0x70
Nov 10 15:29:55 rem kernel:  ret_from_fork+0x35/0x40
```

For the time being, I have reverted to kernel 4.14.78, which does not experience the issue.
Comment 1 bfields 2018-11-11 18:31:45 UTC
Those are the only processes it reports blocked?

I'm guessing it's the inode_lock at the start of ext4_file_write_iter that's blocking.  On a quick look I don't see any of the callers taking any locks.  So I'd expect that elsewhere there'd be a process holding that inode lock and blocking on something else.

Based just on this might first guess would be a vfs or maybe ext4 bug rather than an nfsd bug, but I'm not seeing how to reassign.  May be worth reporting to the relevant mailing lists.
Comment 2 Kevin Liu 2018-11-12 04:28:39 UTC
No, there are more blocked processes; looking at htop when this happens (assuming htop itself doesn't freeze), every process reading from NFS blocks.

Thanks for the analysis! I'll try posting on the mailing lists about it.
Comment 3 Theodore Tso 2018-11-12 17:47:29 UTC
From my reply on linux-ext4:

Sorry, this isn't something I've seen before.  And it's not at all
obvious from the information in Bugzilla what's causing the deadlock.

The down_read() up appears to be in mm/memory.c, in
__access_remote_vm() getting called from proc_pid_cmdline_read().
access_Remote_vm is apparently trying to get a shared lock on
&mm->mmap_sem.  How this would get involved with the inode_lock() is
not immediately obvious.

Things I would suggest.

1) Try running your kernel console log throughn
./scripts/decode_stacktrace.sh so we can be sure we've correctly
assessed where the kernel is grabbing which lock.  Enabling
CONFIG_DEBUG_INFO and CONFIG_DEBUG_INFO_REDUCED will be helpful.

2) Try turning on CONFIG_LOCKDEP and see if this reports some
potential deadlock.

3) Try using sysrq-d to find all held locks (running the resulting
kernel console output through decode_stacktrace.sh will also eb
helpful).
Comment 4 DocMAX 2019-09-30 20:03:28 UTC
I have a (similar?) problem. Where to start debugging?:

[Mo Sep 30 21:37:47 2019] INFO: task nfsd:1557 blocked for more than 122 seconds.
[Mo Sep 30 21:37:47 2019]       Not tainted 5.2.14-artix1-1-ARTIX #1
[Mo Sep 30 21:37:47 2019] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Mo Sep 30 21:37:47 2019] nfsd            D    0  1557      2 0x80004000
[Mo Sep 30 21:37:47 2019] Call Trace:
[Mo Sep 30 21:37:47 2019]  ? __schedule+0x27f/0x6d0
[Mo Sep 30 21:37:47 2019]  schedule+0x43/0xd0
[Mo Sep 30 21:37:47 2019]  rwsem_down_write_failed+0x192/0x280
[Mo Sep 30 21:37:47 2019]  ? btrfs_lookup_fs_root+0x32/0x40 [btrfs]
[Mo Sep 30 21:37:47 2019]  ? __d_obtain_alias+0x31/0x80
[Mo Sep 30 21:37:47 2019]  down_write+0x38/0x40
[Mo Sep 30 21:37:47 2019]  btrfs_file_write_iter+0xc1/0x630 [btrfs]
[Mo Sep 30 21:37:47 2019]  do_iter_readv_writev+0x166/0x1e0
[Mo Sep 30 21:37:47 2019]  do_iter_write+0x7d/0x190
[Mo Sep 30 21:37:47 2019]  nfsd_vfs_write+0x127/0x4b0 [nfsd]
[Mo Sep 30 21:37:47 2019]  nfsd4_write+0xf1/0x230 [nfsd]
[Mo Sep 30 21:37:47 2019]  nfsd4_proc_compound+0x334/0x740 [nfsd]
[Mo Sep 30 21:37:47 2019]  nfsd_dispatch+0xc1/0x200 [nfsd]
[Mo Sep 30 21:37:47 2019]  svc_process_common.isra.0+0x3cf/0x750 [sunrpc]
[Mo Sep 30 21:37:47 2019]  ? svc_sock_secure_port+0x12/0x30 [sunrpc]
[Mo Sep 30 21:37:47 2019]  ? nfsd_svc+0x2d0/0x2d0 [nfsd]
[Mo Sep 30 21:37:47 2019]  svc_process+0xe2/0x120 [sunrpc]
[Mo Sep 30 21:37:47 2019]  nfsd+0xed/0x150 [nfsd]
[Mo Sep 30 21:37:47 2019]  kthread+0xfb/0x130
[Mo Sep 30 21:37:47 2019]  ? nfsd_destroy+0x60/0x60 [nfsd]
[Mo Sep 30 21:37:47 2019]  ? kthread_park+0x80/0x80
[Mo Sep 30 21:37:47 2019]  ret_from_fork+0x1f/0x40
Comment 5 Qball Cow 2021-01-25 09:18:43 UTC
Since +- 1 months I have been hitting this bug too under ubuntu 20.04.

Jan 25 09:11:32 o9 kernel: [  604.872233] INFO: task nfsd:894 blocked for more than 120 seconds.
Jan 25 09:11:32 o9 kernel: [  604.872273]       Not tainted 5.8.0-40-generic #45~20.04.1-Ubuntu
Jan 25 09:11:32 o9 kernel: [  604.872310] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 25 09:11:32 o9 kernel: [  604.872358] nfsd            D    0   894      2 0x00004000
Jan 25 09:11:32 o9 kernel: [  604.872360] Call Trace:
Jan 25 09:11:32 o9 kernel: [  604.872365]  __schedule+0x394/0xa60
Jan 25 09:11:32 o9 kernel: [  604.872369]  schedule+0x55/0xc0
Jan 25 09:11:32 o9 kernel: [  604.872372]  rwsem_down_write_slowpath+0x239/0x4b0
Jan 25 09:11:32 o9 kernel: [  604.872377]  down_write+0x41/0x50
Jan 25 09:11:32 o9 kernel: [  604.872385]  ovl_dir_release+0x2f/0x70 [overlay]
Jan 25 09:11:32 o9 kernel: [  604.872388]  __fput+0xe9/0x250
Jan 25 09:11:32 o9 kernel: [  604.872390]  flush_delayed_fput+0x23/0x30
Jan 25 09:11:32 o9 kernel: [  604.872407]  nfsd_file_close_inode_sync+0x15a/0x170 [nfsd]
Jan 25 09:11:32 o9 kernel: [  604.872421]  nfsd_unlink+0x256/0x2b0 [nfsd]
Jan 25 09:11:32 o9 kernel: [  604.872437]  nfsd3_proc_remove+0x74/0x110 [nfsd]
Jan 25 09:11:32 o9 kernel: [  604.872450]  nfsd_dispatch+0xe2/0x220 [nfsd]
Jan 25 09:11:32 o9 kernel: [  604.872476]  svc_process_common+0x47b/0x6f0 [sunrpc]
Jan 25 09:11:32 o9 kernel: [  604.872501]  ? svc_sock_secure_port+0x16/0x30 [sunrpc]
Jan 25 09:11:32 o9 kernel: [  604.872513]  ? nfsd_svc+0x330/0x330 [nfsd]
Jan 25 09:11:32 o9 kernel: [  604.872538]  svc_process+0xbc/0xf0 [sunrpc]
Jan 25 09:11:32 o9 kernel: [  604.872551]  nfsd+0xe8/0x150 [nfsd]
Jan 25 09:11:32 o9 kernel: [  604.872555]  kthread+0x114/0x150
Jan 25 09:11:32 o9 kernel: [  604.872568]  ? nfsd_destroy+0x60/0x60 [nfsd]
Jan 25 09:11:32 o9 kernel: [  604.872571]  ? kthread_park+0x90/0x90
Jan 25 09:11:32 o9 kernel: [  604.872575]  ret_from_fork+0x22/0x30


This box serves root file system via NFS for several small boards.
Once all nfs processes hang, I have to restart the machine to get everything running again.
Comment 6 bfields 2021-01-25 15:52:22 UTC
Comments 4 and 5 are for different filesystem operations on different filesystems, with different NFS versions.  It's possible they could all come down to the same nfsd bug somehow, but my first guess would be that they're unrelated to each other and to the originally reported bug, so I'd prefer seperate bugs.

That said, the debugging steps Ted describes in comment 3 look like good starting points for debugging all 3 reports.
Comment 7 bfields 2022-01-21 20:14:47 UTC
Assuming this is no longer an issue.  Reopen if it actually is.  (And please consider a new bug if you're in any doubt as to whether it's the same as the bug in the original description.)

Note You need to log in before you can comment on or make changes to this bug.