I have two Synology Disk station NAS devices with NFS mounts present on Gentoo servers with the following fstab mount configuration: 10.200.1.247:/volume1/filer02-sata /mnt/filer02-sata nfs vers=4.1,tcp,rsize=32768,wsize=32768,nolock,noatime,nodiratime,hard,timeo=60,retry=6,retrans=6,nconnect=4 0 0 10.200.1.247:/volume1/filer03-sata /mnt/filer03-sata nfs vers=4.1,tcp,rsize=32768,wsize=32768,nolock,noatime,nodiratime,hard,timeo=60,retry=6,retrans=6,nconnect=4 0 0 10.200.1.246:/volume1/filer04-sata /mnt/filer04-sata nfs vers=4.1,tcp,rsize=32768,wsize=32768,nolock,noatime,nodiratime,hard,timeo=60,retry=6,retrans=6,nconnect=4 0 0 On Linux Kernel 6.3.6 these work perfectly fine. As soon as I upgrade to 6.4 (tested 6.4.7 through 6.4.11) or 6.5-rc7 NFS mounts randomly hang and block system operation with high load times eventually resulting in a system freeze. dmesg/syslog: Aug 22 18:13:49 sjc-www2 kernel: nfs: server 10.200.1.247 OK Aug 22 18:13:49 sjc-www2 kernel: nfs: server 10.200.1.247 OK Aug 22 18:13:49 sjc-www2 kernel: nfs: server 10.200.1.247 OK Aug 22 18:13:49 sjc-www2 kernel: nfs: server 10.200.1.247 OK Aug 22 18:14:35 sjc-www2 kernel: nfs: server 10.200.1.247 not responding, still trying Aug 22 18:14:35 sjc-www2 kernel: nfs: server 10.200.1.247 not responding, still trying Aug 22 18:14:35 sjc-www2 kernel: nfs: server 10.200.1.247 not responding, still trying Aug 22 18:14:35 sjc-www2 kernel: nfs: server 10.200.1.247 not responding, still trying Aug 22 18:14:35 sjc-www2 kernel: nfs: server 10.200.1.247 not responding, still trying Aug 22 18:14:35 sjc-www2 kernel: nfs: server 10.200.1.247 not responding, still trying Aug 22 18:14:35 sjc-www2 kernel: nfs: server 10.200.1.247 not responding, still trying Aug 22 18:14:35 sjc-www2 kernel: nfs: server 10.200.1.247 not responding, still trying Aug 22 18:14:35 sjc-www2 kernel: nfs: server 10.200.1.247 not responding, still trying Aug 22 18:14:35 sjc-www2 kernel: nfs: server 10.200.1.247 not responding, still trying Aug 22 18:15:23 sjc-www2 kernel: nfs: server 10.200.1.247 OK Aug 22 18:15:23 sjc-www2 kernel: nfs: server 10.200.1.247 OK Aug 22 18:15:23 sjc-www2 kernel: nfs: server 10.200.1.247 OK Aug 22 18:15:23 sjc-www2 kernel: nfs: server 10.200.1.247 OK Aug 22 18:15:23 sjc-www2 kernel: nfs: server 10.200.1.247 OK Aug 22 18:15:23 sjc-www2 kernel: nfs: server 10.200.1.247 OK Aug 22 18:15:23 sjc-www2 kernel: nfs: server 10.200.1.247 OK Aug 22 18:15:23 sjc-www2 kernel: nfs: server 10.200.1.247 OK Aug 22 18:15:23 sjc-www2 kernel: nfs: server 10.200.1.247 OK Aug 22 18:15:23 sjc-www2 kernel: nfs: server 10.200.1.247 OK Aug 22 18:16:05 sjc-www2 kernel: nfs: server 10.200.1.247 not responding, still trying Aug 22 18:16:05 sjc-www2 kernel: nfs: server 10.200.1.247 not responding, still trying Aug 22 18:16:05 sjc-www2 kernel: nfs: server 10.200.1.247 not responding, still trying Aug 22 18:16:05 sjc-www2 kernel: nfs: server 10.200.1.247 not responding, still trying Aug 22 18:16:05 sjc-www2 kernel: nfs: server 10.200.1.247 not responding, still trying Aug 22 18:16:05 sjc-www2 kernel: nfs: server 10.200.1.247 not responding, still trying Aug 22 18:16:05 sjc-www2 kernel: nfs: server 10.200.1.247 not responding, still trying Aug 22 18:16:05 sjc-www2 kernel: nfs: server 10.200.1.247 not responding, still trying Aug 22 18:16:05 sjc-www2 kernel: nfs: server 10.200.1.247 not responding, still trying Aug 22 18:16:05 sjc-www2 kernel: nfs: server 10.200.1.247 not responding, still trying Aug 22 18:16:54 sjc-www2 kernel: nfs: server 10.200.1.247 OK Aug 22 18:16:54 sjc-www2 kernel: nfs: server 10.200.1.247 OK Aug 22 18:16:54 sjc-www2 kernel: nfs: server 10.200.1.247 OK Aug 22 18:16:54 sjc-www2 kernel: nfs: server 10.200.1.247 OK Aug 22 18:16:54 sjc-www2 kernel: nfs: server 10.200.1.247 OK Aug 22 18:16:54 sjc-www2 kernel: nfs: server 10.200.1.247 OK Aug 22 18:16:54 sjc-www2 kernel: nfs: server 10.200.1.247 OK Aug 22 18:16:54 sjc-www2 kernel: nfs: server 10.200.1.247 OK Aug 22 18:16:54 sjc-www2 kernel: nfs: server 10.200.1.247 OK Aug 22 18:16:54 sjc-www2 kernel: nfs: server 10.200.1.247 OK The box in question i have been testing the kernel upgrades on has 1 x 10G NIC set with MTU 9000 for NFS volumes and i can successfully ping the nfs host with 9000 byte packets: sjc-www2 ~ # ping -4 -s 9000 10.200.1.247 PING 10.200.1.247 (10.200.1.247) 9000(9028) bytes of data. 9008 bytes from 10.200.1.247: icmp_seq=1 ttl=64 time=0.205 ms 9008 bytes from 10.200.1.247: icmp_seq=2 ttl=64 time=0.279 ms 9008 bytes from 10.200.1.247: icmp_seq=3 ttl=64 time=0.402 ms
(In reply to greg from comment #0) > I have two Synology Disk station NAS devices with NFS mounts present on > Gentoo servers with the following fstab mount configuration: > > 10.200.1.247:/volume1/filer02-sata /mnt/filer02-sata nfs > vers=4.1,tcp,rsize=32768,wsize=32768,nolock,noatime,nodiratime,hard,timeo=60, > retry=6,retrans=6,nconnect=4 0 0 > 10.200.1.247:/volume1/filer03-sata /mnt/filer03-sata nfs > vers=4.1,tcp,rsize=32768,wsize=32768,nolock,noatime,nodiratime,hard,timeo=60, > retry=6,retrans=6,nconnect=4 0 0 > 10.200.1.246:/volume1/filer04-sata /mnt/filer04-sata nfs > vers=4.1,tcp,rsize=32768,wsize=32768,nolock,noatime,nodiratime,hard,timeo=60, > retry=6,retrans=6,nconnect=4 0 0 > > > On Linux Kernel 6.3.6 these work perfectly fine. > > As soon as I upgrade to 6.4 (tested 6.4.7 through 6.4.11) or 6.5-rc7 NFS > mounts randomly hang and block system operation with high load times > eventually resulting in a system freeze. > Then can you perform bisection (see Documentation/admin-guide/bug-bisect.rst in kernel sources) to find the culprit?
(In reply to greg from comment #0) > I have two Synology Disk station NAS devices with NFS mounts present on > Gentoo servers with the following fstab mount configuration: > > On Linux Kernel 6.3.6 these work perfectly fine. > > As soon as I upgrade to 6.4 (tested 6.4.7 through 6.4.11) or 6.5-rc7 NFS > mounts randomly hang and block system operation with high load times > eventually resulting in a system freeze. Not clear from initial description: Are you changing the kernel on your Synology devices (NFS servers), or on your Gentoo servers (NFS clients) ?
This occurs with the Gentoo Servers (NFS clients). Gentoo Server running kernel 6.3 - stable As soon as I upgrade to kernel 6.4.x => 6.5-rc / linux-kernel from git I get a reproduceable high-load event and unstable NFS mout. There are no changes made to the Synology NAS units (NFS servers). I will attempt to build the linux kernel from git and use the bisect instructions but this will take time I do not currently have.
On 25/08/2023 13:06, bugzilla-daemon@kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=217815 > > --- Comment #3 from greg@greg.net.au --- > This occurs with the Gentoo Servers (NFS clients). > > Gentoo Server running kernel 6.3 - stable > As soon as I upgrade to kernel 6.4.x => 6.5-rc / linux-kernel from git I get > a > reproduceable high-load event and unstable NFS mout. > > There are no changes made to the Synology NAS units (NFS servers). > > I will attempt to build the linux kernel from git and use the bisect > instructions but this will take time I do not currently have. > FYI, you can see Documentation/admin-guide/quickly-build-trimmed-linux.rst in the kernel sources for how to build kernel with localmodconfig. Thanks.
So an update from my end. This appears to be an issue with the i40 intel nic driver and nothing to do with NFS as further testing revealed incredibly poor general network performance (tested using iperf). I am now able to run 6.4.13 stable after manually pulling down the latest i40 driver from sourceforge and compiling it as a module manually. This can probably be closed here, and i will open a ticket with the i40 driver team.