Created attachment 257275 [details] NFS debug output too long to put in line This is causing NFS to hang to the point we had to reboot. We are running NIS on Fedora 25. NFS & NSF-server just won't start. At one point there was even a "systemd: Looping too fast. Throttling execution a little" message every second or so. I tried some of the debugging options here but I'm not sure what to look for. nfs-utils x86_64 1:2.1.1-5.rc4.fc25 kernel at the time was 4.11.5-200.fc25.x86_64 systemctl status nfs-idmapd * nfs-idmapd.service - NFSv4 ID-name mapping service Loaded: loaded (/usr/lib/systemd/system/nfs-idmapd.service; static; vendor preset: disabled) Active: active (running) since Fri 2017-06-30 21:49:52 EDT; 13min ago Process: 16991 ExecStart=/usr/sbin/rpc.idmapd $RPCIDMAPDARGS (code=exited, status=0/SUCCESS) Main PID: 16992 (rpc.idmapd) Tasks: 1 (limit: 4915) CGroup: /system.slice/nfs-idmapd.service `-16992 /usr/sbin/rpc.idmapd systemctl status nfs * nfs-server.service - NFS server and services Loaded: loaded (/usr/lib/systemd/system/nfs-server.service; enabled; vendor preset: disabled) Drop-In: /run/systemd/generator/nfs-server.service.d `-order-with-mounts.conf Active: activating (start) since Fri 2017-06-30 21:26:58 EDT; 37min ago Main PID: 15798 (rpc.nfsd) Tasks: 4 (limit: 4915) CGroup: /system.slice/nfs-server.service |- 5570 /usr/sbin/rpc.nfsd 8 |- 6053 /usr/sbin/rpc.nfsd 8 |- 8032 /usr/sbin/rpc.nfsd 8 `-15798 /usr/sbin/rpc.nfsd 8 Jun 30 21:26:58 ourserver systemd[1]: Starting NFS server and services... systemctl status nfs-server * nfs-server.service - NFS server and services Loaded: loaded (/usr/lib/systemd/system/nfs-server.service; enabled; vendor preset: disabled) Drop-In: /run/systemd/generator/nfs-server.service.d `-order-with-mounts.conf Active: activating (start) since Fri 2017-06-30 21:26:58 EDT; 38min ago Main PID: 15798 (rpc.nfsd) Tasks: 4 (limit: 4915) CGroup: /system.slice/nfs-server.service |- 5570 /usr/sbin/rpc.nfsd 8 |- 6053 /usr/sbin/rpc.nfsd 8 |- 8032 /usr/sbin/rpc.nfsd 8 `-15798 /usr/sbin/rpc.nfsd 8 Jun 30 21:26:58 ourserver systemd[1]: Starting NFS server and services... From messages logs: Jun 30 21:49:52 ourserver systemd[1]: Starting NFSv4 ID-name mapping service... Jun 30 21:49:52 ourserver rpc.idmapd[16991]: rpc.idmapd: conf_reinit: open ("(null)", O_RDONLY) failed Jun 30 21:49:52 ourserver rpc.idmapd[16991]: rpc.idmapd: conf_reinit: open ("(null)", O_RDONLY) failed Jun 30 21:49:52 ourserver systemd[1]: Started NFSv4 ID-name mapping service. Logs: Jun 29 02:23:27 dsm rpc.idmapd: rpc.idmapd: conf_reinit: open ("(null)", O_RDONLY) failed Jun 29 02:23:27 dsm rpc.idmapd: rpc.idmapd: conf_reinit: open ("(null)", O_RDONLY) failed Jun 29 02:23:27 dsm kernel: nfsd: last server has exited, flushing export cache Jun 29 02:23:27 dsm systemd: Starting NFS server and services... Jun 29 02:23:27 dsm kernel: divide error: 0000 [#1] SMP Jun 29 02:23:27 dsm kernel: Modules linked in: fuse arc4 md4 nls_utf8 cifs ccm rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache cfg80211 rfkill nf_log_ipv4 nf_log_common xt_LOG xt_limit xt_multiport ipt_MASQUERADE nf_nat_masquerade_ipv4 ip6t_REJECT nf_reject_ipv6 nf_con ntrack_ipv6 iptable_nat nf_defrag_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_conntrack nf_nat nf_conntrack libcrc32c ip6table_filter ip6_tables iptable_mangle joydev coretemp kvm iTCO_wdt gpio_ich iTCO_vendor_support irqbypass ses ipmi_ssif enclosure dcdbas scsi_transport_sas i5000_edac ipmi_si edac_core lpc_ich ipmi_devintf i5k_amb ipmi_msghandler shpchp tpm_tis tpm_tis_core tpm nfsd auth_rpcgss nfs_acl lockd grace sunrpc binfmt_misc uas usb_storage amdkfd amd_iommu_v2 radeon i2c_algo_bit drm_kms_helper ttm drm ata_gene ric pata_acpi serio_raw Jun 29 02:23:27 dsm kernel: megaraid_sas bnx2 Jun 29 02:23:27 dsm kernel: CPU: 4 PID: 7781 Comm: rpc.nfsd Not tainted 4.11.5-200.fc25.x86_64 #1 Jun 29 02:23:27 dsm kernel: Hardware name: Dell Inc. PowerEdge 1950/0TT740, BIOS 2.2.6 02/05/2008 Jun 29 02:23:27 dsm kernel: task: ffff9290e5034880 task.stack: ffffa770ce0c4000 Jun 29 02:23:27 dsm kernel: RIP: 0010:svc_pool_for_cpu+0x2b/0x80 [sunrpc] Jun 29 02:23:27 dsm kernel: RSP: 0018:ffffa770ce0c7c18 EFLAGS: 00010246 Jun 29 02:23:27 dsm kernel: RAX: 0000000000000000 RBX: ffff9290286a6000 RCX: 0000000000000002 Jun 29 02:23:27 dsm kernel: RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffff928ec3d45500 Jun 29 02:23:27 dsm kernel: RBP: ffffa770ce0c7c18 R08: ffff928ec3d45528 R09: 0000000000018783 Jun 29 02:23:27 dsm kernel: R10: ffffffffc06fb100 R11: 0000000000000000 R12: ffff9290286a6010 Jun 29 02:23:27 dsm kernel: R13: ffff9290286a6018 R14: ffff928ec3d45528 R15: ffff928ec3d45500 Jun 29 02:23:27 dsm kernel: FS: 00007f0f69cacc40(0000) GS:ffff9290efd00000(0000) knlGS:0000000000000000 Jun 29 02:23:27 dsm kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jun 29 02:23:27 dsm kernel: CR2: 00007ffede0fe0a8 CR3: 00000001fb7c4000 CR4: 00000000000006e0 Jun 29 02:23:27 dsm kernel: Call Trace: Jun 29 02:23:27 dsm kernel: svc_xprt_do_enqueue+0xef/0x260 [sunrpc] Jun 29 02:23:27 dsm kernel: svc_xprt_received+0x47/0x90 [sunrpc] Jun 29 02:23:27 dsm kernel: svc_add_new_perm_xprt+0x76/0x90 [sunrpc] Jun 29 02:23:27 dsm kernel: svc_addsock+0x14b/0x200 [sunrpc] Jun 29 02:23:27 dsm kernel: ? recalc_sigpending+0x1b/0x50 Jun 29 02:23:27 dsm kernel: ? __getnstimeofday64+0x41/0xd0 Jun 29 02:23:27 dsm kernel: ? do_gettimeofday+0x29/0x90 Jun 29 02:23:27 dsm kernel: write_ports+0x255/0x2c0 [nfsd] Jun 29 02:23:27 dsm kernel: ? _copy_from_user+0x4e/0x80 Jun 29 02:23:27 dsm kernel: ? write_recoverydir+0x100/0x100 [nfsd] Jun 29 02:23:27 dsm kernel: nfsctl_transaction_write+0x48/0x80 [nfsd] Jun 29 02:23:27 dsm kernel: __vfs_write+0x37/0x160 Jun 29 02:23:27 dsm kernel: ? __inet_hash+0xd2/0x260 Jun 29 02:23:27 dsm kernel: vfs_write+0xb5/0x1a0 Jun 29 02:23:27 dsm kernel: SyS_write+0x55/0xc0 Jun 29 02:23:27 dsm kernel: entry_SYSCALL_64_fastpath+0x1a/0xa9 Jun 29 02:23:27 dsm kernel: RIP: 0033:0x7f0f695c8ae0 Jun 29 02:23:27 dsm kernel: RSP: 002b:00007ffede056ba8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 Jun 29 02:23:27 dsm kernel: RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f0f695c8ae0 Jun 29 02:23:27 dsm kernel: RDX: 0000000000000002 RSI: 00005637ab1dd600 RDI: 0000000000000003 Jun 29 02:23:27 dsm kernel: RBP: 00007ffede056ba0 R08: 0000000000000001 R09: 0000000000000002 Jun 29 02:23:27 dsm kernel: R10: 0000000000000064 R11: 0000000000000246 R12: 0000000000000004 Jun 29 02:23:27 dsm kernel: R13: 00005637abfa37a0 R14: 00005637abfa3720 R15: 00007ffede056658 Jun 29 02:23:27 dsm kernel: Code: 66 66 66 66 90 48 8b 87 98 00 00 00 55 48 89 e5 48 83 78 08 00 74 10 8b 05 97 51 02 00 83 f8 01 74 40 83 f8 02 74 19 31 c0 31 d2 <f7> b7 88 00 00 00 5d 89 d0 48 c1 e0 07 48 03 87 90 00 00 00 c3 Jun 29 02:23:27 dsm kernel: RIP: svc_pool_for_cpu+0x2b/0x80 [sunrpc] RSP: ffffa770ce0c7c18 Jun 29 02:23:27 dsm kernel: ---[ end trace 7a2bfc4aacf7a21e ]---
Looks like upstream df807fffaabd "NFSv4.x/callback: Create the callback service through svc_create_pooled" should have gotten a stable cc. Done; may take time to make it into stable tree and fedora. Open a fedora bug referencing this and it may be possible to get it into Fedora sooner. Reopen if this is still reproduceable on a kernel with that patch.
Thanks for the quick fix I did create a Bugzilla for Fedora just before this one https://bugzilla.redhat.com/show_bug.cgi?id=1466850 I would like to downgrade to a kernel that worked and I believe we'd have to go back to 4.10.15, would the instructions at https://community.arubanetworks.com/t5/Monitoring-Management-Location/Downgrading-the-linux-Kernel/ta-p/173612 work? "If the 2.6.9-11 kernel is not already installed you will need to put a copy of the rpm on the box. Once the RPM is on the box run "rpm -Uvh --force <kernel>". Then check /etc/grub.conf to make sure the 11 kernel is at the top of the list and reboot. When the server comes up it will load the 11 kernel and the memory issues should be resolved."