Bug 196235 - Fedora 25 NFS & NFS-server w/ NIS won't start after upgrade to nfs-utils x86_64 1:2.1.1-5.rc4; kernel: divide error: 0000 [#1] SMP
Summary: Fedora 25 NFS & NFS-server w/ NIS won't start after upgrade to nfs-utils x86_...
Status: RESOLVED CODE_FIX
Alias: None
Product: File System
Classification: Unclassified
Component: NFS (show other bugs)
Hardware: x86-64 Linux
: P1 blocking
Assignee: bfields
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-07-01 03:34 UTC by RobbieTheK
Modified: 2017-07-01 17:22 UTC (History)
1 user (show)

See Also:
Kernel Version: 4.11.7-200.fc25.x86_64
Subsystem:
Regression: No
Bisected commit-id:


Attachments
NFS debug output too long to put in line (97.17 KB, text/rtf)
2017-07-01 03:34 UTC, RobbieTheK
Details

Description RobbieTheK 2017-07-01 03:34:52 UTC
Created attachment 257275 [details]
NFS debug output too long to put in line

This is causing NFS to hang to the point we had to reboot. We are running NIS on Fedora 25. NFS & NSF-server just won't start. At one point there was even a "systemd: Looping too fast. Throttling execution a little" message every second or so. I tried some of the debugging options here but I'm not sure what to look for. 

nfs-utils x86_64 1:2.1.1-5.rc4.fc25
kernel at the time was 4.11.5-200.fc25.x86_64

systemctl status nfs-idmapd
* nfs-idmapd.service - NFSv4 ID-name mapping service
   Loaded: loaded (/usr/lib/systemd/system/nfs-idmapd.service; static; vendor
preset: disabled)
   Active: active (running) since Fri 2017-06-30 21:49:52 EDT; 13min ago
  Process: 16991 ExecStart=/usr/sbin/rpc.idmapd $RPCIDMAPDARGS (code=exited,
status=0/SUCCESS)
 Main PID: 16992 (rpc.idmapd)
    Tasks: 1 (limit: 4915)
   CGroup: /system.slice/nfs-idmapd.service
           `-16992 /usr/sbin/rpc.idmapd

 systemctl status nfs       
* nfs-server.service - NFS server and services
   Loaded: loaded (/usr/lib/systemd/system/nfs-server.service; enabled; vendor
preset: disabled)
  Drop-In: /run/systemd/generator/nfs-server.service.d
           `-order-with-mounts.conf
   Active: activating (start) since Fri 2017-06-30 21:26:58 EDT; 37min ago
 Main PID: 15798 (rpc.nfsd)
    Tasks: 4 (limit: 4915)
   CGroup: /system.slice/nfs-server.service
           |- 5570 /usr/sbin/rpc.nfsd 8
           |- 6053 /usr/sbin/rpc.nfsd 8
           |- 8032 /usr/sbin/rpc.nfsd 8
           `-15798 /usr/sbin/rpc.nfsd 8

Jun 30 21:26:58 ourserver systemd[1]: Starting NFS server and services...

 systemctl status nfs-server
* nfs-server.service - NFS server and services
   Loaded: loaded (/usr/lib/systemd/system/nfs-server.service; enabled; vendor
preset: disabled)
  Drop-In: /run/systemd/generator/nfs-server.service.d
           `-order-with-mounts.conf
   Active: activating (start) since Fri 2017-06-30 21:26:58 EDT; 38min ago
 Main PID: 15798 (rpc.nfsd)
    Tasks: 4 (limit: 4915)
   CGroup: /system.slice/nfs-server.service
           |- 5570 /usr/sbin/rpc.nfsd 8
           |- 6053 /usr/sbin/rpc.nfsd 8
           |- 8032 /usr/sbin/rpc.nfsd 8
           `-15798 /usr/sbin/rpc.nfsd 8

Jun 30 21:26:58 ourserver systemd[1]: Starting NFS server and services...

From messages logs:
Jun 30 21:49:52 ourserver systemd[1]: Starting NFSv4 ID-name mapping service...
Jun 30 21:49:52 ourserver rpc.idmapd[16991]: rpc.idmapd: conf_reinit: open
("(null)", O_RDONLY) failed
Jun 30 21:49:52 ourserver rpc.idmapd[16991]: rpc.idmapd: conf_reinit: open
("(null)", O_RDONLY) failed
Jun 30 21:49:52 ourserver systemd[1]: Started NFSv4 ID-name mapping service.

Logs:
Jun 29 02:23:27 dsm rpc.idmapd: rpc.idmapd: conf_reinit: open ("(null)",
O_RDONLY) failed
Jun 29 02:23:27 dsm rpc.idmapd: rpc.idmapd: conf_reinit: open ("(null)",
O_RDONLY) failed
Jun 29 02:23:27 dsm kernel: nfsd: last server has exited, flushing export cache
Jun 29 02:23:27 dsm systemd: Starting NFS server and services...
Jun 29 02:23:27 dsm kernel: divide error: 0000 [#1] SMP
Jun 29 02:23:27 dsm kernel: Modules linked in: fuse arc4 md4 nls_utf8 cifs ccm
rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache cfg80211 rfkill nf_log_ipv4
nf_log_common xt_LOG xt_limit xt_multiport ipt_MASQUERADE
nf_nat_masquerade_ipv4 ip6t_REJECT nf_reject_ipv6 nf_con
ntrack_ipv6 iptable_nat nf_defrag_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4
nf_nat_ipv4 xt_conntrack nf_nat nf_conntrack libcrc32c ip6table_filter
ip6_tables iptable_mangle joydev coretemp kvm iTCO_wdt gpio_ich
iTCO_vendor_support irqbypass ses ipmi_ssif enclosure dcdbas 
scsi_transport_sas i5000_edac ipmi_si edac_core lpc_ich ipmi_devintf i5k_amb
ipmi_msghandler shpchp tpm_tis tpm_tis_core tpm nfsd auth_rpcgss nfs_acl lockd
grace sunrpc binfmt_misc uas usb_storage amdkfd amd_iommu_v2 radeon
i2c_algo_bit drm_kms_helper ttm drm ata_gene
ric pata_acpi serio_raw
Jun 29 02:23:27 dsm kernel: megaraid_sas bnx2
Jun 29 02:23:27 dsm kernel: CPU: 4 PID: 7781 Comm: rpc.nfsd Not tainted
4.11.5-200.fc25.x86_64 #1
Jun 29 02:23:27 dsm kernel: Hardware name: Dell Inc. PowerEdge 1950/0TT740,
BIOS 2.2.6 02/05/2008
Jun 29 02:23:27 dsm kernel: task: ffff9290e5034880 task.stack: ffffa770ce0c4000
Jun 29 02:23:27 dsm kernel: RIP: 0010:svc_pool_for_cpu+0x2b/0x80 [sunrpc]
Jun 29 02:23:27 dsm kernel: RSP: 0018:ffffa770ce0c7c18 EFLAGS: 00010246
Jun 29 02:23:27 dsm kernel: RAX: 0000000000000000 RBX: ffff9290286a6000 RCX:
0000000000000002
Jun 29 02:23:27 dsm kernel: RDX: 0000000000000000 RSI: 0000000000000004 RDI:
ffff928ec3d45500
Jun 29 02:23:27 dsm kernel: RBP: ffffa770ce0c7c18 R08: ffff928ec3d45528 R09:
0000000000018783
Jun 29 02:23:27 dsm kernel: R10: ffffffffc06fb100 R11: 0000000000000000 R12:
ffff9290286a6010
Jun 29 02:23:27 dsm kernel: R13: ffff9290286a6018 R14: ffff928ec3d45528 R15:
ffff928ec3d45500
Jun 29 02:23:27 dsm kernel: FS:  00007f0f69cacc40(0000)
GS:ffff9290efd00000(0000) knlGS:0000000000000000
Jun 29 02:23:27 dsm kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 29 02:23:27 dsm kernel: CR2: 00007ffede0fe0a8 CR3: 00000001fb7c4000 CR4:
00000000000006e0
Jun 29 02:23:27 dsm kernel: Call Trace:
Jun 29 02:23:27 dsm kernel: svc_xprt_do_enqueue+0xef/0x260 [sunrpc]
Jun 29 02:23:27 dsm kernel: svc_xprt_received+0x47/0x90 [sunrpc]
Jun 29 02:23:27 dsm kernel: svc_add_new_perm_xprt+0x76/0x90 [sunrpc]
Jun 29 02:23:27 dsm kernel: svc_addsock+0x14b/0x200 [sunrpc]
Jun 29 02:23:27 dsm kernel: ? recalc_sigpending+0x1b/0x50
Jun 29 02:23:27 dsm kernel: ? __getnstimeofday64+0x41/0xd0
Jun 29 02:23:27 dsm kernel: ? do_gettimeofday+0x29/0x90
Jun 29 02:23:27 dsm kernel: write_ports+0x255/0x2c0 [nfsd]
Jun 29 02:23:27 dsm kernel: ? _copy_from_user+0x4e/0x80
Jun 29 02:23:27 dsm kernel: ? write_recoverydir+0x100/0x100 [nfsd]
Jun 29 02:23:27 dsm kernel: nfsctl_transaction_write+0x48/0x80 [nfsd]
Jun 29 02:23:27 dsm kernel: __vfs_write+0x37/0x160
Jun 29 02:23:27 dsm kernel: ? __inet_hash+0xd2/0x260
Jun 29 02:23:27 dsm kernel: vfs_write+0xb5/0x1a0
Jun 29 02:23:27 dsm kernel: SyS_write+0x55/0xc0
Jun 29 02:23:27 dsm kernel: entry_SYSCALL_64_fastpath+0x1a/0xa9
Jun 29 02:23:27 dsm kernel: RIP: 0033:0x7f0f695c8ae0
Jun 29 02:23:27 dsm kernel: RSP: 002b:00007ffede056ba8 EFLAGS: 00000246
ORIG_RAX: 0000000000000001
Jun 29 02:23:27 dsm kernel: RAX: ffffffffffffffda RBX: 0000000000000002 RCX:
00007f0f695c8ae0
Jun 29 02:23:27 dsm kernel: RDX: 0000000000000002 RSI: 00005637ab1dd600 RDI:
0000000000000003
Jun 29 02:23:27 dsm kernel: RBP: 00007ffede056ba0 R08: 0000000000000001 R09:
0000000000000002
Jun 29 02:23:27 dsm kernel: R10: 0000000000000064 R11: 0000000000000246 R12:
0000000000000004
Jun 29 02:23:27 dsm kernel: R13: 00005637abfa37a0 R14: 00005637abfa3720 R15:
00007ffede056658
Jun 29 02:23:27 dsm kernel: Code: 66 66 66 66 90 48 8b 87 98 00 00 00 55 48 89
e5 48 83 78 08 00 74 10 8b 05 97 51 02 00 83 f8 01 74 40 83 f8 02 74 19 31 c0
31 d2 <f7> b7 88 00 00 00 5d 89 d0 48 c1 e0 07 48 03 87 90 00 00 00 c3 
Jun 29 02:23:27 dsm kernel: RIP: svc_pool_for_cpu+0x2b/0x80 [sunrpc] RSP:
ffffa770ce0c7c18
Jun 29 02:23:27 dsm kernel: ---[ end trace 7a2bfc4aacf7a21e ]---
Comment 1 bfields 2017-07-01 16:08:11 UTC
Looks like upstream df807fffaabd "NFSv4.x/callback: Create the callback service through svc_create_pooled" should have gotten a stable cc.  Done; may take time to make it into stable tree and fedora.  Open a fedora bug referencing this and it may be possible to get it into Fedora sooner.

Reopen if this is still reproduceable on a kernel with that patch.
Comment 2 RobbieTheK 2017-07-01 17:22:11 UTC
Thanks for the quick fix I did create a Bugzilla for Fedora just before this one https://bugzilla.redhat.com/show_bug.cgi?id=1466850

I would like to downgrade to a kernel that worked and I believe we'd have to go back to 4.10.15, would the instructions at https://community.arubanetworks.com/t5/Monitoring-Management-Location/Downgrading-the-linux-Kernel/ta-p/173612 work?

"If the 2.6.9-11 kernel is not already installed you will need to put a copy of the rpm on the box. Once the RPM is on the box run "rpm -Uvh --force <kernel>". Then check /etc/grub.conf to make sure the 11 kernel is at the top of the list and reboot. When the server comes up it will load the 11 kernel and the memory issues should be resolved."

Note You need to log in before you can comment on or make changes to this bug.