Bug 200379 - kernel panic in NFSv4 server on high load (1000+/sec accesses from 3 clients)
Summary: kernel panic in NFSv4 server on high load (1000+/sec accesses from 3 clients)
Status: NEW
Alias: None
Product: File System
Classification: Unclassified
Component: NFS (show other bugs)
Hardware: x86-64 Linux
: P1 high
Assignee: bfields
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-06-30 15:31 UTC by Robert Hoffmann
Modified: 2018-08-22 15:08 UTC (History)
6 users (show)

See Also:
Kernel Version: 4.17.2-200.fc28.x86_64 #1 SMP
Tree: Mainline
Regression: No


Attachments
here is the output from abrt-cli list -d (4.22 KB, application/gzip)
2018-06-30 15:31 UTC, Robert Hoffmann
Details

Description Robert Hoffmann 2018-06-30 15:31:39 UTC
Created attachment 277083 [details]
here is the output from abrt-cli list -d

Hello guys,

Linux boa 4.17.2-200.fc28.x86_64 #1 SMP Mon Jun 18 20:09:31 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Running on: 2 16 core AMD EPYC 7351s (-MCP-SMP-) arch: Zen rev.2 
Mobo: Supermicro model: H11DSU-iN v: 1.02A

Basically, I am trying to cache many small image files (avg 12k each), from 3 client machines (running FC25 32bit).  I have an NVME mounted in the server as /NVME0 which is ext4.

The clients mount the partition as follows:
mount -t nfs -o vers=4,noatime,nodiratime,norelatime,rsize=32768,wsize=32768 boa.internal:/NVME0 /boa/NVME0

As the cache fills up, it's doing approx 1000 (write+reads) per second.

The server runs "ok", but as soon as I try to do something else IN the server, i.e. copy 2 files between 2 different disks (not the NVME0)
I get this:
(from /var/log/messages)
----------------------------------------------------------------
Jun 30 13:55:08 boa kernel: general protection fault: 0000 [#13] SMP NOPTI
Jun 30 13:55:08 boa kernel: Modules linked in: xfs btrfs xor zstd_compress raid6_pq zstd_decompress xxhash xt_CHECKSUM ip6t_MASQUERADE nf_nat_masquerade_ipv6 ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ebtable_broute ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_raw iptable_security ebtable_filter ebtables ip6table_filter ip6_tables macvtap macvlan vhost_net vhost tap nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache devlink tun cfg80211 rfkill ip_set nfnetlink bridge stp llc libcrc32c amd64_edac_mod edac_mce_amd kvm_amd kvm joydev irqbypass
Jun 30 13:55:08 boa kernel: crct10dif_pclmul crc32_pclmul ghash_clmulni_intel ipmi_ssif ccp sp5100_tco k10temp i2c_piix4 ipmi_si shpchp ipmi_devintf ipmi_msghandler pinctrl_amd acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc ast drm_kms_helper mpt3sas ttm igb raid_class crc32c_intel drm scsi_transport_sas dca nvme i2c_algo_bit nvme_core [last unloaded: ip6_tables]
Jun 30 13:55:08 boa kernel: CPU: 21 PID: 2895 Comm: nfsd Tainted: G      D           4.17.2-200.fc28.x86_64 #1
Jun 30 13:55:08 boa kernel: Hardware name: Supermicro AS -2023US-TR4/H11DSU-iN, BIOS 1.1 02/07/2018
Jun 30 13:55:08 boa kernel: RIP: 0010:kmem_cache_alloc+0x82/0x1d0
Jun 30 13:55:08 boa kernel: RSP: 0018:ffffa90491ec7c60 EFLAGS: 00010206
Jun 30 13:55:08 boa kernel: RAX: 0000000000000000 RBX: 6ede0f286c1041d1 RCX: ffff937141ef3290
Jun 30 13:55:08 boa kernel: RDX: 000000000002fba2 RSI: 00000000014080c0 RDI: 000000000002c2e0
Jun 30 13:55:08 boa kernel: RBP: ffff9365debb56c0 R08: ffff9371dfa6c2e0 R09: 0000000000000004
Jun 30 13:55:08 boa kernel: R10: 0000000000000000 R11: 0000000000000032 R12: 00000000014080c0
Jun 30 13:55:08 boa kernel: R13: ffffffffc0372ac1 R14: 6ede0f286c1041d1 R15: ffff9365debb56c0
Jun 30 13:55:08 boa kernel: FS:  0000000000000000(0000) GS:ffff9371dfa40000(0000) knlGS:0000000000000000
Jun 30 13:55:08 boa kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 30 13:55:08 boa kernel: CR2: 000056433f0c3140 CR3: 0000001ad020a000 CR4: 00000000003406e0
Jun 30 13:55:08 boa kernel: Call Trace:
Jun 30 13:55:08 boa kernel: ? nfsd4_free_file_rcu+0x20/0x20 [nfsd]
Jun 30 13:55:08 boa kernel: nfs4_alloc_stid+0x21/0xa0 [nfsd]
Jun 30 13:55:08 boa kernel: nfsd4_process_open2+0x1048/0x1360 [nfsd]
Jun 30 13:55:08 boa kernel: ? nfsd_permission+0x63/0xe0 [nfsd]
Jun 30 13:55:08 boa kernel: ? fh_verify+0x17a/0x5b0 [nfsd]
Jun 30 13:55:08 boa kernel: ? nfsd4_process_open1+0x139/0x420 [nfsd]
Jun 30 13:55:08 boa kernel: nfsd4_open+0x2b1/0x6b0 [nfsd]
Jun 30 13:55:08 boa kernel: nfsd4_proc_compound+0x33e/0x640 [nfsd]
Jun 30 13:55:08 boa kernel: nfsd_dispatch+0x9e/0x210 [nfsd]
Jun 30 13:55:08 boa kernel: svc_process_common+0x46e/0x6c0 [sunrpc]
Jun 30 13:55:08 boa kernel: ? nfsd_destroy+0x50/0x50 [nfsd]
Jun 30 13:55:08 boa kernel: svc_process+0xb7/0xf0 [sunrpc]
Jun 30 13:55:08 boa kernel: nfsd+0xe3/0x140 [nfsd]
Jun 30 13:55:08 boa kernel: kthread+0x112/0x130
Jun 30 13:55:08 boa kernel: ? kthread_create_worker_on_cpu+0x70/0x70
Jun 30 13:55:08 boa kernel: ret_from_fork+0x22/0x40
Jun 30 13:55:08 boa kernel: Code: 50 08 65 4c 03 05 37 62 da 5d 49 83 78 10 00 4d 8b 30 0f 84 1b 01 00 00 4d 85 f6 0f 84 12 01 00 00 41 8b 5f 20 49 8b 3f 4c 01 f3 <48> 33 1b 49 33 9f 38 01 00 00 40 f6 c7 0f 0f 85 33 01 00 00 48 
Jun 30 13:55:08 boa kernel: RIP: kmem_cache_alloc+0x82/0x1d0 RSP: ffffa90491ec7c60
Jun 30 13:55:08 boa kernel: ---[ end trace 1c7f01c931917119 ]---
Jun 30 13:55:09 boa abrt-dump-journal-oops[1604]: abrt-dump-journal-oops: Found oopses: 1
Jun 30 13:55:09 boa abrt-dump-journal-oops[1604]: abrt-dump-journal-oops: Creating problem directories
----------------------------------------------------------------
.
.
after this, the NFS becomes unresponsive in the clients, and everything just slows down to a gerinding halt.
Comment 1 Robert Hoffmann 2018-06-30 15:35:25 UTC
Oh, I forgot, the NFSv4 server is running 32 threads.
I set /etc/sysconfig/nfs to:
RPCNFSDCOUNT=32
Comment 2 Robert Hoffmann 2018-06-30 15:37:59 UTC
Also, the clients are writing files into the NFSv4 server, using this:

int copy_file(from,to,bytes)
char	from[2048],to[2048];
size_t	bytes;
{
	int	in_fd,out_fd;

	in_fd=open(from, O_RDONLY); 
	out_fd=open(to, O_WRONLY | O_CREAT, 0777); 
	sendfile(out_fd, in_fd, NULL, bytes);
	close(in_fd);
	close(out_fd);
}
Comment 3 Robert Hoffmann 2018-06-30 18:08:00 UTC
I was reading around the net, and found something regarding SMB mounts, which gets fixed when using O_RDWR  instead of O_WRONLY
(https://github.com/syncthing/syncthing/pull/2994 )

could it have something to do with it ?
Comment 4 bfields 2018-06-30 19:49:36 UTC
(In reply to Robert Hoffmann from comment #3)
> I was reading around the net, and found something regarding SMB mounts,
> which gets fixed when using O_RDWR  instead of O_WRONLY
> (https://github.com/syncthing/syncthing/pull/2994 )
> 
> could it have something to do with it ?

I don't see any connection.
Comment 5 bfields 2018-06-30 19:55:12 UTC
"as soon as I try to do something else IN the server, i.e. copy 2 files between 2 different disks (not the NVME0)"

I'm not sure I understand what you mean by "something else IN the server".

Is this completely reproduceable?

Also, is this the *first* bug in the logs, or if you look back through them do you find anything else?

That's a strange place to crash: it's suggests the "slab" argument to nfs4_alloc_stid is bad, but all that's ever passed to those is caches that are created on server startup and destroyed on server shutdown.
Comment 6 Robert Hoffmann 2018-06-30 20:38:07 UTC
(In reply to bfields from comment #5)
> "as soon as I try to do something else IN the server, i.e. copy 2 files
> between 2 different disks (not the NVME0)"
> 
> I'm not sure I understand what you mean by "something else IN the server".
> 
> Is this completely reproduceable?
> 
> Also, is this the *first* bug in the logs, or if you look back through them
> do you find anything else?
> 
> That's a strange place to crash: it's suggests the "slab" argument to
> nfs4_alloc_stid is bad, but all that's ever passed to those is caches that
> are created on server startup and destroyed on server shutdown.

This is the first non-PEBKAC-error error since reinstall of the box from Centos7 (kernel 3.10), to FC28 (kernel 4.17).

The only thing I did which was "new" to the box, is utilize it as an NFSv4 server with a gazillion traffic, writing into an NVME. Which I thought would be a good idea.
I will test tonight to mount it as NVSv3 on the clients, and see if the error happens again.

After 10 hours of running (if it lasts that long), I will try to copy those 2 files again in the same server, from some other disk, to some third disk, to see if I can trigger the event, if it hasn't already happened.
Comment 7 Robert Hoffmann 2018-06-30 21:11:48 UTC
> 
> Is this completely reproduceable?
> 

Oh now I get it, yes, this is the second time in 24 hours, same error.
Since it's a production system, I can't just simply cause 20 reboots cuz I feel like it. So, I will just see if it happens again now mounted NFSv3, and I will let you know if it survives a few days.
Comment 8 Robert Hoffmann 2018-06-30 21:14:11 UTC
To be more precise:

grep "kernel: RIP:" /var/log/messages
Jun 30 01:02:15 boa kernel: RIP: 0010:prefetch_freepointer+0x10/0x20
Jun 30 01:02:15 boa kernel: RIP: prefetch_freepointer+0x10/0x20 RSP: ffffa23ed1657c58
Jun 30 01:02:52 boa kernel: RIP: 0010:kmem_cache_alloc+0x82/0x1d0
Jun 30 01:02:52 boa kernel: RIP: kmem_cache_alloc+0x82/0x1d0 RSP: ffffa23ed067bc60
Jun 30 01:02:52 boa kernel: RIP: 0010:kmem_cache_alloc+0x82/0x1d0
Jun 30 01:02:52 boa kernel: RIP: kmem_cache_alloc+0x82/0x1d0 RSP: ffffa23ed061bc60
Jun 30 01:02:57 boa kernel: RIP: 0010:kmem_cache_alloc+0x82/0x1d0
Jun 30 01:02:57 boa kernel: RIP: kmem_cache_alloc+0x82/0x1d0 RSP: ffffa23ed058fc60
Jun 30 01:03:08 boa kernel: RIP: 0010:kmem_cache_alloc+0x82/0x1d0
Jun 30 01:03:08 boa kernel: RIP: kmem_cache_alloc+0x82/0x1d0 RSP: ffffa23ed0587c60
Jun 30 01:03:42 boa kernel: RIP: 0010:kmem_cache_alloc+0x82/0x1d0
Jun 30 01:03:42 boa kernel: RIP: kmem_cache_alloc+0x82/0x1d0 RSP: ffffa23ed0547c60
Jun 30 01:03:43 boa kernel: RIP: 0010:kmem_cache_alloc+0x82/0x1d0
Jun 30 01:03:43 boa kernel: RIP: kmem_cache_alloc+0x82/0x1d0 RSP: ffffa23ed05bfc60
Jun 30 01:03:51 boa kernel: RIP: 0010:kmem_cache_alloc+0x82/0x1d0
Jun 30 01:03:51 boa kernel: RIP: kmem_cache_alloc+0x82/0x1d0 RSP: ffffa23ed0507c60
Jun 30 13:42:09 boa kernel: RIP: 0010:prefetch_freepointer+0x10/0x20
Jun 30 13:42:09 boa kernel: RIP: prefetch_freepointer+0x10/0x20 RSP: ffffa90492027c58
Jun 30 13:42:10 boa kernel: RIP: 0010:kmem_cache_alloc+0x82/0x1d0
Jun 30 13:42:10 boa kernel: RIP: kmem_cache_alloc+0x82/0x1d0 RSP: ffffa9049201fc60
Jun 30 13:42:53 boa kernel: RIP: 0010:kmem_cache_alloc+0x82/0x1d0
Jun 30 13:42:53 boa kernel: RIP: kmem_cache_alloc+0x82/0x1d0 RSP: ffffa9049202fc60
Jun 30 13:42:55 boa kernel: RIP: 0010:kmem_cache_alloc+0x82/0x1d0
Jun 30 13:42:55 boa kernel: RIP: kmem_cache_alloc+0x82/0x1d0 RSP: ffffa90492017c60
Jun 30 13:42:59 boa kernel: RIP: 0010:kmem_cache_alloc+0x82/0x1d0
Jun 30 13:42:59 boa kernel: RIP: kmem_cache_alloc+0x82/0x1d0 RSP: ffffa90491eefc60
Jun 30 13:43:01 boa kernel: RIP: 0010:kmem_cache_alloc+0x82/0x1d0
Jun 30 13:43:01 boa kernel: RIP: kmem_cache_alloc+0x82/0x1d0 RSP: ffffa90491ef7c60
Jun 30 13:43:28 boa kernel: RIP: 0010:kmem_cache_alloc+0x82/0x1d0
Jun 30 13:43:28 boa kernel: RIP: kmem_cache_alloc+0x82/0x1d0 RSP: ffffa90492007c60
Jun 30 13:43:30 boa kernel: RIP: 0010:kmem_cache_alloc+0x82/0x1d0
Jun 30 13:43:30 boa kernel: RIP: kmem_cache_alloc+0x82/0x1d0 RSP: ffffa90491ee7c60
Jun 30 13:44:10 boa kernel: RIP: 0010:kmem_cache_alloc+0x82/0x1d0
Jun 30 13:44:10 boa kernel: RIP: kmem_cache_alloc+0x82/0x1d0 RSP: ffffa9049200fc60
Jun 30 13:44:19 boa kernel: RIP: 0010:kmem_cache_alloc+0x82/0x1d0
Jun 30 13:44:19 boa kernel: RIP: kmem_cache_alloc+0x82/0x1d0 RSP: ffffa90491ecfc60
Jun 30 13:44:39 boa kernel: RIP: 0010:kmem_cache_alloc+0x82/0x1d0
Jun 30 13:44:39 boa kernel: RIP: kmem_cache_alloc+0x82/0x1d0 RSP: ffffa90491edfc60
Jun 30 13:45:04 boa kernel: RIP: 0010:kmem_cache_alloc+0x82/0x1d0
Jun 30 13:45:04 boa kernel: RIP: kmem_cache_alloc+0x82/0x1d0 RSP: ffffa90491ed7c60
Jun 30 13:55:08 boa kernel: RIP: 0010:kmem_cache_alloc+0x82/0x1d0
Jun 30 13:55:08 boa kernel: RIP: kmem_cache_alloc+0x82/0x1d0 RSP: ffffa90491ec7c60
Jun 30 13:56:10 boa kernel: RIP: 0010:_raw_spin_lock+0xc/0x20
Jun 30 13:56:10 boa kernel: RIP: 0033:0x7f01a0dfa0b4
Jun 30 13:56:10 boa kernel: RIP: _raw_spin_lock+0xc/0x20 RSP: ffffa904909cfcf0
Comment 9 Robert Hoffmann 2018-07-01 23:55:02 UTC
Server survived the day with same NVME0 partition mounted as NFSv3 on the clients.

The next task will be to empty the cache so that the concurrent writes can occur, which is where I think the problem will manifest itself.

And if it survives that, I will remount with NFSv4, until the error reappears.
Comment 10 Adam Williamson 2018-07-04 19:25:48 UTC
Saw what looks like the same thing on some official Fedora boxes (our openQA instance), filed an RHBZ:

https://bugzilla.redhat.com/show_bug.cgi?id=1598229

I didn't try using NFSv3 instead (we'd probably run into firewall issues), I just downgraded the boxes all back to kernel 4.16, which worked fine.
Comment 11 Adam Williamson 2018-07-04 19:28:50 UTC
Jul 04 03:54:55 openqa01.qa.fedoraproject.org kernel: general protection fault: 0000 [#1] SMP PTI
Jul 04 03:54:55 openqa01.qa.fedoraproject.org kernel: Modules linked in: rpcsec_gss_krb5 xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat bridge stp llc tun ebtable_filter ebtables nfsv3 nfs fscache binfmt_misc ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables snd_hda_codec_generic sb_edac kvm_intel snd_hda_intel snd_hda_codec kvm nf_conntrack_ipv4 nf_defrag_ipv4 irqbypass xt_conntrack snd_hda_core nf_conntrack crct10dif_pclmul crc32_pclmul snd_hwdep libcrc32c crc32c_intel snd_seq ghash_clmulni_intel snd_seq_device snd_pcm joydev snd_timer virtio_balloon snd soundcore virtio_net i2c_piix4 i6300esb nfsd nfs_acl lockd auth_rpcgss grace sunrpc qxl drm_kms_helper ttm drm serio_raw virtio_console virtio_blk ata_generic pata_acpi qemu_fw_cfg
Jul 04 03:54:55 openqa01.qa.fedoraproject.org kernel: CPU: 0 PID: 821 Comm: nfsd Not tainted 4.17.4-100.fc27.x86_64 #1
Jul 04 03:54:55 openqa01.qa.fedoraproject.org kernel: Hardware name: Red Hat KVM, BIOS 1.11.0-2.el7 04/01/2014
Jul 04 03:54:55 openqa01.qa.fedoraproject.org kernel: RIP: 0010:prefetch_freepointer+0x10/0x20
Jul 04 03:54:55 openqa01.qa.fedoraproject.org kernel: RSP: 0018:ffffbedfc3ca7c48 EFLAGS: 00010286
Jul 04 03:54:55 openqa01.qa.fedoraproject.org kernel: RAX: 0000000000000000 RBX: b687250af1fc0395 RCX: 0000000000000006
Jul 04 03:54:55 openqa01.qa.fedoraproject.org kernel: RDX: 0000000000000005 RSI: b687250af1fc0395 RDI: ffff98b125dd6280
Jul 04 03:54:55 openqa01.qa.fedoraproject.org kernel: RBP: ffff98b125dd6280 R08: ffff98b12b02b240 R09: ffff98ac36fec000
Jul 04 03:54:55 openqa01.qa.fedoraproject.org kernel: R10: ffffbedfc3ca7cb0 R11: 0000000000000000 R12: 00000000014080c0
Jul 04 03:54:55 openqa01.qa.fedoraproject.org kernel: R13: ffffffffc0322a21 R14: ffff98ac36db2cb2 R15: ffff98b125dd6280
Jul 04 03:54:55 openqa01.qa.fedoraproject.org kernel: FS:  0000000000000000(0000) GS:ffff98b12b000000(0000) knlGS:0000000000000000
Jul 04 03:54:55 openqa01.qa.fedoraproject.org kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 04 03:54:55 openqa01.qa.fedoraproject.org kernel: CR2: 0000562bb9b59000 CR3: 000000029920a006 CR4: 00000000000606f0
Jul 04 03:54:55 openqa01.qa.fedoraproject.org kernel: Call Trace:
Jul 04 03:54:55 openqa01.qa.fedoraproject.org kernel:  kmem_cache_alloc+0xb4/0x1c0
Jul 04 03:54:55 openqa01.qa.fedoraproject.org kernel:  ? nfsd4_free_file_rcu+0x20/0x20 [nfsd]
Jul 04 03:54:55 openqa01.qa.fedoraproject.org kernel:  nfs4_alloc_stid+0x21/0xa0 [nfsd]
Jul 04 03:54:55 openqa01.qa.fedoraproject.org kernel:  nfsd4_process_open2+0xb84/0x14c0 [nfsd]
Jul 04 03:54:55 openqa01.qa.fedoraproject.org kernel:  ? nfsd_permission+0x5a/0xf0 [nfsd]
Jul 04 03:54:55 openqa01.qa.fedoraproject.org kernel:  ? fh_verify+0x44b/0x600 [nfsd]
Jul 04 03:54:55 openqa01.qa.fedoraproject.org kernel:  ? nfsd4_open+0x2dd/0x700 [nfsd]
Jul 04 03:54:55 openqa01.qa.fedoraproject.org kernel:  nfsd4_open+0x2dd/0x700 [nfsd]
Jul 04 03:54:55 openqa01.qa.fedoraproject.org kernel:  nfsd4_proc_compound+0x4f9/0x6e0 [nfsd]
Jul 04 03:54:55 openqa01.qa.fedoraproject.org kernel:  nfsd_dispatch+0xf5/0x230 [nfsd]
Jul 04 03:54:55 openqa01.qa.fedoraproject.org kernel:  svc_process_common+0x4c3/0x720 [sunrpc]
Jul 04 03:54:55 openqa01.qa.fedoraproject.org kernel:  ? nfsd_destroy+0x60/0x60 [nfsd]
Jul 04 03:54:55 openqa01.qa.fedoraproject.org kernel:  svc_process+0xd7/0xf0 [sunrpc]
Jul 04 03:54:55 openqa01.qa.fedoraproject.org kernel:  nfsd+0xe3/0x150 [nfsd]
Jul 04 03:54:55 openqa01.qa.fedoraproject.org kernel:  kthread+0x113/0x130
Jul 04 03:54:55 openqa01.qa.fedoraproject.org kernel:  ? kthread_create_worker_on_cpu+0x70/0x70
Jul 04 03:54:55 openqa01.qa.fedoraproject.org kernel:  ret_from_fork+0x35/0x40
Jul 04 03:54:55 openqa01.qa.fedoraproject.org kernel: Code: 75 58 48 c7 c7 a8 21 0d a4 e8 9b 7d ea ff eb 90 90 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 48 85 f6 74 13 8b 47 20 48 01 c6 <48> 33 36 48 33 b7 38 01 00 00 0f 18 0e f3 c3 90 66 66 66 66 90 
Jul 04 03:54:55 openqa01.qa.fedoraproject.org kernel: RIP: prefetch_freepointer+0x10/0x20 RSP: ffffbedfc3ca7c48
Jul 04 03:54:55 openqa01.qa.fedoraproject.org kernel: ---[ end trace ddcc593e8aac0749 ]---
Comment 12 Robert Hoffmann 2018-07-16 18:06:34 UTC
I have tested the exact same on an Intel CPU based server, and there is no errors.

In the AMD-Epyc server, I have encountered this error 3 more times in the last 2 weeks, all with NFSv4

I have now switched all the clients to mount NFSv3, and I will post again after a few weeks, if the error has gone away on the Epyc.
Comment 13 Adam Williamson 2018-07-16 18:21:19 UTC
Note, someone mentioned to me that several NFS-related fixes landed in 4.17.4 and they might help. I haven't yet tried switching one of our affected servers to that kernel version to see if it's fixed, but I will do soon.
Comment 14 Robert Hoffmann 2018-07-17 07:42:46 UTC
I just upgraded to 4.17.5-200 , however, all my clients now remounted using NFSv3.

Also, I forgot to add, 2 weeks before the last error (Friday the 13th), which was the same as the top error I posted, using Kernel 4.17.2-200 , I also got this error:

[root@boa ~]# more /var/spool/abrt/oops-2018-06-30-13:59:07-1629-0/backtrace
BUG: unable to handle kernel paging request at 0000000000001808
PGD 0 P4D 0 
Oops: 0002 [#14] SMP NOPTI
Modules linked in: xfs btrfs xor zstd_compress raid6_pq zstd_decompress xxhash xt_CHECKSUM ip6t_MASQUERADE nf_nat_masquerade_ipv6 ipt_MASQUERADE nf_nat_masquerade_ipv
4 nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ebtable_broute ip6table_nat nf_conntrack_ipv6
 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle
 iptable_raw iptable_security ebtable_filter ebtables ip6table_filter ip6_tables macvtap macvlan vhost_net vhost tap nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fsca
che devlink tun cfg80211 rfkill ip_set nfnetlink bridge stp llc libcrc32c amd64_edac_mod edac_mce_amd kvm_amd kvm joydev irqbypass
 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel ipmi_ssif ccp sp5100_tco k10temp i2c_piix4 ipmi_si shpchp ipmi_devintf ipmi_msghandler pinctrl_amd acpi_cpufreq nfs
d auth_rpcgss nfs_acl lockd grace sunrpc ast drm_kms_helper mpt3sas ttm igb raid_class crc32c_intel drm scsi_transport_sas dca nvme i2c_algo_bit nvme_core [last unloa
ded: ip6_tables]
CPU: 58 PID: 54391 Comm: rpc.nfsd Tainted: G      D           4.17.2-200.fc28.x86_64 #1
Hardware name: Supermicro AS -2023US-TR4/H11DSU-iN, BIOS 1.1 02/07/2018
RIP: 0010:_raw_spin_lock+0xc/0x20
RSP: 0018:ffffa904909cfcf0 EFLAGS: 00010046
RAX: 0000000000000000 RBX: 0000000000001000 RCX: 0000000000000000
RDX: 0000000000000001 RSI: ffffa904909cfd18 RDI: 0000000000001808
RBP: ffff9365d8d7be00 R08: ffff9365d8d7be00 R09: 0000000000000001
R10: 0000000000000008 R11: f000000000000000 R12: ffffa904909cfd18
R13: 0000000000000001 R14: ffff9375d904bf00 R15: ffff9375d904bf04
FS:  00007f01a171c5c0(0000) GS:ffff9375dfb80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000001808 CR3: 0000001c0f9b4000 CR4: 00000000003406e0
Call Trace:
 __lock_task_sighand+0x3c/0x70
 do_send_sig_info+0x35/0x90
 svc_set_num_threads+0x107/0x1a0 [sunrpc]
 nfsd_svc+0x142/0x270 [nfsd]
 ? write_leasetime+0x70/0x70 [nfsd]
 write_threads+0x7f/0xe0 [nfsd]
 ? _copy_from_user+0x37/0x60
 ? simple_transaction_get+0x8d/0xd0
 nfsctl_transaction_write+0x45/0x80 [nfsd]
 __vfs_write+0x36/0x170
 vfs_write+0xa5/0x1a0
 ksys_write+0x4f/0xb0
 do_syscall_64+0x5b/0x160
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f01a0dfa0b4
RSP: 002b:00007fff698112f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 00005624fc7a8420 RCX: 00007f01a0dfa0b4
--More--(82%)

Also, NFS related, but different.
Comment 15 Robert Hoffmann 2018-07-17 07:47:30 UTC
Funny thing is that in the 4.17.2-200 Error, which is that spinlock, we see the line:
CPU: 58 PID: 54391 Comm: rpc.nfsd Tainted: G      D           4.17.2-200.fc28.x86_64 #1

Yet in my Friday 13 error, which was the kmem_alloc_cache error, we see:
CPU: 57 PID: 2940 Comm: nfsd Not tainted 4.17.3-200.fc28.x86_64 #1

I wonder why sometimes it is Tainted, and sometimes not.  I haven't added any kernel packages or extra VM software (this machine runs standard KVM/QEMU on 2 instances)
Comment 16 Robert Hoffmann 2018-07-17 08:00:52 UTC
Another interesting observation I have to this, is that when running the same NFSv4 action on an Intel-based server, the "nfsd" processes are evenly spread out in the "top" process list, using say 2-3%cpu each, however in the AMD-Epyc-based box, there was one nfsd process using say 15%cpu , and the rest all 1% or less, and much lower down the list.
Comment 17 Adam Williamson 2018-07-17 16:34:11 UTC
Taint flags are documented. 'G' is just informational: it means all loaded modules are GPL (which is good). 'D' means "if the kernel has died recently, i.e. there was an OOPS or BUG." - basically it means any trace is tainted if there had been *any previous trace*. Often it just means the *same* error/trace is happening over and over, so to find the untainted one you just have to scroll back to find the first occurrence.

https://www.kernel.org/doc/html/v4.15/admin-guide/tainted-kernels.html
Comment 18 Kevin Liu 2018-07-20 17:28:01 UTC
I'm also getting this issue on NixOS, running Linux 4.17.5:

Linux rem 4.17.5-NixOS #1-NixOS SMP PREEMPT Sun Jul 8 13:32:21 UTC 2018 x86_64 GNU/Linux

Jul 20 13:13:17 rem kernel: general protection fault: 0000 [#2] PREEMPT SMP PTI
Jul 20 13:13:17 rem kernel: Modules linked in: bfq tcp_diag inet_diag sctp rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache veth xt_statistic xt_nat xt_recen>
Jul 20 13:13:17 rem kernel:  xt_conntrack nf_conntrack libcrc32c xt_tcpudp ip6t_rpfilter ipt_rpfilter chash gpu_sched ip6table_raw ttm iptable_raw nls_iso8>
Jul 20 13:13:17 rem kernel:  ext4 crc32c_generic crc16 mbcache jbd2 fscrypto dm_crypt hid_logitech_hidpp hid_logitech_dj input_leds led_class hid_generic u>
Jul 20 13:13:17 rem kernel: CPU: 3 PID: 1681 Comm: nfsd Tainted: G      D W         4.17.5-ck1 #1-NixOS
Jul 20 13:13:17 rem kernel: Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./X79-UP4, BIOS F7 03/19/2014
Jul 20 13:13:17 rem kernel: RIP: 0010:kmem_cache_alloc+0x82/0x1c0
Jul 20 13:13:17 rem kernel: RSP: 0018:ffffa80244aa3c60 EFLAGS: 00010286
Jul 20 13:13:17 rem kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000000001cf803
Jul 20 13:13:17 rem kernel: RDX: 00000000001cf603 RSI: 00000000001cf603 RDI: 0000000000028660
Jul 20 13:13:17 rem kernel: RBP: 00000000014080c0 R08: ffff914a5ca28000 R09: ffff9148175a78c0
Jul 20 13:13:17 rem kernel: R10: 0000000000000044 R11: 000000000000003f R12: ffffffffc06ed7b5
Jul 20 13:13:17 rem kernel: R13: ff9148c3cb387800 R14: ffff914ad7862a80 R15: ffff914ad7862a80
Jul 20 13:13:17 rem kernel: FS:  0000000000000000(0000) GS:ffff914aefcc0000(0000) knlGS:0000000000000000
Jul 20 13:13:17 rem kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 20 13:13:17 rem kernel: CR2: 00007fe658a8f570 CR3: 000000002200a003 CR4: 00000000001606e0
Jul 20 13:13:17 rem kernel: Call Trace:
Jul 20 13:13:17 rem kernel:  ? nfsd4_free_file_rcu+0x20/0x20 [nfsd]
Jul 20 13:13:17 rem kernel:  nfs4_alloc_stid+0x25/0xd0 [nfsd]
Jul 20 13:13:17 rem kernel:  nfsd4_process_open2+0xab8/0x13f0 [nfsd]
Jul 20 13:13:17 rem kernel:  ? nfsd_permission+0x5a/0xf0 [nfsd]
Jul 20 13:13:17 rem kernel:  ? fh_verify+0x427/0x5e0 [nfsd]
Jul 20 13:13:17 rem kernel:  ? nfsd4_process_open1+0x12f/0x3d0 [nfsd]
Jul 20 13:13:17 rem kernel:  ? nfsd4_open+0x2e8/0x720 [nfsd]
Jul 20 13:13:17 rem kernel:  nfsd4_open+0x2e8/0x720 [nfsd]
Jul 20 13:13:18 rem kernel:  nfsd4_proc_compound+0x525/0x710 [nfsd]
Jul 20 13:13:18 rem kernel:  nfsd_dispatch+0xf5/0x230 [nfsd]
Jul 20 13:13:18 rem kernel:  svc_process_common+0x4ac/0x710 [sunrpc]
Jul 20 13:13:18 rem kernel:  ? nfsd_destroy+0x60/0x60 [nfsd]
Jul 20 13:13:18 rem kernel:  svc_process+0xd7/0xf0 [sunrpc]
Jul 20 13:13:18 rem kernel:  nfsd+0xed/0x160 [nfsd]
Jul 20 13:13:18 rem kernel:  kthread+0x113/0x130
Jul 20 13:13:18 rem kernel:  ? kthread_create_worker_on_cpu+0x50/0x50
Jul 20 13:13:18 rem kernel:  ret_from_fork+0x35/0x40
Jul 20 13:13:18 rem kernel: Code: 08 48 39 f2 75 e7 48 83 78 10 00 4c 8b 28 0f 84 09 01 00 00 4d 85 ed 0f 84 00 01 00 00 41 8b 46 20 49 8b 3e 48 8d 8a 00 0>
Jul 20 13:13:18 rem kernel: RIP: kmem_cache_alloc+0x82/0x1c0 RSP: ffffa80244aa3c60

I have around ~20 NFSv4.2 mounts with heavy disk IO on several of them.

Running an Intel i7-4820k.
Comment 19 Robert Hoffmann 2018-07-21 08:09:11 UTC
Thanks Adam for your explanation.

Well, I can successfully say that a temporary mitigation technique is to remount the file systems on the clients using -o vers=3  , this seems to work fine, as I am already a few days up without any problems.  It maybe slower than NFSv4, due to the discrete RPC reconnections, but it does the job.

On another note, I have done a lot of TCP programming, and you always need to assume that the network is broken, if you don't assume that, then you got yourself a problem (like all persistent database connections for example, a bad idea). So maybe NFSv3 will be my preferred choice from now on.
Comment 20 Kevin Liu 2018-07-21 11:37:34 UTC
I do think the issue is somewhere in 4.17 - I've switched to running on 4.18-rc4 for a while now, and interestingly enough, I haven't seen the bug occur.
Comment 21 H.J. Lu 2018-07-31 12:49:23 UTC
I have a light workload which can trigger kernel NFS oops in less than 10
minutes:

4.17.11-200.fc28.x86_64 still failed:

[   81.222513] general protection fault: 0000 [#1] SMP PTI
[   81.222570] Modules linked in: devlink ebtable_filter ebtables ip6table_filter ip6_tables intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic kvm snd_hda_intel snd_hda_codec mei_wdt irqbypass crct10dif_pclmul gpio_ich iTCO_wdt crc32_pclmul iTCO_vendor_support ppdev snd_hda_core snd_hwdep snd_seq ghash_clmulni_intel intel_cstate snd_seq_device intel_uncore snd_pcm intel_rapl_perf joydev snd_timer snd i2c_i801 mei_me lpc_ich soundcore shpchp parport_pc mei parport pcc_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc i915 i2c_algo_bit drm_kms_helper r8169 drm crc32c_intel mii video
[   81.223069] CPU: 2 PID: 1047 Comm: nfsd Not tainted 4.17.11-200.fc28.x86_64 #1
[   81.223130] Hardware name: Gigabyte Technology Co., Ltd. H87M-D3H/H87M-D3H, BIOS F11 08/18/2015
[   81.223207] RIP: 0010:prefetch_freepointer+0x10/0x20
[   81.223250] RSP: 0018:ffffa73043d33c58 EFLAGS: 00010286
[   81.223296] RAX: 0000000000000000 RBX: e093175f59a73d71 RCX: 0000000000000037
[   81.223354] RDX: 0000000000000036 RSI: e093175f59a73d71 RDI: ffff9689f3bb3380
[   81.223413] RBP: ffff9689f3bb3380 R08: ffffc7303fc91960 R09: 0000000000000004
[   81.223471] R10: 0000000000000000 R11: 0000000000000000 R12: 00000000014080c0
[   81.223529] R13: ffffffffc05e2ac1 R14: ffff9689e41815c1 R15: ffff9689f3bb3380
[   81.223589] FS:  0000000000000000(0000) GS:ffff968a1e280000(0000) knlGS:0000000000000000
[   81.223655] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   81.223703] CR2: 00007f4cec584160 CR3: 000000058820a005 CR4: 00000000001606e0
[   81.223762] Call Trace:
[   81.223791]  kmem_cache_alloc+0xb4/0x1d0
[   81.223855]  ? nfsd4_free_file_rcu+0x20/0x20 [nfsd]
[   81.223916]  nfs4_alloc_stid+0x21/0xa0 [nfsd]
[   81.223973]  nfsd4_process_open2+0x1048/0x1360 [nfsd]
[   81.224031]  ? nfsd_permission+0x63/0xe0 [nfsd]
[   81.224083]  ? fh_verify+0x17a/0x5b0 [nfsd]
[   81.224137]  ? nfsd4_process_open1+0x139/0x420 [nfsd]
[   81.224196]  nfsd4_open+0x2b1/0x6b0 [nfsd]
[   81.224249]  nfsd4_proc_compound+0x33e/0x640 [nfsd]
[   81.224304]  nfsd_dispatch+0x9e/0x210 [nfsd]
[   81.224371]  svc_process_common+0x46e/0x6c0 [sunrpc]
[   81.224438]  ? nfsd_destroy+0x50/0x50 [nfsd]
[   81.224508]  svc_process+0xb7/0xf0 [sunrpc]
[   81.224566]  nfsd+0xe3/0x140 [nfsd]
[   81.224609]  kthread+0x112/0x130
[   81.224649]  ? kthread_create_worker_on_cpu+0x70/0x70
[   81.224704]  ret_from_fork+0x35/0x40
[   81.224745] Code: b7 89 d3 e8 13 fa 67 00 85 c0 0f 85 c1 77 00 00 48 83 c4 08 5b 5d 41 5c 41 5d c3 0f 1f 44 00 00 48 85 f6 74 13 8b 47 20 48 01 c6 <48> 33 36 48 33 b7 38 01 00 00 0f 18 0e c3 66 90 0f 1f 44 00 00 
[   81.224984] RIP: prefetch_freepointer+0x10/0x20 RSP: ffffa73043d33c58
[   81.225076] ---[ end trace a1b3921de0d50156 ]---
Comment 22 H.J. Lu 2018-08-01 16:36:34 UTC
I tested 2 backports from 4.18.0-rc on 4.17.11:

commit 692ad280bff3e81721ab138b9455948ab5289acf
Author: Andrew Elble <aweits@rit.edu>
Date:   Wed Apr 18 17:04:37 2018 -0400

    nfsd: fix error handling in nfs4_set_delegation()

and

commit 3171822fdcdd6e6d536047c425af6dc7a92dc585
Author: Scott Mayhew <smayhew@redhat.com>
Date:   Fri Jun 8 16:31:46 2018 -0400

    nfsd: fix potential use-after-free in nfsd4_decode_getdeviceinfo

They survived my workload.
Comment 23 H.J. Lu 2018-08-22 15:08:52 UTC
This has been fixed on 4.17 branch.

Note You need to log in before you can comment on or make changes to this bug.