Bug 208807 - Problem with NFS kernel server code
Summary: Problem with NFS kernel server code
Status: RESOLVED CODE_FIX
Alias: None
Product: Networking
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All Linux
: P1 high
Assignee: Stephen Hemminger
URL:
Keywords:
: 208999 (view as bug list)
Depends on:
Blocks:
 
Reported: 2020-08-04 11:33 UTC by Robert Dinse
Modified: 2020-09-07 04:17 UTC (History)
2 users (show)

See Also:
Kernel Version: 5.8.0
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Output of dmesg while machine was dying (250.91 KB, text/plain)
2020-08-04 11:33 UTC, Robert Dinse
Details
Complete Kernel Log from Boot until Reboot (3.37 MB, text/plain)
2020-08-04 22:18 UTC, Robert Dinse
Details

Description Robert Dinse 2020-08-04 11:33:23 UTC
Created attachment 290759 [details]
Output of dmesg while machine was dying

After about an hour and a half of operating on 5.8.0, one of our NFS servers began to slow and basically ground to a halt.  Looks like a spinlock issue in
the NFS kernel server code.  I'll attach the output of dmesg while the machine was dying in a file called "spinlock".  A shame because while it was running, the performance of 5.8 was astounding.  Only the NFS servers seem to get ill, the
client machines seem to run fine on 5.8 so far at least.
Comment 1 Robert Dinse 2020-08-04 11:34:33 UTC
I marked importance as high because the machine is not usable with this flaw.
Comment 2 bfields 2020-08-04 21:21:03 UTC
So there's an nfsd thread stuck on the flc_lock in lease_get_mtime.

That doesn't tell us a lot on its own, probably it's a consequence of an earlier crash.  I think the "D" taint also means there was an earlier oops.  What's the very first backtrace in your logs?

Was this working OK before?  If so, what's the most recent kernel that worked?

What filesystem are you exporting?
Comment 3 Robert Dinse 2020-08-04 22:18:37 UTC
Created attachment 290779 [details]
Complete Kernel Log from Boot until Reboot

Because I'm really not familiar enough to know what all might be relevant, this
is the complete kern.log from boot up to shut down.  As you can see we got about
two hours out of it before it became so unresponsive that I had to drive to the
co-location facility and hard boot it into the old kernel.
Comment 4 Robert Dinse 2020-08-04 22:26:08 UTC
ps showed 4 of the 8 knfsd's stuck in a loop consuming 100% CPU.

I've attached the entire kernel kern.log from boot to shutdown (actually a
forcible reboot since it got too unresponsive for other options).

The most recent kernel that worked as a Ubuntu Canonical kernel 5.4.0-42, and
previous to that all of Canonical's 5.4.0 builds did not work on this machine
so I was running 5.3.0.

Starting in 5.5 and through 5.6 there were stability issues on my hardware
where the machine would spontaneously boot without logging anything out of
the ordinary.  I began to suspect a hardware issue but 5.3.0 was stable as
was the current 5.4.0-42 build by Canonical.

With 5.7 I experienced a different NFS problem, sometimes file systems were not
exported even though they were in the exports file, sometimes the
kernel-nfs-server would fail to start upon bootup but I could start manually
with systemctl start nfs-kernel-server.  Later I discovered if I used
exportfs -r rather than -a, it would work but still did not start reliably.

With 5.8rc releases there was problems with IP tables so they were unusable
but the official release of 5.8.0 had that problem resolved, and
the nfs-kernel-server started, and I setup a pair of virtual machines to
beat it for a few hours before we went live and they worked fine.

At first things seemed to go well, every single client mounted every single
exported file system the first time without problems.  But a couple of hours
into it it got sick and died.
Comment 5 Robert Dinse 2020-08-04 22:29:39 UTC
The file system I am exporting is an ext4 file system which is on a RAID10 and some on RAID1 partition (software raid, mdraid).  Sorry I prematurely ended the previous comment.
Comment 6 bfields 2020-08-05 00:09:19 UTC
The log has just what I needed, thanks!

The below is the interesting part.  nfsd_breaker_owns_lease is a new function added to 5.8 by my patch 28df3d1539de50 "nfsd: clients don't need to break their own delegations".  So, my fault, and should be easy to figure out....

Aug  4 01:30:35 iglulik kernel: [ 8204.364755] BUG: kernel NULL pointer dereference, address: 0000000000000000
Aug  4 01:30:35 iglulik kernel: [ 8204.364758] #PF: supervisor read access in kernel mode
Aug  4 01:30:35 iglulik kernel: [ 8204.364759] #PF: error_code(0x0000) - not-present page
Aug  4 01:30:35 iglulik kernel: [ 8204.364759] PGD 0 P4D 0
Aug  4 01:30:35 iglulik kernel: [ 8204.364761] Oops: 0000 [#1] SMP PTI
Aug  4 01:30:35 iglulik kernel: [ 8204.364762] CPU: 6 PID: 2032 Comm: nfsd Not tainted 5.8.0 #1
Aug  4 01:30:35 iglulik kernel: [ 8204.364763] Hardware name: ASUS All Series/X99-E, BIOS 2101 07/10/2019
Aug  4 01:30:35 iglulik kernel: [ 8204.364776] RIP: 0010:nfsd_breaker_owns_lease+0x30/0x40 [nfsd]
Aug  4 01:30:35 iglulik kernel: [ 8204.364777] Code: 48 89 e5 53 48 8b 5f 48 e8 bd e0 fd ff 84 c0 74 1f 65 48 8b 3c 25 c0 7b 01 00 e8 5b b2 5f f2 48 8b 53 30 48 8b 80 60 2b 00 00 <48> 39 10 0f 94 c0 5b 5d c3 0f 1f 80 00 00 00 00 0f 1f 44 00 00 40
Aug  4 01:30:35 iglulik kernel: [ 8204.364778] RSP: 0018:ffffb6e00233fb28 EFLAGS: 00010202
Aug  4 01:30:35 iglulik kernel: [ 8204.364779] RAX: 0000000000000000 RBX: ffff9d73d6d486c8 RCX: ffff9d746f360bd8
Aug  4 01:30:35 iglulik kernel: [ 8204.364779] RDX: ffff9d7b2b1c9440 RSI: ffff9d756238c510 RDI: ffff9d7b655a5f00
Aug  4 01:30:35 iglulik kernel: [ 8204.364780] RBP: ffffb6e00233fb30 R08: ffff9d7b7fbb3560 R09: ffff9d756238c510
Aug  4 01:30:35 iglulik kernel: [ 8204.364780] R10: ffff9d747cab7da8 R11: 00000000000007d0 R12: ffff9d756238c510
Aug  4 01:30:35 iglulik kernel: [ 8204.364781] R13: ffff9d756238c510 R14: ffff9d721f698818 R15: ffff9d6fff087a20
Aug  4 01:30:35 iglulik kernel: [ 8204.364782] FS:  0000000000000000(0000) GS:ffff9d7b7fb80000(0000) knlGS:0000000000000000
Aug  4 01:30:35 iglulik kernel: [ 8204.364782] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug  4 01:30:35 iglulik kernel: [ 8204.364783] CR2: 0000000000000000 CR3: 0000001f2c6c2002 CR4: 00000000003626e0
Aug  4 01:30:35 iglulik kernel: [ 8204.364784] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Aug  4 01:30:35 iglulik kernel: [ 8204.364784] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Aug  4 01:30:35 iglulik kernel: [ 8204.364785] Call Trace:
Aug  4 01:30:35 iglulik kernel: [ 8204.364789]  leases_conflict+0x2d/0xf0
Aug  4 01:30:35 iglulik kernel: [ 8204.364790]  any_leases_conflict.isra.0+0x3c/0x50
Aug  4 01:30:35 iglulik kernel: [ 8204.364791]  __break_lease+0xad/0x550
Aug  4 01:30:35 iglulik kernel: [ 8204.364793]  ? list_lru_walk_one+0x4f/0x80
Aug  4 01:30:35 iglulik kernel: [ 8204.364798]  ? nfsd_file_schedule_laundrette.part.0+0x30/0x30 [nfsd]
Aug  4 01:30:35 iglulik kernel: [ 8204.364800]  ? list_lru_walk_node+0x3c/0xd0
Aug  4 01:30:35 iglulik kernel: [ 8204.364804]  nfsd_open_break_lease.part.0+0x3f/0x50 [nfsd]
Aug  4 01:30:35 iglulik kernel: [ 8204.364808]  __nfsd_open.constprop.0+0x11c/0x190 [nfsd]
Aug  4 01:30:35 iglulik kernel: [ 8204.364812]  nfsd_open_verified+0x16/0x20 [nfsd]
Aug  4 01:30:35 iglulik kernel: [ 8204.364816]  nfsd_file_acquire+0x50f/0x780 [nfsd]
Aug  4 01:30:35 iglulik kernel: [ 8204.364820]  nfsd_read+0x5a/0x1c0 [nfsd]
Aug  4 01:30:35 iglulik kernel: [ 8204.364824]  nfsd_proc_read+0xce/0x1b0 [nfsd]
Aug  4 01:30:35 iglulik kernel: [ 8204.364826]  ? kvfree+0x28/0x30
Aug  4 01:30:35 iglulik kernel: [ 8204.364828]  ? groups_free+0xe/0x10
Aug  4 01:30:35 iglulik kernel: [ 8204.364847]  ? svcauth_unix_set_client+0x49a/0x4d0 [sunrpc]
Aug  4 01:30:35 iglulik kernel: [ 8204.364855]  nfsd_dispatch+0xe2/0x220 [nfsd]
Aug  4 01:30:35 iglulik kernel: [ 8204.364866]  svc_process_common+0x47b/0x6f0 [sunrpc]
Aug  4 01:30:35 iglulik kernel: [ 8204.364876]  ? svc_sock_secure_port+0x16/0x30 [sunrpc]
Aug  4 01:30:35 iglulik kernel: [ 8204.364883]  ? nfsd_svc+0x330/0x330 [nfsd]
Aug  4 01:30:35 iglulik kernel: [ 8204.364893]  svc_process+0xbc/0xf0 [sunrpc]
Aug  4 01:30:35 iglulik kernel: [ 8204.364897]  nfsd+0xe8/0x150 [nfsd]
Aug  4 01:30:35 iglulik kernel: [ 8204.364899]  kthread+0x112/0x150
Aug  4 01:30:35 iglulik kernel: [ 8204.364903]  ? nfsd_destroy+0x60/0x60 [nfsd]
Aug  4 01:30:35 iglulik kernel: [ 8204.364904]  ? kthread_park+0x90/0x90
Aug  4 01:30:35 iglulik kernel: [ 8204.364907]  ret_from_fork+0x22/0x30
Aug  4 01:30:35 iglulik kernel: [ 8204.364908] Modules linked in: tcp_diag inet_diag unix_diag vhost_net vhost tap vhost_iotlb xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter bpfilter rpcsec_gss_krb5 nfsv4 nfs fscache bridge stp llc binfmt_misc quota_v2 quota_tree nls_iso8859_1 intel_rapl_msr intel_rapl_common snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_seq_midi x86_pkg_temp_thermal snd_seq_midi_event intel_powerclamp snd_rawmidi snd_seq snd_seq_device snd_timer kvm_intel kvm rapl intel_cstate snd eeepc_wmi wmi_bmof soundcore asus_wmi sparse_keymap intel_wmi_thunderbolt mei_me mei mac_hid sch_fq_codel nfsd nct6775 hwmon_vid coretemp auth_rpcgss nfs_acl lockd parport_pc ppdev grace sunrpc lp parport ip_tables x_tables
Aug  4 01:30:35 iglulik kernel: [ 8204.364927]  autofs4 btrfs blake2b_generic zstd_compress raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid0 multipath linear dm_mirror dm_region_hash dm_log raid10 raid1 nouveau video crct10dif_pclmul i2c_algo_bit crc32_pclmul ttm ghash_clmulni_intel drm_kms_helper aesni_intel syscopyarea sysfillrect sysimgblt fb_sys_fops cec crypto_simd i2c_i801 cryptd mxm_wmi glue_helper i2c_smbus lpc_ich drm r8169 e1000e ahci realtek libahci wmi
Aug  4 01:30:35 iglulik kernel: [ 8204.364940] CR2: 0000000000000000
Aug  4 01:30:35 iglulik kernel: [ 8204.364941] ---[ end trace f879f54317d2c662 ]---
Comment 7 bfields 2020-08-05 00:42:09 UTC
D'oh!  It's something you need a mixture of NFSv4 and non-NFSv3 (or v2) clients to hit.  I'll do some testing and send it upstream in the next couple days.

Pretty sure this is it, but any confirmation welcome if it's easy for you to test a patch.

commit 36316aaea91f (HEAD -> TMP)
Author: J. Bruce Fields <bfields@redhat.com>
Date:   Tue Aug 4 20:24:42 2020 -0400

    nfsd: fix oops on mixed NFSv4/NFSv3 client access
    
    If an NFSv2/v3 client breaks an NFSv4 client's delegation, it will hit a
    NULL dereference in nfsd_breaker_owns_lease().
    
    Reported-by: Robert Dinse <nanook@eskimo.com>
    Signed-off-by: J. Bruce Fields <bfields@redhat.com>

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 30818b006530..f0a607c795ef 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -4597,6 +4597,8 @@ static bool nfsd_breaker_owns_lease(struct file_lock *fl)
        if (!i_am_nfsd())
                return NULL;
        rqst = kthread_data(current);
+       if (!rqst->rq_lease_breaker)
+               return NULL;
        clp = *(rqst->rq_lease_breaker);
        return dl->dl_stid.sc_client == clp;
 }
Comment 8 Robert Dinse 2020-08-05 00:45:59 UTC
That's exactly the mixture I have, one old SunOS 4.1.4 box, NFSv2, two old
Redhat 6.2 boxes, NFSv3, the rest modern Linux systems, NFSv4.2.
Comment 9 Robert Dinse 2020-08-05 05:51:55 UTC
      Only real way to test is put it up in actual traffic.  I had a couple of 
virtual machines mounting file systems generated with files containing random 
data from /dev/random and it ran for hours constant back and forth mounting and 
unmounting and it didn't explode, but I put 5.8.0 on a live machine which 
provides home directories for about 500 people and it died within just over two 
hours.  So when a kernel is ready to test I'll test on a Friday night and give 
my customers advanced warning that there may be some interruption but it is 
necessary to debug and get this stable.  I'm really super impressed with the 
work that went into the kernel, performance aside from this glitch, is awesome, 
way more than made up for the performance loss to meltdown and spectre patches.

-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-
  Eskimo North Linux Friendly Internet Access, Shell Accounts, and Hosting.
    Knowledgeable human assistance, not telephone trees or script readers.
  See our web site: http://www.eskimo.com/ (206) 812-0051 or (800) 246-6874.

On Wed, 5 Aug 2020, bugzilla-daemon@bugzilla.kernel.org wrote:

> Date: Wed, 05 Aug 2020 00:42:09 +0000
> From: bugzilla-daemon@bugzilla.kernel.org
> To: nanook@eskimo.com
> Subject: [Bug 208807] Problem with NFS kernel server code
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=208807
>
> --- Comment #7 from bfields@fieldses.org ---
> D'oh!  It's something you need a mixture of NFSv4 and non-NFSv3 (or v2)
> clients
> to hit.  I'll do some testing and send it upstream in the next couple days.
>
> Pretty sure this is it, but any confirmation welcome if it's easy for you to
> test a patch.
>
> commit 36316aaea91f (HEAD -> TMP)
> Author: J. Bruce Fields <bfields@redhat.com>
> Date:   Tue Aug 4 20:24:42 2020 -0400
>
>    nfsd: fix oops on mixed NFSv4/NFSv3 client access
>
>    If an NFSv2/v3 client breaks an NFSv4 client's delegation, it will hit a
>    NULL dereference in nfsd_breaker_owns_lease().
>
>    Reported-by: Robert Dinse <nanook@eskimo.com>
>    Signed-off-by: J. Bruce Fields <bfields@redhat.com>
>
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index 30818b006530..f0a607c795ef 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -4597,6 +4597,8 @@ static bool nfsd_breaker_owns_lease(struct file_lock
> *fl)
>        if (!i_am_nfsd())
>                return NULL;
>        rqst = kthread_data(current);
> +       if (!rqst->rq_lease_breaker)
> +               return NULL;
>        clp = *(rqst->rq_lease_breaker);
>        return dl->dl_stid.sc_client == clp;
> }
>
> -- 
> You are receiving this mail because:
> You reported the bug.
Comment 10 Robert Dinse 2020-08-07 01:11:28 UTC
If there is test code available by tomorrow afternoon then I can build a kernel and test tomorrow evening, else it will have to wait another week, can't really test during the business week.
Comment 11 bfields 2020-08-07 01:51:54 UTC
I've tested the patch above (just by reading the file from a v4 mount while touching it from a v3 mount) and can reproduce the bug and confirm the fix.  If you're comfortable applying that patch and testing it, go ahead and let us know what you find.  If you need it in a released kernel, that will take longer.
Comment 12 Robert Dinse 2020-08-07 04:51:00 UTC
     I am building a new kernel with patch applied and will reboot NFS servers onto it, assuming a successful build around 11pm Pacific time.
Comment 13 Robert Dinse 2020-08-07 08:46:34 UTC
That patch seems to have done the trick, I've been running on patched 5.8 kernel now for about three hours on three serves with a mix of v2, v3, and v4.2 nfs clients (and maybe a 4.1 in there) and so far 100% quiet, not a single oops, error, or other complaint out of the kernel,

The 5.8 kernel is definitely a major win in terms of performance.
Comment 14 bfields 2020-08-07 13:23:13 UTC
Thanks for the report!

I think this bug can be closed.  (I don't seem to be able to close it myself.)
Comment 15 Robert Dinse 2020-08-07 13:52:21 UTC
I'll give it a shot.  Kernel still running totally clean.
Comment 16 bfields 2020-08-25 19:37:53 UTC
*** Bug 208999 has been marked as a duplicate of this bug. ***
Comment 17 Robert Dinse 2020-08-25 21:11:42 UTC
Ok, said patch did the trick and I'm still running patched 5.8.0 with not a single crash or hiccough since, however, the patch still has not made it into
kernel 5.8.3, what do I/we need to do to make this happen so I don't have to
manually patch every subsequent release?
Comment 18 Salvatore Bonaccorso 2020-08-31 09:59:28 UTC
Robert, this is pending now for the next v5.8.y update:

https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/commit/?id=3f5186e8350b84d4e794839f011a91eae2d17229
Comment 19 Robert Dinse 2020-09-07 04:17:17 UTC
Since this is now in 5.8.7 and appears to be working correctly, I am now marking
this as resolved.  Thank you.

Note You need to log in before you can comment on or make changes to this bug.