Bug 208451 - NFS server occasionally spontaneously reboots when client mounts exported directory
Summary: NFS server occasionally spontaneously reboots when client mounts exported dir...
Status: RESOLVED CODE_FIX
Alias: None
Product: Networking
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Stephen Hemminger
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-07-06 02:11 UTC by Robert Dinse
Modified: 2020-08-09 01:28 UTC (History)
1 user (show)

See Also:
Kernel Version: 5.7.7
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Robert Dinse 2020-07-06 02:11:42 UTC
This may be related to bug #208157.  From 5.7.0 through 5.7.4, nfs-server
would not start upon boot on one of my servers.

     With 5.7.7 this was resolved BUT now when I reboot one of the NFS clients
or unmount and remount an NFS partition on the client, the NFS server will
sometimes spontaneously reboot.

     I get these messages in /var/log/dmesg.0:

     (The old dmesg log I assume is relevant since it would be the active one
before the last boot).

[   40.302192] systemd[1]: Mounting NFSD configuration filesystem...
[   40.688313] kernel: RPC: Registered tcp NFSv4.1 backchannel transport module.
[   69.899630] kernel: NFSD: Using UMH upcall client tracking operations.
[   69.899635] kernel: NFSD: starting 90-second grace period (net f00000a8)

     After the NFS server reboots I see these NFS related messages in dmesg:

[   53.810062] systemd[1]: Mounting NFSD configuration filesystem...
[   54.254326] RPC: Registered tcp NFSv4.1 backchannel transport module.
[  106.468779] NFSD: Using UMH upcall client tracking operations.
[  106.468781] NFSD: starting 90-second grace period (net f00000a8)
[  107.631713] NFS: Registering the id_resolver key type
[  110.815312] NFS4: Couldn't follow remote path
[  113.935404] NFS4: Couldn't follow remote path
[  117.055421] NFS4: Couldn't follow remote path
[  120.175488] NFS4: Couldn't follow remote path
[  123.295611] NFS4: Couldn't follow remote path
[  126.415625] NFS4: Couldn't follow remote path
[  129.545752] NFS4: Couldn't follow remote path
[  132.655844] NFS4: Couldn't follow remote path

     So pretty much the same thing except for the "NFS4: Couldn't follow remote path" messages which I've read are caused by old nfs-utils not using the new
system calls so probably not relevant.
Comment 1 bfields 2020-07-07 14:31:05 UTC
(In reply to Robert Dinse from comment #0)
> This may be related to bug #208157.  From 5.7.0 through 5.7.4, nfs-server
> would not start upon boot on one of my servers.
> 
>      With 5.7.7 this was resolved BUT now when I reboot one of the NFS
> clients
> or unmount and remount an NFS partition on the client, the NFS server will
> sometimes spontaneously reboot.

Well, that's not good.  Too bad the dmesg has nothing interesting in it.  Can you capture console output to see if there are messages that aren't making it to disk before the reboot?

Do you have CONFIG_PANIC_ON_OOPS set?

Is it possible for you to build kernels between 5.7.4 and 5.7.7 to figure out where exactly the server started crashing?
Comment 2 Robert Dinse 2020-07-07 18:40:50 UTC
On Tue, 7 Jul 2020, bugzilla-daemon@bugzilla.kernel.org wrote:
>
> Well, that's not good.  Too bad the dmesg has nothing interesting in it.  Can
> you capture console output to see if there are messages that aren't making it
> to disk before the reboot?
>
> Do you have CONFIG_PANIC_ON_OOPS set?
>
> Is it possible for you to build kernels between 5.7.4 and 5.7.7 to figure out
> where exactly the server started crashing?

      It started crashing at 5.7.6 and not sure how to get 5.7.5 now.  Don't
have access to the console because the machine is 21 miles from me.  When it
reboots console is overwritten with login screen when it comes back up.

      However, I discovered another problem that may be related, nouveau is 
allowing the nvidia 210 video card to DMA without having allocated the memory 
so who knows what it is randomly overwriting.  So don't know that that isn't
related, but crashes always occur when I try to mount a file system on a
client.

      Presently I've reverted to a stock Ubuntu 5.4.0 kernel just to make
sure I haven't got a hardware issue.
Comment 3 Robert Dinse 2020-08-09 01:28:16 UTC
With a recent patch applied, this appears to be totally fixed in 5.8 so I am closing this ticket.

Note You need to log in before you can comment on or make changes to this bug.