Bug 213607 - unable to mount a nfs v3 file system exported from a machine running kernel 5.13
Summary: unable to mount a nfs v3 file system exported from a machine running kernel 5.13
Status: RESOLVED CODE_FIX
Alias: None
Product: File System
Classification: Unclassified
Component: NFS (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: bfields
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-06-28 11:53 UTC by az0123456
Modified: 2022-01-21 22:21 UTC (History)
2 users (show)

See Also:
Kernel Version: 5.13.0
Subsystem:
Regression: No
Bisected commit-id:


Attachments
kernel configuration (147.76 KB, text/plain)
2021-06-28 15:11 UTC, az0123456
Details

Description az0123456 2021-06-28 11:53:55 UTC
Hello,  

I am not able to mount a nfs-v3 file system exported from a machine with kernel 5.13.
The tested kernel versions of the client system were 5.12.13 and 5.13.

Regards,
Axel
Comment 1 Trond Myklebust 2021-06-28 13:30:32 UTC
5.13 works just fine for me, but assigning to bfields in case there is a regression somewhere...
Comment 2 bfields 2021-06-28 14:17:35 UTC
Nothing I can think of.

Could you give us any details about how the mount actually fails?

If this happened to me, the first thing I might do is watch the traffic under wireshark, and work out what happens differently in the 5.13 case as compared to the 5.12.13 case.
Comment 3 az0123456 2021-06-28 15:11:34 UTC
Created attachment 297645 [details]
kernel configuration

kernel configuration
Comment 4 az0123456 2021-06-28 15:34:06 UTC
The machine in question runs now the previous kernel, no network traces,
sorry.
The mount takes a long time on the client side until it is put into the background.
On the server side i see only one of 4 mount requests:
Jun 28 13:24:42 srv rpc.mountd[3828]: authenticated mount request from fd5d:5ce:f267:d8c4::10:831 for /usr/local/dvd (/usr/local/dvd)
The logs on the client side show timeouts:
Jun 28 13:24:09 ac2 mount[3663]: mount to NFS server 'srv' failed: timed out, retrying
Comment 5 David Arendt 2021-06-28 20:30:39 UTC
I have exactly the same problem here using nfsv4.

Kernel 5.12.13 on the server side works fine.

If using 5.13 on there server side, nfs mount requests are hanging.

As this is a productions system I have unfortunately no possibility to do other tests.

On the client side I have tested using kernel 5.13 and MacOSX.
Comment 6 David Arendt 2021-06-29 17:14:06 UTC
Just for information. I have seen a post on the lkml that it might be that the problem was introduced between 5.13-rc7 and 5.13.

citation: "It's likely this regression is due to a last minute change to
alloc_pages_bulk_array() done just before v5.13."

The full post can be found here: http://lkml.iu.edu/hypermail/linux/kernel/2106.3/04707.html
Comment 7 David Arendt 2021-06-29 17:43:56 UTC
Just for information, I tried bisecting the differences and can confirm that applying the following patch makes nfs working again for me:

--- linux-5.13/mm/page_alloc.c  2021-06-28 00:21:11.000000000 +0200
+++ linux-5.13-rc7/mm/page_alloc.c      2021-06-21 00:03:15.000000000 +0200
@@ -5053,13 +5053,9 @@
         * Skip populated array elements to determine if any pages need
         * to be allocated before disabling IRQs.
         */
-       while (page_array && nr_populated < nr_pages && page_array[nr_populated])
+       while (page_array && page_array[nr_populated] && nr_populated < nr_pages)
                nr_populated++;
 
-       /* Already populated array? */
-       if (unlikely(page_array && nr_pages - nr_populated == 0))
-               return 0;
-
        /* Use the single page allocator for one page. */
        if (nr_pages - nr_populated == 1)
                goto failed;
Comment 8 Orion 2021-06-29 22:16:30 UTC
same probleme here with  kernel 5.13.0
with kernel 5.12.13 everything is ok
Comment 9 Orion 2021-06-29 22:39:55 UTC
and I confirm that the @David Arendt patch works too...
Comment 10 David Arendt 2021-07-08 04:14:53 UTC
I can confirm that this problem is fixed in kernel 5.13.1

Note You need to log in before you can comment on or make changes to this bug.