Bug 195725 - WARNING: CPU: 0 PID: 6756 at fs/nfsd/nfs4proc.c:450 nfsd4_open+0x500/0x7b0 [nfsd]
Summary: WARNING: CPU: 0 PID: 6756 at fs/nfsd/nfs4proc.c:450 nfsd4_open+0x500/0x7b0 [n...
Status: RESOLVED DOCUMENTED
Alias: None
Product: File System
Classification: Unclassified
Component: NFS (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Chuck Lever
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-05-12 01:42 UTC by JianhongYin
Modified: 2022-03-22 21:10 UTC (History)
4 users (show)

See Also:
Kernel Version: 4.11.0
Subsystem:
Regression: No
Bisected commit-id:


Attachments
full test console log (161.89 KB, application/gzip)
2017-05-15 03:34 UTC, JianhongYin
Details
full console log of 4.9.0 test (81.22 KB, application/gzip)
2017-05-16 07:57 UTC, JianhongYin
Details

Description JianhongYin 2017-05-12 01:42:10 UTC
Got bellow dmesg warning, when do racer test on NFS
'''
Checking dmesg for specific failures!
[ 8273.773664] ------------[ cut here ]------------
[ 8273.774197] WARNING: CPU: 0 PID: 6756 at fs/nfsd/nfs4proc.c:450 nfsd4_open+0x500/0x7b0 [nfsd]
[ 8273.775113] Modules linked in: nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache ext4 jbd2 mbcache loop crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel ppdev crypto_simd i2c_piix4 glue_helper cryptd i2c_core nfsd parport_pc pcspkr joydev virtio_balloon parport auth_rpcgss nfs_acl lockd acpi_cpufreq grace sunrpc ip_tables xfs libcrc32c ata_generic pata_acpi ata_piix libata virtio_blk virtio_net crc32c_intel serio_raw virtio_pci virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod
[ 8273.779987] CPU: 0 PID: 6756 Comm: nfsd Not tainted 4.11.0 #1
[ 8273.780629] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[ 8273.781244] task: ffff99e56c5e8000 task.stack: ffffb2e54358c000
[ 8273.781899] RIP: 0010:nfsd4_open+0x500/0x7b0 [nfsd]
[ 8273.782424] RSP: 0000:ffffb2e54358fd68 EFLAGS: 00010246
[ 8273.782999] RAX: 000000000000003f RBX: ffff99e4f58743e0 RCX: 0000000000000000
[ 8273.783759] RDX: 0000000000000000 RSI: ffff99e5ffc0e0a8 RDI: ffff99e5ffc0e0a8
[ 8273.784516] RBP: ffffb2e54358fdb8 R08: 00000000fffffffe R09: 0000000000000227
[ 8273.785290] R10: 0000000000000005 R11: 0000000000000226 R12: ffff99e4f5873068
[ 8273.786064] R13: 0000000005000000 R14: ffff99e5dcbfa000 R15: ffff99e5f1620400
[ 8273.786840] FS:  0000000000000000(0000) GS:ffff99e5ffc00000(0000) knlGS:0000000000000000
[ 8273.787701] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 8273.788322] CR2: 0000000000441582 CR3: 00000001353fa000 CR4: 00000000003406f0
[ 8273.789090] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 8273.789858] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 8273.790629] Call Trace:
[ 8273.790904]  nfsd4_proc_compound+0x465/0x730 [nfsd]
[ 8273.791442]  nfsd_dispatch+0xce/0x270 [nfsd]
[ 8273.791930]  svc_process_common+0x387/0x740 [sunrpc]
[ 8273.792484]  svc_process+0xfc/0x1c0 [sunrpc]
[ 8273.792953]  nfsd+0xe9/0x160 [nfsd]
[ 8273.793337]  kthread+0x109/0x140
[ 8273.793693]  ? nfsd_destroy+0x60/0x60 [nfsd]
[ 8273.794164]  ? kthread_park+0x60/0x60
[ 8273.794568]  ret_from_fork+0x2c/0x40
[ 8273.794955] Code: 00 85 c0 41 89 c5 0f 84 a8 fd ff ff 80 bb 15 01 00 00 00 0f 84 9b fd ff ff 89 c6 48 c7 c7 a0 7f 41 c0 31 c0 0f ce e8 cf 19 9b da <0f> ff e9 82 fd ff ff 41 bd 00 00 27 31 e9 d1 fd ff ff 41 bd 00 
[ 8273.796962] ---[ end trace 47aa3ba8800d4657 ]---

End of log.
'''
Comment 2 Trond Myklebust 2017-05-12 01:45:32 UTC
Assigning to Bruce.
Comment 3 JianhongYin 2017-05-12 01:47:47 UTC
my test script:

'''
BZ=lustre-racer
loopf=/tmp/loop-$BZ
nfsexpdir=/mnt/lustre
nfsmp=/mnt/nfsmp
nfsmp2=/mnt/nfsmp2

        mkdir -p $nfsexpdir $nfsmp $nfsmp2
        echo "$nfsexpdir *(rw,no_root_squash,fsid=0)" >/etc/exports

        dd if=/dev/zero of=$loopf bs=1024k count=1024
        mkfs.ext4 -F $loopf

        service rpcbind start
        mount none /var/lib/nfs -t tmpfs
        touch /var/lib/nfs/etab
        service nfs start

        # https://git.hpdd.intel.com/fs/lustre-release.git/tree/refs/heads/master:/lustre/tests/racer
        pushd lustre-racer.orig
        while :; do
                date

                mount $loopf $nfsexpdir -o loop || exit 1
                mkdir $nfsexpdir/racer
                # start nfs-server.service
                service_nfs restart || exit 2
                mount localhost:$nfsexpdir $nfsmp -t nfs -o nolock || exit 3
                mount localhost:/ $nfsmp2 -t nfs4 || exit 4
                DURATION=3600 sh racer.sh $nfsmp/racer &
                DURATION=3600 sh racer.sh $nfsmp2/racer &
                DURATION=3600 sh racer.sh $nfsexpdir/racer &
                wait %1 %2 %3
                umount $nfsmp || exit 5
                umount $nfsmp2 || exit 6
                # stop nfs-server.service
                service_nfs stop || exit 7
                umount $nfsexpdir || exit 8
                e2fsck -n -f $loopf || exit 9
        done
        popd
'''
Comment 4 bfields 2017-05-13 18:43:10 UTC
Thanks!  I think it should also be printing this message:

WARN(status && open->op_created,
             "nfsd4_process_open2 failed to open newly-created file! status=%u\n",
             be32_to_cpu(status));

are you seeing that in the logs (probably right before the oops)?

It might be interesting to see what the exact error was there.

I think there *might* be a race there if the file's chmod'd to deny the open just after its created but before we open it.  In which case, maybe we need some way to override permissions on that open, I don't know.
Comment 5 JianhongYin 2017-05-15 03:34:31 UTC
Created attachment 256523 [details]
full test console log
Comment 6 JianhongYin 2017-05-15 03:39:26 UTC
(In reply to bfields from comment #4)
> Thanks!  I think it should also be printing this message:
> 
> WARN(status && open->op_created,
>              "nfsd4_process_open2 failed to open newly-created file!
> status=%u\n",
>              be32_to_cpu(status));
> 
> are you seeing that in the logs (probably right before the oops)?

Yes you are right! (see full console log in  Comment 5)

> 
> It might be interesting to see what the exact error was there.
> 
> I think there *might* be a race there if the file's chmod'd to deny the open
> just after its created but before we open it.  In which case, maybe we need
> some way to override permissions on that open, I don't know.
Comment 7 bfields 2017-05-15 19:20:30 UTC
"nfsd4_process_open2 failed to open newly-created file! status=5"

That's EIO, which wasn't what I was expecting!  So, I don't know what's going on.

Do you happen to know whether this is a regression?  So, are there older kernel that passed this test reliably where this kernel fails?
Comment 8 JianhongYin 2017-05-16 07:55:48 UTC
(In reply to bfields from comment #7)
> "nfsd4_process_open2 failed to open newly-created file! status=5"
> 
> That's EIO, which wasn't what I was expecting!  So, I don't know what's
> going on.
> 
> Do you happen to know whether this is a regression?  So, are there older
> kernel that passed this test reliably where this kernel fails?

Got it.

This is not a regression, I've reproduced it on kernel 4.9.0
Comment 9 JianhongYin 2017-05-16 07:57:43 UTC
Created attachment 256563 [details]
full console log of 4.9.0 test
Comment 10 JianhongYin 2018-04-26 01:24:11 UTC
still reproduce on v4.17-rc2

'''
[  101.737135] nfsd: last server has exited, flushing export cache 
[  101.791554] NFSD: starting 90-second grace period (net f00000a0) 
[  334.500304] ------------[ cut here ]------------ 
[  334.505514] nfsd4_process_open2 failed to open newly-created file! status=5 
[  334.513380] WARNING: CPU: 11 PID: 2712 at fs/nfsd/nfs4proc.c:454 nfsd4_open+0x4f8/0x790 [nfsd] 
[  334.522995] Modules linked in: nfsv3 ext4 mbcache jbd2 loop nfsd nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache sunrpc intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel ipmi_ssif pcbc aesni_intel ipmi_si crypto_simd cdc_ether cryptd glue_helper ipmi_devintf usbnet ipmi_msghandler mii pcspkr iTCO_wdt iTCO_vendor_support sg i2c_i801 lpc_ich shpchp ioatdma ip_tables xfs libcrc32c sd_mod mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm igb ahci libahci libata dca i2c_algo_bit i2c_core crc32c_intel dm_mirror dm_region_hash dm_log dm_mod 
[  334.590173] CPU: 11 PID: 2712 Comm: nfsd Not tainted 4.17.0-rc2 #1 
[  334.597083] Hardware name: IBM  -[791822X]-/00W2432, BIOS -[TDE112DUS-1.00]- 03/01/2012 
[  334.606059] RIP: 0010:nfsd4_open+0x4f8/0x790 [nfsd] 
[  334.611507] RSP: 0000:ffffae46c0b63d90 EFLAGS: 00010286 
[  334.617345] RAX: 0000000000000000 RBX: ffff912c732683e0 RCX: 0000000000000000 
[  334.625314] RDX: 0000000000000001 RSI: ffff912cbe8d68b8 RDI: ffff912cbe8d68b8 
[  334.633286] RBP: ffff912c73269068 R08: 0000000000000000 R09: 000000000000044a 
[  334.641255] R10: 0000000000000000 R11: ffffae46c0b63af0 R12: 0000000005000000 
[  334.649227] R13: ffff912cbb90c000 R14: ffff912cb28a4e00 R15: ffff912cae23ae00 
[  334.657197] FS:  0000000000000000(0000) GS:ffff912cbe8c0000(0000) knlGS:0000000000000000 
[  334.666235] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 
[  334.672657] CR2: 00000000015c81c8 CR3: 000000002780a002 CR4: 00000000000606e0 
[  334.680627] Call Trace: 
[  334.683391]  nfsd4_proc_compound+0x5aa/0x7b0 [nfsd] 
[  334.688866]  nfsd_dispatch+0xb7/0x250 [nfsd] 
[  334.693673]  svc_process_common+0x399/0x750 [sunrpc] 
[  334.699259]  svc_process+0xeb/0x100 [sunrpc] 
[  334.704078]  nfsd+0xe3/0x150 [nfsd] 
[  334.707987]  kthread+0xf8/0x130 
[  334.711530]  ? nfsd_destroy+0x60/0x60 [nfsd] 
[  334.716305]  ? kthread_bind+0x10/0x10 
[  334.720412]  ret_from_fork+0x35/0x40 
[  334.724407] Code: 00 85 c0 41 89 c4 0f 84 be fd ff ff 80 bb 15 01 00 00 00 0f 84 b1 fd ff ff 89 c6 48 c7 c7 78 f2 7d c0 31 c0 0f ce e8 48 80 6d c4 <0f> 0b e9 98 fd ff ff 41 bc 00 00 27 31 e9 e8 fd ff ff 41 bc 00  
[  334.745492] ---[ end trace ac9143cbd6895047 ]--- 
[-- MARK -- Wed Apr 25 16:05:00 2018] 
[  404.873897] perf: interrupt took too long (2544 > 2500), lowering kernel.perf_event_max_sample_rate to 78000 
'''
Comment 11 Zarmin 2019-03-25 01:59:12 UTC
I'm having a same issue.

Did you research on any successful workarounds, like special NFS server / client options to prevent this?
Comment 12 bfields 2019-03-25 21:48:11 UTC
(In reply to Zarmin from comment #11)
> I'm having a same issue.

The same "failed to open newly-created file!" warning?

> Did you research on any successful workarounds, like special NFS server /
> client options to prevent this?

I don't think so.  What kernel are you seeing this on?

What sort of workload?  Is there simultaneous use of the same directories by NFS clients and processes runing on the server?
Comment 13 Zarmin 2019-03-26 14:27:58 UTC
(In reply to bfields from comment #12)
> (In reply to Zarmin from comment #11)
> > I'm having a same issue.
> 
> The same "failed to open newly-created file!" warning?
> 
> > Did you research on any successful workarounds, like special NFS server /
> > client options to prevent this?
> 
> I don't think so.  What kernel are you seeing this on?
> 
> What sort of workload?  Is there simultaneous use of the same directories by
> NFS clients and processes runing on the server?

Yes, I'm seeing this.

  985.894641] WARNING: CPU: 0 PID: 3714 at fs/nfsd/nfs4proc.c:464 nfsd4_open+0x4d8/0x7a0 [nfsd]
[  985.897639] nfsd4_process_open2 failed to open newly-created file! status=5
[  985.900268] Modules linked in: fuse nfsd(OF) loop isofs zfs(PO) zunicode(PO) zavl(PO) zcommon(PO) znvpair(PO) spl(O) edac_core kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel qxl ttm drm_kms_helper joydev evdev serio_raw virtio_balloon drm button auth_rpcgss oid_registry nfs_acl lockd grace sunrpc ip_tables x_tables autofs4 ext4 crc16 jbd2 crc32c_generic fscrypto ecb mbcache ata_generic virtio_net virtio_scsi virtio_blk crc32c_intel uhci_hcd aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd ehci_hcd psmouse ata_piix floppy libata virtio_pci virtio_ring virtio i2c_piix4 usbcore usb_common scsi_mod
[  985.927004] CPU: 0 PID: 3714 Comm: nfsd Tainted: PF       W  O    4.9.0-8-amd64 #1 Debian 4.9.144-3.1
[  985.931207] Hardware name: DigitalOcean Droplet, BIOS 20171212 12/12/2017
[  985.933657]  0000000000000000 ffffffffb9934524 ffff9dc58280fce0 0000000000000000
[  985.936451]  ffffffffb967a5cb ffff8b23076333e0 ffff9dc58280fd38 0000000005000000
[  985.939078]  ffff8b23308ea000 ffff8b2300c75400 ffff8b2332ad4400 ffffffffb967a64f
[  985.942272] Call Trace:
[  985.943807]  [<ffffffffb9934524>] ? dump_stack+0x5c/0x78
[  985.946621]  [<ffffffffb967a5cb>] ? __warn+0xcb/0xf0
[  985.949419]  [<ffffffffb967a64f>] ? warn_slowpath_fmt+0x5f/0x80
[  985.951630]  [<ffffffffc0908888>] ? nfsd4_open+0x4d8/0x7a0 [nfsd]
[  985.954035]  [<ffffffffc0908ee7>] ? nfsd4_proc_compound+0x397/0x6d0 [nfsd]
[  985.956595]  [<ffffffffc08f62a6>] ? nfsd_dispatch+0xc6/0x260 [nfsd]
[  985.959372]  [<ffffffffc038438a>] ? svc_process_common+0x47a/0x6c0 [sunrpc]
[  985.962980]  [<ffffffffc08f5c40>] ? nfsd_destroy+0x60/0x60 [nfsd]
[  985.966404]  [<ffffffffc03854b6>] ? svc_process+0xf6/0x1a0 [sunrpc]
[  985.969440]  [<ffffffffc08f5d29>] ? nfsd+0xe9/0x150 [nfsd]
[  985.972577]  [<ffffffffb969a5d9>] ? kthread+0xd9/0xf0
[  985.976266]  [<ffffffffb9c19364>] ? __switch_to_asm+0x34/0x70
[  985.979346]  [<ffffffffb969a500>] ? kthread_park+0x60/0x60
[  985.982574]  [<ffffffffb9c193f7>] ? ret_from_fork+0x57/0x70
[  985.985159] ---[ end trace aebc44694e167e63 ]---

Kernel version: SMP Debian 4.9.144-3.1 (2019-02-19) x86_64 GNU/Linux


Scenario to reproduce:

Using a lot of concurrent npm installs, I'm stress testing an nodejs deployment service which is using the storage as NFS. It needs a while to be reproduced. But it only happening with some types of underlying filesystems.

I can't reproduce it with Ext4 or XFS.
I can reproduce it with tmpfs, or any fuse based filesystem (even with a simple mirrorfs)

I've tracked down to this call: https://elixir.bootlin.com/linux/v4.9.144/source/fs/open.c#L869

The EIO (status=5) is coming the underlying implementation, so for further analysis I'm checking the different reproducible environments for returning EIO.
Comment 14 Zarmin 2019-03-27 14:11:22 UTC
Kernel is swallowing the original error code (24) and replacing it with EIO.

EMFILE      24  /* Too many open files */

For me, increasing my limits is solved the issue.

/etc/security/limits.conf
*         hard    nofile      2097152
*         soft    nofile      2097152


/etc/sysctl.conf
fs.file-max = 2097152 


But I haven't tested the code above to reproduce, only on my problems.
Comment 15 Zarmin 2019-03-27 14:21:44 UTC
Do you have any, 'nfsd: non-standard errno' lines in your logs? 

Unfortunately, I've missed that line along the error output during debugging, which actually tells me the original error.
Comment 16 Robert Dinse 2020-10-16 15:21:08 UTC
This looks like the same error I am seeing, was running clean on 5.8.7, now 5.9 is fuxored.  Hardware is an i7-6850k generic box, Ubuntu 20.04 OS kernel built from kernel.org source:

[ 5924.109784] ------------[ cut here ]------------
[ 5924.109798] WARNING: CPU: 11 PID: 1993 at fs/nfsd/nfs4state.c:4957 nfsd4_check_conflicting_opens+0x96/0xb0 [nfsd]
[ 5924.109799] Modules linked in: tcp_diag inet_diag unix_diag vhost_net vhost tap vhost_iotlb xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter bpfilter rpcsec_gss_krb5 nfsv4 nfs fscache bridge stp llc binfmt_misc quota_v2 quota_tree nls_iso8859_1 snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi intel_rapl_msr snd_hda_intel intel_rapl_common snd_intel_dspcfg snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_seq_midi x86_pkg_temp_thermal snd_seq_midi_event intel_powerclamp snd_rawmidi snd_seq snd_seq_device snd_timer kvm_intel kvm rapl mei_me snd mei intel_cstate iTCO_wdt soundcore iTCO_vendor_support eeepc_wmi asus_wmi sparse_keymap intel_wmi_thunderbolt wmi_bmof mac_hid sch_fq_codel nct6775 nfsd hwmon_vid coretemp auth_rpcgss nfs_acl lockd parport_pc ppdev grace sunrpc lp
[ 5924.109814]  parport ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid0 multipath linear dm_mirror dm_region_hash dm_log raid10 raid1 nouveau video i2c_algo_bit ttm drm_kms_helper crct10dif_pclmul syscopyarea crc32_pclmul sysfillrect ghash_clmulni_intel sysimgblt aesni_intel mxm_wmi fb_sys_fops cec i2c_i801 r8169 glue_helper drm lpc_ich realtek i2c_smbus ahci libahci wmi
[ 5924.109827] CPU: 11 PID: 1993 Comm: nfsd Not tainted 5.9.0 #1
[ 5924.109828] Hardware name: ASUS All Series/X99-E, BIOS 2101 07/10/2019
[ 5924.109834] RIP: 0010:nfsd4_check_conflicting_opens+0x96/0xb0 [nfsd]
[ 5924.109836] Code: ee 4c 89 ef c6 07 00 0f 1f 40 00 b8 f5 ff ff ff 5b 41 5c 41 5d 5d c3 4c 89 ef c6 07 00 0f 1f 40 00 31 c0 5b 41 5c 41 5d 5d c3 <0f> 0b eb a2 b8 f5 ff ff ff eb db 66 66 2e 0f 1f 84 00 00 00 00 00
[ 5924.109836] RSP: 0018:ffffb567c20ebc58 EFLAGS: 00010286
[ 5924.109837] RAX: 00000000ffffffff RBX: ffff95203244c000 RCX: 0000000000000001
[ 5924.109837] RDX: ffff9522753cf900 RSI: ffff95203244c000 RDI: ffff95224549db20
[ 5924.109838] RBP: ffffb567c20ebc70 R08: ffffb567c20ebc18 R09: ffff95252b89c5e8
[ 5924.109838] R10: ffff95252b89c5e8 R11: 0000000000000000 R12: ffff95224549db20
[ 5924.109839] R13: ffff95203244c000 R14: ffff9522752c1c18 R15: ffff9526882f8508
[ 5924.109839] FS:  0000000000000000(0000) GS:ffff9526ffcc0000(0000) knlGS:0000000000000000
[ 5924.109840] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5924.109840] CR2: 000055d51ecce488 CR3: 0000001f8577c006 CR4: 00000000003726e0
[ 5924.109841] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 5924.109841] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 5924.109841] Call Trace:
[ 5924.109850]  nfsd4_process_open2+0xede/0x1550 [nfsd]
[ 5924.109854]  ? fh_verify+0x1e5/0x6b0 [nfsd]
[ 5924.109861]  ? nfsd4_process_open1+0x182/0x430 [nfsd]
[ 5924.109866]  nfsd4_open+0x41c/0x840 [nfsd]
[ 5924.109872]  nfsd4_proc_compound+0x3c2/0x760 [nfsd]
[ 5924.109885]  ? svc_reserve+0x4e/0x60 [sunrpc]
[ 5924.109893]  nfsd_dispatch+0xe2/0x220 [nfsd]
[ 5924.109903]  svc_process_common+0x480/0x6f0 [sunrpc]
[ 5924.109914]  ? svc_sock_secure_port+0x16/0x40 [sunrpc]
[ 5924.109922]  ? nfsd_svc+0x330/0x330 [nfsd]
[ 5924.109933]  svc_process+0xbc/0xf0 [sunrpc]
[ 5924.109939]  nfsd+0xe8/0x150 [nfsd]
[ 5924.109942]  kthread+0x130/0x150
[ 5924.109948]  ? nfsd_destroy+0x60/0x60 [nfsd]
[ 5924.109949]  ? __kthread_bind_mask+0x70/0x70
[ 5924.109951]  ret_from_fork+0x22/0x30
[ 5924.109953] ---[ end trace 19eccb263b275715 ]---
Comment 17 bfields 2020-10-18 21:49:58 UTC
(In reply to Robert Dinse from comment #16)
> This looks like the same error I am seeing, was running clean on 5.8.7, now
> 5.9 is fuxored.  Hardware is an i7-6850k generic box, Ubuntu 20.04 OS kernel
> built from kernel.org source:
> 
> [ 5924.109784] ------------[ cut here ]------------
> [ 5924.109798] WARNING: CPU: 11 PID: 1993 at fs/nfsd/nfs4state.c:4957
> nfsd4_check_conflicting_opens+0x96/0xb0 [nfsd]

That's unrelated.  That function has a couple problems, one of which is that the warning is just wrong--I have a patch queued up to remove it.  You can safely ignore the warning, as long as you're not seeing other symptoms.  (In which case, feel free to open a separate bug describing them.)
Comment 18 Robert Dinse 2020-10-18 22:27:04 UTC
So far haven't seen anything else bad, nfs seems to be functioning mostly okay,
there is still a bug where occasionally when a server starts it will not
export everything in /etc/exports, exportfs -a doesn't fix but exportfs -r does.
But it is not a frequent error very difficult to reproduce.
Comment 19 bfields 2022-01-21 22:19:47 UTC
This "nfsd4_process_open2 failed to open newly-created file! status=%u\n" is a real bug that still needs to be addressed.

It seems to me a fundamental design flaw that we're creating the file and then opening it as two separate operations.  If the former succeeds but the latter fails, we're stuck without any reasonable reply to the client.

We should instead be calling into the vfs once to ask it to open the file and create it as one step.  That will require some reorganization of the NFSv4 open code.  And I'm not sure what the right vfs call is.
Comment 20 Chuck Lever 2022-02-18 16:32:53 UTC
There are two separate reports of EIO during an OPEN, under an intensive workload. One of them was resolved by increasing the maximum number of open files allowed. After a couple of years, the other reporter has not confirmed whether increasing the file limit resolves the issue.

Since the failure seems to be related to workload level, I expect that a pynfs-style unit test will not be adequate as confirmation that the issue remains or has been addressed.

Re-engineering the OPEN(CREATE) path is still an intriguing an idea, but it's not clear to me that it is related to the original issue.

One reporter states:

> The EIO (status=5) is coming the underlying implementation

but also that

> Kernel is swallowing the original error code (24) and replacing it with EIO.

If I were to choose a code change today that might move things forward, it would be to find a way of exposing the underlying error to NFSD. Though, we may find that EMFILE is the root problem after all.
Comment 21 bfields 2022-02-18 17:30:08 UTC
Even if it's possible to work around by bumping the maximum number of open files, the lack of atomicity between create and open is still a bug.

Agreed that it may be hard to trigger, that the particular situation the reporter was hitting may have been addressed, and that that may make this a lower priority to fix.
Comment 22 Chuck Lever 2022-02-18 18:46:34 UTC
(In reply to bfields from comment #21)
> Even if it's possible to work around by bumping the maximum number of open
> files, the lack of atomicity between create and open is still a bug.

I agree that the lack of atomicity is an issue. I'm just not convinced it's the root cause of /this/ issue. I propose to mark this RESOLVED and then open a "feature-request" on bugzilla.linux-nfs.org documenting the need to improve the atomicity of OPEN(CREATE).
Comment 23 Chuck Lever 2022-02-18 19:32:33 UTC
Marked DOCUMENTED: Note the workaround suggested above to increase the maximum number of open files.

The need for OPEN(CREATE) atomicity has been recorded and will be pursued separately:

https://bugzilla.linux-nfs.org/show_bug.cgi?id=382
Comment 24 Chuck Lever 2022-03-22 16:07:34 UTC
[ removed my Cc: since I am now the assignee for this issue. ]

Since v5.4, nfsd4_process_open2() relies on the new NFSD file cache. I don't see any way for nfsd4_process_open2() to fail during OPEN(CREATE) unless it's nfsd_file_acquire() that is somehow failing. In fact the nfsd4_open() code path looks to be pretty careful to avoid failure in the CREATE case, although there could be gaps that are not obvious to code inspection.

I've also tried reducing fs.file_max and have not been able to reproduce a failure with workloads that are available to me.

There are trace points in the filecache code now that might be enabled if a reproducer can be found for today's kernel (as of this writing, v5.17).
Comment 25 bfields 2022-03-22 18:28:21 UTC
(In reply to Chuck Lever from comment #24)
> Since v5.4, nfsd4_process_open2() relies on the new NFSD file cache. I don't
> see any way for nfsd4_process_open2() to fail during OPEN(CREATE) unless
> it's nfsd_file_acquire() that is somehow failing. In fact the nfsd4_open()
> code path looks to be pretty careful to avoid failure in the CREATE case,
> although there could be gaps that are not obvious to code inspection.

Right, but it's exactly nfsd_file_acquire() that I'm concerned about, as that's where the vfs open call happens (nfsd_file_acquire->nfsd_open_verified->__nfsd_open->dentry_open...).

> I've also tried reducing fs.file_max and have not been able to reproduce a
> failure with workloads that are available to me.
> 
> There are trace points in the filecache code now that might be enabled if a
> reproducer can be found for today's kernel (as of this writing, v5.17).

Red Hat has been reproducing it with the lustre racer test--see comment 3, and lustre/tests/racer.sh from git://git.whamcloud.com/fs/lustre-release.git, and https://bugzilla.redhat.com/show_bug.cgi?id=1905809.

A simpler way to reproduce might be to chmod(0) the file from another process between the time it's created and the time it's opened.  You could make that race more likely either by attempting it repeatedly, or by adding an artificial delay in the code somehow.

The race may be too difficult to hit to make the effort worthwhile, I don't know--but I'm quite sure there's a bug there (and not only in the EMFILE case).
Comment 26 Chuck Lever 2022-03-22 21:10:02 UTC
(In reply to bfields from comment #25)
> (In reply to Chuck Lever from comment #24)
> > Since v5.4, nfsd4_process_open2() relies on the new NFSD file cache. I
> don't
> > see any way for nfsd4_process_open2() to fail during OPEN(CREATE) unless
> > it's nfsd_file_acquire() that is somehow failing. In fact the nfsd4_open()
> > code path looks to be pretty careful to avoid failure in the CREATE case,
> > although there could be gaps that are not obvious to code inspection.
> 
> Right, but it's exactly nfsd_file_acquire() that I'm concerned about, as
> that's where the vfs open call happens
> (nfsd_file_acquire->nfsd_open_verified->__nfsd_open->dentry_open...).

To repair this problem, we might need to replace the use of nfsd_file_acquire() in the OPEN(CREATE) case with a new filecache API that manages to do the acquire at the same time as the underlying create. I've started exploring that idea.


> > I've also tried reducing fs.file_max and have not been able to reproduce a
> > failure with workloads that are available to me.
> > 
> > There are trace points in the filecache code now that might be enabled if a
> > reproducer can be found for today's kernel (as of this writing, v5.17).
> 
> Red Hat has been reproducing it with the lustre racer test--see comment 3,
> and lustre/tests/racer.sh from
> git://git.whamcloud.com/fs/lustre-release.git, and
> https://bugzilla.redhat.com/show_bug.cgi?id=1905809.

1905809 shows that the problem indeed occurs in kernels after the NFSD filecache was introduced. That's helpful to know.

But I would like some solid confirmation that the dentry_open() is what is failing, and not something else that is less excepted.


> A simpler way to reproduce might be to chmod(0) the file from another
> process between the time it's created and the time it's opened.  You could
> make that race more likely either by attempting it repeatedly, or by adding
> an artificial delay in the code somehow.
> 
> The race may be too difficult to hit to make the effort worthwhile, I don't
> know--but I'm quite sure there's a bug there (and not only in the EMFILE
> case).

I'm running a couple copies of racer.sh right now, and haven't seen an issue (xfs, not ext4). I also see that JianHong runs his reproducer on a small virtual system that acts as both client and server. That seems like a very contrived configuration, not one that is commonly deployed outside of stress testing labs ;-) But thrashing a small system like that might introduce a large enough window to enable multiple clients to interfere with each others' create operations.

Note You need to log in before you can comment on or make changes to this bug.