215673 – fsx dio testing get BAD DATA on NFS

Bug 215673 - fsx dio testing get BAD DATA on NFS

Summary: fsx dio testing get BAD DATA on NFS

Status:	NEW

Alias:	None

Product:	File System
Classification:	Unclassified
Component:	NFS (show other bugs)
Hardware:	All Linux

Importance:	P1 normal
Assignee:	Trond Myklebust

URL:
Keywords:

Depends on:
Blocks:

Reported:	2022-03-11 05:15 UTC by Zorro Lang
Modified:	2022-07-21 14:44 UTC (History)
CC List:	3 users (show)

See Also:
Kernel Version:	Linux v5.17-rc6+
Subsystem:
Regression:	No
Bisected commit-id:

Attachments
fsx bad file dump (116.41 KB, application/x-bzip) 2022-03-11 05:17 UTC, Zorro Lang	Details
fsx good file dump (116.41 KB, application/x-bzip) 2022-03-11 05:17 UTC, Zorro Lang	Details
fsx output log (574.43 KB, text/plain) 2022-03-11 05:18 UTC, Zorro Lang	Details
fsx operations running log (375.03 KB, text/plain) 2022-03-11 05:19 UTC, Zorro Lang	Details
.config file (219.10 KB, text/plain) 2022-03-11 05:21 UTC, Zorro Lang	Details
Add an attachment (proposed patch, testcase, etc.)

Description Zorro Lang 2022-03-11 05:15:02 UTC

Description of problem:
Our recent regression test on upstream linux hit below failure on NFS:
/usr/bin/fsx -Z -r 4096 -w 4096 -t 4096 -l 10485760 -S 0 /mnt/fsx-longterm/nfs-client/fsxtest_dio


skipping zero size read
skipping zero length punch hole
skipping zero length punch hole
truncating to largest ever: 0x25a000
copying to largest ever: 0x7d0000
truncating to largest ever: 0x99b000
copying to largest ever: 0x9ad000
copying to largest ever: 0x9b8000
truncating to largest ever: 0x9c4000
copying to largest ever: 0xa00000
READ BAD DATA: offset = 0x7000, size = 0xaff2, fname = /mnt/fsx-longterm/nfs-client/fsxtest_dio
OFFSET	GOOD	BAD	RANGE
0x0b000	0xda9e	0x0000	0x00000
operation# (mod 256) for the bad data unknown, check HOLE and EXTEND ops
0x0b001	0x9eda	0x0000	0x00001
operation# (mod 256) for the bad data unknown, check HOLE and EXTEND ops
0x0b002	0xda63	0x0000	0x00002
operation# (mod 256) for the bad data unknown, check HOLE and EXTEND ops
0x0b003	0x63da	0x0000	0x00003
operation# (mod 256) for the bad data unknown, check HOLE and EXTEND ops
0x0b004	0xdac6	0x0000	0x00004
operation# (mod 256) for the bad data unknown, check HOLE and EXTEND ops
0x0b005	0xc6da	0x0000	0x00005
operation# (mod 256) for the bad data unknown, check HOLE and EXTEND ops
0x0b006	0xdae0	0x0000	0x00006
operation# (mod 256) for the bad data unknown, check HOLE and EXTEND ops
0x0b007	0xe0da	0x0000	0x00007
operation# (mod 256) for the bad data unknown, check HOLE and EXTEND ops
0x0b008	0xda6e	0x0000	0x00008
operation# (mod 256) for the bad data unknown, check HOLE and EXTEND ops
0x0b009	0x6eda	0x0000	0x00009
operation# (mod 256) for the bad data unknown, check HOLE and EXTEND ops
0x0b00a	0xda34	0x0000	0x0000a
operation# (mod 256) for the bad data unknown, check HOLE and EXTEND ops
0x0b00b	0x34da	0x0000	0x0000b
operation# (mod 256) for the bad data unknown, check HOLE and EXTEND ops
0x0b00c	0xdaec	0x0000	0x0000c
operation# (mod 256) for the bad data unknown, check HOLE and EXTEND ops
0x0b00d	0xecda	0x0000	0x0000d
operation# (mod 256) for the bad data unknown, check HOLE and EXTEND ops
0x0b00e	0xdacd	0x0000	0x0000e
operation# (mod 256) for the bad data unknown, check HOLE and EXTEND ops
0x0b00f	0xcdda	0x0000	0x0000f
operation# (mod 256) for the bad data unknown, check HOLE and EXTEND ops
LOG DUMP (209358 total operations):
209359(207 mod 256): READ     0x4a1000 thru 0x4a5fff	(0x5000 bytes)
209360(208 mod 256): WRITE    0x96d000 thru 0x972fff	(0x6000 bytes) HOLE
209361(209 mod 256): SKIPPED (no operation)
209362(210 mod 256): COPY 0x66d000 thru 0x679fff	(0xd000 bytes) to 0x8fe000 thru 0x90afff
209363(211 mod 256): SKIPPED (no operation)
209364(212 mod 256): MAPWRITE 0x2f6000 thru 0x2f8c51	(0x2c52 bytes)
209365(213 mod 256): SKIPPED (no operation)
209366(214 mod 256): SKIPPED (no operation)
209367(215 mod 256): MAPREAD  0x14000 thru 0x23ec6	(0xfec7 bytes)
209368(216 mod 256): SKIPPED (no operation)
...
...
209354(202 mod 256): PUNCH    0xb73c7 thru 0xbc3a1	(0x4fdb bytes)
209355(203 mod 256): SKIPPED (no operation)
209356(204 mod 256): SKIPPED (no operation)
209357(205 mod 256): TRUNCATE DOWN	from 0x6102a7 to 0x3a6000
209358(206 mod 256): MAPREAD  0x7000 thru 0x11ff1	(0xaff2 bytes)	***RRRR***
Log of operations saved to "/mnt/fsx-longterm/nfs-client/fsxtest_dio.fsxops"; replay with --replay-ops
Correct content saved for comparison
(maybe hexdump "/mnt/fsx-longterm/nfs-client/fsxtest_dio" vs "/mnt/fsx-longterm/nfs-client/fsxtest_dio.fsxgood")

Version number of selected component:
Mainline Linux v5.17-rc6+, HEAD=ac84e82f78cb55ce8e760e0b5887d56efd78d6bc

How reproducible:
I test on aarch64, x86_64 and ppc64le, then aarch64 and x86_64 hit this failure. And only pure dio testing failed, others include buffer-io, aio etc didn't fail. Will give it more testing if needed.

Additional info:
The fsx comes from upstream xfstests, xfstests/ltp/fsx.c.
I'll upload full fsx test results later.

Comment 1 Zorro Lang 2022-03-11 05:17:10 UTC

Created attachment 300551 [details]
fsx bad file dump

Comment 2 Zorro Lang 2022-03-11 05:17:31 UTC

Created attachment 300552 [details]
fsx good file dump

Comment 3 Zorro Lang 2022-03-11 05:18:00 UTC

Created attachment 300553 [details]
fsx output log

Comment 4 Zorro Lang 2022-03-11 05:19:09 UTC

Created attachment 300554 [details]
fsx operations running log

Comment 5 Zorro Lang 2022-03-11 05:21:57 UTC

Created attachment 300555 [details]
.config file

Comment 6 Zorro Lang 2022-03-11 05:27:02 UTC

The underlying filesystem is xfs:
meta-data=/dev/vda2              isize=512    agcount=4, agsize=327680 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=0
         =                       reflink=1    bigtime=1 inobtcount=1
data     =                       bsize=4096   blocks=1310720, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=2560, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

mount nfs without any specified mount options, all by default nfs-utils 2.5.4

Comment 7 The Linux kernel's regression tracker (Thorsten Leemhuis) 2022-03-12 07:04:16 UTC

This sounds a lot like a regression, but it's not totally clear from the description. Am I right in assuming that 5.16 worked fine? Or did even some of the 5.17-rc kernels work fine?

Comment 8 Anna Schumaker 2022-03-14 13:26:22 UTC

I'm assuming this is against a Linux server? Do you know if the client is using NFS v4.2 and if it has CONFIG_NFS_v4_2_READ_PLUS enabled?

The fsx output looks a lot like the output we sometimes get when using v4.2 READ_PLUS, which is why we have it locked behind the config option with the message "This is intended for developers only" for now (I'm working on patches to address this, but they're not ready yet)

Comment 9 Zorro Lang 2022-03-14 13:47:36 UTC

(In reply to Anna Schumaker from comment #8)
> I'm assuming this is against a Linux server? Do you know if the client is
> using NFS v4.2 and if it has CONFIG_NFS_v4_2_READ_PLUS enabled?
> 
> The fsx output looks a lot like the output we sometimes get when using v4.2
> READ_PLUS, which is why we have it locked behind the config option with the
> message "This is intended for developers only" for now (I'm working on
> patches to address this, but they're not ready yet)

Hi Anna,

The .config file is in attachment. You check more details in it. I just found "CONFIG_NFS_V4_2_READ_PLUS=y" in it, so it might be that issue. I'll give it a test by disable this configuration.

Thanks,
Zorro

Comment 10 Zorro Lang 2022-03-19 15:50:34 UTC

(In reply to Zorro Lang from comment #9)
> (In reply to Anna Schumaker from comment #8)
> > I'm assuming this is against a Linux server? Do you know if the client is
> > using NFS v4.2 and if it has CONFIG_NFS_v4_2_READ_PLUS enabled?
> > 
> > The fsx output looks a lot like the output we sometimes get when using v4.2
> > READ_PLUS, which is why we have it locked behind the config option with the
> > message "This is intended for developers only" for now (I'm working on
> > patches to address this, but they're not ready yet)
> 
> Hi Anna,
> 
> The .config file is in attachment. You check more details in it. I just
> found "CONFIG_NFS_V4_2_READ_PLUS=y" in it, so it might be that issue. I'll
> give it a test by disable this configuration.

Hi Anna,

I think you're right, after I disable CONFIG_NFS_V4_2_READ_PLUS, I can't reproduce this bug anymore. So after you fix it, I'd like to verify that works.

Thanks,
Zorro

> 
> Thanks,
> Zorro

Comment 11 Chuck Lever 2022-07-21 14:44:07 UTC

I ran the fsx command above over night.

 ....

skipping zero length punch hole
skipping zero length punch hole
skipping zero size read
^Csignal 2
testcalls = 124721256
[cel@morisot ~]$ mountstats --rpc
Stats for klimt:/export/fast mounted on /mnt:

RPC statistics:
  267015341 RPC requests sent, 267015341 RPC replies received (0 XIDs not found)
  average backlog queue length: 0

GETATTR:
	161259261 ops (60%) 
	avg bytes sent per op: 199	avg bytes received per op: 240
	backlog wait: 0.003842 	RTT: 0.152617 	total execute time: 0.168519 (milliseconds)
READ_PLUS:
	46936965 ops (17%) 
	avg bytes sent per op: 216	avg bytes received per op: 4465
	backlog wait: 0.012698 	RTT: 0.200121 	total execute time: 0.237943 (milliseconds)
WRITE:
	27575688 ops (10%) 
	avg bytes sent per op: 33432	avg bytes received per op: 141
	backlog wait: 0.011856 	RTT: 0.720103 	total execute time: 0.739979 (milliseconds)
SETATTR:
	12061698 ops (4%) 
	avg bytes sent per op: 247	avg bytes received per op: 264
	backlog wait: 0.004311 	RTT: 0.223075 	total execute time: 0.239609 (milliseconds)
ALLOCATE:
	9592654 ops (3%) 
	avg bytes sent per op: 236	avg bytes received per op: 168
	backlog wait: 0.003911 	RTT: 0.205160 	total execute time: 0.220310 (milliseconds)
DEALLOCATE:
	9588990 ops (3%) 
	avg bytes sent per op: 236	avg bytes received per op: 168
	backlog wait: 0.003670 	RTT: 0.200074 	total execute time: 0.214885 (milliseconds)

 ....

So... the mount is NFSv4.2, the backing filesystem is XFS, and the client is clearly generating READ_PLUS and hole punches. I haven't been able to reproduce a problem so far with a Linux 5.19.0-rc7-00042-g94d75868c355 NFS server (no changes to the current READ_PLUS code -- this is just what is ready for 5.20).

Note You need to log in before you can comment on or make changes to this bug.