Bug 201259
Summary: | [xfstests shared/010]: maybe pagecache contents is mutated after cycle mount | ||
---|---|---|---|
Product: | File System | Reporter: | Zorro Lang (zlang) |
Component: | XFS | Assignee: | FileSystem/XFS Default Virtual Assignee (filesystem_xfs) |
Status: | NEW --- | ||
Severity: | normal | CC: | sandeen |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | v4.19-rc5 | Subsystem: | |
Regression: | No | Bisected commit-id: |
Description
Zorro Lang
2018-09-28 06:06:53 UTC
In trying to get to a more targeted reproducer, this script clearly exposes stale data, though I'm not seeing an md5sum change across the remount. The operations below are a subset of an fsstress sequence that caused a failure in shared/010 for me. --- #!/bin/bash DEV=/dev/pmem0p2 MNT=/mnt/scratch umount $MNT xfs_io -d -c "pwrite -S 0x58 0 16g" $DEV mkfs.xfs -m reflink=1 $DEV mount $DEV $MNT DONOR1=$MNT/donor1 DONOR2=$MNT/donor2 TARGET=$MNT/file rm -f $DONOR1 $DONOR2 $TARGET xfs_io -f -c "pwrite -S 0x72 0 1772544" $DONOR1 xfs_io -f -c "pwrite -S 0x52 0 840263" $DONOR2 touch $TARGET xfs_io -f \ -c "falloc -k 624393 840278" \ -c "reflink $DONOR1 925696 212992 57344" \ -c "pwrite -S 0x57 1662464 74240" \ -c "fdatasync" \ -c "reflink $DONOR2 585728 5357568 122880" \ $TARGET md5sum $TARGET umount $MNT mount $DEV $MNT md5sum $TARGET xfs_bmap -vvp $TARGET hexdump -C $TARGET --- 95580da3cda8eb09c23c8deefb347d77 /mnt/scratch/file 95580da3cda8eb09c23c8deefb347d77 /mnt/scratch/file /mnt/scratch/file: EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS 0: [0..415]: hole 416 1: [416..527]: 3648..3759 0 (3648..3759) 112 100000 (first reflink) 2: [528..1215]: hole 688 3: [1216..2863]: 192..1839 0 (192..1839) 1648 010000 (fallocated) 4: [2864..3239]: hole 376 5: [3240..3455]: 5304..5519 0 (5304..5519) 216 000000 (pwrite) 6: [3456..10463]: hole 7008 7: [10464..10703]: 6664..6903 0 (6664..6903) 240 100000 (2nd reflink) 00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| (hole) * 00034000 72 72 72 72 72 72 72 72 72 72 72 72 72 72 72 72 |rrrrrrrrrrrrrrrr| (first reflink) * 00042000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| (hole + fallocated) * 00195e00 57 57 57 57 57 57 57 57 57 57 57 57 57 57 57 57 |WWWWWWWWWWWWWWWW| (pwrite) * 001a8000 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 |XXXXXXXXXXXXXXXX| (stale 32k) * 001b0000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| (hole) * 0051c000 52 52 52 52 52 52 52 52 52 52 52 52 52 52 52 52 |RRRRRRRRRRRRRRRR| (2nd reflink) * 0053a000 the "XXXXXXX" is exposed stale data (no pwrite in the script wrote 'X', that is the disk pattern) The pattern starts right after the end of the pwrite call of the script. There also seems to be a reflinked extent prior to the pwrite. Hrm sorry about the wrap. Also:
> There also seems to be a reflinked extent prior to the pwrite.
That should say "there is an fallocated extent prior to the pwrite"
Here's a slightly simpler test on smaller, block boundaries: xfs_io -f -c "pwrite -S 0x72 0 4096" $DONOR1 xfs_io -f -c "pwrite -S 0x52 0 4096" $DONOR2 xfs_io -f \ -c "falloc -k 8192 4096" \ -c "reflink $DONOR1 0 4096 4096" \ -c "pwrite -S 0x57 65536 8192" \ -c "fdatasync" \ -c "reflink $DONOR2 0 131072 4096" \ $TARGET By block, it does these operations, all within 32 blocks: 11111111112222222222333 012345678901234567890123456789012 P R WW R P = preallocate R = reflink W = write What we should get out is: HRPHHHHHHHHHHHHHWWHHHHHHHHHHHHHHR H = Hole What we do get out is: HRPHHHHHHHHHHHHHWWSSSSSSSSSSSSSSR S = Stale The extents: EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS 0: [0..7]: hole 8 1: [8..15]: 112..119 0 (112..119) 8 100000 2: [16..23]: 104..111 0 (104..111) 8 010000 3: [24..127]: hole 104 4: [128..255]: 216..343 0 (216..343) 128 000000 5: [256..263]: 120..127 0 (120..127) 8 100000 are lined up as (chars represent hole/reflink/prealloc/written/stale) H R P HHHHHHHHHHHHH WWSSSSSSSSSSSSSS R [E][E][E][ E ][ E ][E] So the 5th extent is what has exposed the stale data, with written data only in the first 2 blocks and another 14 blocks exposed up to the next reflinked block. If I change the test to mount with -o allocsize=4096, the problem goes away, so it would appear that this has to do with speculative preallocation past EOF. Even simpler, xfs_io -f -c "pwrite -S 0x72 0 4096" $DONOR1 xfs_io -f \ -c "falloc -k 8192 4096" \ -c "pwrite -S 0x57 65536 8192" \ -c "fdatasync" \ -c "reflink $DONOR1 0 131072 4096" \ $TARGET so this seems to be an extending reflink extending past/into speculative preallocation which exposes the stale blocks. After upstream merged the patch of this bug, I still can reproduce a shared/010 failure: FSTYP -- xfs (non-debug) PLATFORM -- Linux/x86_64 xxxxxxxx 4.19.0+ MKFS_OPTIONS -- -f -m reflink=1 -b size=1024 /dev/mapper/xxxxxxxx-xfscratch MOUNT_OPTIONS -- -o context=system_u:object_r:root_t:s0 /dev/mapper/xxxxxxxxx-xfscratch /mnt/scratch shared/010 160s ... - output mismatch (see /home/xfstests-dev/results//shared/010.out.bad) --- tests/shared/010.out 2018-10-16 23:31:53.924269141 -0400 +++ /home/xfstests-dev/results//shared/010.out.bad 2018-11-02 12:20:39.858510419 -0400 @@ -1,2 +1,4 @@ QA output created by 010 +/mnt/scratch/dir/p0/da/d51XX/f6dXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX: FAILED +md5sum: WARNING: 1 computed checksum did NOT match Silence is golden ... (Run 'diff -u tests/shared/010.out /home/xfstests-dev/results//shared/010.out.bad' to see the entire diff) Ran: shared/010 Failures: shared/010 Failed 1 of 1 tests But maybe it's a new issue, due to I can't reproduce this bug by the reproducer on comment#1: # bash reproducer.sh umount: /mnt/scratch: not mounted. wrote 17179869184/17179869184 bytes at offset 0 16.000 GiB, 4194304 ops; 0:16:47.88 (16.256 MiB/sec and 4161.4915 ops/sec) meta-data=/dev/mapper/xxxxxxxx-xfscratch isize=512 agcount=16, agsize=8192000 blks = sectsz=512 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=1, rmapbt=0 = reflink=1 data = bsize=4096 blocks=131072000, imaxpct=25 = sunit=64 swidth=64 blks naming =version 2 bsize=4096 ascii-ci=0, ftype=1 log =internal log bsize=4096 blocks=64000, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 wrote 1772544/1772544 bytes at offset 0 2 MiB, 433 ops; 0.0170 sec (99.111 MiB/sec and 25386.9606 ops/sec) wrote 840263/840263 bytes at offset 0 821 KiB, 206 ops; 0.0083 sec (96.419 MiB/sec and 24786.4276 ops/sec) linked 57344/57344 bytes at offset 212992 56 KiB, 1 ops; 0.0030 sec (17.698 MiB/sec and 323.6246 ops/sec) wrote 74240/74240 bytes at offset 1662464 72 KiB, 19 ops; 0.0010 sec (65.984 MiB/sec and 17707.3625 ops/sec) linked 122880/122880 bytes at offset 5357568 120 KiB, 1 ops; 0.0037 sec (31.477 MiB/sec and 268.6006 ops/sec) a42583dd3f7edb0c00e7356c89d3e58c /mnt/scratch/file a42583dd3f7edb0c00e7356c89d3e58c /mnt/scratch/file /mnt/scratch/file: EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS 0: [0..415]: hole 416 1: [416..527]: 4880..4991 0 (4880..4991) 112 101111 2: [528..1215]: hole 688 3: [1216..2863]: 1024..2671 0 (1024..2671) 1648 010101 4: [2864..3239]: hole 376 5: [3240..3583]: 2672..3015 0 (2672..3015) 344 001111 6: [3584..10463]: hole 6880 7: [10464..10703]: 7800..8039 0 (7800..8039) 240 101111 FLAG Values: 0100000 Shared extent 0010000 Unwritten preallocated extent 0001000 Doesn't begin on stripe unit 0000100 Doesn't end on stripe unit 0000010 Doesn't begin on stripe width 0000001 Doesn't end on stripe width 00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00034000 72 72 72 72 72 72 72 72 72 72 72 72 72 72 72 72 |rrrrrrrrrrrrrrrr| * 00042000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00195e00 57 57 57 57 57 57 57 57 57 57 57 57 57 57 57 57 |WWWWWWWWWWWWWWWW| * 001a8000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 0051c000 52 52 52 52 52 52 52 52 52 52 52 52 52 52 52 52 |RRRRRRRRRRRRRRRR| * 0053a000 (In reply to Zorro Lang from comment #6) > After upstream merged the patch of this bug, I still can reproduce a > shared/010 failure: It's not duperemove related, I've removed all duperemove related part from shared/010, still can reproduce this bug. On the other word, use fsstress and md5sum can reproduce this bug. But if set 'fsstress -f deduperange=0', can't reproduce this bug (at least I can't reproduce it until now by loop running shared/010 200 times). Hmm... maybe I should report a new bug, if this's a different bug with this one. Thanks, Zorro > > FSTYP -- xfs (non-debug) > PLATFORM -- Linux/x86_64 xxxxxxxx 4.19.0+ > MKFS_OPTIONS -- -f -m reflink=1 -b size=1024 /dev/mapper/xxxxxxxx-xfscratch > MOUNT_OPTIONS -- -o context=system_u:object_r:root_t:s0 > /dev/mapper/xxxxxxxxx-xfscratch /mnt/scratch > > shared/010 160s ... - output mismatch (see > /home/xfstests-dev/results//shared/010.out.bad) > --- tests/shared/010.out 2018-10-16 23:31:53.924269141 -0400 > +++ /home/xfstests-dev/results//shared/010.out.bad 2018-11-02 > 12:20:39.858510419 -0400 > @@ -1,2 +1,4 @@ > QA output created by 010 > +/mnt/scratch/dir/p0/da/d51XX/f6dXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX: > FAILED > +md5sum: WARNING: 1 computed checksum did NOT match > Silence is golden > ... > (Run 'diff -u tests/shared/010.out > /home/xfstests-dev/results//shared/010.out.bad' to see the entire diff) > Ran: shared/010 > Failures: shared/010 > Failed 1 of 1 tests > > But maybe it's a new issue, due to I can't reproduce this bug by the > reproducer on comment#1: > > # bash reproducer.sh > umount: /mnt/scratch: not mounted. > wrote 17179869184/17179869184 bytes at offset 0 > 16.000 GiB, 4194304 ops; 0:16:47.88 (16.256 MiB/sec and 4161.4915 ops/sec) > meta-data=/dev/mapper/xxxxxxxx-xfscratch isize=512 agcount=16, > agsize=8192000 blks > = sectsz=512 attr=2, projid32bit=1 > = crc=1 finobt=1, sparse=1, rmapbt=0 > = reflink=1 > data = bsize=4096 blocks=131072000, imaxpct=25 > = sunit=64 swidth=64 blks > naming =version 2 bsize=4096 ascii-ci=0, ftype=1 > log =internal log bsize=4096 blocks=64000, version=2 > = sectsz=512 sunit=0 blks, lazy-count=1 > realtime =none extsz=4096 blocks=0, rtextents=0 > wrote 1772544/1772544 bytes at offset 0 > 2 MiB, 433 ops; 0.0170 sec (99.111 MiB/sec and 25386.9606 ops/sec) > wrote 840263/840263 bytes at offset 0 > 821 KiB, 206 ops; 0.0083 sec (96.419 MiB/sec and 24786.4276 ops/sec) > linked 57344/57344 bytes at offset 212992 > 56 KiB, 1 ops; 0.0030 sec (17.698 MiB/sec and 323.6246 ops/sec) > wrote 74240/74240 bytes at offset 1662464 > 72 KiB, 19 ops; 0.0010 sec (65.984 MiB/sec and 17707.3625 ops/sec) > linked 122880/122880 bytes at offset 5357568 > 120 KiB, 1 ops; 0.0037 sec (31.477 MiB/sec and 268.6006 ops/sec) > a42583dd3f7edb0c00e7356c89d3e58c /mnt/scratch/file > a42583dd3f7edb0c00e7356c89d3e58c /mnt/scratch/file > /mnt/scratch/file: > EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS > 0: [0..415]: hole 416 > 1: [416..527]: 4880..4991 0 (4880..4991) 112 101111 > 2: [528..1215]: hole 688 > 3: [1216..2863]: 1024..2671 0 (1024..2671) 1648 010101 > 4: [2864..3239]: hole 376 > 5: [3240..3583]: 2672..3015 0 (2672..3015) 344 001111 > 6: [3584..10463]: hole 6880 > 7: [10464..10703]: 7800..8039 0 (7800..8039) 240 101111 > FLAG Values: > 0100000 Shared extent > 0010000 Unwritten preallocated extent > 0001000 Doesn't begin on stripe unit > 0000100 Doesn't end on stripe unit > 0000010 Doesn't begin on stripe width > 0000001 Doesn't end on stripe width > 00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > |................| > * > 00034000 72 72 72 72 72 72 72 72 72 72 72 72 72 72 72 72 > |rrrrrrrrrrrrrrrr| > * > 00042000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > |................| > * > 00195e00 57 57 57 57 57 57 57 57 57 57 57 57 57 57 57 57 > |WWWWWWWWWWWWWWWW| > * > 001a8000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > |................| > * > 0051c000 52 52 52 52 52 52 52 52 52 52 52 52 52 52 52 52 > |RRRRRRRRRRRRRRRR| > * > 0053a000 Zorro, I see that your new report is using a smaller blocksize (-b 1024) - was that used in the initial report? (In reply to Eric Sandeen from comment #8) > Zorro, I see that your new report is using a smaller blocksize (-b 1024) - > was that used in the initial report? No, smaller blocksize is not necessary (I just forgot remove that line after another testing:). Both 1024 or 4096 blocksize can reproduce this bug. Thanks, Zorro |