Bug 201259 - [xfstests shared/010]: maybe pagecache contents is mutated after cycle mount
Summary: [xfstests shared/010]: maybe pagecache contents is mutated after cycle mount
Status: NEW
Alias: None
Product: File System
Classification: Unclassified
Component: XFS (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: FileSystem/XFS Default Virtual Assignee
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-09-28 06:06 UTC by Zorro Lang
Modified: 2018-11-06 03:14 UTC (History)
1 user (show)

See Also:
Kernel Version: v4.19-rc5
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Zorro Lang 2018-09-28 06:06:53 UTC
Recently, shared/010 sometimes hit below issue:

QA output created by 010
/mnt/scratch/dir/p0/deXXXXXXXXXXX/d218X/d3a0XXX/d3da/d90XXXXXXXXXXXXX/d1c8XXXXXXXXX/d71/d75XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX/da6XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX/db9XXXX/de0XXXXXXXXXXXXX/d7e0XXXXXXXXXXXX/d1d9X/f476XXXX: FAILED
md5sum: WARNING: 1 computed checksum did NOT match
Silence is golden

Looks like something changed after cycle mount, cause this integrity issue.

Version-Release number of selected component (if applicable):
kernel v4.19-rc5

How reproducible:
10~30%

Steps to Reproduce:
1) git clone https://github.com/markfasheh/duperemove.git
2) cd duperemove; make; make install
3) Run shared/010 on XFS several times, with reflink=1:

Actual results:
checksum mismatch

Expected results:
test always pass

Additional info:
I can't be sure this's a real bug, maybe it's case issue. But I don't know what's wrong with the case, so report this bug to get more reviewing/suggestions from XFS developers. Thanks.
Comment 1 Eric Sandeen 2018-10-02 13:53:37 UTC
In trying to get to a more targeted reproducer, this script clearly exposes stale data, though I'm not seeing an md5sum change across the remount.  The operations below are a subset of an fsstress sequence that caused a failure in shared/010 for me.

---
#!/bin/bash

DEV=/dev/pmem0p2
MNT=/mnt/scratch

umount $MNT
xfs_io -d -c "pwrite -S 0x58 0 16g" $DEV
mkfs.xfs -m reflink=1 $DEV
mount $DEV $MNT

DONOR1=$MNT/donor1
DONOR2=$MNT/donor2
TARGET=$MNT/file

rm -f $DONOR1 $DONOR2 $TARGET

xfs_io -f -c "pwrite -S 0x72 0 1772544" $DONOR1
xfs_io -f -c "pwrite -S 0x52 0 840263" $DONOR2

touch $TARGET

xfs_io -f \
        -c "falloc -k 624393 840278"    \
        -c "reflink $DONOR1 925696 212992 57344" \
        -c "pwrite -S 0x57 1662464 74240" \
        -c "fdatasync" \
        -c "reflink $DONOR2 585728 5357568 122880" \
                $TARGET

md5sum $TARGET
umount $MNT
mount $DEV $MNT
md5sum $TARGET
xfs_bmap -vvp $TARGET
hexdump -C $TARGET
---

95580da3cda8eb09c23c8deefb347d77  /mnt/scratch/file
95580da3cda8eb09c23c8deefb347d77  /mnt/scratch/file
/mnt/scratch/file:
 EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL FLAGS
   0: [0..415]:        hole                                   416
   1: [416..527]:      3648..3759        0 (3648..3759)       112 100000 (first reflink)
   2: [528..1215]:     hole                                   688
   3: [1216..2863]:    192..1839         0 (192..1839)       1648 010000 (fallocated)
   4: [2864..3239]:    hole                                   376
   5: [3240..3455]:    5304..5519        0 (5304..5519)       216 000000 (pwrite)
   6: [3456..10463]:   hole                                  7008
   7: [10464..10703]:  6664..6903        0 (6664..6903)       240 100000 (2nd reflink)


00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................| (hole)
*
00034000  72 72 72 72 72 72 72 72  72 72 72 72 72 72 72 72  |rrrrrrrrrrrrrrrr| (first reflink)
*
00042000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................| (hole + fallocated)
*
00195e00  57 57 57 57 57 57 57 57  57 57 57 57 57 57 57 57  |WWWWWWWWWWWWWWWW| (pwrite)
*
001a8000  58 58 58 58 58 58 58 58  58 58 58 58 58 58 58 58  |XXXXXXXXXXXXXXXX| (stale 32k)
*
001b0000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................| (hole)
*
0051c000  52 52 52 52 52 52 52 52  52 52 52 52 52 52 52 52  |RRRRRRRRRRRRRRRR| (2nd reflink)
*
0053a000


the "XXXXXXX" is exposed stale data (no pwrite in the script wrote 'X', that is the disk pattern)
The pattern starts right after the end of the pwrite call of the script.  There also seems to be a reflinked extent prior to the pwrite.
Comment 2 Eric Sandeen 2018-10-02 13:54:35 UTC
Hrm sorry about the wrap.  Also:

> There also seems to be a reflinked extent prior to the pwrite.

That should say "there is an fallocated extent prior to the pwrite"
Comment 3 Eric Sandeen 2018-10-02 16:24:09 UTC
Here's a slightly simpler test on smaller, block boundaries:

xfs_io -f -c "pwrite -S 0x72 0 4096" $DONOR1
xfs_io -f -c "pwrite -S 0x52 0 4096" $DONOR2

xfs_io -f \
        -c "falloc -k 8192 4096"        \
        -c "reflink $DONOR1 0 4096 4096" \
        -c "pwrite -S 0x57 65536 8192" \
        -c "fdatasync" \
        -c "reflink $DONOR2 0 131072 4096" \
                $TARGET

By block, it does these operations, all within 32 blocks:

          11111111112222222222333
012345678901234567890123456789012
   P
 R
                WW
                                R
P = preallocate
R = reflink
W = write

What we should get out is:

HRPHHHHHHHHHHHHHWWHHHHHHHHHHHHHHR

H = Hole

What we do get out is:

HRPHHHHHHHHHHHHHWWSSSSSSSSSSSSSSR

S = Stale

The extents:

 EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL FLAGS
   0: [0..7]:          hole                                     8
   1: [8..15]:         112..119          0 (112..119)           8 100000
   2: [16..23]:        104..111          0 (104..111)           8 010000
   3: [24..127]:       hole                                   104
   4: [128..255]:      216..343          0 (216..343)         128 000000
   5: [256..263]:      120..127          0 (120..127)           8 100000

are lined up as (chars represent hole/reflink/prealloc/written/stale)

 H  R  P  HHHHHHHHHHHHH  WWSSSSSSSSSSSSSS  R
[E][E][E][      E      ][        E       ][E]

So the 5th extent is what has exposed the stale data, with written data only in the first 2 blocks and another 14 blocks exposed up to the next reflinked block.
Comment 4 Eric Sandeen 2018-10-02 16:28:51 UTC
If I change the test to mount with -o allocsize=4096, the problem goes away, so it would appear that this has to do with speculative preallocation past EOF.
Comment 5 Eric Sandeen 2018-10-02 16:37:14 UTC
Even simpler, 

xfs_io -f -c "pwrite -S 0x72 0 4096" $DONOR1

xfs_io -f \
        -c "falloc -k 8192 4096"        \
        -c "pwrite -S 0x57 65536 8192" \
        -c "fdatasync" \
        -c "reflink $DONOR1 0 131072 4096" \
                $TARGET

so this seems to be an extending reflink extending past/into speculative preallocation which exposes the stale blocks.
Comment 6 Zorro Lang 2018-11-03 03:05:06 UTC
After upstream merged the patch of this bug, I still can reproduce a shared/010 failure:

FSTYP         -- xfs (non-debug)
PLATFORM      -- Linux/x86_64 xxxxxxxx 4.19.0+
MKFS_OPTIONS  -- -f -m reflink=1 -b size=1024 /dev/mapper/xxxxxxxx-xfscratch
MOUNT_OPTIONS -- -o context=system_u:object_r:root_t:s0 /dev/mapper/xxxxxxxxx-xfscratch /mnt/scratch
 
shared/010 160s ... - output mismatch (see /home/xfstests-dev/results//shared/010.out.bad)
    --- tests/shared/010.out    2018-10-16 23:31:53.924269141 -0400
    +++ /home/xfstests-dev/results//shared/010.out.bad  2018-11-02 12:20:39.858510419 -0400
    @@ -1,2 +1,4 @@
     QA output created by 010
    +/mnt/scratch/dir/p0/da/d51XX/f6dXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX: FAILED
    +md5sum: WARNING: 1 computed checksum did NOT match
     Silence is golden
    ...
    (Run 'diff -u tests/shared/010.out /home/xfstests-dev/results//shared/010.out.bad'  to see the entire diff)
Ran: shared/010
Failures: shared/010
Failed 1 of 1 tests

But maybe it's a new issue, due to I can't reproduce this bug by the reproducer on comment#1:

# bash reproducer.sh
umount: /mnt/scratch: not mounted.
wrote 17179869184/17179869184 bytes at offset 0
16.000 GiB, 4194304 ops; 0:16:47.88 (16.256 MiB/sec and 4161.4915 ops/sec)
meta-data=/dev/mapper/xxxxxxxx-xfscratch isize=512    agcount=16, agsize=8192000 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=0
         =                       reflink=1
data     =                       bsize=4096   blocks=131072000, imaxpct=25
         =                       sunit=64     swidth=64 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=64000, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
wrote 1772544/1772544 bytes at offset 0
2 MiB, 433 ops; 0.0170 sec (99.111 MiB/sec and 25386.9606 ops/sec)
wrote 840263/840263 bytes at offset 0
821 KiB, 206 ops; 0.0083 sec (96.419 MiB/sec and 24786.4276 ops/sec)
linked 57344/57344 bytes at offset 212992
56 KiB, 1 ops; 0.0030 sec (17.698 MiB/sec and 323.6246 ops/sec)
wrote 74240/74240 bytes at offset 1662464
72 KiB, 19 ops; 0.0010 sec (65.984 MiB/sec and 17707.3625 ops/sec)
linked 122880/122880 bytes at offset 5357568
120 KiB, 1 ops; 0.0037 sec (31.477 MiB/sec and 268.6006 ops/sec)
a42583dd3f7edb0c00e7356c89d3e58c  /mnt/scratch/file
a42583dd3f7edb0c00e7356c89d3e58c  /mnt/scratch/file
/mnt/scratch/file:
 EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL FLAGS
   0: [0..415]:        hole                                   416
   1: [416..527]:      4880..4991        0 (4880..4991)       112 101111
   2: [528..1215]:     hole                                   688
   3: [1216..2863]:    1024..2671        0 (1024..2671)      1648 010101
   4: [2864..3239]:    hole                                   376
   5: [3240..3583]:    2672..3015        0 (2672..3015)       344 001111
   6: [3584..10463]:   hole                                  6880
   7: [10464..10703]:  7800..8039        0 (7800..8039)       240 101111
 FLAG Values:
    0100000 Shared extent
    0010000 Unwritten preallocated extent
    0001000 Doesn't begin on stripe unit
    0000100 Doesn't end   on stripe unit
    0000010 Doesn't begin on stripe width
    0000001 Doesn't end   on stripe width
00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00034000  72 72 72 72 72 72 72 72  72 72 72 72 72 72 72 72  |rrrrrrrrrrrrrrrr|
*
00042000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00195e00  57 57 57 57 57 57 57 57  57 57 57 57 57 57 57 57  |WWWWWWWWWWWWWWWW|
*
001a8000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
0051c000  52 52 52 52 52 52 52 52  52 52 52 52 52 52 52 52  |RRRRRRRRRRRRRRRR|
*
0053a000
Comment 7 Zorro Lang 2018-11-03 03:11:15 UTC
(In reply to Zorro Lang from comment #6)
> After upstream merged the patch of this bug, I still can reproduce a
> shared/010 failure:

It's not duperemove related, I've removed all duperemove related part from shared/010, still can reproduce this bug. On the other word, use fsstress and md5sum can reproduce this bug.

But if set 'fsstress -f deduperange=0', can't reproduce this bug (at least I can't reproduce it until now by loop running shared/010 200 times).

Hmm... maybe I should report a new bug, if this's a different bug with this one.

Thanks,
Zorro

> 
> FSTYP         -- xfs (non-debug)
> PLATFORM      -- Linux/x86_64 xxxxxxxx 4.19.0+
> MKFS_OPTIONS  -- -f -m reflink=1 -b size=1024 /dev/mapper/xxxxxxxx-xfscratch
> MOUNT_OPTIONS -- -o context=system_u:object_r:root_t:s0
> /dev/mapper/xxxxxxxxx-xfscratch /mnt/scratch
>  
> shared/010 160s ... - output mismatch (see
> /home/xfstests-dev/results//shared/010.out.bad)
>     --- tests/shared/010.out    2018-10-16 23:31:53.924269141 -0400
>     +++ /home/xfstests-dev/results//shared/010.out.bad  2018-11-02
> 12:20:39.858510419 -0400
>     @@ -1,2 +1,4 @@
>      QA output created by 010
>     +/mnt/scratch/dir/p0/da/d51XX/f6dXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX:
> FAILED
>     +md5sum: WARNING: 1 computed checksum did NOT match
>      Silence is golden
>     ...
>     (Run 'diff -u tests/shared/010.out
> /home/xfstests-dev/results//shared/010.out.bad'  to see the entire diff)
> Ran: shared/010
> Failures: shared/010
> Failed 1 of 1 tests
> 
> But maybe it's a new issue, due to I can't reproduce this bug by the
> reproducer on comment#1:
> 
> # bash reproducer.sh
> umount: /mnt/scratch: not mounted.
> wrote 17179869184/17179869184 bytes at offset 0
> 16.000 GiB, 4194304 ops; 0:16:47.88 (16.256 MiB/sec and 4161.4915 ops/sec)
> meta-data=/dev/mapper/xxxxxxxx-xfscratch isize=512    agcount=16,
> agsize=8192000 blks
>          =                       sectsz=512   attr=2, projid32bit=1
>          =                       crc=1        finobt=1, sparse=1, rmapbt=0
>          =                       reflink=1
> data     =                       bsize=4096   blocks=131072000, imaxpct=25
>          =                       sunit=64     swidth=64 blks
> naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
> log      =internal log           bsize=4096   blocks=64000, version=2
>          =                       sectsz=512   sunit=0 blks, lazy-count=1
> realtime =none                   extsz=4096   blocks=0, rtextents=0
> wrote 1772544/1772544 bytes at offset 0
> 2 MiB, 433 ops; 0.0170 sec (99.111 MiB/sec and 25386.9606 ops/sec)
> wrote 840263/840263 bytes at offset 0
> 821 KiB, 206 ops; 0.0083 sec (96.419 MiB/sec and 24786.4276 ops/sec)
> linked 57344/57344 bytes at offset 212992
> 56 KiB, 1 ops; 0.0030 sec (17.698 MiB/sec and 323.6246 ops/sec)
> wrote 74240/74240 bytes at offset 1662464
> 72 KiB, 19 ops; 0.0010 sec (65.984 MiB/sec and 17707.3625 ops/sec)
> linked 122880/122880 bytes at offset 5357568
> 120 KiB, 1 ops; 0.0037 sec (31.477 MiB/sec and 268.6006 ops/sec)
> a42583dd3f7edb0c00e7356c89d3e58c  /mnt/scratch/file
> a42583dd3f7edb0c00e7356c89d3e58c  /mnt/scratch/file
> /mnt/scratch/file:
>  EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL FLAGS
>    0: [0..415]:        hole                                   416
>    1: [416..527]:      4880..4991        0 (4880..4991)       112 101111
>    2: [528..1215]:     hole                                   688
>    3: [1216..2863]:    1024..2671        0 (1024..2671)      1648 010101
>    4: [2864..3239]:    hole                                   376
>    5: [3240..3583]:    2672..3015        0 (2672..3015)       344 001111
>    6: [3584..10463]:   hole                                  6880
>    7: [10464..10703]:  7800..8039        0 (7800..8039)       240 101111
>  FLAG Values:
>     0100000 Shared extent
>     0010000 Unwritten preallocated extent
>     0001000 Doesn't begin on stripe unit
>     0000100 Doesn't end   on stripe unit
>     0000010 Doesn't begin on stripe width
>     0000001 Doesn't end   on stripe width
> 00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00 
> |................|
> *
> 00034000  72 72 72 72 72 72 72 72  72 72 72 72 72 72 72 72 
> |rrrrrrrrrrrrrrrr|
> *
> 00042000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00 
> |................|
> *
> 00195e00  57 57 57 57 57 57 57 57  57 57 57 57 57 57 57 57 
> |WWWWWWWWWWWWWWWW|
> *
> 001a8000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00 
> |................|
> *
> 0051c000  52 52 52 52 52 52 52 52  52 52 52 52 52 52 52 52 
> |RRRRRRRRRRRRRRRR|
> *
> 0053a000
Comment 8 Eric Sandeen 2018-11-05 22:44:56 UTC
Zorro, I see that your new report is using a smaller blocksize (-b 1024) - was that used in the initial report?
Comment 9 Zorro Lang 2018-11-06 03:14:01 UTC
(In reply to Eric Sandeen from comment #8)
> Zorro, I see that your new report is using a smaller blocksize (-b 1024) -
> was that used in the initial report?

No, smaller blocksize is not necessary (I just forgot remove that line after another testing:). Both 1024 or 4096 blocksize can reproduce this bug.

Thanks,
Zorro

Note You need to log in before you can comment on or make changes to this bug.