Bug 201897

Summary: [xfstests generic/482]: filesystem corruption after log replay
Product: File System Reporter: Zorro Lang (zlang)
Component: XFSAssignee: FileSystem/XFS Default Virtual Assignee (filesystem_xfs)
Status: NEW ---    
Severity: normal    
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: v4.20-rc4 and xfs-4.20-fixes-3 Subsystem:
Regression: No Bisected commit-id:
Attachments: corrupted xfs metadump on v4.20-rc4
generic 482.full

Description Zorro Lang 2018-12-05 08:40:35 UTC
generic/482 fails on v4.20-rc4, and xfs-4.20-fixes-3. But they have different coruption output(xfs_repair -n):

# ./check generic/482
FSTYP         -- xfs (non-debug)
PLATFORM      -- Linux/x86_64 hp-dl380pg8-01 4.20.0-rc4+
MKFS_OPTIONS  -- -f -m reflink=1,rmapbt=1 /dev/mapper/rhel_hp--dl380pg8--01-xfscratch
MOUNT_OPTIONS -- -o context=system_u:object_r:nfs_t:s0 /dev/mapper/rhel_hp--dl380pg8--01-xfscratch /mnt/xfstests/mnt2                                                                         

generic/482     [failed, exit status 1]- output mismatch (see /var/lib/xfstests/results//generic/482.out.bad)                                                                                 
    --- tests/generic/482.out   2018-12-05 02:29:59.736699716 -0500
    +++ /var/lib/xfstests/results//generic/482.out.bad  2018-12-05 03:10:34.985268665 -0500
    @@ -1,2 +1,3 @@
     QA output created by 482
    -Silence is golden
    +_check_xfs_filesystem: filesystem on /dev/mapper/rhel_hp--dl380pg8--01-xfscratch is inconsistent (r)                                                                                     
    +(see /var/lib/xfstests/results//generic/482.full for details)
    ...
    (Run 'diff -u /var/lib/xfstests/tests/generic/482.out /var/lib/xfstests/results//generic/482.out.bad'  to see the entire diff)                                                            
Ran: generic/482
Failures: generic/482
Failed 1 of 1 tests

# cat results/generic/482.full
_check_xfs_filesystem: filesystem on /dev/mapper/rhel_hp--dl380pg8--01-xfscratch is inconsistent (r)
*** xfs_repair -n output ***
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan (but don't clear) agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
unknown block state, ag 1, block 2066
unknown block state, ag 1, block 2126
unknown block state, ag 1, block 2459
...
unknown block state, ag 1, block 7780
unknown block state, ag 1, block 8189
        - check for inodes claiming duplicate blocks...
        - agno = 1
        - agno = 0
        - agno = 2
        - agno = 3
Missing reverse-mapping record for (1/1007) len 16 owner 805306519 off 348
Missing reverse-mapping record for (1/1018) len 5 owner 134 off 118
Missing reverse-mapping record for (1/1023) len 14 owner 163 off 810
...
Missing reverse-mapping record for (1/3172) len 25 owner 268435634 off 54
Missing reverse-mapping record for (1/3461) len 31 owner 138 off 783
Incorrect reverse-mapping: saw (1/4066) len 23 owner 170 off 692; should be (1/4066) len 28 owner 170 off 692
Incorrect reverse-mapping: saw (1/4094) unwritten len 16 owner 268435610 off 112; should be (1/4094) unwritten len 212 owner 268435610 off 112
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
...
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify link counts...
Maximum metadata LSN (1:6336) is ahead of log (1:1963).
Would format log to cycle 4.
No modify flag set, skipping filesystem flush and exiting.
*** end xfs_repair output

But on 4.20-rc1 with xfs-4.20-fixes-3, I got different corruption output:
# cat results//generic/482.full
....
....
_check_xfs_filesystem: filesystem on /dev/mapper/rhel_ibm--x3650m4--10-xfscratch is inconsistent (r)
*** xfs_repair -n output ***
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan (but don't clear) agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
bad symlink header ino 201326746, file block 0, disk block 25165839
problem with symbolic link in inode 201326746
would have cleared inode 201326746
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
entry "l21" in shortform directory 201326726 references free inode 201326746
would have junked entry "l21" in directory inode 201326726
bad symlink header ino 201326746, file block 0, disk block 25165839
problem with symbolic link in inode 201326746
would have cleared inode 201326746
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
        - traversing filesystem ...
entry "l21" in shortform directory inode 201326726 points to free inode 201326746
would junk entry
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify link counts...
Maximum metadata LSN (1:2865) is ahead of log (1:681).
Would format log to cycle 4.
No modify flag set, skipping filesystem flush and exiting.
*** end xfs_repair output
....
....
Comment 1 Zorro Lang 2018-12-05 08:43:38 UTC
Created attachment 279863 [details]
corrupted xfs metadump on v4.20-rc4
Comment 2 Zorro Lang 2018-12-05 08:44:29 UTC
Created attachment 279865 [details]
generic 482.full
Comment 3 Zorro Lang 2018-12-05 09:07:23 UTC
Ext4 test passed (below 4.20.0-rc1 contains xfs-v4.20-fixes-3):

# ./check generic/482
FSTYP         -- ext4
PLATFORM      -- Linux/x86_64 ibm-x3650m4-10 4.20.0-rc1
MKFS_OPTIONS  -- /dev/mapper/rhel_ibm--x3650m4--10-xfscratch
MOUNT_OPTIONS -- -o acl,user_xattr -o context=system_u:object_r:root_t:s0 /dev/mapper/rhel_ibm--x3650m4--10-xfscratch /mnt/scratch                                                            

generic/482      1567s
Ran: generic/482
Passed all 1 tests
Comment 4 Zorro Lang 2018-12-05 10:02:43 UTC
V5 XFS without reflink feature test pass:
# ./check generic/482
FSTYP         -- xfs (non-debug)
PLATFORM      -- Linux/x86_64 ibm-x3650m4-10 4.20.0-rc1
MKFS_OPTIONS  -- -f -m reflink=0,rmapbt=0,finobt=0,crc=1 -i sparse=0 /dev/mapper/rhel_ibm--x3650m4--10-xfscratch
MOUNT_OPTIONS -- -o context=system_u:object_r:root_t:s0 /dev/mapper/rhel_ibm--x3650m4--10-xfscratch /mnt/scratch

generic/482 1567s ...  774s
Ran: generic/482
Passed all 1 tests

# ./check generic/482
FSTYP         -- xfs (non-debug)
PLATFORM      -- Linux/x86_64 ibm-x3650m4-10 4.20.0-rc1
MKFS_OPTIONS  -- -f -m reflink=0,rmapbt=0,finobt=1,crc=1 -i sparse=1 /dev/mapper/rhel_ibm--x3650m4--10-xfscratch                                                                              
MOUNT_OPTIONS -- -o context=system_u:object_r:root_t:s0 /dev/mapper/rhel_ibm--x3650m4--10-xfscratch /mnt/scratch                                                                              

generic/482 774s ...  841s
Ran: generic/482
Passed all 1 tests
Comment 5 Zorro Lang 2018-12-05 10:10:34 UTC
(In reply to Zorro Lang from comment #4)
> V5 XFS without reflink feature test pass:
> # ./check generic/482
> FSTYP         -- xfs (non-debug)
> PLATFORM      -- Linux/x86_64 ibm-x3650m4-10 4.20.0-rc1
> MKFS_OPTIONS  -- -f -m reflink=0,rmapbt=0,finobt=0,crc=1 -i sparse=0
> /dev/mapper/rhel_ibm--x3650m4--10-xfscratch
> MOUNT_OPTIONS -- -o context=system_u:object_r:root_t:s0
> /dev/mapper/rhel_ibm--x3650m4--10-xfscratch /mnt/scratch
> 
> generic/482 1567s ...  774s
> Ran: generic/482
> Passed all 1 tests
> 
> # ./check generic/482
> FSTYP         -- xfs (non-debug)
> PLATFORM      -- Linux/x86_64 ibm-x3650m4-10 4.20.0-rc1
> MKFS_OPTIONS  -- -f -m reflink=0,rmapbt=0,finobt=1,crc=1 -i sparse=1
> /dev/mapper/rhel_ibm--x3650m4--10-xfscratch                                 
> 
> MOUNT_OPTIONS -- -o context=system_u:object_r:root_t:s0
> /dev/mapper/rhel_ibm--x3650m4--10-xfscratch /mnt/scratch                    
> 
> 
> generic/482 774s ...  841s
> Ran: generic/482
> Passed all 1 tests

Hmm... maybe I said that too early :/ (below 4.20-rc1 contains xfs-4.20-fixes-3) ...

FSTYP         -- xfs (non-debug)
PLATFORM      -- Linux/x86_64 ibm-x3650m4-10 4.20.0-rc1
MKFS_OPTIONS  -- -f -m reflink=0,rmapbt=0,finobt=1,crc=1 -i sparse=1 /dev/mapper/rhel_ibm--x3650m4--10-xfscratch
MOUNT_OPTIONS -- -o context=system_u:object_r:root_t:s0 /dev/mapper/rhel_ibm--x3650m4--10-xfscratch /mnt/scratch

generic/482 841s ... [failed, exit status 1]- output mismatch (see /root/git/xfstests-dev/results//generic/482.out.bad)
    --- tests/generic/482.out   2018-11-25 23:54:50.488528619 -0500
    +++ /root/git/xfstests-dev/results//generic/482.out.bad     2018-12-05 05:07:01.060133234 -0500
    @@ -1,2 +1,3 @@
     QA output created by 482
    -Silence is golden
    +_check_xfs_filesystem: filesystem on /dev/mapper/rhel_ibm--x3650m4--10-xfscratch is inconsistent (r)
    +(see /root/git/xfstests-dev/results//generic/482.full for details)
    ...
    (Run 'diff -u /root/git/xfstests-dev/tests/generic/482.out /root/git/xfstests-dev/results//generic/482.out.bad'  to see the entire diff)
Ran: generic/482
Failures: generic/482
Failed 1 of 1 tests

# cat results/generic/482.full
# ./ltp/fsstress -w -d /mnt/scratch -n 512 -p 8
seed = 1543732939
mark mkfs has entry number 235
next fua is entry number 239
=== replay to 239 ===
next fua is entry number 244
=== replay to 244 ===
next fua is entry number 247
=== replay to 247 ===
next fua is entry number 259
=== replay to 259 ===
next fua is entry number 283
=== replay to 283 ===
next fua is entry number 284
=== replay to 284 ===
next fua is entry number 289
=== replay to 289 ===
next fua is entry number 309
=== replay to 309 ===
next fua is entry number 319
=== replay to 319 ===
_check_xfs_filesystem: filesystem on /dev/mapper/rhel_ibm--x3650m4--10-xfscratch is inconsistent (r)
*** xfs_repair -n output ***
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan (but don't clear) agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
bad symlink header ino 134304910, file block 0, disk block 16788111
problem with symbolic link in inode 134304910
would have cleared inode 134304910
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 2
        - agno = 1
entry "l6" in shortform directory 134304896 references free inode 134304910
        - agno = 3
would have junked entry "l6" in directory inode 134304896
bad symlink header ino 134304910, file block 0, disk block 16788111
problem with symbolic link in inode 134304910
would have cleared inode 134304910
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
        - traversing filesystem ...
entry "l6" in shortform directory inode 134304896 points to free inode 134304910
would junk entry
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify link counts...
Maximum metadata LSN (1:962) is ahead of log (1:141).
Would format log to cycle 4.
No modify flag set, skipping filesystem flush and exiting.