Bug 200063

Summary: XFS corruption: mismatch between format (2) and size (0) in symlink
Product: File System Reporter: Zorro Lang (zlang)
Component: XFSAssignee: FileSystem/XFS Default Virtual Assignee (filesystem_xfs)
Status: NEW ---    
Severity: normal CC: sandeen
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 4.17 Subsystem:
Regression: No Bisected commit-id:

Description Zorro Lang 2018-06-13 08:20:05 UTC
I hit a corruption when I use "XFS_IOC_GOINGDOWN" ioctl to simulate powerfailure:

Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan (but don't clear) agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
mismatch between format (2) and size (0) in symlink ino 50358026
bad data fork in symlink 50358026
would have cleared inode 50358026
        - process newly discovered inodes...  
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list... 
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 3
        - agno = 2
mismatch between format (2) and size (0) in symlink ino 50358026
bad data fork in symlink 50358026
would have cleared inode 50358026
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify link counts...
No modify flag set, skipping filesystem flush and exiting.

Version-Release number of selected component (if applicable):
xfsprogs 4.16.1
kernel 4.17

How reproducible:
about 1%

Steps to Reproduce:
1. prepare a 30G local device
2. mkfs.xfs -f $dev
3. mount $dev $mnt
4. fs_mark  -d  $mnt  -n  10000  -N  10  -D  1000  -s  409600  -r  8  -t  10  -k &
5. -d $SCRATCH_MNT/ -n 1000 -p 100 -c -l 0 &
6. wait 1~15min randomly
7. xfs_io -xc "shutdown" $mnt
8. kill fs_mark and fssstress processes
9. umount $dev
10. mount $dev $mnt
11. umount $dev
12. xfs_repair -n $dev
13. loop run above 12 steps until reproduce this bug.

Additional info:
XFS info:
meta-data=/dev/mapper/xxxx-xfscratch isize=512    agcount=4, agsize=1966080 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=0
         =                       reflink=0
data     =                       bsize=4096   blocks=7864320, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=3840, version=2
         =                       sectsz=512   sunit=0 blks, lazy-co
Comment 1 Zorro Lang 2018-06-13 08:28:03 UTC
dmesg log:

[586904.373660] XFS (dm-3): Starting recovery (logdev: internal)
[586914.907824] XFS (dm-3): Metadata corruption detected at xfs_dinode_verify.part.5+0x27c/0x450 [xfs], inode 0x300670a dinode
[586914.920278] XFS (dm-3): Unmount and run xfs_repair
[586914.925734] XFS (dm-3): First 128 bytes of corrupted metadata buffer:
[586914.933034] 00000000bfd2c617: 49 4e a1 ff 03 02 00 00 00 00 00 00 00 00 00 00  IN..............
[586914.942844] 000000008bc40e32: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[586914.952644] 0000000047723d40: 5b 20 a7 8a 1e be a2 95 5b 20 a7 8a 1e be a2 95  [ ......[ ......
[586914.962452] 00000000a7866534: 5b 20 a7 cc 19 76 5e 4a 00 00 00 00 00 00 00 00  [ ...v^J........
[586914.972261] 00000000d84c3ac0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[586914.982058] 000000009e465a94: 00 00 00 02 00 00 00 00 00 00 00 00 07 43 52 07  .............CR.
[586914.991866] 000000006db3fac1: ff ff ff ff d2 03 50 9f 00 00 00 00 00 00 00 09  ......P.........
[586915.001675] 00000000c42b6120: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[586915.011498] XFS (dm-3): Ending recovery (logdev: internal)
Comment 2 Eric Sandeen 2018-06-15 01:42:45 UTC
dchinner's looking at this one
Comment 3 Eric Sandeen 2018-06-15 02:02:35 UTC
[PATCH] xfs: symlinks can be zero length during log recovery
Comment 4 Zorro Lang 2018-11-12 13:31:26 UTC
I still can reproduce this bug on linux v4.20-rc1. So we haven't merged this patch (in comment 3), right?

Phase 1 - find and verify superblock...
        - reporting progress in intervals of 15 minutes
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
        - 14:10:29: scanning filesystem freespace - 16 of 16 allocation groups done
        - found root inode chunk
Phase 3 - for each AG...
        - scan (but don't clear) agi unlinked lists...
        - 14:10:29: scanning agi unlinked lists - 16 of 16 allocation groups done
        - process known inodes and perform inode discovery...
        - agno = 15
        - agno = 0
        - agno = 1
mismatch between format (2) and size (0) in symlink ino 125888517
bad data fork in symlink 125888517
would have cleared inode 125888517
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - agno = 11
        - agno = 12
        - agno = 13
        - agno = 14
        - 14:10:32: process known inodes and inode discovery - 50048 of 50048 inodes done
        - process newly discovered inodes...
        - 14:10:32: process newly discovered inodes - 16 of 16 allocation groups done
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - 14:10:32: setting up duplicate extent list - 16 of 16 allocation groups done
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 13
        - agno = 2
        - agno = 3
        - agno = 5
        - agno = 4
        - agno = 6
        - agno = 7
        - agno = 9
        - agno = 11
        - agno = 10
        - agno = 8
        - agno = 14
        - agno = 12
        - agno = 15
        - agno = 1
mismatch between format (2) and size (0) in symlink ino 125888517
bad data fork in symlink 125888517
would have cleared inode 125888517
        - 14:10:32: check for inodes claiming duplicate blocks - 50048 of 50048 inodes done
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify link counts...
        - 14:10:33: verify and correct link counts - 16 of 16 allocation groups done
Comment 5 Dave Chinner 2018-11-12 22:08:22 UTC
On Mon, Nov 12, 2018 at 01:31:26PM +0000, bugzilla-daemon@bugzilla.kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=200063
> 
> --- Comment #4 from Zorro Lang (zlang@redhat.com) ---
> I still can reproduce this bug on linux v4.20-rc1. So we haven't merged this
> patch (in comment 3), right?

Patch was superceded, and I never got around to posting the new
version. It's still sitting in my stack, I'll try to get it out in
the next coupel of days.

Cheers,

Dave.