Bug 200063 - XFS corruption: mismatch between format (2) and size (0) in symlink
Summary: XFS corruption: mismatch between format (2) and size (0) in symlink
Status: NEW
Alias: None
Product: File System
Classification: Unclassified
Component: XFS (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: FileSystem/XFS Default Virtual Assignee
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-06-13 08:20 UTC by Zorro Lang
Modified: 2018-11-12 22:08 UTC (History)
1 user (show)

See Also:
Kernel Version: 4.17
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Zorro Lang 2018-06-13 08:20:05 UTC
I hit a corruption when I use "XFS_IOC_GOINGDOWN" ioctl to simulate powerfailure:

Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan (but don't clear) agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
mismatch between format (2) and size (0) in symlink ino 50358026
bad data fork in symlink 50358026
would have cleared inode 50358026
        - process newly discovered inodes...  
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list... 
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 3
        - agno = 2
mismatch between format (2) and size (0) in symlink ino 50358026
bad data fork in symlink 50358026
would have cleared inode 50358026
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify link counts...
No modify flag set, skipping filesystem flush and exiting.

Version-Release number of selected component (if applicable):
xfsprogs 4.16.1
kernel 4.17

How reproducible:
about 1%

Steps to Reproduce:
1. prepare a 30G local device
2. mkfs.xfs -f $dev
3. mount $dev $mnt
4. fs_mark  -d  $mnt  -n  10000  -N  10  -D  1000  -s  409600  -r  8  -t  10  -k &
5. -d $SCRATCH_MNT/ -n 1000 -p 100 -c -l 0 &
6. wait 1~15min randomly
7. xfs_io -xc "shutdown" $mnt
8. kill fs_mark and fssstress processes
9. umount $dev
10. mount $dev $mnt
11. umount $dev
12. xfs_repair -n $dev
13. loop run above 12 steps until reproduce this bug.

Additional info:
XFS info:
meta-data=/dev/mapper/xxxx-xfscratch isize=512    agcount=4, agsize=1966080 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=0
         =                       reflink=0
data     =                       bsize=4096   blocks=7864320, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=3840, version=2
         =                       sectsz=512   sunit=0 blks, lazy-co
Comment 1 Zorro Lang 2018-06-13 08:28:03 UTC
dmesg log:

[586904.373660] XFS (dm-3): Starting recovery (logdev: internal)
[586914.907824] XFS (dm-3): Metadata corruption detected at xfs_dinode_verify.part.5+0x27c/0x450 [xfs], inode 0x300670a dinode
[586914.920278] XFS (dm-3): Unmount and run xfs_repair
[586914.925734] XFS (dm-3): First 128 bytes of corrupted metadata buffer:
[586914.933034] 00000000bfd2c617: 49 4e a1 ff 03 02 00 00 00 00 00 00 00 00 00 00  IN..............
[586914.942844] 000000008bc40e32: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[586914.952644] 0000000047723d40: 5b 20 a7 8a 1e be a2 95 5b 20 a7 8a 1e be a2 95  [ ......[ ......
[586914.962452] 00000000a7866534: 5b 20 a7 cc 19 76 5e 4a 00 00 00 00 00 00 00 00  [ ...v^J........
[586914.972261] 00000000d84c3ac0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[586914.982058] 000000009e465a94: 00 00 00 02 00 00 00 00 00 00 00 00 07 43 52 07  .............CR.
[586914.991866] 000000006db3fac1: ff ff ff ff d2 03 50 9f 00 00 00 00 00 00 00 09  ......P.........
[586915.001675] 00000000c42b6120: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[586915.011498] XFS (dm-3): Ending recovery (logdev: internal)
Comment 2 Eric Sandeen 2018-06-15 01:42:45 UTC
dchinner's looking at this one
Comment 3 Eric Sandeen 2018-06-15 02:02:35 UTC
[PATCH] xfs: symlinks can be zero length during log recovery
Comment 4 Zorro Lang 2018-11-12 13:31:26 UTC
I still can reproduce this bug on linux v4.20-rc1. So we haven't merged this patch (in comment 3), right?

Phase 1 - find and verify superblock...
        - reporting progress in intervals of 15 minutes
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
        - 14:10:29: scanning filesystem freespace - 16 of 16 allocation groups done
        - found root inode chunk
Phase 3 - for each AG...
        - scan (but don't clear) agi unlinked lists...
        - 14:10:29: scanning agi unlinked lists - 16 of 16 allocation groups done
        - process known inodes and perform inode discovery...
        - agno = 15
        - agno = 0
        - agno = 1
mismatch between format (2) and size (0) in symlink ino 125888517
bad data fork in symlink 125888517
would have cleared inode 125888517
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - agno = 11
        - agno = 12
        - agno = 13
        - agno = 14
        - 14:10:32: process known inodes and inode discovery - 50048 of 50048 inodes done
        - process newly discovered inodes...
        - 14:10:32: process newly discovered inodes - 16 of 16 allocation groups done
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - 14:10:32: setting up duplicate extent list - 16 of 16 allocation groups done
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 13
        - agno = 2
        - agno = 3
        - agno = 5
        - agno = 4
        - agno = 6
        - agno = 7
        - agno = 9
        - agno = 11
        - agno = 10
        - agno = 8
        - agno = 14
        - agno = 12
        - agno = 15
        - agno = 1
mismatch between format (2) and size (0) in symlink ino 125888517
bad data fork in symlink 125888517
would have cleared inode 125888517
        - 14:10:32: check for inodes claiming duplicate blocks - 50048 of 50048 inodes done
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify link counts...
        - 14:10:33: verify and correct link counts - 16 of 16 allocation groups done
Comment 5 Dave Chinner 2018-11-12 22:08:22 UTC
On Mon, Nov 12, 2018 at 01:31:26PM +0000, bugzilla-daemon@bugzilla.kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=200063
> 
> --- Comment #4 from Zorro Lang (zlang@redhat.com) ---
> I still can reproduce this bug on linux v4.20-rc1. So we haven't merged this
> patch (in comment 3), right?

Patch was superceded, and I never got around to posting the new
version. It's still sitting in my stack, I'll try to get it out in
the next coupel of days.

Cheers,

Dave.

Note You need to log in before you can comment on or make changes to this bug.