Bug 208325

Summary: f2fs inconsistent node block
Product: File System Reporter: zKqri0
Component: f2fsAssignee: Default virtual assignee for f2fs (filesystem_f2fs)
Status: NEEDINFO ---    
Severity: normal CC: chao
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 5.7.2-arch1-1 Subsystem:
Regression: No Bisected commit-id:

Description zKqri0 2020-06-26 10:22:46 UTC
i was working with an android kernel tree and once when i was trying to build i started getting errors like "make[1]: stat: drivers/input/touchscreen/Kconfig: Invalid argument". checked my laptop's dmesg and there were a bunch of errors like "F2FS-fs (sda2): inconsistent node block, nid:1761978, node_footer[nid:540703074,ino:1937074503,ofs:212733452,cpver:7020105085017473824,blkaddr:1008758642]".


rebooted and the problem is still there whenever i try to build that kernel tree. probably could just delete and clone it again but i kept it in case having it will help with finding what caused this bug. its probably not caused by ssd failure as i didnt get any sata errors and ive never got files corrupted before and after this happened.
Comment 1 Chao Yu 2020-06-28 12:44:25 UTC
Hi, thanks for the report.

What's you mkfs/mount option?

I've no idea whether this is a f2fs bug or not, as you said device can be trusted, so almostly it should be a software bug.

One case I can image could be that apps bypassing filesystem to write data via LBA directly, then data can be corrupted.

If possible, could you please help to add below three patches to recompile the kernel

https://lore.kernel.org/linux-f2fs-devel/20200628122940.29665-1-yuchao0@huawei.com/T/#t

[f2fs-dev] [PATCH 1/3] f2fs: fix wrong return value of f2fs_bmap_compress()
[f2fs-dev] [PATCH 2/3] f2fs: support to trace f2fs_bmap()
[f2fs-dev] [PATCH 3/3] f2fs: support to trace f2fs_fiemap()

Then, use below commands to see whether there is apps are lookuping LBA:

echo 1 > /sys/kernel/debug/tracing/events/f2fs/f2fs_bmap/enable
echo 1 > /sys/kernel/debug/tracing/events/f2fs/f2fs_fiemap/enable
cat /sys/kernel/debug/tracing/trace_pipe |grep f2fs
Comment 2 zKqri0 2020-06-29 06:17:37 UTC
(In reply to Chao Yu from comment #1)
> Hi, thanks for the report.
> 
> What's you mkfs/mount option?
> 
> I've no idea whether this is a f2fs bug or not, as you said device can be
> trusted, so almostly it should be a software bug.
> 
> One case I can image could be that apps bypassing filesystem to write data
> via LBA directly, then data can be corrupted.
> 
> If possible, could you please help to add below three patches to recompile
> the kernel
> 
> https://lore.kernel.org/linux-f2fs-devel/20200628122940.29665-1-
> yuchao0@huawei.com/T/#t
> 
> [f2fs-dev] [PATCH 1/3] f2fs: fix wrong return value of f2fs_bmap_compress()
> [f2fs-dev] [PATCH 2/3] f2fs: support to trace f2fs_bmap()
> [f2fs-dev] [PATCH 3/3] f2fs: support to trace f2fs_fiemap()
> 
> Then, use below commands to see whether there is apps are lookuping LBA:
> 
> echo 1 > /sys/kernel/debug/tracing/events/f2fs/f2fs_bmap/enable
> echo 1 > /sys/kernel/debug/tracing/events/f2fs/f2fs_fiemap/enable
> cat /sys/kernel/debug/tracing/trace_pipe |grep f2fs

Mount options are "/dev/sda2 on / type f2fs (rw,relatime,lazytime,background_gc=on,discard,no_heap,user_xattr,inline_xattr,acl,inline_data,inline_dentry,flush_merge,extent_cache,mode=adaptive,active_logs=6,alloc_mode=default,fsync_mode=posix)".

I patched my laptop's kernel with those patches but I don't see anything in "trace_pipe" while I'm getting invalid argument errors. Also I noticed that the "nid" and "node_footer" are the same always in the error so its only one node block that's messed up. 

Maybe a raw copy of that node block will help find what caused it ?
Comment 3 Chao Yu 2020-06-30 14:08:37 UTC
(In reply to zKqri0 from comment #2)
> Mount options are "/dev/sda2 on / type f2fs
> (rw,relatime,lazytime,background_gc=on,discard,no_heap,user_xattr,
> inline_xattr,acl,inline_data,inline_dentry,flush_merge,extent_cache,
> mode=adaptive,active_logs=6,alloc_mode=default,fsync_mode=posix)".

It looks it's default mount options.

Did you use any special mkfs options? like -O [feature_name]?

> 
> I patched my laptop's kernel with those patches but I don't see anything in
> "trace_pipe" while I'm getting invalid argument errors. Also I noticed that
> the "nid" and "node_footer" are the same always in the error so its only one
> node block that's messed up. 
> 
> Maybe a raw copy of that node block will help find what caused it ?

Yes, please, I can parse it with dentry_block, inode or dnode structure to see what it looks like, and what kind of fields are corrupted.
Comment 4 zKqri0 2020-07-01 05:01:03 UTC
(In reply to Chao Yu from comment #3)
> (In reply to zKqri0 from comment #2)
> > Mount options are "/dev/sda2 on / type f2fs
> > (rw,relatime,lazytime,background_gc=on,discard,no_heap,user_xattr,
> > inline_xattr,acl,inline_data,inline_dentry,flush_merge,extent_cache,
> > mode=adaptive,active_logs=6,alloc_mode=default,fsync_mode=posix)".
> 
> It looks it's default mount options.
> 
> Did you use any special mkfs options? like -O [feature_name]?
> 
> > 
> > I patched my laptop's kernel with those patches but I don't see anything in
> > "trace_pipe" while I'm getting invalid argument errors. Also I noticed that
> > the "nid" and "node_footer" are the same always in the error so its only
> one
> > node block that's messed up. 
> > 
> > Maybe a raw copy of that node block will help find what caused it ?
> 
> Yes, please, I can parse it with dentry_block, inode or dnode structure to
> see what it looks like, and what kind of fields are corrupted.


I used default mkfs options. Here is output of using dump.f2fs on that inode


>sudo dump.f2fs -i 1761978 /dev/sda2


>Info: [/dev/sda2] Disk Model: Samsung SSD 850 
>Info: Segments per section = 1
>Info: Sections per zone = 1
>Info: sector size = 512
>Info: total sectors = 102539264 (50068 MB)
>Info: MKFS version
>  "Linux version 4.20.0-arch1-1-ARCH (builduser@heftig-29859) (gcc version
>  8.2.1 20181127 (GCC)) #1 SMP PREEMPT Mon Dec 24 03:00:40 UTC 2018"
>Info: FSCK version
>  from "Linux version 5.7.2-arch1-1 (linux@archlinux) (gcc version 10.1.0
>  (GCC), GNU ld (GNU Binutils) 2.34.0) #1 SMP PREEMPT Wed, 10 Jun 2020
>  20:36:24 +0000"
>    to "Linux version 5.7.2-arch1-1 (linux@archlinux) (gcc version 10.1.0
>    (GCC), GNU ld (GNU Binutils) 2.34.0) #1 SMP PREEMPT Wed, 10 Jun 2020
>    20:36:24 +0000"
>Info: superblock features = 0 : 
>Info: superblock encrypt level = 0, salt = 00000000000000000000000000000000
>Info: total FS sectors = 102539264 (50068 MB)
>Info: CKPT version = 64d05005
>[print_node_info: 271] Node ID [0x203a7962:540703074] is direct node or
>>indirect node.
>[0]                    [0x5452202c : 1414668332]
>[1]                    [0x45445f4d : 1162108749]
>[2]                    [0x4444414c : 1145323852]
>[3]                    [0x202c2952 : 539765074]
>[4]                    [0x6e657665 : 1852143205]
>[5]                    [0x6f687420 : 1869116448]
>[6]                    [0x20686775 : 543713141]
>[7]                    [0x69207469 : 1763734633]
>[8]                    [0x6f6e2073 : 1869488243]
>[9]                    [0x46202e74 : 1176514164]
>[10]                   [0x69207869 : 1763735657]
>Invalid (i)node block

>Info: checkpoint state = 51 :  crc fsck unmount

>Done: 0.063481 secs
Comment 5 Chao Yu 2020-07-01 06:55:42 UTC
(In reply to zKqri0 from comment #4)
> >[print_node_info: 271] Node ID [0x203a7962:540703074] is direct node or
> >>indirect node.
> >[0]                    [0x5452202c : 1414668332]
> >[1]                    [0x45445f4d : 1162108749]
> >[2]                    [0x4444414c : 1145323852]
> >[3]                    [0x202c2952 : 539765074]
> >[4]                    [0x6e657665 : 1852143205]
> >[5]                    [0x6f687420 : 1869116448]
> >[6]                    [0x20686775 : 543713141]
> >[7]                    [0x69207469 : 1763734633]
> >[8]                    [0x6f6e2073 : 1869488243]
> >[9]                    [0x46202e74 : 1176514164]
> >[10]                   [0x69207869 : 1763735657]
> >Invalid (i)node block

I don't see any valid information from this, could you please upload the raw block if possible?
Comment 6 zKqri0 2020-07-05 07:40:37 UTC
(In reply to Chao Yu from comment #5)
> (In reply to zKqri0 from comment #4)
> > >[print_node_info: 271] Node ID [0x203a7962:540703074] is direct node or
> > >>indirect node.
> > >[0]                    [0x5452202c : 1414668332]
> > >[1]                    [0x45445f4d : 1162108749]
> > >[2]                    [0x4444414c : 1145323852]
> > >[3]                    [0x202c2952 : 539765074]
> > >[4]                    [0x6e657665 : 1852143205]
> > >[5]                    [0x6f687420 : 1869116448]
> > >[6]                    [0x20686775 : 543713141]
> > >[7]                    [0x69207469 : 1763734633]
> > >[8]                    [0x6f6e2073 : 1869488243]
> > >[9]                    [0x46202e74 : 1176514164]
> > >[10]                   [0x69207869 : 1763735657]
> > >Invalid (i)node block
> 
> I don't see any valid information from this, could you please upload the raw
> block if possible?

yeah there probably isnt because it seems like blk_addr is pointing to an invalid address. i took a dump of the node with "dd if=/dev/sda2 of=./node.bin bs=4096 skip=2900324352 count=4096 iflag=skip_bytes,count_bytes" with 2900324352 being "blk_addr >> 12" and it was part of a random git commit message and not a node block. anything else that would be useful to dump ?