Bug 199769 - Kernel BUG() on corrupted F2FS partition mount
Summary: Kernel BUG() on corrupted F2FS partition mount
Status: RESOLVED CODE_FIX
Alias: None
Product: File System
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: fs_other
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-05-19 11:27 UTC by Anatoly Trosinenko
Modified: 2018-08-31 08:19 UTC (History)
3 users (show)

See Also:
Kernel Version: commit 73fcb1a370c76 (Makefile: 4.17.0-rc5)
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Corrupted filesystem image (1.05 KB, application/x-bzip)
2018-05-19 11:27 UTC, Anatoly Trosinenko
Details
Kernel .config file (102.52 KB, text/plain)
2018-05-19 11:28 UTC, Anatoly Trosinenko
Details
Kernel log (21.60 KB, text/plain)
2018-05-19 11:28 UTC, Anatoly Trosinenko
Details
New kernel config (112.40 KB, text/x-mpsub)
2018-06-16 09:24 UTC, Anatoly Trosinenko
Details
`fsck.f2fs -a f2fs.img` log (Patch, mount, then run fsck in host system) (3.60 KB, text/plain)
2018-06-16 11:21 UTC, Anatoly Trosinenko
Details

Description Anatoly Trosinenko 2018-05-19 11:27:32 UTC
Created attachment 276061 [details]
Corrupted filesystem image

How to reproduce:
1. Compile the 73fcb1a370c76 version of the kernel using the config attached
2. Unpack and mount the attached filesystem image as F2FS
3. The kernel will BUG() on mount (BUGs are explicitly enabled in config)

[    2.233612] F2FS-fs (sda): Found nat_bits in checkpoint
[    2.248422] ------------[ cut here ]------------
[    2.248857] kernel BUG at fs/f2fs/node.c:1967!
[    2.249760] invalid opcode: 0000 [#1] SMP NOPTI
[    2.250219] Modules linked in:
[    2.251848] CPU: 0 PID: 944 Comm: mount Not tainted 4.17.0-rc5+ #1
[    2.252331] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
[    2.253305] RIP: 0010:build_free_nids+0x337/0x3f0
[    2.253672] RSP: 0018:ffffae7fc0857c50 EFLAGS: 00000246
[    2.254080] RAX: 00000000ffffffff RBX: 0000000000000123 RCX: 0000000000000001
[    2.254638] RDX: ffff9aa7063d5c00 RSI: 0000000000000122 RDI: ffff9aa705852e00
[    2.255190] RBP: ffff9aa705852e00 R08: 0000000000000001 R09: ffff9aa7059090c0
[    2.255719] R10: 0000000000000000 R11: 0000000000000000 R12: ffff9aa705852e00
[    2.256242] R13: ffff9aa7063ad000 R14: ffff9aa705919000 R15: 0000000000000123
[    2.256809] FS:  00000000023078c0(0000) GS:ffff9aa707800000(0000) knlGS:0000000000000000
[    2.258654] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    2.259153] CR2: 00000000005511ae CR3: 0000000005872000 CR4: 00000000000006f0
[    2.259801] Call Trace:
[    2.260583]  build_node_manager+0x5cd/0x600
[    2.260963]  f2fs_fill_super+0x66a/0x17c0
[    2.261300]  ? f2fs_commit_super+0xe0/0xe0
[    2.261622]  mount_bdev+0x16e/0x1a0
[    2.261899]  mount_fs+0x30/0x150
[    2.262398]  vfs_kern_mount.part.28+0x4f/0xf0
[    2.262743]  do_mount+0x5d0/0xc60
[    2.263010]  ? _copy_from_user+0x37/0x60
[    2.263313]  ? memdup_user+0x39/0x60
[    2.263692]  ksys_mount+0x7b/0xd0
[    2.263960]  __x64_sys_mount+0x1c/0x20
[    2.264268]  do_syscall_64+0x43/0xf0
[    2.264560]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[    2.265095] RIP: 0033:0x48d31a
[    2.265502] RSP: 002b:00007ffc6fe60a08 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
[    2.266089] RAX: ffffffffffffffda RBX: 0000000000008000 RCX: 000000000048d31a
[    2.266607] RDX: 00007ffc6fe62fa5 RSI: 00007ffc6fe62f9d RDI: 00007ffc6fe62f94
[    2.267130] RBP: 00000000023078a0 R08: 0000000000000000 R09: 0000000000000000
[    2.267670] R10: 0000000000008000 R11: 0000000000000246 R12: 0000000000000000
[    2.268192] R13: 0000000000000000 R14: 00007ffc6fe60c78 R15: 0000000000000000
[    2.268767] Code: e8 5f c3 ff ff 83 c3 01 41 83 c7 01 81 fb c7 01 00 00 74 48 44 39 7d 04 76 42 48 63 c3 48 8d 04 c0 41 8b 44 06 05 83 f8 ff 75 c1 <0f> 0b 49 8b 45 50 48 8d b8 b0 00 00 00 e8 37 59 69 00 b9 01 00 
[    2.270434] RIP: build_free_nids+0x337/0x3f0 RSP: ffffae7fc0857c50
[    2.271426] ---[ end trace ab20c06cd3c8fde4 ]---
Comment 1 Anatoly Trosinenko 2018-05-19 11:28:05 UTC
Created attachment 276063 [details]
Kernel .config file
Comment 2 Anatoly Trosinenko 2018-05-19 11:28:26 UTC
Created attachment 276065 [details]
Kernel log
Comment 3 Chao Yu 2018-06-15 08:33:49 UTC
Hi, Anatoly

Thanks for your report. :)

https://sourceforge.net/p/linux-f2fs/mailman/linux-f2fs-devel/thread/20180615064557.5096-1-yuchao0%40huawei.com/#msg36343428

I just sent one patch to fix this issue, could you have a try with it?
Comment 4 Anatoly Trosinenko 2018-06-16 09:21:57 UTC
Hi, Chao

Tested this patch against current master branch (commit 35773c938) with the attached config. Now mount returns Invalid argument:

[    1.565297] F2FS-fs (sda): Found nat_bits in checkpoint
[    1.577434] F2FS-fs (sda): NAT is corrupt, run fsck to fix it
[    1.577539] F2FS-fs (sda): Failed to initialize F2FS node manager
[    1.581935] F2FS-fs (sda): Found nat_bits in checkpoint
[    1.593903] F2FS-fs (sda): NAT is corrupt, run fsck to fix it
[    1.593987] F2FS-fs (sda): Failed to initialize F2FS node manager
mount: mounting /dev/sda on /mnt failed: Invalid argument


Meanwhile, looks like before this patch mount produced kernel warning but mount was successfull, and I was able to operate on this partition (maybe on some previous snapshot, I don't know...).
Comment 5 Anatoly Trosinenko 2018-06-16 09:24:23 UTC
Created attachment 276593 [details]
New kernel config

Update kernel .config for recent master branch and possibly more defconfig-like settings.
Comment 6 Chao Yu 2018-06-16 10:33:59 UTC
(In reply to Anatoly Trosinenko from comment #4)
> Hi, Chao
> 
> Tested this patch against current master branch (commit 35773c938) with the
> attached config. Now mount returns Invalid argument:
> 
> [    1.565297] F2FS-fs (sda): Found nat_bits in checkpoint
> [    1.577434] F2FS-fs (sda): NAT is corrupt, run fsck to fix it
> [    1.577539] F2FS-fs (sda): Failed to initialize F2FS node manager
> [    1.581935] F2FS-fs (sda): Found nat_bits in checkpoint
> [    1.593903] F2FS-fs (sda): NAT is corrupt, run fsck to fix it
> [    1.593987] F2FS-fs (sda): Failed to initialize F2FS node manager
> mount: mounting /dev/sda on /mnt failed: Invalid argument
> 
> 
> Meanwhile, looks like before this patch mount produced kernel warning but
> mount was successfull, and I was able to operate on this partition (maybe on
> some previous snapshot, I don't know...).

I just fix the issue of mounting corrupted image makes kernel panic, after
applying the patch, it fails mounting as we expect, and in kmsg we give hint
as below to run fsck to repair the corrupted image, could you please try
fsck and mount again?

"F2FS-fs (sda): NAT is corrupt, run fsck to fix it"

Could you reproduce to generate corrupted image like the one you attached?
Comment 7 Anatoly Trosinenko 2018-06-16 11:21:32 UTC
Created attachment 276597 [details]
`fsck.f2fs -a f2fs.img` log (Patch, mount, then run fsck in host system)
Comment 8 Anatoly Trosinenko 2018-06-16 11:27:00 UTC
(In reply to Chao Yu from comment #6)
> I just fix the issue of mounting corrupted image makes kernel panic, after
> applying the patch, it fails mounting as we expect, and in kmsg we give hint
> as below to run fsck to repair the corrupted image,

Oops, I mean not "it still doesn't work", but "now it correctly returns an error for broken FS and not page faulting, looks OK". :)

> could you please try fsck and mount again?

Booted patched kernel, mounted broken FS image, got

[    4.759660] F2FS-fs (sda): Found nat_bits in checkpoint
[    4.771137] F2FS-fs (sda): NAT is corrupt, run fsck to fix it
[    4.771555] F2FS-fs (sda): Failed to initialize F2FS node manager
[    4.776247] F2FS-fs (sda): Found nat_bits in checkpoint
[    4.786226] F2FS-fs (sda): NAT is corrupt, run fsck to fix it
[    4.786614] F2FS-fs (sda): Failed to initialize F2FS node manager
mount: mounting /dev/sda on /mnt failed: Invalid argument
[    4.792579] exe (1000) used greatest stack depth: 13872 bytes left

Then closed QEMU, run `fsck -a f2fs.img` (see attached log, please note that it is not the most recent fsck version). Then booted and mounted again:

[    4.776816] F2FS-fs (sda): Found nat_bits in checkpoint
[    4.790852] F2FS-fs (sda): Mounted with checkpoint version = 1eda0cf8
[    4.792207] exe (1000) used greatest stack depth: 13872 bytes left

And now FS is mounted and I can operate on it.

> "F2FS-fs (sda): NAT is corrupt, run fsck to fix it"
> 
> Could you reproduce to generate corrupted image like the one you attached?

It already produces such message. What do you mean?
Comment 9 Chao Yu 2018-06-16 11:51:51 UTC
(In reply to Anatoly Trosinenko from comment #8)
> (In reply to Chao Yu from comment #6)
> > I just fix the issue of mounting corrupted image makes kernel panic, after
> > applying the patch, it fails mounting as we expect, and in kmsg we give
> hint
> > as below to run fsck to repair the corrupted image,
> 
> Oops, I mean not "it still doesn't work", but "now it correctly returns an
> error for broken FS and not page faulting, looks OK". :)

Got you, :)

> 
> > could you please try fsck and mount again?
> 
> Booted patched kernel, mounted broken FS image, got
> 
> [    4.759660] F2FS-fs (sda): Found nat_bits in checkpoint
> [    4.771137] F2FS-fs (sda): NAT is corrupt, run fsck to fix it
> [    4.771555] F2FS-fs (sda): Failed to initialize F2FS node manager
> [    4.776247] F2FS-fs (sda): Found nat_bits in checkpoint
> [    4.786226] F2FS-fs (sda): NAT is corrupt, run fsck to fix it
> [    4.786614] F2FS-fs (sda): Failed to initialize F2FS node manager
> mount: mounting /dev/sda on /mnt failed: Invalid argument
> [    4.792579] exe (1000) used greatest stack depth: 13872 bytes left
> 
> Then closed QEMU, run `fsck -a f2fs.img` (see attached log, please note that
> it is not the most recent fsck version). Then booted and mounted again:
> 
> [    4.776816] F2FS-fs (sda): Found nat_bits in checkpoint
> [    4.790852] F2FS-fs (sda): Mounted with checkpoint version = 1eda0cf8
> [    4.792207] exe (1000) used greatest stack depth: 13872 bytes left
> 
> And now FS is mounted and I can operate on it.
> 
> > "F2FS-fs (sda): NAT is corrupt, run fsck to fix it"
> > 
> > Could you reproduce to generate corrupted image like the one you attached?
> 
> It already produces such message. What do you mean?

I mean I still be interest in that how your image becoming corrupted, because before
loading/persisting the nat entry, we always do the sanity check on the entry (if 
nat_entry.blk_addr is 0xffffffff, it means the entry is invalidate), so theoretically,
the entry shouldn't be such status.

If there are any special operations you did before the image become corrupted, please
let me know.

Thanks,
Comment 10 Anatoly Trosinenko 2018-06-16 12:00:00 UTC
> If there are any special operations you did before the image become
> corrupted, please let me know.

It was artificially corrupted by fuzzer, not your driver. Please excuse me, if I mis-informed you. (Are you interested in such reports?)
Comment 11 Chao Yu 2018-06-16 13:57:36 UTC
(In reply to Anatoly Trosinenko from comment #10)
> It was artificially corrupted by fuzzer, not your driver. Please excuse me,
> if I mis-informed you. (Are you interested in such reports?)

Alright, if there is further bug that found by fuzzer, I'm interested in
looking into it. BTW, if you want to report other bugs, please Cc f2fs mailing
list as below in where I think the response will be more quick. :)

linux-f2fs-devel@lists.sourceforge.net

Thanks,

Note You need to log in before you can comment on or make changes to this bug.