Bug 101801

Summary: [f2fs] f2fs_gc NULL pointer dereference
Product: File System Reporter: Andreas Reis (andreas.reis)
Component: OtherAssignee: fs_other
Status: RESOLVED CODE_FIX    
Severity: normal CC: chao2.yu, jaegeuk
Priority: P1    
Hardware: x86-64   
OS: Linux   
Kernel Version: 4.2-rc3 + f2fs dev-test Subsystem:
Regression: No Bisected commit-id:

Description Andreas Reis 2015-07-22 13:02:57 UTC
On my f2fs USB stick used as ccache cache I get the following crash – so far only when compiling Chromium and apparently always at the same file.

fsck.f2fs (from f2fs-tools @ 2015-07-02) runs through fine but does not prevent the error:

"fsck.f2fs -af /dev/sdg
Info: Fix the reported corruption.
Info: Force to fix corruption
Info: sector size = 512
Info: total sectors = 62533296 (30533 MB)
Info: MKFS version
  ""
Info: FSCK version
  from "Linux version 4.2.0-1-cu (meme@mememe) (gcc version 5.2.0 (GCC) ) #1 SMP PREEMPT Tue Jul 21 00:13:15 CEST 2015"
    to "Linux version 4.2.0-1-cu (meme@mememe) (gcc version 5.2.0 (GCC) ) #1 SMP PREEMPT Tue Jul 21 16:35:16 CEST 2015"
Info: superblock features = 0 : 
Info: superblock encrypt level = 0, salt = 00000000000000000000000000000000
Info: total FS sectors = 62533296 (30533 MB)
Info: CKPT version = 8057
Info: checkpoint state = 5 :  compacted_summary unmount

[FSCK] Unreachable nat entries                        [Ok..] [0x0]
[FSCK] SIT valid block bitmap checking                [Ok..]
[FSCK] Hard link checking for regular file            [Ok..] [0x0]
[FSCK] valid_block_count matching with CP             [Ok..] [0x3fbb3c]
[FSCK] valid_node_count matcing with CP (de lookup)   [Ok..] [0xa4e54]
[FSCK] valid_node_count matcing with CP (nat lookup)  [Ok..] [0xa4e54]
[FSCK] valid_inode_count matched with CP              [Ok..] [0xa4855]
[FSCK] free segment_count matched with CP             [Ok..] [0x2e6]
[FSCK] next block offset is free                      [Ok..]
[FSCK] fixing SIT types
[FSCK] other corrupted bugs                           [Ok..]

Done."

Kernel is at Linux 4.2-rc3 patched with the commits from dev-test.

BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8
IP: [<ffffffff8149d82a>] __percpu_counter_add+0x1a/0x90
PGD 127014067 PUD 226f26067 PMD 0 
Oops: 0000 [#1] PREEMPT SMP 
task: ffff8803a422cf80 ti: ffff880139bf8000 task.ti: ffff880139bf8000
RIP: 0010:[<ffffffff8149d82a>]  [<ffffffff8149d82a>] __percpu_counter_add+0x1a/0x90
RSP: 0018:ffff880139bfb8b8  EFLAGS: 00010082
RAX: 0000000000000003 RBX: ffffea00062bc9c0 RCX: 0000000000000001
RDX: 0000000000000020 RSI: 0000000000000001 RDI: 0000000000000088
RBP: ffff880139bfb8d8 R08: 0000000000000000 R09: ffff88014156afc8
R10: 00000000fffffffa R11: 0000000000000000 R12: 0000000000000088
R13: 0000000000000001 R14: ffff88041d099000 R15: ffff880099ce5740
FS:  00007f0e69cc5840(0000) GS:ffff88042f280000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000000a8 CR3: 0000000114c31000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Stack:
 0000000000000000 ffffea00062bc9c0 ffff880099ce58a8 ffff880099ce58c0
 ffff880139bfb938 ffffffff811cc97e 00000000002c4df6 0000000000000292
 0000000000000000 ffff8800b4850198 0000000000000000 ffffea00062bc9c0
Call Trace:
 [<ffffffff811cc97e>] __test_set_page_writeback+0xde/0x1d0
 [<ffffffff813ffcf7>] do_write_data_page+0xe7/0x3a0
 [<ffffffff813fae0a>] gc_data_segment+0x5aa/0x640
 [<ffffffff813fafd8>] do_garbage_collect+0x138/0x150
 [<ffffffff813fb31e>] f2fs_gc+0x1be/0x3e0
 [<ffffffff81406451>] f2fs_balance_fs+0x81/0x90
 [<ffffffff813fdf93>] f2fs_write_begin+0x73/0x5e0
 [<ffffffff81426963>] ? security_file_open+0x93/0xa0
 [<ffffffff811c2fee>] generic_perform_write+0xbe/0x1d0
 [<ffffffff812485a9>] ? generic_update_time+0x79/0xc0
 [<ffffffff811c4560>] __generic_file_write_iter+0x190/0x1f0
 [<ffffffff811c469d>] generic_file_write_iter+0xdd/0x1f0
 [<ffffffff813ea83e>] f2fs_file_write_iter+0xe/0x10
 [<ffffffff8122e61c>] __vfs_write+0xcc/0x100
 [<ffffffff8122ee94>] vfs_write+0xa4/0x190
 [<ffffffff8122fb85>] SyS_write+0x55/0xc0
 [<ffffffff819416ee>] entry_SYSCALL_64_fastpath+0x12/0x71
Code: 41 5e 5d c3 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 55 49 89 f5 41 54 49 89 fc 53 48 83 ec 08 65 ff 05 a6 e0 b6 7e <48> 8b 47 20 48 63 ca 65 8b 18 48 63 db 48 01 f3 48 39 cb 7d 0a 
RIP  [<ffffffff8149d82a>] __percpu_counter_add+0x1a/0x90
Comment 1 Jaegeuk Kim 2015-07-22 16:20:07 UTC
Thank you for the report.
It seems that this is caused by recent changes on other components.
I'll do bisect.
Comment 2 Andreas Reis 2015-07-23 13:45:38 UTC
Another one, perhaps of use as it's a bit shorter:

BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8
IP: [<ffffffff8149deea>] __percpu_counter_add+0x1a/0x90
PGD 2951ff067 PUD 2df43f067 PMD 0 
Oops: 0000 [#1] PREEMPT SMP 
Modules linked in:
CPU: 7 PID: 10356 Comm: gcc Tainted: G        W       4.2.0-1-cu #1
Hardware name: Gigabyte Technology Co., Ltd. G1.Sniper M5/G1.Sniper M5, BIOS T01 02/03/2015
task: ffff880295044f80 ti: ffff880295140000 task.ti: ffff880295140000
RIP: 0010:[<ffffffff8149deea>]  [<ffffffff8149deea>] __percpu_counter_add+0x1a/0x90
RSP: 0018:ffff880295143ac8  EFLAGS: 00010082
RAX: 0000000000000003 RBX: ffffea000a526d40 RCX: 0000000000000001
RDX: 0000000000000020 RSI: 0000000000000001 RDI: 0000000000000088
RBP: ffff880295143ae8 R08: 0000000000000000 R09: ffff88008f69bb30
R10: 00000000fffffffa R11: 0000000000000000 R12: 0000000000000088
R13: 0000000000000001 R14: ffff88041d099000 R15: ffff880084a205d0
FS:  00007f8549374700(0000) GS:ffff88042f3c0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000000a8 CR3: 000000033e1d5000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Stack:
 0000000000000000 ffffea000a526d40 ffff880084a20738 ffff880084a20750
 ffff880295143b48 ffffffff811cc91e ffff880000000000 0000000000000296
 0000000000000000 ffff880417090198 0000000000000000 ffffea000a526d40
Call Trace:
 [<ffffffff811cc91e>] __test_set_page_writeback+0xde/0x1d0
 [<ffffffff813fee87>] do_write_data_page+0xe7/0x3a0
 [<ffffffff813faeea>] gc_data_segment+0x5aa/0x640
 [<ffffffff813fb0b8>] do_garbage_collect+0x138/0x150
 [<ffffffff813fb3fe>] f2fs_gc+0x1be/0x3e0
 [<ffffffff81405541>] f2fs_balance_fs+0x81/0x90
 [<ffffffff813ee357>] f2fs_unlink+0x47/0x1d0
 [<ffffffff81239329>] vfs_unlink+0x109/0x1b0
 [<ffffffff8123e3d7>] do_unlinkat+0x287/0x2c0
 [<ffffffff8123ebc6>] SyS_unlink+0x16/0x20
 [<ffffffff81942e2e>] entry_SYSCALL_64_fastpath+0x12/0x71
Code: 41 5e 5d c3 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 55 49 89 f5 41 54 49 89 fc 53 48 83 ec 08 65 ff 05 e6 d9 b6 7e <48> 8b 47 20 48 63 ca 65 8b 18 48 63 db 48 01 f3 48 39 cb 7d 0a 
RIP  [<ffffffff8149deea>] __percpu_counter_add+0x1a/0x90
 RSP <ffff880295143ac8>
CR2: 00000000000000a8
---[ end trace 5132449a58ed93a3 ]---
note: gcc[10356] exited with preempt_count 2

------------
System ran for another 60 sec, then dmesg ends with

INFO: rcu_preempt detected stalls on CPUs/tasks: { P10356} (detected by 6, t=18002 jiffies, g=49234, c=49233, q=32914)
gcc             x ffff8802951437d8     0 10356      0 0x00000002
 ffff8802951437d8 ffff8800b5359a80 ffff880295044f80 ffff88041c570618
 ffff880295144000 ffff880295143830 ffff880295143378 ffff8800830e49e8
 0000000000000046 ffff8802951437f8 ffffffff8193f3ce ffff880295143830
Call Trace:
 [<ffffffff8193f3ce>] schedule+0x3e/0x90
 [<ffffffff810e7f1b>] do_exit+0x81b/0xb10
 [<ffffffff81066f3e>] oops_end+0x9e/0xd0
 [<ffffffff810ac1b3>] no_context+0x163/0x370
 [<ffffffff8147aefd>] ? radix_tree_lookup+0xd/0x10
 [<ffffffff810ac4e5>] __bad_area_nosemaphore+0x125/0x230
 [<ffffffff814005db>] ? get_node_info+0x8b/0x230
 [<ffffffff810ac603>] bad_area_nosemaphore+0x13/0x20
 [<ffffffff810aca82>] __do_page_fault+0x222/0x450
 [<ffffffff814007e0>] ? read_node_page+0x60/0xb0
 [<ffffffff810accd2>] do_page_fault+0x22/0x30
 [<ffffffff81944c08>] page_fault+0x28/0x30
 [<ffffffff8149deea>] ? __percpu_counter_add+0x1a/0x90
 [<ffffffff8122117a>] ? mem_cgroup_begin_page_stat+0x1a/0xa0
 [<ffffffff811cc91e>] __test_set_page_writeback+0xde/0x1d0
 [<ffffffff813fee87>] do_write_data_page+0xe7/0x3a0
 [<ffffffff813faeea>] gc_data_segment+0x5aa/0x640
 [<ffffffff813fb0b8>] do_garbage_collect+0x138/0x150
 [<ffffffff813fb3fe>] f2fs_gc+0x1be/0x3e0
 [<ffffffff81405541>] f2fs_balance_fs+0x81/0x90
 [<ffffffff813ee357>] f2fs_unlink+0x47/0x1d0
 [<ffffffff81239329>] vfs_unlink+0x109/0x1b0
 [<ffffffff8123e3d7>] do_unlinkat+0x287/0x2c0
 [<ffffffff8123ebc6>] SyS_unlink+0x16/0x20
 [<ffffffff81942e2e>] entry_SYSCALL_64_fastpath+0x12/0x71
Comment 3 Andreas Reis 2015-08-24 13:57:40 UTC
Everything ok for the past month, so I'll close.