sometimes, dead lock when make system call SYS_getdents64 with fsync() is called by another process. monkey running on android9.0 1. task 9785 held sbi->cp_rwsem and waiting lock_page() 2. task 10349 held mm_sem and waiting sbi->cp_rwsem 3. task 9709 held lock_page() and waiting mm_sem so this is a dead lock scenario. task stack is show by crash tools as following crash_arm64> bt ffffffc03c354080 PID: 9785 TASK: ffffffc03c354080 CPU: 1 COMMAND: "RxIoScheduler-3" #0 [ffffffc01b50f8a0] __switch_to at ffffff8008086d88 #1 [ffffffc01b50f8c0] __schedule at ffffff8008a90840 #2 [ffffffc01b50f920] schedule at ffffff8008a90f2c #3 [ffffffc01b50f940] schedule_timeout at ffffff8008a940ec #4 [ffffffc01b50f9f0] io_schedule_timeout at ffffff8008a90544 #5 [ffffffc01b50fa20] bit_wait_io at ffffff8008a913b4 #6 [ffffffc01b50fa40] __wait_on_bit_lock at ffffff8008a91574 >> #7 [ffffffc01b50fac0] __lock_page at ffffff80081b11e8 #8 [ffffffc01b50fb30] f2fs_sync_node_pages at ffffff8008387d08 #9 [ffffffc01b50fc50] f2fs_write_checkpoint at ffffff8008376620 #10 [ffffffc01b50fd30] f2fs_sync_fs at ffffff800836b030 #11 [ffffffc01b50fda0] f2fs_do_sync_file at ffffff8008358a30 #12 [ffffffc01b50fe50] f2fs_sync_file at ffffff80083591c4 #13 [ffffffc01b50fe90] sys_fsync at ffffff800824cffc crash-arm64> bt 10349 PID: 10349 TASK: ffffffc018b83080 CPU: 1 COMMAND: "BUGLY_ASYNC_UPL" #0 [ffffffc01f8cf9a0] __switch_to at ffffff8008086d88 #1 [ffffffc01f8cf9c0] __schedule at ffffff8008a90840 #2 [ffffffc01f8cfa20] schedule at ffffff8008a90f2c >> #3 [ffffffc01f8cfa40] rwsem_down_read_failed at ffffff8008a93afc #4 [ffffffc01f8cfab0] down_read at ffffff8008a93360 #5 [ffffffc01f8cfad0] __do_map_lock at ffffff800837e758 #6 [ffffffc01f8cfb00] f2fs_vm_page_mkwrite at ffffff8008359c14 #7 [ffffffc01f8cfb90] do_page_mkwrite at ffffff80081e09cc #8 [ffffffc01f8cfc10] do_wp_page at ffffff80081e2de8 #9 [ffffffc01f8cfcb0] handle_mm_fault at ffffff80081e5228 #10 [ffffffc01f8cfd80] do_page_fault at ffffff800809d05c #11 [ffffffc01f8cfdf0] do_mem_abort at ffffff8008081570 #12 [ffffffc01f8cfed0] el0_da at ffffff8008085650 PC: 00000033 LR: 00000000 SP: 00000000 PSTATE: ffffffffffffffff crash-arm64> bt 9709 PID: 9709 TASK: ffffffc03e7f3080 CPU: 1 COMMAND: "IntentService[A" #0 [ffffffc001e677b0] __switch_to at ffffff8008086d88 #1 [ffffffc001e677d0] __schedule at ffffff8008a90840 #2 [ffffffc001e67830] schedule at ffffff8008a90f2c >> #3 [ffffffc001e67850] rwsem_down_read_failed at ffffff8008a93afc #4 [ffffffc001e678c0] down_read at ffffff8008a93360 #5 [ffffffc001e678e0] do_page_fault at ffffff800809ceb8 #6 [ffffffc001e67950] do_translation_fault at ffffff800809d250 #7 [ffffffc001e67980] do_mem_abort at ffffff8008081570 >> #8 [ffffffc001e67b80] el1_ia at ffffff8008084fc4 PC: ffffff8008274114 [compat_filldir64+120] LR: ffffff80083584d4 [f2fs_fill_dentries+448] SP: ffffffc001e67b80 PSTATE: 80400145 X29: ffffffc001e67b80 X28: 0000000000000000 X27: 000000000000001a X26: 00000000000093d7 X25: ffffffc070d52480 X24: 0000000000000008 X23: 0000000000000028 X22: 00000000d43dfd60 X21: ffffffc001e67e90 X20: 0000000000000011 X19: ffffff80093a4000 X18: 0000000000000000 X17: 0000000000000000 X16: 0000000000000000 X15: 0000000000000000 X14: ffffffffffffffff X13: 0000000000000008 X12: 0101010101010101 X11: 7f7f7f7f7f7f7f7f X10: 6a6a6a6a6a6a6a6a X9: 7f7f7f7f7f7f7f7f X8: 0000000080808000 X7: ffffff800827409c X6: 0000000080808000 X5: 0000000000000008 X4: 00000000000093d7 X3: 000000000000001a X2: 0000000000000011 X1: ffffffc070d52480 X0: 0000000000800238 >> #9 [ffffffc001e67be0] f2fs_fill_dentries at ffffff80083584d0 #10 [ffffffc001e67ca0] f2fs_read_inline_dir at ffffff8008372ca4 #11 [ffffffc001e67d20] f2fs_readdir at ffffff80083588a0 #12 [ffffffc001e67de0] iterate_dir at ffffff8008228e90 #13 [ffffffc001e67e30] compat_sys_getdents64 at ffffff8008278894 #14 [ffffffc001e67ed0] __sys_trace at ffffff8008085b48 PC: 0000003c LR: 00000000 SP: 00000000 PSTATE: 000000d9 X12: f48a02ff X11: d4678960 X10: d43dfc00 X9: d4678ae4 X8: 00000058 X7: d4678994 X6: d43de800 X5: 000000d9 X4: d43dfc0c X3: d43dfc10 X2: d46799c8 X1: 00000000 X0: 00001068
for task 9709, do_page_fault() is triggered by __put_user_unaligned() in compat_filldir64(). static int compat_filldir64(struct dir_context *ctx, const char *name, int namlen, loff_t offset, u64 ino, unsigned int d_type) { ..... if (dirent) { if (__put_user_unaligned(offset, &dirent->d_off)) goto efault; } dirent is a local array[] in user space, when want to access dirent->d_off, do_page_fault() is triggered and want to alloc real memory. dirent is a valid user space address
Hi Jiqun, I just sent one patch for that, could you try the patch? [PATCH] f2fs: fix to avoid deadlock in f2fs_read_inline_dir() The solution is very similar to what we did in another patch as below, we just take off the lock directly, since it's unneeded. f2fs: no need to take page lock in readdir https://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git/commit/?h=dev-test&id=f3d42f6e5087a4b3a52ee4008734af93519dec06
thanks, I got it
I will try it
The fixing patch has been merged, close this issue. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=aadcef64b22f668c1a107b86d3521d9cac915c24