Bug 206153 - ubifs: Deadlock in concurrent bulk-read and writepage
Summary: ubifs: Deadlock in concurrent bulk-read and writepage
Status: NEW
Alias: None
Product: File System
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: fs_other
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-01-10 11:16 UTC by Zhihao Cheng
Modified: 2020-01-25 09:18 UTC (History)
1 user (show)

See Also:
Kernel Version: 5.5-rc5
Subsystem:
Regression: No
Bisected commit-id:


Attachments
reproduce source code in C (1.10 KB, text/x-csrc)
2020-01-10 11:54 UTC, Zhihao Cheng
Details
setup ubifs from a dumped flash file (524 bytes, application/x-shellscript)
2020-01-10 12:00 UTC, Zhihao Cheng
Details
flash (628.32 KB, application/gzip)
2020-01-10 12:00 UTC, Zhihao Cheng
Details
reproduce delay and print (7.31 KB, patch)
2020-01-11 09:40 UTC, Zhihao Cheng
Details | Diff

Description Zhihao Cheng 2020-01-10 11:16:41 UTC
In ubifs, concurrent execution of writepage during bulk read may result in deadlock, you may see following callstack(Here, we set hungtask timeout as 10s to accelerate reproducing):

[  338.275252] INFO: task aa:7447 blocked for more than 10 seconds.
[  338.276806]       Not tainted 5.5.0-rc5-dirty #104
[  338.277582] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  338.278806] Call Trace:
[  338.279225]  ? __schedule+0x5b8/0xb50
[  338.280001]  schedule+0x9a/0x1b0
[  338.280756]  schedule_preempt_disabled+0x1a/0x30
[  338.281810]  __mutex_lock.isra.8+0x4e8/0x900
[  338.282510]  ? __mutex_lock_slowpath+0x1f/0x30
[  338.283220]  __mutex_lock_slowpath+0x1f/0x30
[  338.284120]  mutex_lock+0x57/0x70
[  338.284925]  ubifs_write_inode+0xb5/0x220 [ubifs]
[  338.285927]  ubifs_writepage+0x299/0x330 [ubifs]
[  338.286663]  __writepage+0x1b/0xa0
[  338.287215]  write_cache_pages+0x353/0x8a0
[  338.288073]  ? __wb_calc_thresh+0x160/0x160
[  338.289035]  ? kvm_sched_clock_read+0x25/0x40
[  338.289941]  ? sched_clock+0xd/0x20
[  338.290550]  ? wake_up_process+0x19/0x30
[  338.291207]  generic_writepages+0x74/0xc0
[  338.292043]  ? remove_wait_queue+0x7a/0x90
[  338.292763]  ? do_writepages+0x8d/0xb0
[  338.293432]  do_writepages+0x8d/0xb0
[  338.294088]  __filemap_fdatawrite_range+0x123/0x190
[  338.294987]  file_write_and_wait_range+0xa5/0x110
[  338.295890]  ubifs_fsync+0xc4/0x170 [ubifs]
[  338.296636]  vfs_fsync_range+0x59/0xb0
[  338.297323]  ? fput+0x17/0x20
[  338.297872]  ? ksys_write+0xf8/0x150
[  338.298485]  do_fsync+0x50/0x90
[  338.299083]  __x64_sys_fsync+0x18/0x30
[  338.299773]  do_syscall_64+0xbf/0x440
[  338.300402]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  338.301268] RIP: 0033:0x7f21ef32889d
[  338.301926] Code: Bad RIP value.
[  338.302515] RSP: 002b:00007f21eef3cf00 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
[  338.303948] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f21ef32889d
[  338.305244] RDX: 00007f21ef313740 RSI: 00007f21e80008c0 RDI: 0000000000000003
[  338.306511] RBP: 00007f21eef40f40 R08: 00007f21eef41700 R09: 0000000000000077
[  338.307747] R10: 0000000000000004 R11: 0000000000000293 R12: 00007f21eef41000
[  338.308987] R13: 0000000000000000 R14: 00007ffcf71b1ba0 R15: 00007f21eef419c0
[  338.310296] INFO: task aa:7448 blocked for more than 10 seconds.
[  338.311347]       Not tainted 5.5.0-rc5-dirty #104
[  338.312213] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  338.313575] Call Trace:
[  338.313981]  ? __schedule+0x5b8/0xb50
[  338.314583]  schedule+0x9a/0x1b0
[  338.315108]  io_schedule+0x22/0x50
[  338.315680]  __lock_page+0x1be/0x390
[  338.316279]  ? add_to_page_cache_lru+0x170/0x170
[  338.317023]  pagecache_get_page+0x393/0x4b0
[  338.317714]  ubifs_readpage+0x4dc/0x770 [ubifs]
[  338.318464]  generic_file_read_iter+0x663/0xc10
[  338.319213]  new_sync_read+0x195/0x240
[  338.320012]  __vfs_read+0x41/0x60
[  338.320801]  vfs_read+0xce/0x1f0
[  338.321573]  ksys_read+0x88/0x150
[  338.322285]  __x64_sys_read+0x1e/0x30
[  338.322892]  do_syscall_64+0xbf/0x440
[  338.323550]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  338.324740] RIP: 0033:0x7f21ef3282fd
[  338.325594] Code: Bad RIP value.
[  338.326189] RSP: 002b:00007f21ee73bf00 EFLAGS: 00000293 ORIG_RAX: 0000000000000000
[  338.327411] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f21ef3282fd
[  338.329064] RDX: 0000000000004006 RSI: 00007f21ee73bf20 RDI: 0000000000000004
[  338.330395] RBP: 00007f21ee73ff40 R08: 00007f21ee740700 R09: 00007f21ee740700
[  338.331633] R10: 0000000000000004 R11: 0000000000000293 R12: 00007f21ee740000
[  338.333281] R13: 0000000000000000 R14: 00007ffcf71b1ba0 R15: 00007f21ee7409c0
[  348.515255] INFO: task aa:7447 blocked for more than 20 seconds.
[  348.516699]       Not tainted 5.5.0-rc5-dirty #104
[  348.517797] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  348.519042] Call Trace:
[  348.519477]  ? __schedule+0x5b8/0xb50
[  348.520335]  schedule+0x9a/0x1b0
[  348.521081]  schedule_preempt_disabled+0x1a/0x30
[  348.522005]  __mutex_lock.isra.8+0x4e8/0x900
[  348.522685]  ? __mutex_lock_slowpath+0x1f/0x30
[  348.523393]  __mutex_lock_slowpath+0x1f/0x30
[  348.524381]  mutex_lock+0x57/0x70
[  348.525185]  ubifs_write_inode+0xb5/0x220 [ubifs]
[  348.526173]  ubifs_writepage+0x299/0x330 [ubifs]
[  348.526916]  __writepage+0x1b/0xa0
[  348.527496]  write_cache_pages+0x353/0x8a0
[  348.528452]  ? __wb_calc_thresh+0x160/0x160
[  348.529429]  ? kvm_sched_clock_read+0x25/0x40
[  348.530206]  ? sched_clock+0xd/0x20
[  348.530762]  ? wake_up_process+0x19/0x30
[  348.531388]  generic_writepages+0x74/0xc0
[  348.532321]  ? remove_wait_queue+0x7a/0x90
[  348.533277]  ? do_writepages+0x8d/0xb0
[  348.533981]  do_writepages+0x8d/0xb0
[  348.534559]  __filemap_fdatawrite_range+0x123/0x190
[  348.535342]  file_write_and_wait_range+0xa5/0x110
[  348.536426]  ubifs_fsync+0xc4/0x170 [ubifs]
[  348.537402]  vfs_fsync_range+0x59/0xb0
[  348.538126]  ? fput+0x17/0x20
[  348.538613]  ? ksys_write+0xf8/0x150
[  348.539191]  do_fsync+0x50/0x90
[  348.539827]  __x64_sys_fsync+0x18/0x30
[  348.540705]  do_syscall_64+0xbf/0x440
[  348.541561]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  348.542385] RIP: 0033:0x7f21ef32889d
[  348.542953] Code: Bad RIP value.
[  348.543507] RSP: 002b:00007f21eef3cf00 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
[  348.544914] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f21ef32889d
[  348.546027] RDX: 00007f21ef313740 RSI: 00007f21e80008c0 RDI: 0000000000000003
[  348.547143] RBP: 00007f21eef40f40 R08: 00007f21eef41700 R09: 0000000000000077
[  348.548631] R10: 0000000000000004 R11: 0000000000000293 R12: 00007f21eef41000
[  348.550133] R13: 0000000000000000 R14: 00007ffcf71b1ba0 R15: 00007f21eef419c0
[  348.551245] INFO: task aa:7448 blocked for more than 20 seconds.
[  348.552539]       Not tainted 5.5.0-rc5-dirty #104
[  348.553639] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  348.554872] Call Trace:
[  348.555280]  ? __schedule+0x5b8/0xb50
[  348.556077]  schedule+0x9a/0x1b0
[  348.556831]  io_schedule+0x22/0x50
[  348.557635]  __lock_page+0x1be/0x390
[  348.558223]  ? add_to_page_cache_lru+0x170/0x170
[  348.558945]  pagecache_get_page+0x393/0x4b0
[  348.559727]  ubifs_readpage+0x4dc/0x770 [ubifs]
[  348.560780]  generic_file_read_iter+0x663/0xc10
[  348.561796]  new_sync_read+0x195/0x240
[  348.562402]  __vfs_read+0x41/0x60
[  348.562924]  vfs_read+0xce/0x1f0
[  348.563464]  ksys_read+0x88/0x150
[  348.564255]  __x64_sys_read+0x1e/0x30
[  348.565091]  do_syscall_64+0xbf/0x440
[  348.565861]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  348.566661] RIP: 0033:0x7f21ef3282fd
[  348.567237] Code: Bad RIP value.
[  348.567911] RSP: 002b:00007f21ee73bf00 EFLAGS: 00000293 ORIG_RAX: 0000000000000000
[  348.569611] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f21ef3282fd
[  348.570742] RDX: 0000000000004006 RSI: 00007f21ee73bf20 RDI: 0000000000000004
[  348.572064] RBP: 00007f21ee73ff40 R08: 00007f21ee740700 R09: 00007f21ee740700
[  348.573677] R10: 0000000000000004 R11: 0000000000000293 R12: 00007f21ee740000
[  348.574794] R13: 0000000000000000 R14: 00007ffcf71b1ba0 R15: 00007f21ee7409c0
Comment 1 Zhihao Cheng 2020-01-10 11:37:40 UTC
A. Process of reading|writing pages in ubifs
1. vfs_write -> __vfs_write -> new_sync_write -> ubifs_write_iter -> generic_file_write_iter -> __generic_file_write_iter -> generic_perform_write:
    ubifs_write_begin
        page_alloc -> LOCK(page)
        do_readpage
        allocate_budget
            LOCK(ui->ui_mutex)   [For append write only]
    ubifs_write_end
            UNLOCK(ui->ui_mutex) [For append write only]
            UNLOCK(page)

2. Write page:wb_workfn -> writeback_inodes_wb -> __writeback_inodes_wb -> writeback_sb_inodes -> __writeback_single_inode -> do_writepages -> generic_writepages -> write_cache_pages:
    LOCK(page)
    ubifs_writepage
        ubifs_write_inode
            LOCK(ui->ui_mutex)
            UNLOCK(ui->ui_mutex)
        do_writepage
            UNLOCK(page)

3. Read page:vfs_read -> __vfs_read -> new_sync_read -> generic_file_read_iter -> generic_file_buffered_read:
  Situation ①:Read single page
    find_get_page
      Possibility I page not in memory:
        page_cache_alloc
        LOCK(page)
        ubifs_readpage
            ubifs_bulk_read
                LOCK(ui->ui_mutex)
                UNLOCK(ui->ui_mutex)
            do_readpage
            UNLOCK(page)
      Possibility II page in memory,but not update:
        LOCK(page)
        IF NOT update
            ubifs_readpage
                ubifs_bulk_read
                    LOCK(ui->ui_mutex)
                    UNLOCK(ui->ui_mutex)
                do_readpage
                UNLOCK(page)
        IF update
        UNLOCK(page)
  Situation ②:Bulk read
    find_get_page
      (First two pages):
        page_alloc
        LOCK(page)
        ubifs_readpage
            ubifs_bulk_read
                LOCK(ui->ui_mutex)
                UNLOCK(ui->ui_mutex)
            do_readpage
            UNLOCK(page)
      (Page 3~N):
        page_alloc
        LOCK(page3)
        ubifs_readpage
            ubifs_bulk_read
                LOCK(ui->ui_mutex)
                    ubifs_do_bulk_read
                        populate_page(page3)
                        UNLOCK(page3)
                        find_or_create_page
                            pagecache_get_page
                                LOCK(page4)
                        UNLOCK(page4)
                UNLOCK(ui->ui_mutex)

B. Reproduce problem:
1. Apply reproduce_delay_and_print.patch and compile kernel,build nandsim/mtd/ubi/ubifs in modules
2. ./setup.sh
3. Compile reproduce.c and run ./reproduce, after a period of time, hung task occurs

C. 打印信息
[  310.515130] flash size: 512 MiB
[  310.515660] page size: 4096 bytes
[  310.516198] OOB area size: 128 bytes
[  310.516764] sector size: 128 KiB
[  310.517287] pages number: 131072
[  310.517802] pages per sector: 32
[  310.518327] bus width: 8
[  310.518735] bits in sector size: 17
[  310.519312] bits in page size: 12
[  310.519855] bits in OOB size: 7
[  310.520366] flash size with OOB: 540672 KiB
[  310.521024] page address bytes: 5
[  310.521558] sector address bytes: 3
[  310.522108] options: 0x80
[  310.541678] [nandsim] warning: CONFIG_MTD_PARTITIONED_MASTER must be enabled to expose debugfs stuff
[  317.947702] {WRITE_ITER} A generic_file_write_iter, ino=65 LOCK INODE ==
[  317.947724] [READ] Step 1. ubifs_bulk_read ino=65 page=0xffffea0004e130c0 LOCK UI ==
[  317.950557] {WRITE_ITER} B generic_file_write_iter, ino=65 LOCK INODE --
[  317.950581] {WRITE_ITER} Step 1. Get page by ubifs_write_begin, ino=65 LOCK PAGE ==
[  317.951876] [READ] Step 1. ubifs_bulk_read ino=65 page=0xffffea0004e130c0 LOCK UI --
[  317.951878] [READ] Step 3. ubifs_bulk_read ino=65 page=0xffffea0004e130c0 bulked=0 UNLOCK UI ++
[  317.951948] [READ] Step 1. ubifs_bulk_read ino=65 page=0xffffea0004ff8d80 LOCK UI ==
[  317.953542] {WRITE_ITER} Step 1. Get page 0xffffea0004eb0140 by ubifs_write_begin, ino=65 LOCK PAGE --
[  317.954245] [READ] Step 1. ubifs_bulk_read ino=65 page=0xffffea0004ff8d80 LOCK UI --
[  317.955617] {WRITE_ITER} Step 2. Appending page 0xffffea0004eb0140 by allocate_budget, ino=65 LOCK UI ==
[  317.956815] [READ] Step 3. ubifs_bulk_read ino=65 page=0xffffea0004ff8d80 bulked=0 UNLOCK UI ++
[  317.956846] [READ] Step 1. ubifs_bulk_read ino=65 page=0xffffea0004e640c0 LOCK UI ==
[  317.958029] {WRITE_ITER} Step 2. Appending page 0xffffea0004eb0140 by allocate_budget, ino=65 LOCK UI --
[  317.966116] {WRITE_ITER} Step 3. ubifs_write_end appending write page=0xffffea0004eb0140, ino=65 UNLOCK UI ++
[  317.967644] {WRITE_ITER} Step 3. Fill page 0xffffea0004eb0140 by ubifs_write_end, ino=65 UNLOCK PAGE ++
[  317.969117] {WRITE_ITER} C generic_file_write_iter, ino=65 UNLOCK INODE ++
[  317.970200] <WRITE> Step 1. write_cache_pages page=0xffffea0004eb0140 data=345 LOCK PAGE ==.
[  317.971593] <WRITE> Step 1. write_cache_pages page=0xffffea0004eb0140 data=345 LOCK PAGE --.
[  317.972906] <WRITE> Step 2. ubifs_write_inode ino=65 LOCK UI ==.
[  318.067190] [READ] Step 1. ubifs_bulk_read ino=65 page=0xffffea0004e640c0 LOCK UI --
[  318.475195] [READ] Step 2. pagecache_get_page page=0xffffea0004eb0140 LOCK PAGE ==.
[  338.275252] INFO: task aa:7447 blocked for more than 10 seconds.
[  338.276806]       Not tainted 5.5.0-rc5-dirty #104
[  338.277582] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  338.278806] Call Trace:
[  338.279225]  ? __schedule+0x5b8/0xb50
[  338.280001]  schedule+0x9a/0x1b0
[  338.280756]  schedule_preempt_disabled+0x1a/0x30
[  338.281810]  __mutex_lock.isra.8+0x4e8/0x900
[  338.282510]  ? __mutex_lock_slowpath+0x1f/0x30
[  338.283220]  __mutex_lock_slowpath+0x1f/0x30
[  338.284120]  mutex_lock+0x57/0x70
[  338.284925]  ubifs_write_inode+0xb5/0x220 [ubifs]
[  338.285927]  ubifs_writepage+0x299/0x330 [ubifs]
[  338.286663]  __writepage+0x1b/0xa0
[  338.287215]  write_cache_pages+0x353/0x8a0
[  338.288073]  ? __wb_calc_thresh+0x160/0x160
[  338.289035]  ? kvm_sched_clock_read+0x25/0x40
[  338.289941]  ? sched_clock+0xd/0x20
[  338.290550]  ? wake_up_process+0x19/0x30
[  338.291207]  generic_writepages+0x74/0xc0
[  338.292043]  ? remove_wait_queue+0x7a/0x90
[  338.292763]  ? do_writepages+0x8d/0xb0
[  338.293432]  do_writepages+0x8d/0xb0
[  338.294088]  __filemap_fdatawrite_range+0x123/0x190
[  338.294987]  file_write_and_wait_range+0xa5/0x110
[  338.295890]  ubifs_fsync+0xc4/0x170 [ubifs]
[  338.296636]  vfs_fsync_range+0x59/0xb0
[  338.297323]  ? fput+0x17/0x20
[  338.297872]  ? ksys_write+0xf8/0x150
[  338.298485]  do_fsync+0x50/0x90
[  338.299083]  __x64_sys_fsync+0x18/0x30
[  338.299773]  do_syscall_64+0xbf/0x440
[  338.300402]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  338.301268] RIP: 0033:0x7f21ef32889d
[  338.301926] Code: Bad RIP value.
[  338.302515] RSP: 002b:00007f21eef3cf00 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
[  338.303948] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f21ef32889d
[  338.305244] RDX: 00007f21ef313740 RSI: 00007f21e80008c0 RDI: 0000000000000003
[  338.306511] RBP: 00007f21eef40f40 R08: 00007f21eef41700 R09: 0000000000000077
[  338.307747] R10: 0000000000000004 R11: 0000000000000293 R12: 00007f21eef41000
[  338.308987] R13: 0000000000000000 R14: 00007ffcf71b1ba0 R15: 00007f21eef419c0
[  338.310296] INFO: task aa:7448 blocked for more than 10 seconds.
[  338.311347]       Not tainted 5.5.0-rc5-dirty #104
[  338.312213] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  338.313575] Call Trace:
[  338.313981]  ? __schedule+0x5b8/0xb50
[  338.314583]  schedule+0x9a/0x1b0
[  338.315108]  io_schedule+0x22/0x50
[  338.315680]  __lock_page+0x1be/0x390
[  338.316279]  ? add_to_page_cache_lru+0x170/0x170
[  338.317023]  pagecache_get_page+0x393/0x4b0
[  338.317714]  ubifs_readpage+0x4dc/0x770 [ubifs]
[  338.318464]  generic_file_read_iter+0x663/0xc10
[  338.319213]  new_sync_read+0x195/0x240
[  338.320012]  __vfs_read+0x41/0x60
[  338.320801]  vfs_read+0xce/0x1f0
[  338.321573]  ksys_read+0x88/0x150
[  338.322285]  __x64_sys_read+0x1e/0x30
[  338.322892]  do_syscall_64+0xbf/0x440
[  338.323550]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  338.324740] RIP: 0033:0x7f21ef3282fd
[  338.325594] Code: Bad RIP value.
[  338.326189] RSP: 002b:00007f21ee73bf00 EFLAGS: 00000293 ORIG_RAX: 0000000000000000
[  338.327411] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f21ef3282fd
[  338.329064] RDX: 0000000000004006 RSI: 00007f21ee73bf20 RDI: 0000000000000004
[  338.330395] RBP: 00007f21ee73ff40 R08: 00007f21ee740700 R09: 00007f21ee740700
[  338.331633] R10: 0000000000000004 R11: 0000000000000293 R12: 00007f21ee740000
[  338.333281] R13: 0000000000000000 R14: 00007ffcf71b1ba0 R15: 00007f21ee7409c0
[  348.515255] INFO: task aa:7447 blocked for more than 20 seconds.
[  348.516699]       Not tainted 5.5.0-rc5-dirty #104
[  348.517797] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  348.519042] Call Trace:
[  348.519477]  ? __schedule+0x5b8/0xb50
[  348.520335]  schedule+0x9a/0x1b0
[  348.521081]  schedule_preempt_disabled+0x1a/0x30
[  348.522005]  __mutex_lock.isra.8+0x4e8/0x900
[  348.522685]  ? __mutex_lock_slowpath+0x1f/0x30
[  348.523393]  __mutex_lock_slowpath+0x1f/0x30
[  348.524381]  mutex_lock+0x57/0x70
[  348.525185]  ubifs_write_inode+0xb5/0x220 [ubifs]
[  348.526173]  ubifs_writepage+0x299/0x330 [ubifs]
[  348.526916]  __writepage+0x1b/0xa0
[  348.527496]  write_cache_pages+0x353/0x8a0
[  348.528452]  ? __wb_calc_thresh+0x160/0x160
[  348.529429]  ? kvm_sched_clock_read+0x25/0x40
[  348.530206]  ? sched_clock+0xd/0x20
[  348.530762]  ? wake_up_process+0x19/0x30
[  348.531388]  generic_writepages+0x74/0xc0
[  348.532321]  ? remove_wait_queue+0x7a/0x90
[  348.533277]  ? do_writepages+0x8d/0xb0
[  348.533981]  do_writepages+0x8d/0xb0
[  348.534559]  __filemap_fdatawrite_range+0x123/0x190
[  348.535342]  file_write_and_wait_range+0xa5/0x110
[  348.536426]  ubifs_fsync+0xc4/0x170 [ubifs]
[  348.537402]  vfs_fsync_range+0x59/0xb0
[  348.538126]  ? fput+0x17/0x20
[  348.538613]  ? ksys_write+0xf8/0x150
[  348.539191]  do_fsync+0x50/0x90
[  348.539827]  __x64_sys_fsync+0x18/0x30
[  348.540705]  do_syscall_64+0xbf/0x440
[  348.541561]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  348.542385] RIP: 0033:0x7f21ef32889d
[  348.542953] Code: Bad RIP value.
[  348.543507] RSP: 002b:00007f21eef3cf00 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
[  348.544914] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f21ef32889d
[  348.546027] RDX: 00007f21ef313740 RSI: 00007f21e80008c0 RDI: 0000000000000003
[  348.547143] RBP: 00007f21eef40f40 R08: 00007f21eef41700 R09: 0000000000000077
[  348.548631] R10: 0000000000000004 R11: 0000000000000293 R12: 00007f21eef41000
[  348.550133] R13: 0000000000000000 R14: 00007ffcf71b1ba0 R15: 00007f21eef419c0
[  348.551245] INFO: task aa:7448 blocked for more than 20 seconds.
[  348.552539]       Not tainted 5.5.0-rc5-dirty #104
[  348.553639] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  348.554872] Call Trace:
[  348.555280]  ? __schedule+0x5b8/0xb50
[  348.556077]  schedule+0x9a/0x1b0
[  348.556831]  io_schedule+0x22/0x50
[  348.557635]  __lock_page+0x1be/0x390
[  348.558223]  ? add_to_page_cache_lru+0x170/0x170
[  348.558945]  pagecache_get_page+0x393/0x4b0
[  348.559727]  ubifs_readpage+0x4dc/0x770 [ubifs]
[  348.560780]  generic_file_read_iter+0x663/0xc10
[  348.561796]  new_sync_read+0x195/0x240
[  348.562402]  __vfs_read+0x41/0x60
[  348.562924]  vfs_read+0xce/0x1f0
[  348.563464]  ksys_read+0x88/0x150
[  348.564255]  __x64_sys_read+0x1e/0x30
[  348.565091]  do_syscall_64+0xbf/0x440
[  348.565861]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  348.566661] RIP: 0033:0x7f21ef3282fd
[  348.567237] Code: Bad RIP value.
[  348.567911] RSP: 002b:00007f21ee73bf00 EFLAGS: 00000293 ORIG_RAX: 0000000000000000
[  348.569611] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f21ef3282fd
[  348.570742] RDX: 0000000000004006 RSI: 00007f21ee73bf20 RDI: 0000000000000004
[  348.572064] RBP: 00007f21ee73ff40 R08: 00007f21ee740700 R09: 00007f21ee740700
[  348.573677] R10: 0000000000000004 R11: 0000000000000293 R12: 00007f21ee740000
[  348.574794] R13: 0000000000000000 R14: 00007ffcf71b1ba0 R15: 00007f21ee7409c0

D. Some explanations
The lock sequence of several reading and writing page processes of UBIFS mentioned in A is LOCK(page) first, then ui->mutex. Except for the content after page3 of case A-3-②(Locks ui->mutex first), it is assumed to be case T. If case T and case A-2 are executed simultaneously, ABBA deadlock will occur, and then the hung task call stack in C will appear. In addition, ABBA deadlock occurs when T and A-1 are executed simultaneously.
Comment 2 Zhihao Cheng 2020-01-10 11:54:20 UTC
Created attachment 286733 [details]
reproduce source code in C
Comment 3 Zhihao Cheng 2020-01-10 12:00:11 UTC
Created attachment 286735 [details]
setup ubifs from a dumped flash file
Comment 4 Zhihao Cheng 2020-01-10 12:00:35 UTC
Created attachment 286737 [details]
flash
Comment 5 Zhihao Cheng 2020-01-11 09:40:25 UTC
Created attachment 286747 [details]
reproduce delay and print

Note You need to log in before you can comment on or make changes to this bug.