Bug 216562

Summary: ubifs: Bad budget calculation for journal head
Product: File System Reporter: Zhihao Cheng (chengzhihao1)
Component: OtherAssignee: fs_other
Status: NEW ---    
Severity: normal    
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 5.10 Subsystem:
Regression: No Bisected commit-id:
Attachments: setup.sh
diff
error_log

Description Zhihao Cheng 2022-10-09 11:14:46 UTC
CONFIG_FS_ENCRYPTION=y
CONFIG_UBIFS_FS=m
CONFIG_MTD_UBI=m
CONFIG_MTD_NAND_NANDSIM=m

1. Apply diff and compile kernel
2. ./setup.sh 1
3. fsstress -l0 -n1000 -p4 -d /root/temp

[ 1661.371626] UBIFS error (ubi0:0 pid 23042): make_reservation.cold [ubifs]: cannot reserve 160 bytes in jhead 1, error -28 nospc_retries 3 cmt_retries 2
[ 1661.372942] CPU: 0 PID: 23042 Comm: kworker/u2:1 Not tainted 5.10.0 #1
[ 1661.373535] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014
[ 1661.374564] Workqueue: writeback wb_workfn (flush-ubifs_0_0)
[ 1661.375076] Call Trace:
[ 1661.375305]  dump_stack+0x9c/0xd3
[ 1661.375613]  make_reservation+0x855/0xb40 [ubifs]
[ 1661.376047]  ? ubifs_shrink_scan+0x6e0/0x6e0 [ubifs]
[ 1661.376493]  ? kasan_unpoison_shadow+0x33/0x50
[ 1661.376897]  ? __kasan_kmalloc.constprop.0+0xc9/0xd0
[ 1661.377348]  ubifs_jnl_write_inode+0x228/0xc80 [ubifs]
[ 1661.377813]  ? release_pages+0x8a3/0x1050
[ 1661.378181]  ? ubifs_jnl_write_data+0x970/0x970 [ubifs]
[ 1661.378651]  ? mem_cgroup_uncharge+0x150/0x150
[ 1661.379056]  ? pagevec_lookup_range_nr_tag+0xa0/0xa0
[ 1661.379503]  ? xas_find_marked+0x367/0x820
[ 1661.379881]  ? mutex_lock+0x8e/0xe0
[ 1661.380200]  ? __mutex_lock_slowpath+0x10/0x10
[ 1661.380605]  ubifs_write_inode+0x283/0x410 [ubifs]
[ 1661.381041]  ? cgroup_rstat_updated+0x2d/0x2e0
[ 1661.381446]  ubifs_writepage+0x28a/0x6f0 [ubifs]
[ 1661.381868]  __writepage+0x66/0x160
[ 1661.382186]  write_cache_pages+0x406/0xec0
[ 1661.382557]  ? tag_pages_for_writeback+0x320/0x320
[ 1661.382992]  ? wait_on_page_writeback+0x260/0x260
[ 1661.383416]  ? trace_print_context+0x181/0x490
[ 1661.383821]  ? unwind_next_frame+0xb45/0x1a70
[ 1661.384215]  ? ftrace_graph_ret_addr+0xe2/0x160
[ 1661.384624]  ? fprop_reflect_period_percpu.isra.0+0x1c/0x2b0
[ 1661.385136]  ? cpuacct_charge+0xd3/0x1a0
[ 1661.385491]  do_writepages+0x1af/0x260
[ 1661.385837]  ? writeback_set_ratelimit+0x140/0x140
[ 1661.386268]  ? __wb_calc_thresh+0x286/0x3e0
[ 1661.386646]  ? writeout_period+0x150/0x150
[ 1661.387022]  ? finish_writeback_work.constprop.0+0xb3/0xf0
[ 1661.387514]  __writeback_single_inode+0xb3/0x870
[ 1661.387932]  ? wbc_attach_and_unlock_inode+0x10c/0x6c0
[ 1661.388394]  writeback_sb_inodes+0x4b2/0xc00
[ 1661.388783]  ? _raw_spin_unlock+0x39/0x70
[ 1661.389146]  ? move_expired_inodes+0x429/0x6b0
[ 1661.389547]  ? inode_sleep_on_writeback+0x180/0x180
[ 1661.389991]  ? wb_io_lists_populated+0x4d/0x110
[ 1661.390400]  ? queue_io+0x207/0x390
[ 1661.390720]  ? pointer+0xa30/0xa30
[ 1661.391029]  wb_writeback+0x22a/0x760
[ 1661.391362]  ? __writeback_inodes_wb+0x230/0x230
[ 1661.391783]  ? _raw_spin_lock_bh+0x87/0xe0
[ 1661.392154]  ? _raw_spin_lock+0xe0/0xe0
[ 1661.392501]  ? bdi_dev_name+0x54/0x80
[ 1661.392836]  wb_workfn+0x232/0xdb0
[ 1661.393148]  ? set_next_entity+0x3a5/0x1ac0
[ 1661.393527]  ? inode_wait_for_writeback+0x40/0x40
[ 1661.393954]  ? _raw_spin_unlock_irq+0x3a/0x70
[ 1661.394348]  ? finish_task_switch+0x137/0x650
[ 1661.394746]  ? __switch_to+0x5b1/0x1160
[ 1661.395095]  ? __switch_to_asm+0x42/0x80
[ 1661.395450]  ? __switch_to_asm+0x5a/0x80
[ 1661.395810]  ? __schedule+0x872/0x1730
[ 1661.396151]  ? read_word_at_a_time+0xe/0x20
[ 1661.396530]  ? strscpy+0xa0/0x2b0
[ 1661.396838]  process_one_work+0x761/0x1010
[ 1661.397209]  worker_thread+0x574/0x11e0
[ 1661.397558]  ? __kthread_parkme+0xe0/0x150
[ 1661.397933]  ? rescuer_thread+0xd60/0xd60
[ 1661.398295]  kthread+0x31e/0x430
[ 1661.398590]  ? kthread_park+0x190/0x190
[ 1661.398941]  ret_from_fork+0x22/0x30
Comment 1 Zhihao Cheng 2022-10-09 11:16:22 UTC
Created attachment 302962 [details]
setup.sh
Comment 2 Zhihao Cheng 2022-10-09 11:16:49 UTC
Created attachment 302963 [details]
diff
Comment 3 Zhihao Cheng 2022-10-09 11:18:02 UTC
Created attachment 302964 [details]
error_log
Comment 4 Zhihao Cheng 2022-10-09 11:23:09 UTC
UBIFS calculates available space by c->main_bytes - c->lst.total_used (which means free and dirty non-index space are accounted into total available), then index lebs and four lebs (one for gc_lnum, one for deletions, two for journal heads) are deducted.

All lebs's space information are displayed in error_log:

     LEB 84: DATAHD   free 122880 used 1920  dirty 2176  dark 6144
     LEB 110:DELETION free 126976 used 0     dirty 0     dark 6144 (empty)
     LEB 201:gc_lnum  free 126976 used 0     dirty 0     dark 6144
     LEB 272:GCHD     free 77824  used 47672 dirty 1480  dark 6144
     LEB 356:BASEHD   free 0      used 39776 dirty 87200 dark 6144
     OTHERS: index lebs, zero-available non-index lebs

total_available: (5 - 4) * 126976 - 1920 - 47672 - 39776 - 6144 * 5 = 6888

However UBIFS cannot use BASEHD's dirty space(87200), because UBIFS can no longer get any lebs to gc (c->bi.min_idx_lebs equals to c->lst.idx_lebs, dirty index lebs won't be picked as gced lebs. All non-index lebs has dirty space less then c->dead_wm, non-index lebs won't be picked as gced lebs either).

So, ubifs should reserve one leb for each journal head while doing budget.