Bug 217787

Summary: ubi: fastmap: Fix a series of wear leveling problems
Product: Drivers Reporter: Zhihao Cheng (chengzhihao1)
Component: OtherAssignee: drivers_other
Status: NEW ---    
Severity: normal CC: bagasdotme
Priority: P3    
Hardware: All   
OS: Linux   
Kernel Version: Subsystem:
Regression: No Bisected commit-id:
Attachments: mtd-utils ioctl patches
Single writing test
Single writing test flamegraph

Description Zhihao Cheng 2023-08-12 03:12:15 UTC
Problem 1: large erase counter for single fastmap data PEB

Config:
x86_64 qemu
flash: nandsim
CONFIG_MTD_UBI_WL_THRESHOLD=128
CONFIG_MTD_UBI_FASTMAP=y
ubi->beb_rsvd_pebs=0

Running fsstress on ubifs for 3h(fastmap data PEB has large erase counter than others):
=========================================================
from              to     count      min      avg      max
---------------------------------------------------------
0        ..        9:        0        0        0        0
10       ..       99:      532       84       92       99
100      ..      999:    15787      100      147      229
1000     ..     9999:       64     4699     4765     4826
10000    ..    99999:        0        0        0        0
100000   ..      inf:        1   272935   272935   272935
---------------------------------------------------------
Total               :    16384       84      180   272935
PEB 8031(ec=272935) is always taken for fastmap data.

After fix, running fsstress on ubifs for 12h(no pool reservation), no individual peb has big erase counter:
=========================================================
from              to     count      min      avg      max
---------------------------------------------------------
0        ..        9:        0        0        0        0
10       ..       99:        0        0        0        0
100      ..      999:    16320      609      642      705
1000     ..     9999:        0        0        0        0
10000    ..    99999:       64    18176    18234    18303
100000   ..      inf:        0        0        0        0
---------------------------------------------------------
Total               :    16384      609      710    18303
Comment 1 Zhihao Cheng 2023-08-12 03:15:56 UTC
Problem 2: large erase counter for first 64 PEBs

Config:
x86_64 qemu
flash: nandsim
CONFIG_MTD_UBI_BEB_LIMIT=20
CONFIG_MTD_UBI_WL_THRESHOLD=128
CONFIG_MTD_UBI_FASTMAP=y

Before applying patches, running fsstress for 24h:
Device A(1024 PEBs, pool=50, wl_pool=25):
=========================================================
from              to     count      min      avg      max
---------------------------------------------------------
0        ..        9:        0        0        0        0
10       ..       99:        0        0        0        0
100      ..      999:        0        0        0        0
1000     ..     9999:        0        0        0        0
10000    ..    99999:      960    29224    29282    29362
100000   ..      inf:       64   117897   117934   117940
---------------------------------------------------------
Total               :     1024    29224    34822   117940
first 64 PEBs: 117897~117940
others: 29224~29362

Device B(8192 PEBs, pool=256, wl_pool=128):
=========================================================
from              to     count      min      avg      max
---------------------------------------------------------
0        ..        9:        0        0        0        0
10       ..       99:        0        0        0        0
100      ..      999:        0        0        0        0
1000     ..     9999:     8128     2253     2321     2387
10000    ..    99999:       64    35387    35387    35388
100000   ..      inf:        0        0        0        0
---------------------------------------------------------
Total               :     8192     2253     2579    35388
first 64 PEBs: 35387~35388
others: 2253~2387

Device C(16384 PEBs, pool=256, wl_pool=128):
=========================================================
from              to     count      min      avg      max
---------------------------------------------------------
0        ..        9:        0        0        0        0
10       ..       99:        0        0        0        0
100      ..      999:        0        0        0        0
1000     ..     9999:    16320     1000     1071     1131
10000    ..    99999:       64    30140    30140    30141
100000   ..      inf:        0        0        0        0
---------------------------------------------------------
Total               :    16384     1000     1184    30141
first 64 PEBs: 30140~30141
others: 1000~1131

The erase counter of first 64 PEBs is larger than other PEBs, about 4~28 times. If we the appearance of first bad block as the end of service life for UBI device, the life time of UBIFS will be shorter 4~28 times than we expected, because the bad block may contain important data, any IO for that PEB may be failed before the block is marked bad.
 Device A: 117934 / 29282 = 4.02
 Device B: 35387 / 2321 = 15.24
 Device C: 30140 / 1071 = 28.14
Given an UBI device with N PEBs, free PEBs is nearly running out and pool will be filled with 1 PEB every time ubi_update_fastmap invoked. So t=N/POOL_SIZE[1]/64 means that in worst case the erase counter of first 64 PEBs is t times greater than other PEBs in theory.

After applying patches without pool reservation, running fsstress for 24h:
Device A(1024 PEBs, pool=50, wl_pool=25):
=========================================================
from              to     count      min      avg      max
---------------------------------------------------------
0        ..        9:        0        0        0        0
10       ..       99:        0        0        0        0
100      ..      999:        0        0        0        0
1000     ..     9999:        0        0        0        0
10000    ..    99999:     1024    36958    37846    50806
100000   ..      inf:        0        0        0        0
---------------------------------------------------------
Total               :     1024    36958    37846    50806
first 64 PEBs: 50699~50806
others: 36958~37070

Device B(8192 PEBs, pool=256, wl_pool=128):
=========================================================
from              to     count      min      avg      max
---------------------------------------------------------
0        ..        9:        0        0        0        0
10       ..       99:        0        0        0        0
100      ..      999:        0        0        0        0
1000     ..     9999:     8192     2559     2634     4614
10000    ..    99999:        0        0        0        0
100000   ..      inf:        0        0        0        0
---------------------------------------------------------
Total               :     8192     2559     2634     4614
first 64 PEBs: 4577~4614
others: 2559~2690

Device C(16384 PEBs, pool=256, wl_pool=128):
=========================================================
from              to     count      min      avg      max
---------------------------------------------------------
0        ..        9:        0        0        0        0
10       ..       99:        0        0        0        0
100      ..      999:     3216      976      988      999
1000     ..     9999:    13168     1000     1051     1592
10000    ..    99999:        0        0        0        0
100000   ..      inf:        0        0        0        0
---------------------------------------------------------
Total               :    16384      976     1039     1592
first 64 PEBs: 1549~1592
others: 976~1106

Benefit from following patches:
ubi: fastmap: Wait until there are enough free PEBs before filling pools
ubi: fastmap: Use free pebs reserved for bad block handling

The max erase counter is lower, about 2.32~18.93 times service life prolonged compared before modification.
Based on the results of performance regression tests(in follow-up sections), there are little changes before and after the patches are applied, which means the amount of written data is similar before and after the patches applied , so we can assessment service life according to the max erase counter of the PEB.
 Device A: 117940 / 50806 = 2.32
 Device B: 35388 / 4614 = 7.66
 Device C: 30141 / 1592 = 18.93

After applying patches with pool reservation, running fsstress for 24h:
Device A(1024 PEBs, pool=50, wl_pool=25):
=========================================================
from              to     count      min      avg      max
---------------------------------------------------------
0        ..        9:        0        0        0        0
10       ..       99:        0        0        0        0
100      ..      999:        0        0        0        0
1000     ..     9999:        0        0        0        0
10000    ..    99999:     1024    33801    33997    34056
100000   ..      inf:        0        0        0        0
---------------------------------------------------------
Total               :     1024    33801    33997    34056
first 64 PEBs: 34020~34056
others: 33801~34056

Device B(8192 PEBs, pool=256, wl_pool=128):
=========================================================
from              to     count      min      avg      max
---------------------------------------------------------
0        ..        9:        0        0        0        0
10       ..       99:        0        0        0        0
100      ..      999:        0        0        0        0
1000     ..     9999:     8192     2205     2397     2460
10000    ..    99999:        0        0        0        0
100000   ..      inf:        0        0        0        0
---------------------------------------------------------
Total               :     8192     2205     2397     2460
first 64 PEBs: 2444~2460
others: 2205~2458

Device C(16384 PEBs, pool=256, wl_pool=128):
=========================================================
from              to     count      min      avg      max
---------------------------------------------------------
0        ..        9:        0        0        0        0
10       ..       99:        0        0        0        0
100      ..      999:     3452      968      983      999
1000     ..     9999:    12932     1000     1052     1269
10000    ..    99999:        0        0        0        0
100000   ..      inf:        0        0        0        0
---------------------------------------------------------
Total               :    16384      968     1038     1269
first 64 PEBs: 1268~1269
others: 968~1100

Benefit from "ubi: fastmap: Fix lapsed wear leveling for first 64 PEBs":
1. The difference of erase counter between first 64 PEBs and others is under WL_FREE_MAX_DIFF(2*UBI_WL_THRESHOLD=2*128=256) for device A and B. For device C, t=N/POOL_SIZE[128]/64=16384/128/64=2, which means that in worst case the erase counter of first 64 PEBs is 2 times greater than other PEBs in theory. Now this test result meets the expectation.
 Device A: 34056 - 33801 = 255
 Device B: 2460 - 2205 = 255
 Device C: 1269 / 968 = 1.3 < 2
2. The erase counter of first 64 PEBs is no longer many times larger than other PEBs.
Comment 2 Zhihao Cheng 2023-08-12 03:16:28 UTC
Performance regression:
x86_64 qemu
nandsim: 2GB 128KB PEB, 2KB page
CONFIG_MTD_UBI_FASTMAP=y
CONFIG_MTD_UBI_WL_THRESHOLD=4096

mount -o compr=none -t ubifs /dev/ubi0_0 $MNT
fio -directory=$MNT -name=mytest -direct=0 -ramp_time=10 -runtime=40 -size=$sz -rw=$mod -numjobs=$t -bs=$b -fsync=1 -ioengine=libaio -iodepth=8 -time_based
(fio will fill full of the ubifs by files. The total space of ubifs with pool reservation will be a little smaller caused by reserved pool PEBs, so the same amount of data is written slower when space is nearly running out)
--------------------------------------------------------------------------
             |before applying |   after applying   | after applying patch
             |                | no pool reservation| with pool reservation
--------------------------------------------------------------------------
4K seq write |   41.3MiB/s    |     42.6MiB/s      |     42.1MiB/s
1 thread     |                |                    |
--------------------------------------------------------------------------
4K rand write|   9268KiB/s    |     11.5MiB/s      |     7291KiB/s
1 thread     |                |                    |
--------------------------------------------------------------------------
1M seq write |   175MiB/s     |     174MiB/s       |     170MiB/s
1 thread     |                |                    |
--------------------------------------------------------------------------
1M rand write|   155MiB/s     |     155MiB/s       |     130MiB/s
1 thread     |                |                    |
--------------------------------------------------------------------------
4K seq write |   39.4MiB/s    |     41.3MiB/s      |     40.2MiB/s
4 threads    |                |                    |
--------------------------------------------------------------------------
4K rand write|   9343KiB/s    |     11.3MiB/s      |     6857KiB/s
4 threads    |                |                    |
--------------------------------------------------------------------------
1M seq write |   137MiB/s     |     136MiB/s       |     111MiB/s
4 threads    |                |                    |
--------------------------------------------------------------------------
1M rand write|   70.1MiB/s    |     75.1MiB/s      |     64.5MiB/s
4 threads    |                |                    |
--------------------------------------------------------------------------
Comment 3 Zhihao Cheng 2023-08-12 03:17:16 UTC
Regression test passed:

d09e9a2bddba("ubi: fastmap: Fix high cpu usage of ubi_bgt by making sure wl_pool not empty")
b68bf9a69d22dd ("ubi: Select fastmap anchor PEBs considering wear level rules")
https://bugzilla.kernel.org/show_bug.cgi?id=215407

mtd-utils ubi test passed (pool reservation/no pool reservation).
Comment 4 Zhihao Cheng 2023-08-12 03:27:41 UTC
Created attachment 304825 [details]
mtd-utils ioctl patches

mtd-utils ioctl patches based on "ubi: fastmap: Add control in 'UBI_IOCATT' ioctl to reserve PEBs for filling pools"
Comment 5 Bagas Sanjaya 2023-08-12 11:31:04 UTC
(In reply to Zhihao Cheng from comment #4)
> Created attachment 304825 [details]
> mtd-utils ioctl patches
> 
> mtd-utils ioctl patches based on "ubi: fastmap: Add control in 'UBI_IOCATT'
> ioctl to reserve PEBs for filling pools"

Can you post to appropriate mailing lists instead? Please run scripts/get_maintainer.pl to get a list of maintainers and lists who should
receive your patch.
Comment 6 Zhihao Cheng 2023-08-16 11:24:11 UTC
Tests for single writing latency:

x86_64 qemu
nandsim: 1G 128KB 2KB
CONFIG_MTD_UBI_FASTMAP=y
CONFIG_MTD_UBI_WL_THRESHOLD=4096
ubi->beb_rsvd_pebs=0

When pool is empty, ubi_update_fastmap is invoked by ubi_wl_get_peb, which increases the single writing time. So the latency of single writing is floating.

Before patches applied, the latency of single writing is 20~4151us,avg 52.3us.
After applying the patches, the the latency of single writing is 21~5461us,avg 53.92us.
Combined with the result of flamegraph, the cost of wait_free_pebs_for_pool() can be ignored.

The detailed testing data and testing program is in attachment named "Single writing test".
Comment 7 Zhihao Cheng 2023-08-16 11:28:08 UTC
Created attachment 304872 [details]
Single writing test
Comment 8 Zhihao Cheng 2023-08-16 11:30:55 UTC
Created attachment 304873 [details]
Single writing test flamegraph
Comment 9 Zhihao Cheng 2023-09-28 01:26:33 UTC
Suplement verification for problem 2 in CONFIG_MTD_UBI_WL_THRESHOLD=4096 case(Running fsstress for 30 days):
Config:
x86_64 qemu
flash: nandsim (16384 PEBs, pool=256, wl_pool=128)
CONFIG_MTD_UBI_BEB_LIMIT=20
CONFIG_MTD_UBI_WL_THRESHOLD=4096
CONFIG_MTD_UBI_FASTMAP=y

Before applying patches:
=========================================================
from              to     count      min      avg      max
---------------------------------------------------------
0        ..        9:        0        0        0        0
10       ..       99:        0        0        0        0
100      ..      999:        0        0        0        0
1000     ..     9999:    16320     5231     9247     9327
10000    ..    99999:        0        0        0        0
100000   ..      inf:       64   261234   261234   261235
---------------------------------------------------------
Total               :    16384     5231    10231   261235

The erase counter of first 64 PEBs is larger than other PEBs, about 28 times.

After applying patches without pool reservation:
=========================================================
from              to     count      min      avg      max
---------------------------------------------------------
0        ..        9:        0        0        0        0
10       ..       99:        0        0        0        0
100      ..      999:        0        0        0        0
1000     ..     9999:    16320     8600     8872     9252
10000    ..    99999:       64    11609    12357    12374
100000   ..      inf:        0        0        0        0
---------------------------------------------------------
Total               :    16384     8600     8886    12374

12374 - 8600 = 3774 < 8192(WL_FREE_MAX_DIFF = 2*CONFIG_MTD_UBI_WL_THRESHOLD)