Bug 217787
Summary: | ubi: fastmap: Fix a series of wear leveling problems | ||
---|---|---|---|
Product: | Drivers | Reporter: | Zhihao Cheng (chengzhihao1) |
Component: | Other | Assignee: | drivers_other |
Status: | NEW --- | ||
Severity: | normal | CC: | bagasdotme |
Priority: | P3 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | Subsystem: | ||
Regression: | No | Bisected commit-id: | |
Attachments: |
mtd-utils ioctl patches
Single writing test Single writing test flamegraph |
Description
Zhihao Cheng
2023-08-12 03:12:15 UTC
Problem 2: large erase counter for first 64 PEBs Config: x86_64 qemu flash: nandsim CONFIG_MTD_UBI_BEB_LIMIT=20 CONFIG_MTD_UBI_WL_THRESHOLD=128 CONFIG_MTD_UBI_FASTMAP=y Before applying patches, running fsstress for 24h: Device A(1024 PEBs, pool=50, wl_pool=25): ========================================================= from to count min avg max --------------------------------------------------------- 0 .. 9: 0 0 0 0 10 .. 99: 0 0 0 0 100 .. 999: 0 0 0 0 1000 .. 9999: 0 0 0 0 10000 .. 99999: 960 29224 29282 29362 100000 .. inf: 64 117897 117934 117940 --------------------------------------------------------- Total : 1024 29224 34822 117940 first 64 PEBs: 117897~117940 others: 29224~29362 Device B(8192 PEBs, pool=256, wl_pool=128): ========================================================= from to count min avg max --------------------------------------------------------- 0 .. 9: 0 0 0 0 10 .. 99: 0 0 0 0 100 .. 999: 0 0 0 0 1000 .. 9999: 8128 2253 2321 2387 10000 .. 99999: 64 35387 35387 35388 100000 .. inf: 0 0 0 0 --------------------------------------------------------- Total : 8192 2253 2579 35388 first 64 PEBs: 35387~35388 others: 2253~2387 Device C(16384 PEBs, pool=256, wl_pool=128): ========================================================= from to count min avg max --------------------------------------------------------- 0 .. 9: 0 0 0 0 10 .. 99: 0 0 0 0 100 .. 999: 0 0 0 0 1000 .. 9999: 16320 1000 1071 1131 10000 .. 99999: 64 30140 30140 30141 100000 .. inf: 0 0 0 0 --------------------------------------------------------- Total : 16384 1000 1184 30141 first 64 PEBs: 30140~30141 others: 1000~1131 The erase counter of first 64 PEBs is larger than other PEBs, about 4~28 times. If we the appearance of first bad block as the end of service life for UBI device, the life time of UBIFS will be shorter 4~28 times than we expected, because the bad block may contain important data, any IO for that PEB may be failed before the block is marked bad. Device A: 117934 / 29282 = 4.02 Device B: 35387 / 2321 = 15.24 Device C: 30140 / 1071 = 28.14 Given an UBI device with N PEBs, free PEBs is nearly running out and pool will be filled with 1 PEB every time ubi_update_fastmap invoked. So t=N/POOL_SIZE[1]/64 means that in worst case the erase counter of first 64 PEBs is t times greater than other PEBs in theory. After applying patches without pool reservation, running fsstress for 24h: Device A(1024 PEBs, pool=50, wl_pool=25): ========================================================= from to count min avg max --------------------------------------------------------- 0 .. 9: 0 0 0 0 10 .. 99: 0 0 0 0 100 .. 999: 0 0 0 0 1000 .. 9999: 0 0 0 0 10000 .. 99999: 1024 36958 37846 50806 100000 .. inf: 0 0 0 0 --------------------------------------------------------- Total : 1024 36958 37846 50806 first 64 PEBs: 50699~50806 others: 36958~37070 Device B(8192 PEBs, pool=256, wl_pool=128): ========================================================= from to count min avg max --------------------------------------------------------- 0 .. 9: 0 0 0 0 10 .. 99: 0 0 0 0 100 .. 999: 0 0 0 0 1000 .. 9999: 8192 2559 2634 4614 10000 .. 99999: 0 0 0 0 100000 .. inf: 0 0 0 0 --------------------------------------------------------- Total : 8192 2559 2634 4614 first 64 PEBs: 4577~4614 others: 2559~2690 Device C(16384 PEBs, pool=256, wl_pool=128): ========================================================= from to count min avg max --------------------------------------------------------- 0 .. 9: 0 0 0 0 10 .. 99: 0 0 0 0 100 .. 999: 3216 976 988 999 1000 .. 9999: 13168 1000 1051 1592 10000 .. 99999: 0 0 0 0 100000 .. inf: 0 0 0 0 --------------------------------------------------------- Total : 16384 976 1039 1592 first 64 PEBs: 1549~1592 others: 976~1106 Benefit from following patches: ubi: fastmap: Wait until there are enough free PEBs before filling pools ubi: fastmap: Use free pebs reserved for bad block handling The max erase counter is lower, about 2.32~18.93 times service life prolonged compared before modification. Based on the results of performance regression tests(in follow-up sections), there are little changes before and after the patches are applied, which means the amount of written data is similar before and after the patches applied , so we can assessment service life according to the max erase counter of the PEB. Device A: 117940 / 50806 = 2.32 Device B: 35388 / 4614 = 7.66 Device C: 30141 / 1592 = 18.93 After applying patches with pool reservation, running fsstress for 24h: Device A(1024 PEBs, pool=50, wl_pool=25): ========================================================= from to count min avg max --------------------------------------------------------- 0 .. 9: 0 0 0 0 10 .. 99: 0 0 0 0 100 .. 999: 0 0 0 0 1000 .. 9999: 0 0 0 0 10000 .. 99999: 1024 33801 33997 34056 100000 .. inf: 0 0 0 0 --------------------------------------------------------- Total : 1024 33801 33997 34056 first 64 PEBs: 34020~34056 others: 33801~34056 Device B(8192 PEBs, pool=256, wl_pool=128): ========================================================= from to count min avg max --------------------------------------------------------- 0 .. 9: 0 0 0 0 10 .. 99: 0 0 0 0 100 .. 999: 0 0 0 0 1000 .. 9999: 8192 2205 2397 2460 10000 .. 99999: 0 0 0 0 100000 .. inf: 0 0 0 0 --------------------------------------------------------- Total : 8192 2205 2397 2460 first 64 PEBs: 2444~2460 others: 2205~2458 Device C(16384 PEBs, pool=256, wl_pool=128): ========================================================= from to count min avg max --------------------------------------------------------- 0 .. 9: 0 0 0 0 10 .. 99: 0 0 0 0 100 .. 999: 3452 968 983 999 1000 .. 9999: 12932 1000 1052 1269 10000 .. 99999: 0 0 0 0 100000 .. inf: 0 0 0 0 --------------------------------------------------------- Total : 16384 968 1038 1269 first 64 PEBs: 1268~1269 others: 968~1100 Benefit from "ubi: fastmap: Fix lapsed wear leveling for first 64 PEBs": 1. The difference of erase counter between first 64 PEBs and others is under WL_FREE_MAX_DIFF(2*UBI_WL_THRESHOLD=2*128=256) for device A and B. For device C, t=N/POOL_SIZE[128]/64=16384/128/64=2, which means that in worst case the erase counter of first 64 PEBs is 2 times greater than other PEBs in theory. Now this test result meets the expectation. Device A: 34056 - 33801 = 255 Device B: 2460 - 2205 = 255 Device C: 1269 / 968 = 1.3 < 2 2. The erase counter of first 64 PEBs is no longer many times larger than other PEBs. Performance regression: x86_64 qemu nandsim: 2GB 128KB PEB, 2KB page CONFIG_MTD_UBI_FASTMAP=y CONFIG_MTD_UBI_WL_THRESHOLD=4096 mount -o compr=none -t ubifs /dev/ubi0_0 $MNT fio -directory=$MNT -name=mytest -direct=0 -ramp_time=10 -runtime=40 -size=$sz -rw=$mod -numjobs=$t -bs=$b -fsync=1 -ioengine=libaio -iodepth=8 -time_based (fio will fill full of the ubifs by files. The total space of ubifs with pool reservation will be a little smaller caused by reserved pool PEBs, so the same amount of data is written slower when space is nearly running out) -------------------------------------------------------------------------- |before applying | after applying | after applying patch | | no pool reservation| with pool reservation -------------------------------------------------------------------------- 4K seq write | 41.3MiB/s | 42.6MiB/s | 42.1MiB/s 1 thread | | | -------------------------------------------------------------------------- 4K rand write| 9268KiB/s | 11.5MiB/s | 7291KiB/s 1 thread | | | -------------------------------------------------------------------------- 1M seq write | 175MiB/s | 174MiB/s | 170MiB/s 1 thread | | | -------------------------------------------------------------------------- 1M rand write| 155MiB/s | 155MiB/s | 130MiB/s 1 thread | | | -------------------------------------------------------------------------- 4K seq write | 39.4MiB/s | 41.3MiB/s | 40.2MiB/s 4 threads | | | -------------------------------------------------------------------------- 4K rand write| 9343KiB/s | 11.3MiB/s | 6857KiB/s 4 threads | | | -------------------------------------------------------------------------- 1M seq write | 137MiB/s | 136MiB/s | 111MiB/s 4 threads | | | -------------------------------------------------------------------------- 1M rand write| 70.1MiB/s | 75.1MiB/s | 64.5MiB/s 4 threads | | | -------------------------------------------------------------------------- Regression test passed: d09e9a2bddba("ubi: fastmap: Fix high cpu usage of ubi_bgt by making sure wl_pool not empty") b68bf9a69d22dd ("ubi: Select fastmap anchor PEBs considering wear level rules") https://bugzilla.kernel.org/show_bug.cgi?id=215407 mtd-utils ubi test passed (pool reservation/no pool reservation). Created attachment 304825 [details]
mtd-utils ioctl patches
mtd-utils ioctl patches based on "ubi: fastmap: Add control in 'UBI_IOCATT' ioctl to reserve PEBs for filling pools"
(In reply to Zhihao Cheng from comment #4) > Created attachment 304825 [details] > mtd-utils ioctl patches > > mtd-utils ioctl patches based on "ubi: fastmap: Add control in 'UBI_IOCATT' > ioctl to reserve PEBs for filling pools" Can you post to appropriate mailing lists instead? Please run scripts/get_maintainer.pl to get a list of maintainers and lists who should receive your patch. Tests for single writing latency: x86_64 qemu nandsim: 1G 128KB 2KB CONFIG_MTD_UBI_FASTMAP=y CONFIG_MTD_UBI_WL_THRESHOLD=4096 ubi->beb_rsvd_pebs=0 When pool is empty, ubi_update_fastmap is invoked by ubi_wl_get_peb, which increases the single writing time. So the latency of single writing is floating. Before patches applied, the latency of single writing is 20~4151us,avg 52.3us. After applying the patches, the the latency of single writing is 21~5461us,avg 53.92us. Combined with the result of flamegraph, the cost of wait_free_pebs_for_pool() can be ignored. The detailed testing data and testing program is in attachment named "Single writing test". Created attachment 304872 [details]
Single writing test
Created attachment 304873 [details]
Single writing test flamegraph
Suplement verification for problem 2 in CONFIG_MTD_UBI_WL_THRESHOLD=4096 case(Running fsstress for 30 days): Config: x86_64 qemu flash: nandsim (16384 PEBs, pool=256, wl_pool=128) CONFIG_MTD_UBI_BEB_LIMIT=20 CONFIG_MTD_UBI_WL_THRESHOLD=4096 CONFIG_MTD_UBI_FASTMAP=y Before applying patches: ========================================================= from to count min avg max --------------------------------------------------------- 0 .. 9: 0 0 0 0 10 .. 99: 0 0 0 0 100 .. 999: 0 0 0 0 1000 .. 9999: 16320 5231 9247 9327 10000 .. 99999: 0 0 0 0 100000 .. inf: 64 261234 261234 261235 --------------------------------------------------------- Total : 16384 5231 10231 261235 The erase counter of first 64 PEBs is larger than other PEBs, about 28 times. After applying patches without pool reservation: ========================================================= from to count min avg max --------------------------------------------------------- 0 .. 9: 0 0 0 0 10 .. 99: 0 0 0 0 100 .. 999: 0 0 0 0 1000 .. 9999: 16320 8600 8872 9252 10000 .. 99999: 64 11609 12357 12374 100000 .. inf: 0 0 0 0 --------------------------------------------------------- Total : 16384 8600 8886 12374 12374 - 8600 = 3774 < 8192(WL_FREE_MAX_DIFF = 2*CONFIG_MTD_UBI_WL_THRESHOLD) |