Bug 213877

Summary: Mount multiple SMR block devices exceed certain number cause system non-response
Product: File System Reporter: James Z (leftzheng)
Component: f2fsAssignee: Default virtual assignee for f2fs (filesystem_f2fs)
Status: ASSIGNED ---    
Severity: normal CC: chao, jaegeuk
Priority: P1 Keywords: trivial
Hardware: Intel   
OS: Linux   
Kernel Version: Linux 5.13.4-200.fc34.x86_64 #1 SMP Tue Jul 20 20:27:29 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux Subsystem:
Regression: No Bisected commit-id:

Description James Z 2021-07-27 10:07:29 UTC
[1.] One-line summary of the problem:
Mount multiple SMR block devices exceed certain number cause system non-response

[2.] Full description of the problem/report:
Created some F2FS on SMR devices (mkfs.f2fs -m), then mounted in sequence. Each device is the same Model: HGST HSH721414AL (Size 14TB).
Empirically, found that when the amount of SMR device * 1.5Gb > System RAM, the system ran out of memory and hung. No dmesg output. For example, 24 SMR Disk need 24*1.5GB = 36GB. A system with 32G RAM can only mount 21 devices, the 22nd device will be a reproducible cause of system hang.
The number of SMR devices with other FS mounted on this system does not interfere with the result above.

[3.] Keywords (i.e., modules, networking, kernel):
F2FS, SMR, Memory

[4.] Kernel information
[4.1.] Kernel version (uname -a):
Linux 5.13.4-200.fc34.x86_64 #1 SMP Tue Jul 20 20:27:29 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

[4.2.] Kernel .config file:
Default Fedora 34 with f2fs-tools-1.14.0-2.fc34.x86_64

[5.] Most recent kernel version which did not have the bug:
None

[6.] Output of Oops.. message (if applicable) with symbolic information
     resolved (see Documentation/admin-guide/oops-tracing.rst)
None

[7.] A small shell script or example program which triggers the
     problem (if possible)
mount /dev/sdX /mnt/0X

[8.] Memory consumption 

With 24 * 14T SMR Block device with F2FS
free -g
              total        used        free      shared  buff/cache   available
Mem:             46          36           0           0          10          10
Swap:             0           0           0


With 3 * 14T SMR Block device with F2FS
free -g
               total        used        free      shared  buff/cache   available
Mem:               7           5           0           0           1           1
Swap:              7           0           7
Comment 1 Chao Yu 2021-07-27 16:22:09 UTC
Could you please apply below patch, and try nosmall_discard option during mount(), it expects that memory cost will decrease.

*Please note that*, I haven't do any test w/ this patch now, so please backup your data before your test.

https://git.kernel.org/pub/scm/linux/kernel/git/chao/linux.git/commit/?h=misc&id=5dba79cc25c3c902942c2f68a6e2546586c65f96
Comment 2 James Z 2021-07-30 03:10:23 UTC
[1.] Brief
With this patch, memory usage dropped from 1.5G per SMR disk to about 1G.
The read/write function works fine on disks with plenty of free space. However, on a near-full disk, read operations cause panic.

[2.] A small shell script or example program which triggers the problem
[James@DataT01 /mnt/03]$ touch 123

[3.] Kernel Output
[51671.050646] BUG: kernel NULL pointer dereference, address: 0000000000000000
[51671.050652] #PF: supervisor read access in kernel mode
[51671.050654] #PF: error_code(0x0000) - not-present page
[51671.050656] PGD 0 P4D 0 
[51671.050660] Oops: 0000 [#1] SMP NOPTI
[51671.050663] CPU: 4 PID: 52263 Comm: f2fs_ckpt-8:96 Tainted: G        W         5.13.4-200.fc34.x86_64 #1
[51671.050666] Hardware name: Micro-Star International Co., Ltd. MS-7B79/X470 GAMING PRO MAX (MS-7B79), BIOS M.60 06/11/2020
[51671.050668] RIP: 0010:f2fs_issue_discard.isra.0+0x77/0x170 [f2fs]
[51671.050700] Code: 00 00 00 8b 40 48 8b bb 5c 04 00 00 41 29 c0 8d 4f ff 44 21 c1 89 c8 f7 d1 83 e1 07 c1 e8 03 48 03 42 18 ba 01 00 00 00 d3 e2 <0f> be 08 41 89 c8 41 09 d0 44 88 00 85 d1 75 07 83 ab 90 04 00 00
[51671.050702] RSP: 0018:ffffbe70c1893ca0 EFLAGS: 00010212
[51671.050705] RAX: 0000000000000000 RBX: ffff9f734424f000 RCX: 0000000000000007
[51671.050707] RDX: 0000000000000080 RSI: 0000000008270001 RDI: 0000000000000200
[51671.050709] RBP: 00000027ffffffd8 R08: 0000000008260000 R09: 0000000000000000
[51671.050711] R10: 000000000003de00 R11: 0000000000000004 R12: 0000000008270001
[51671.050713] R13: 0000000000000000 R14: 0000000008270000 R15: ffff9f7345933840
[51671.050715] FS:  0000000000000000(0000) GS:ffff9f7456900000(0000) knlGS:0000000000000000
[51671.050717] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[51671.050719] CR2: 0000000000000000 CR3: 0000000144932000 CR4: 0000000000350ee0
[51671.050721] Call Trace:
[51671.050726]  f2fs_clear_prefree_segments+0x439/0x6f0 [f2fs]
[51671.050750]  ? __raw_callee_save___native_queued_spin_unlock+0x11/0x1e
[51671.050756]  f2fs_write_checkpoint+0xccf/0x11d0 [f2fs]
[51671.050782]  __checkpoint_and_complete_reqs+0x78/0x160 [f2fs]
[51671.050803]  issue_checkpoint_thread+0x38/0xb0 [f2fs]
[51671.050823]  ? finish_wait+0x80/0x80
[51671.050827]  ? __checkpoint_and_complete_reqs+0x160/0x160 [f2fs]
[51671.050846]  kthread+0x127/0x150
[51671.050850]  ? set_kthread_struct+0x40/0x40
[51671.050852]  ret_from_fork+0x22/0x30
[51671.050858] Modules linked in: binfmt_misc f2fs nls_utf8 hfsplus hfs crc32_generic lz4hc_compress lz4_compress isofs snd_seq_dummy snd_hrtimer xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_nat_tftp nf_conntrack_tftp bridge stp llc bonding tls nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security rfkill ip_set nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter snd_hda_codec_realtek snd_hda_codec_generic sunrpc ledtrig_audio snd_hda_codec_hdmi intel_rapl_msr intel_rapl_common snd_hda_intel snd_intel_dspcfg vfat edac_mce_amd snd_intel_sdw_acpi fat snd_hda_codec kvm_amd ppdev kvm snd_hda_core snd_hwdep irqbypass rapl snd_seq snd_seq_device pcspkr wmi_bmof k10temp snd_pcm joydev
[51671.050910]  i2c_piix4 snd_timer snd soundcore parport_pc parport gpio_amdpt gpio_generic acpi_cpufreq zram ip_tables radeon i2c_algo_bit drm_ttm_helper ttm drm_kms_helper cec crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel drm sp5100_tco ixgbe ccp mdio dca r8169 wmi fuse
[51671.050932] CR2: 0000000000000000
[51671.050935] ---[ end trace 76f29393379353e4 ]---
[51671.050936] RIP: 0010:f2fs_issue_discard.isra.0+0x77/0x170 [f2fs]
[51671.050958] Code: 00 00 00 8b 40 48 8b bb 5c 04 00 00 41 29 c0 8d 4f ff 44 21 c1 89 c8 f7 d1 83 e1 07 c1 e8 03 48 03 42 18 ba 01 00 00 00 d3 e2 <0f> be 08 41 89 c8 41 09 d0 44 88 00 85 d1 75 07 83 ab 90 04 00 00
[51671.050960] RSP: 0018:ffffbe70c1893ca0 EFLAGS: 00010212
[51671.050962] RAX: 0000000000000000 RBX: ffff9f734424f000 RCX: 0000000000000007
[51671.050964] RDX: 0000000000000080 RSI: 0000000008270001 RDI: 0000000000000200
[51671.050966] RBP: 00000027ffffffd8 R08: 0000000008260000 R09: 0000000000000000
[51671.050967] R10: 000000000003de00 R11: 0000000000000004 R12: 0000000008270001
[51671.050969] R13: 0000000000000000 R14: 0000000008270000 R15: ffff9f7345933840
[51671.050971] FS:  0000000000000000(0000) GS:ffff9f7456900000(0000) knlGS:0000000000000000
[51671.050973] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[51671.050974] CR2: 0000000000000000 CR3: 0000000144932000 CR4: 0000000000350ee0
Comment 3 James Z 2021-07-30 03:26:26 UTC
However, on a near-full disk, *write* operations cause panic.
Comment 5 James Z 2021-09-04 10:17:47 UTC
Could you please provide an available link to this patch? The above one is out of date.