Bug 213877 - Mount multiple SMR block devices exceed certain number cause system non-response
Summary: Mount multiple SMR block devices exceed certain number cause system non-response
Status: ASSIGNED
Alias: None
Product: File System
Classification: Unclassified
Component: f2fs (show other bugs)
Hardware: Intel Linux
: P1 normal
Assignee: Default virtual assignee for f2fs
URL:
Keywords: trivial
Depends on:
Blocks:
 
Reported: 2021-07-27 10:07 UTC by James Z
Modified: 2021-09-04 11:43 UTC (History)
2 users (show)

See Also:
Kernel Version: Linux 5.13.4-200.fc34.x86_64 #1 SMP Tue Jul 20 20:27:29 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Tree: Mainline
Regression: No


Attachments

Description James Z 2021-07-27 10:07:29 UTC
[1.] One-line summary of the problem:
Mount multiple SMR block devices exceed certain number cause system non-response

[2.] Full description of the problem/report:
Created some F2FS on SMR devices (mkfs.f2fs -m), then mounted in sequence. Each device is the same Model: HGST HSH721414AL (Size 14TB).
Empirically, found that when the amount of SMR device * 1.5Gb > System RAM, the system ran out of memory and hung. No dmesg output. For example, 24 SMR Disk need 24*1.5GB = 36GB. A system with 32G RAM can only mount 21 devices, the 22nd device will be a reproducible cause of system hang.
The number of SMR devices with other FS mounted on this system does not interfere with the result above.

[3.] Keywords (i.e., modules, networking, kernel):
F2FS, SMR, Memory

[4.] Kernel information
[4.1.] Kernel version (uname -a):
Linux 5.13.4-200.fc34.x86_64 #1 SMP Tue Jul 20 20:27:29 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

[4.2.] Kernel .config file:
Default Fedora 34 with f2fs-tools-1.14.0-2.fc34.x86_64

[5.] Most recent kernel version which did not have the bug:
None

[6.] Output of Oops.. message (if applicable) with symbolic information
     resolved (see Documentation/admin-guide/oops-tracing.rst)
None

[7.] A small shell script or example program which triggers the
     problem (if possible)
mount /dev/sdX /mnt/0X

[8.] Memory consumption 

With 24 * 14T SMR Block device with F2FS
free -g
              total        used        free      shared  buff/cache   available
Mem:             46          36           0           0          10          10
Swap:             0           0           0


With 3 * 14T SMR Block device with F2FS
free -g
               total        used        free      shared  buff/cache   available
Mem:               7           5           0           0           1           1
Swap:              7           0           7
Comment 1 Chao Yu 2021-07-27 16:22:09 UTC
Could you please apply below patch, and try nosmall_discard option during mount(), it expects that memory cost will decrease.

*Please note that*, I haven't do any test w/ this patch now, so please backup your data before your test.

https://git.kernel.org/pub/scm/linux/kernel/git/chao/linux.git/commit/?h=misc&id=5dba79cc25c3c902942c2f68a6e2546586c65f96
Comment 2 James Z 2021-07-30 03:10:23 UTC
[1.] Brief
With this patch, memory usage dropped from 1.5G per SMR disk to about 1G.
The read/write function works fine on disks with plenty of free space. However, on a near-full disk, read operations cause panic.

[2.] A small shell script or example program which triggers the problem
[James@DataT01 /mnt/03]$ touch 123

[3.] Kernel Output
[51671.050646] BUG: kernel NULL pointer dereference, address: 0000000000000000
[51671.050652] #PF: supervisor read access in kernel mode
[51671.050654] #PF: error_code(0x0000) - not-present page
[51671.050656] PGD 0 P4D 0 
[51671.050660] Oops: 0000 [#1] SMP NOPTI
[51671.050663] CPU: 4 PID: 52263 Comm: f2fs_ckpt-8:96 Tainted: G        W         5.13.4-200.fc34.x86_64 #1
[51671.050666] Hardware name: Micro-Star International Co., Ltd. MS-7B79/X470 GAMING PRO MAX (MS-7B79), BIOS M.60 06/11/2020
[51671.050668] RIP: 0010:f2fs_issue_discard.isra.0+0x77/0x170 [f2fs]
[51671.050700] Code: 00 00 00 8b 40 48 8b bb 5c 04 00 00 41 29 c0 8d 4f ff 44 21 c1 89 c8 f7 d1 83 e1 07 c1 e8 03 48 03 42 18 ba 01 00 00 00 d3 e2 <0f> be 08 41 89 c8 41 09 d0 44 88 00 85 d1 75 07 83 ab 90 04 00 00
[51671.050702] RSP: 0018:ffffbe70c1893ca0 EFLAGS: 00010212
[51671.050705] RAX: 0000000000000000 RBX: ffff9f734424f000 RCX: 0000000000000007
[51671.050707] RDX: 0000000000000080 RSI: 0000000008270001 RDI: 0000000000000200
[51671.050709] RBP: 00000027ffffffd8 R08: 0000000008260000 R09: 0000000000000000
[51671.050711] R10: 000000000003de00 R11: 0000000000000004 R12: 0000000008270001
[51671.050713] R13: 0000000000000000 R14: 0000000008270000 R15: ffff9f7345933840
[51671.050715] FS:  0000000000000000(0000) GS:ffff9f7456900000(0000) knlGS:0000000000000000
[51671.050717] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[51671.050719] CR2: 0000000000000000 CR3: 0000000144932000 CR4: 0000000000350ee0
[51671.050721] Call Trace:
[51671.050726]  f2fs_clear_prefree_segments+0x439/0x6f0 [f2fs]
[51671.050750]  ? __raw_callee_save___native_queued_spin_unlock+0x11/0x1e
[51671.050756]  f2fs_write_checkpoint+0xccf/0x11d0 [f2fs]
[51671.050782]  __checkpoint_and_complete_reqs+0x78/0x160 [f2fs]
[51671.050803]  issue_checkpoint_thread+0x38/0xb0 [f2fs]
[51671.050823]  ? finish_wait+0x80/0x80
[51671.050827]  ? __checkpoint_and_complete_reqs+0x160/0x160 [f2fs]
[51671.050846]  kthread+0x127/0x150
[51671.050850]  ? set_kthread_struct+0x40/0x40
[51671.050852]  ret_from_fork+0x22/0x30
[51671.050858] Modules linked in: binfmt_misc f2fs nls_utf8 hfsplus hfs crc32_generic lz4hc_compress lz4_compress isofs snd_seq_dummy snd_hrtimer xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_nat_tftp nf_conntrack_tftp bridge stp llc bonding tls nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security rfkill ip_set nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter snd_hda_codec_realtek snd_hda_codec_generic sunrpc ledtrig_audio snd_hda_codec_hdmi intel_rapl_msr intel_rapl_common snd_hda_intel snd_intel_dspcfg vfat edac_mce_amd snd_intel_sdw_acpi fat snd_hda_codec kvm_amd ppdev kvm snd_hda_core snd_hwdep irqbypass rapl snd_seq snd_seq_device pcspkr wmi_bmof k10temp snd_pcm joydev
[51671.050910]  i2c_piix4 snd_timer snd soundcore parport_pc parport gpio_amdpt gpio_generic acpi_cpufreq zram ip_tables radeon i2c_algo_bit drm_ttm_helper ttm drm_kms_helper cec crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel drm sp5100_tco ixgbe ccp mdio dca r8169 wmi fuse
[51671.050932] CR2: 0000000000000000
[51671.050935] ---[ end trace 76f29393379353e4 ]---
[51671.050936] RIP: 0010:f2fs_issue_discard.isra.0+0x77/0x170 [f2fs]
[51671.050958] Code: 00 00 00 8b 40 48 8b bb 5c 04 00 00 41 29 c0 8d 4f ff 44 21 c1 89 c8 f7 d1 83 e1 07 c1 e8 03 48 03 42 18 ba 01 00 00 00 d3 e2 <0f> be 08 41 89 c8 41 09 d0 44 88 00 85 d1 75 07 83 ab 90 04 00 00
[51671.050960] RSP: 0018:ffffbe70c1893ca0 EFLAGS: 00010212
[51671.050962] RAX: 0000000000000000 RBX: ffff9f734424f000 RCX: 0000000000000007
[51671.050964] RDX: 0000000000000080 RSI: 0000000008270001 RDI: 0000000000000200
[51671.050966] RBP: 00000027ffffffd8 R08: 0000000008260000 R09: 0000000000000000
[51671.050967] R10: 000000000003de00 R11: 0000000000000004 R12: 0000000008270001
[51671.050969] R13: 0000000000000000 R14: 0000000008270000 R15: ffff9f7345933840
[51671.050971] FS:  0000000000000000(0000) GS:ffff9f7456900000(0000) knlGS:0000000000000000
[51671.050973] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[51671.050974] CR2: 0000000000000000 CR3: 0000000144932000 CR4: 0000000000350ee0
Comment 3 James Z 2021-07-30 03:26:26 UTC
However, on a near-full disk, *write* operations cause panic.
Comment 5 James Z 2021-09-04 10:17:47 UTC
Could you please provide an available link to this patch? The above one is out of date.

Note You need to log in before you can comment on or make changes to this bug.