Bug 219484

Summary:	f2fs discard causes kernel NULL pointer dereferencing
Product:	File System	Reporter:	piergiorgio.sartor
Component:	f2fs	Assignee:	Default virtual assignee for f2fs (filesystem_f2fs)
Status:	RESOLVED CODE_FIX
Severity:	blocking	CC:	chao
Priority:	P3
Hardware:	Intel
OS:	Linux
Kernel Version:		Subsystem:
Regression:	No	Bisected commit-id:

Description piergiorgio.sartor 2024-11-09 12:01:14 UTC

Hi everybody,
this issue was reported to Fedora Bugzilla and to the f2fs-devel mailing list, to no avail.
I'm trying my luck here now.
I've to say this is really an issue, since it's blocking any kernel upgrade.

Fedora Bugzilla reference: https://bugzilla.redhat.com/show_bug.cgi?id=2305521

Working up to 6.9.12, after not anymore.

Some explanation.
I've a small script for backup over LV.
The FS is f2fs on top of LV.
The script creates a snapshot, mount it, performs a backup copy, remove the snapshot.
LVM is configured to issue discards on LV removal.
Kernel up to 6.9.x works fine, after that I get a NULL pointer dereferencing in f2fs on snapshot *creation* (new information, previously was "removal").
Furthermore, it depends on the order of snapshot.
There are 3 LV, "root", "home" and "data". Sometimes, if the snapshot is first done for "root", the others work. Not always.
"root" is the smallest LV.
If the the first snapshot is "home" (largest LV), there is always a crash.

Unfortunately, I cannot test on this machine, so if not already fixed, I'll have some difficulties to test kernel patches.
I'm considering to setup something else, but it is not really straightforward (because is not always happening).

Details also here: https://bugzilla.redhat.com/show_bug.cgi?id=2305521

Kernel trace below:

Aug 17 10:06:41 kernel: F2FS-fs (dm-6): recover fsync data on readonly fs
Aug 17 10:06:41 kernel: F2FS-fs (dm-6): Mounted with checkpoint version = adc5452
Aug 17 10:07:27 kernel: ------------[ cut here ]------------
Aug 17 10:07:27 kernel: WARNING: CPU: 2 PID: 969 at fs/f2fs/segment.c:1330 __submit_discard_cmd+0x27d/0x400 [f2fs]
Aug 17 10:07:27 kernel: Modules linked in: rpcrdma rdma_cm iw_cm ib_cm ib_core dimlib nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4
+nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_tables ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat
+nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security ip_set ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables
+qrtr hwmon_vid vfat fat spi_nor mtd x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt mei_hdcp kvm_intel mei_pxp intel_pmc_bxt iTCO_vendor_support ee1004
+intel_rapl_msr kvm eeepc_wmi asus_wmi sparse_keymap platform_profile rfkill r8169 intel_cstate wmi_bmof processor_thermal_device_pci_legacy realtek processor_thermal_device
+spi_intel_pci spi_intel i2c_i801 mei_me i2c_smbus processor_thermal_wt_hint mei processor_thermal_rfim idma64 processor_thermal_rapl intel_rapl_common
+processor_thermal_wt_req processor_thermal_power_floor
Aug 17 10:07:27 kernel:  processor_thermal_mbox intel_soc_dts_iosf intel_pmc_core int3403_thermal int340x_thermal_zone intel_vsec pmt_telemetry int3400_thermal acpi_pad
+pmt_class acpi_thermal_rel acpi_tad nfsd auth_rpcgss nfs_acl lockd grace sunrpc fuse loop nfnetlink f2fs crc32_generic lz4hc_compress lz4_compress dm_crypt i915
+crct10dif_pclmul crc32_pclmul crc32c_intel polyval_generic ghash_clmulni_intel sha512_ssse3 sdhci_pci cqhci sdhci i2c_algo_bit drm_buddy ttm sha256_ssse3
+spi_pxa2xx_platform uas mmc_core drm_display_helper usb_storage sha1_ssse3 dw_dmac cec video pinctrl_jasperlake wmi
Aug 17 10:07:27 kernel: CPU: 2 PID: 969 Comm: f2fs_discard-25 Not tainted 6.10.3-200.fc40.x86_64 #1
Aug 17 10:07:27 kernel: Hardware name: ASUSTeK COMPUTER INC. MINIPC PN41-S1/PN41-S1, BIOS 0405 07/07/2022
Aug 17 10:07:27 kernel: RIP: 0010:__submit_discard_cmd+0x27d/0x400 [f2fs]
Aug 17 10:07:27 kernel: Code: 8b 00 3b 46 10 0f 83 ee 00 00 00 48 c7 44 24 50 00 00 00 00 44 39 6c 24 2c 0f 83 a1 fe ff ff 8b 6c 24 2c 31 d2 e9 9e fe ff ff <0f> 0b 48 8b 44
+24 48 f0 80 08 04 e9 e9 fe ff ff 65 8b 15 48 3c 53
Aug 17 10:07:27 kernel: RSP: 0018:ffffbfe1c07dfd30 EFLAGS: 00010246
Aug 17 10:07:27 kernel: RAX: 0000000000000000 RBX: ffff9b28055be018 RCX: 000000001d46ffff
Aug 17 10:07:27 kernel: RDX: 000000001d470000 RSI: 000000001d470000 RDI: ffff9b28004c2580
Aug 17 10:07:27 kernel: RBP: 0000000000000000 R08: ffffbfe1c07dfd80 R09: ffffbfe1c07dfe78
Aug 17 10:07:27 kernel: R10: ffff9b2806401000 R11: ffff9b28004c2580 R12: 00000000055be000
Aug 17 10:07:27 kernel: R13: 0000000000000200 R14: ffff9b28055bc000 R15: ffff9b28101c6d90
Aug 17 10:07:27 kernel: FS:  0000000000000000(0000) GS:ffff9b2b70900000(0000) knlGS:0000000000000000
Aug 17 10:07:27 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 17 10:07:27 kernel: CR2: 00005589f3976000 CR3: 0000000130c4e000 CR4: 0000000000350ef0
Aug 17 10:07:27 kernel: Call Trace:
Aug 17 10:07:27 kernel:  <TASK>
Aug 17 10:07:27 kernel:  ? __submit_discard_cmd+0x27d/0x400 [f2fs]
Aug 17 10:07:27 kernel:  ? __warn.cold+0x8e/0xe8
Aug 17 10:07:27 kernel:  ? __submit_discard_cmd+0x27d/0x400 [f2fs]
Aug 17 10:07:27 kernel:  ? report_bug+0xff/0x140
Aug 17 10:07:27 kernel:  ? handle_bug+0x3c/0x80
Aug 17 10:07:27 kernel:  ? exc_invalid_op+0x17/0x70
Aug 17 10:07:27 kernel:  ? asm_exc_invalid_op+0x1a/0x20
Aug 17 10:07:27 kernel:  ? __submit_discard_cmd+0x27d/0x400 [f2fs]
Aug 17 10:07:27 kernel:  __issue_discard_cmd+0x1ca/0x350 [f2fs]
Aug 17 10:07:27 kernel:  issue_discard_thread+0x191/0x480 [f2fs]
Aug 17 10:07:27 kernel:  ? __pfx_autoremove_wake_function+0x10/0x10
Aug 17 10:07:27 kernel:  ? __pfx_issue_discard_thread+0x10/0x10 [f2fs]
Aug 17 10:07:27 kernel:  kthread+0xcf/0x100
Aug 17 10:07:27 kernel:  ? __pfx_kthread+0x10/0x10
Aug 17 10:07:27 kernel:  ret_from_fork+0x31/0x50
Aug 17 10:07:27 kernel:  ? __pfx_kthread+0x10/0x10
Aug 17 10:07:27 kernel:  ret_from_fork_asm+0x1a/0x30
Aug 17 10:07:27 kernel:  </TASK>
Aug 17 10:07:27 kernel: ---[ end trace 0000000000000000 ]---
Aug 17 10:07:27 kernel: BUG: kernel NULL pointer dereference, address: 0000000000000010
Aug 17 10:07:27 kernel: #PF: supervisor write access in kernel mode
Aug 17 10:07:27 kernel: #PF: error_code(0x0002) - not-present page
Aug 17 10:07:27 kernel: PGD 1069f9067 P4D 1069f9067 PUD 0
Aug 17 10:07:27 kernel: Oops: Oops: 0002 [#1] PREEMPT SMP NOPTI
Aug 17 10:07:27 kernel: CPU: 2 PID: 969 Comm: f2fs_discard-25 Tainted: G        W          6.10.3-200.fc40.x86_64 #1
Aug 17 10:07:27 kernel: Hardware name: ASUSTeK COMPUTER INC. MINIPC PN41-S1/PN41-S1, BIOS 0405 07/07/2022
Aug 17 10:07:27 kernel: RIP: 0010:__submit_discard_cmd+0x203/0x400 [f2fs]
Aug 17 10:07:27 kernel: Code: 89 4c 24 20 e8 ee 2e db ca 84 c0 74 14 48 8b 4c 24 20 4c 89 63 08 49 89 5f 28 49 89 4f 30 4c 89 21 48 8b 7c 24 50 8b 44 24 44 <09> 47 10 4c 89
+7f 40 48 c7 47 38 a0 f8 af c0 e8 29 8f d0 ca f0 41
Aug 17 10:07:27 kernel: RSP: 0018:ffffbfe1c07dfd30 EFLAGS: 00010202
Aug 17 10:07:27 kernel: RAX: 0000000000000000 RBX: ffff9b28055be018 RCX: ffff9b28055be018
Aug 17 10:07:27 kernel: RDX: ffff9b28055be018 RSI: ffff9b28055be018 RDI: 0000000000000000
Aug 17 10:07:27 kernel: RBP: 0000000000000000 R08: ffff9b28055be018 R09: ffffbfe1c07dfe78
Aug 17 10:07:27 kernel: R10: ffff9b2806401000 R11: ffff9b28004c2580 R12: ffff9b28101c6db8
Aug 17 10:07:27 kernel: R13: 0000000000000200 R14: ffff9b28055bc000 R15: ffff9b28101c6d90
Aug 17 10:07:27 kernel: FS:  0000000000000000(0000) GS:ffff9b2b70900000(0000) knlGS:0000000000000000
Aug 17 10:07:27 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 17 10:07:27 kernel: CR2: 0000000000000010 CR3: 0000000130c4e000 CR4: 0000000000350ef0
Aug 17 10:07:27 kernel: Call Trace:
Aug 17 10:07:27 kernel:  <TASK>
Aug 17 10:07:27 kernel:  ? __die_body.cold+0x19/0x27
Aug 17 10:07:27 kernel:  ? page_fault_oops+0x15a/0x2f0
Aug 17 10:07:27 kernel:  ? __submit_discard_cmd+0x27d/0x400 [f2fs]
Aug 17 10:07:27 kernel:  ? exc_page_fault+0x7e/0x180
Aug 17 10:07:27 kernel:  ? asm_exc_page_fault+0x26/0x30
Aug 17 10:07:27 kernel:  ? __submit_discard_cmd+0x203/0x400 [f2fs]
Aug 17 10:07:27 kernel:  __issue_discard_cmd+0x1ca/0x350 [f2fs]
Aug 17 10:07:27 kernel:  issue_discard_thread+0x191/0x480 [f2fs]
Aug 17 10:07:27 kernel:  ? __pfx_autoremove_wake_function+0x10/0x10
Aug 17 10:07:27 kernel:  ? __pfx_issue_discard_thread+0x10/0x10 [f2fs]
Aug 17 10:07:27 kernel:  kthread+0xcf/0x100
Aug 17 10:07:27 kernel:  ? __pfx_kthread+0x10/0x10
Aug 17 10:07:27 kernel:  ret_from_fork+0x31/0x50
Aug 17 10:07:27 kernel:  ? __pfx_kthread+0x10/0x10
Aug 17 10:07:27 kernel:  ret_from_fork_asm+0x1a/0x30
Aug 17 10:07:27 kernel:  </TASK>
Aug 17 10:07:27 kernel: Modules linked in: rpcrdma rdma_cm iw_cm ib_cm ib_core dimlib nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4
+nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_tables ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat
+nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security ip_set ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables
+qrtr hwmon_vid vfat fat spi_nor mtd x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt mei_hdcp kvm_intel mei_pxp intel_pmc_bxt iTCO_vendor_support ee1004
+intel_rapl_msr kvm eeepc_wmi asus_wmi sparse_keymap platform_profile rfkill r8169 intel_cstate wmi_bmof processor_thermal_device_pci_legacy realtek processor_thermal_device
+spi_intel_pci spi_intel i2c_i801 mei_me i2c_smbus processor_thermal_wt_hint mei processor_thermal_rfim idma64 processor_thermal_rapl intel_rapl_common
+processor_thermal_wt_req processor_thermal_power_floor
Aug 17 10:07:27 kernel:  processor_thermal_mbox intel_soc_dts_iosf intel_pmc_core int3403_thermal int340x_thermal_zone intel_vsec pmt_telemetry int3400_thermal acpi_pad
+pmt_class acpi_thermal_rel acpi_tad nfsd auth_rpcgss nfs_acl lockd grace sunrpc fuse loop nfnetlink f2fs crc32_generic lz4hc_compress lz4_compress dm_crypt i915
+crct10dif_pclmul crc32_pclmul crc32c_intel polyval_generic ghash_clmulni_intel sha512_ssse3 sdhci_pci cqhci sdhci i2c_algo_bit drm_buddy ttm sha256_ssse3
+spi_pxa2xx_platform uas mmc_core drm_display_helper usb_storage sha1_ssse3 dw_dmac cec video pinctrl_jasperlake wmi
Aug 17 10:07:27 kernel: CR2: 0000000000000010
Aug 17 10:07:27 kernel: ---[ end trace 0000000000000000 ]---
Aug 17 10:07:27 kernel: RIP: 0010:__submit_discard_cmd+0x203/0x400 [f2fs]
Aug 17 10:07:27 kernel: Code: 89 4c 24 20 e8 ee 2e db ca 84 c0 74 14 48 8b 4c 24 20 4c 89 63 08 49 89 5f 28 49 89 4f 30 4c 89 21 48 8b 7c 24 50 8b 44 24 44 <09> 47 10 4c 89
+7f 40 48 c7 47 38 a0 f8 af c0 e8 29 8f d0 ca f0 41
Aug 17 10:07:27 kernel: RSP: 0018:ffffbfe1c07dfd30 EFLAGS: 00010202
Aug 17 10:07:27 kernel: RAX: 0000000000000000 RBX: ffff9b28055be018 RCX: ffff9b28055be018
Aug 17 10:07:27 kernel: RDX: ffff9b28055be018 RSI: ffff9b28055be018 RDI: 0000000000000000
Aug 17 10:07:27 kernel: RBP: 0000000000000000 R08: ffff9b28055be018 R09: ffffbfe1c07dfe78
Aug 17 10:07:27 kernel: R10: ffff9b2806401000 R11: ffff9b28004c2580 R12: ffff9b28101c6db8
Aug 17 10:07:27 kernel: R13: 0000000000000200 R14: ffff9b28055bc000 R15: ffff9b28101c6d90
Aug 17 10:07:27 kernel: R13: 0000000000000200 R14: ffff9b28055bc000 R15: ffff9b28101c6d90
Aug 17 10:07:27 kernel: FS:  0000000000000000(0000) GS:ffff9b2b70900000(0000) knlGS:0000000000000000
Aug 17 10:07:27 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 17 10:07:27 kernel: CR2: 0000000000000010 CR3: 0000000130c4e000 CR4: 0000000000350ef0
Aug 17 10:07:27 kernel: note: f2fs_discard-25[969] exited with irqs disabled
Aug 17 10:07:27 kernel: ------------[ cut here ]------------


Thanks,

bye,

pg

Comment 1 Chao Yu 2024-11-09 15:11:20 UTC

Hi, thanks for your report.

Can you please help to check max_hw_discard_sectors parameter of dm device
via "cat /sys/block/<device_name>/queue/max_hw_discard_sectors"?

I doubt max_discard_blocks becomes zero in __submit_discard_cmd(), result
in that __blkdev_issue_discard() fails to allocate bio.

__submit_discard_cmd()
{
	unsigned int max_discard_blocks =
			SECTOR_TO_BLOCK(bdev_max_discard_sectors(bdev));
...
	while () {
...
		if (len > max_discard_blocks) {
			len = max_discard_blocks;
			last = false;
		}
...
		} else {
			err = __blkdev_issue_discard(bdev,
					SECTOR_FROM_BLOCK(start),
					SECTOR_FROM_BLOCK(len),
					GFP_NOFS, &bio);
		}
...
		f2fs_bug_on(sbi, !bio); // trigger warning here and panic below
}

Comment 2 piergiorgio.sartor 2024-11-09 15:40:09 UTC

Thanks for the prompt reply.

Actually, there is no "max_hw_discard_sectors", but only a "max_discard_segments", which is "1" (for all DM devices).
It is also "1" for the underlying SSD (/dev/sda).
The "discard_max_bytes", as well as the "discard_max_hw_bytes", is "2147450880" everywhere.

Hope this helps,

bye,

pg

Comment 3 piergiorgio.sartor 2024-11-10 11:36:36 UTC

One more thing, possibly important.

When I create the snapshot, with the working kernel, while "max_discard_segments" is still "1", the other two, "discard_max_bytes" and "discard_max_hw_bytes" are both "0", instead of "2147450880".

Hope this helps,

bye,

pg

Comment 4 Chao Yu 2024-11-10 14:44:59 UTC

Do we have any chance to apply this and try to check whether it can fix this bug?

From: Chao Yu <chao@kernel.org>

---
 fs/f2fs/segment.c | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 10ec69cbae68..86a22447b89b 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -1314,11 +1314,6 @@ static int __submit_discard_cmd(struct f2fs_sb_info *sbi,
 		unsigned long flags;
 		bool last = true;
 
-		if (len > max_discard_blocks) {
-			len = max_discard_blocks;
-			last = false;
-		}
-
 		(*issued)++;
 		if (*issued == dpolicy->max_requests)
 			last = true;
-- 
2.40.1

Comment 5 piergiorgio.sartor 2024-11-10 15:42:23 UTC

Thanks for the support.

Difficult to check the patch, I'll have to see with this PC what can I do (not so free to use).
Which kernel would be this 6.11.5/6/7?

Any other way to test?
For example, using sysfs interface?
What about the difference with 6.9.12 (working) with this not working?

I cannot promise, but I'll have a look on patching.

Thanks again,

bye,

pg

Comment 6 Chao Yu 2024-11-21 09:50:24 UTC

Sorry for long delay due to I'm out of office.

Now, I can reproduce this bug w/ below testcase:
- pvcreate /dev/vdb
- vgcreate myvg1 /dev/vdb
- lvcreate -L 1024m -n mylv1 myvg1
- mount /dev/myvg1/mylv1 /mnt/f2fs
- dd if=/dev/zero of=/mnt/f2fs/file bs=1M count=20
- sync
- rm /mnt/f2fs/file
- sync
- lvcreate -L 1024m -s -n mylv1-snapshot /dev/myvg1/mylv1
- umount /mnt/f2fs

------------[ cut here ]------------
kernel BUG at fs/f2fs/segment.c:1363!
Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
CPU: 4 UID: 0 PID: 730 Comm: umount Not tainted 6.12.0-rc3+ #1107
RIP: 0010:__submit_discard_cmd+0xa53/0x1410
<TASK>
__issue_discard_cmd+0x3e5/0x1190
f2fs_issue_discard_timeout+0x244/0x360
f2fs_put_super+0x1fc/0xed0
generic_shutdown_super+0x14c/0x4a0
kill_block_super+0x40/0x90
kill_f2fs_super+0x264/0x430

Let me figure out a patch for that soon.:)

Comment 7 Chao Yu 2024-11-21 09:54:31 UTC

(In reply to piergiorgio.sartor from comment #3)
> One more thing, possibly important.
> 
> When I create the snapshot, with the working kernel, while
> "max_discard_segments" is still "1", the other two, "discard_max_bytes" and
> "discard_max_hw_bytes" are both "0", instead of "2147450880".

Thanks for the hint, I think that would be a key to the truth.

Thanks,

> 
> Hope this helps,
> 
> bye,
> 
> pg

Comment 8 piergiorgio.sartor 2024-11-22 22:45:36 UTC

Thanks for taking the time to reproduce the issue.
I tried to compile the kernel with your patch, but it seems these days is not anymore as easy as it used to be. No success...
Good the you manage to see the issue!

Thanks,

pg

Comment 9 piergiorgio.sartor 2024-12-12 18:57:04 UTC

Hi all,

I tested kernel-6.12.4-100.fc40.x86_64.rpm (Fedora 40, Koji build).
This is supposed to include the patch and, for what I tested, it seems to work fine. No NULL pointer de-referencing, no crash, everything good as before.

I think you can close the bug, in case something else will pop up in the future, I can re-open.

Thanks for the support!

Merry Christmas & Happy New Year!

bye,

pg

Comment 10 Chao Yu 2024-12-17 13:39:03 UTC

(In reply to piergiorgio.sartor from comment #9)
> Hi all,
> 
> I tested kernel-6.12.4-100.fc40.x86_64.rpm (Fedora 40, Koji build).
> This is supposed to include the patch and, for what I tested, it seems to
> work fine. No NULL pointer de-referencing, no crash, everything good as
> before.

Thank you very much for the test and feedback!

> 
> I think you can close the bug, in case something else will pop up in the
> future, I can re-open.

Fine, let us know if you have any other problem.

> 
> Thanks for the support!
> 
> Merry Christmas & Happy New Year!

Merry Christmas & Happy New Year too!

Thanks,

> 
> bye,
> 
> pg