Bug 219504 - iomap/buffered-io/XFS crashes with kernel Version > 6.1.91. Perhaps Changes in kernel 6.1.92 (and up) for XFS/iomap causing the problems?
Summary: iomap/buffered-io/XFS crashes with kernel Version > 6.1.91. Perhaps Changes i...
Status: RESOLVED CODE_FIX
Alias: None
Product: File System
Classification: Unclassified
Component: Other (show other bugs)
Hardware: AMD Linux
: P3 normal
Assignee: FileSystem/XFS Default Virtual Assignee
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-11-17 13:42 UTC by Mike-SPC
Modified: 2025-02-03 10:03 UTC (History)
1 user (show)

See Also:
Kernel Version: since 6.1.92 and up
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Mike-SPC 2024-11-17 13:42:44 UTC
Hi there,

the reason for this message is, with changes in Kernel 6.1.92, I've got some trouble with high I/O by using the XFS filesystem.
I think the reason for the kernel-backtrace, which is mentioned later, is one of the following commits (in Kernel 6.1.92):
e811fec51c66a0056459daa1ac834aea7d8d98f5, ea67e73129fceffd40b9193da93544c34d81b9c2, 54a37e5d07478358dcbf6e73b6c7e40e50a6f375, 580f40b4c956f38e83f66ebed4d81bbe4a7d82fb, 12339ec6fe4d41e69a81a13ca5e1c443fbe5bcba... and so on.

By using kernel 6.1.91 or lower version, no problems occur. By using kernel 6.1.92 or up (latest tests I've done with 6.1.105 - then I gave up) and high I/O workload, the problem occur.

I'm using XFS as a underlaying file-system with a 32bit kernel for exporting LUN's by using SCST (https://github.com/SCST-project/scst) through fileio (not blockio).
I'm using the latest SCST git release.
One of the scst developers also sees the problem here as being more in the direction of the Linux kernel, high I/O and XFS file system.


Here is the kernel-backtrace:

[Tue Aug 20 00:02:00 2024] ------------[ cut here ]------------
[Tue Aug 20 00:02:00 2024] WARNING: CPU: 5 PID: 2048 at fs/iomap/buffered-io.c:980 iomap_file_buffered_write_punch_delalloc+0x3b0/0x440
[Tue Aug 20 00:02:00 2024] Modules linked in: iscsi_scst(O) scst_vdisk(O) scst(O) dlm quota_v2 quota_tree autofs4 tcp_bbr sch_fq udf crc_itu_t input_leds led_class ses enclosure hid_generic wmi_bmof edac_mce_amd crc32_pclmul usbhid aesni_intel crypto_simd uas hid usb_storage rapl r8169 bnx2 i2c_piix4 mpt3sas ccp i2c_core sha1_generic k10temp pcspkr video wmi backlight
[Tue Aug 20 00:02:00 2024] CPU: 5 PID: 2048 Comm: disk042_0 Tainted: G S         O       6.1.106_LFS_FILE01 #1
[Tue Aug 20 00:02:00 2024] Hardware name: To Be Filled By O.E.M. X370 Pro4/X370 Pro4, BIOS P10.08 01/22/2024
[Tue Aug 20 00:02:00 2024] EIP: iomap_file_buffered_write_punch_delalloc+0x3b0/0x440
[Tue Aug 20 00:02:00 2024] Code: 26 00 89 c6 89 d8 e8 4f e3 ed ff f0 ff 4b 1c 75 9c 89 d8 e8 22 0a ef ff 66 90 eb 91 8d b6 00 00 00 00 0f 0b e9 f5 fd ff ff 90 <0f> 0b e9 df fd ff ff 90 0f 0b 8b 45 ec 8b 55 f0 89 f9 39 c6 19 d1
[Tue Aug 20 00:02:00 2024] EAX: a733e000 EBX: 0007f000 ECX: fffffef2 EDX: 0000010e
[Tue Aug 20 00:02:00 2024] ESI: a733f000 EDI: 00000000 EBP: 86df3bd0 ESP: 86df3b80
[Tue Aug 20 00:02:00 2024] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00010293
[Tue Aug 20 00:02:00 2024] CR0: 80050033 CR2: 37f87000 CR3: 0185a6e0 CR4: 00350ef0
[Tue Aug 20 00:02:00 2024] Call Trace:
[Tue Aug 20 00:02:00 2024]  ? show_regs.cold+0x16/0x1b
[Tue Aug 20 00:02:00 2024]  ? iomap_file_buffered_write_punch_delalloc+0x3b0/0x440
[Tue Aug 20 00:02:00 2024]  ? __warn+0x87/0xe0
[Tue Aug 20 00:02:00 2024]  ? iomap_file_buffered_write_punch_delalloc+0x3b0/0x440
[Tue Aug 20 00:02:00 2024]  ? iomap_file_buffered_write_punch_delalloc+0x3b0/0x440
[Tue Aug 20 00:02:00 2024]  ? report_bug+0xe5/0x170
[Tue Aug 20 00:02:00 2024]  ? exc_overflow+0x60/0x60
[Tue Aug 20 00:02:00 2024]  ? handle_bug+0x2a/0x50
[Tue Aug 20 00:02:00 2024]  ? exc_invalid_op+0x1e/0x70
[Tue Aug 20 00:02:00 2024]  ? handle_exception+0x101/0x101
[Tue Aug 20 00:02:00 2024]  ? exc_overflow+0x60/0x60
[Tue Aug 20 00:02:00 2024]  ? iomap_file_buffered_write_punch_delalloc+0x3b0/0x440
[Tue Aug 20 00:02:00 2024]  ? exc_overflow+0x60/0x60
[Tue Aug 20 00:02:00 2024]  ? iomap_file_buffered_write_punch_delalloc+0x3b0/0x440
[Tue Aug 20 00:02:00 2024]  ? xfs_dax_write_iomap_end+0xa0/0xa0
[Tue Aug 20 00:02:00 2024]  xfs_buffered_write_iomap_end+0x52/0xc0
[Tue Aug 20 00:02:00 2024]  ? xfs_buffered_write_iomap_end+0xc0/0xc0
[Tue Aug 20 00:02:00 2024]  iomap_iter+0xce/0x4b0
[Tue Aug 20 00:02:00 2024]  ? xfs_dax_write_iomap_end+0xa0/0xa0
[Tue Aug 20 00:02:00 2024]  iomap_file_buffered_write+0xa9/0x420
[Tue Aug 20 00:02:00 2024]  xfs_file_buffered_write+0x9d/0x2e0
[Tue Aug 20 00:02:00 2024]  xfs_file_write_iter+0xc9/0x100
[Tue Aug 20 00:02:00 2024]  fileio_exec_async+0x25e/0x3a0 [scst_vdisk]
[Tue Aug 20 00:02:00 2024]  fileio_exec_write+0x2ce/0x400 [scst_vdisk]
[Tue Aug 20 00:02:00 2024]  ? __switch_to_asm+0xdd/0xf0
[Tue Aug 20 00:02:00 2024]  ? __switch_to_asm+0xd7/0xf0
[Tue Aug 20 00:02:00 2024]  ? __switch_to_asm+0xd1/0xf0
[Tue Aug 20 00:02:00 2024]  ? __switch_to_asm+0xcb/0xf0
[Tue Aug 20 00:02:00 2024]  vdev_do_job+0x36/0xe0 [scst_vdisk]
[Tue Aug 20 00:02:00 2024]  ? __switch_to_asm+0x8f/0xf0
[Tue Aug 20 00:02:00 2024]  fileio_exec+0x1f/0x30 [scst_vdisk]
[Tue Aug 20 00:02:00 2024]  scst_do_real_exec+0x51/0x130 [scst]
[Tue Aug 20 00:02:00 2024]  scst_exec_check_blocking+0xa8/0x220 [scst]
[Tue Aug 20 00:02:00 2024]  scst_process_active_cmd+0x200/0x18f0 [scst]
[Tue Aug 20 00:02:00 2024]  scst_cmd_thread+0x15c/0x500 [scst]
[Tue Aug 20 00:02:00 2024]  ? prepare_to_wait_event+0x160/0x160
[Tue Aug 20 00:02:00 2024]  kthread+0xd2/0x100
[Tue Aug 20 00:02:00 2024]  ? scst_cmd_done_local+0x90/0x90 [scst]
[Tue Aug 20 00:02:00 2024]  ? kthread_complete_and_exit+0x20/0x20
[Tue Aug 20 00:02:00 2024]  ret_from_fork+0x1c/0x28
[Tue Aug 20 00:02:00 2024] ---[ end trace 0000000000000000 ]---
[Tue Aug 20 00:02:00 2024] ------------[ cut here ]------------
[Tue Aug 20 00:02:00 2024] WARNING: CPU: 5 PID: 2048 at fs/iomap/buffered-io.c:993 iomap_file_buffered_write_punch_delalloc+0x2f0/0x440
[Tue Aug 20 00:02:00 2024] Modules linked in: iscsi_scst(O) scst_vdisk(O) scst(O) dlm quota_v2 quota_tree autofs4 tcp_bbr sch_fq udf crc_itu_t input_leds led_class ses enclosure hid_generic wmi_bmof edac_mce_amd crc32_pclmul usbhid aesni_intel crypto_simd uas hid usb_storage rapl r8169 bnx2 i2c_piix4 mpt3sas ccp i2c_core sha1_generic k10temp pcspkr video wmi backlight
[Tue Aug 20 00:02:00 2024] CPU: 5 PID: 2048 Comm: disk042_0 Tainted: G S      W  O       6.1.106_LFS_FILE01 #1
[Tue Aug 20 00:02:00 2024] Hardware name: To Be Filled By O.E.M. X370 Pro4/X370 Pro4, BIOS P10.08 01/22/2024
[Tue Aug 20 00:02:00 2024] EIP: iomap_file_buffered_write_punch_delalloc+0x2f0/0x440
[Tue Aug 20 00:02:00 2024] Code: 8b 7d f0 01 c2 c1 e2 0c c7 45 d8 00 00 00 00 89 55 d4 39 d6 89 f9 83 d9 00 0f 8d 1e ff ff ff 89 75 d4 89 7d d8 e9 13 ff ff ff <0f> 0b 39 45 dc 8b 4d e4 19 d1 0f 8c b8 00 00 00 8b 45 ec 8b 7d dc
[Tue Aug 20 00:02:00 2024] EAX: a733f000 EBX: 00000000 ECX: a733f000 EDX: 00000000
[Tue Aug 20 00:02:00 2024] ESI: a733f000 EDI: 00000000 EBP: 86df3bd0 ESP: 86df3b80
[Tue Aug 20 00:02:00 2024] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00010246
[Tue Aug 20 00:02:00 2024] CR0: 80050033 CR2: 37f87000 CR3: 0185a6e0 CR4: 00350ef0
[Tue Aug 20 00:02:00 2024] Call Trace:
[Tue Aug 20 00:02:00 2024]  ? show_regs.cold+0x16/0x1b
[Tue Aug 20 00:02:00 2024]  ? iomap_file_buffered_write_punch_delalloc+0x2f0/0x440
[Tue Aug 20 00:02:00 2024]  ? __warn+0x87/0xe0
[Tue Aug 20 00:02:00 2024]  ? iomap_file_buffered_write_punch_delalloc+0x2f0/0x440
[Tue Aug 20 00:02:00 2024]  ? iomap_file_buffered_write_punch_delalloc+0x2f0/0x440
[Tue Aug 20 00:02:00 2024]  ? report_bug+0xe5/0x170
[Tue Aug 20 00:02:00 2024]  ? exc_overflow+0x60/0x60
[Tue Aug 20 00:02:00 2024]  ? handle_bug+0x2a/0x50
[Tue Aug 20 00:02:00 2024]  ? exc_invalid_op+0x1e/0x70
[Tue Aug 20 00:02:00 2024]  ? handle_exception+0x101/0x101
[Tue Aug 20 00:02:00 2024]  ? exc_overflow+0x60/0x60
[Tue Aug 20 00:02:00 2024]  ? iomap_file_buffered_write_punch_delalloc+0x2f0/0x440
[Tue Aug 20 00:02:00 2024]  ? exc_overflow+0x60/0x60
[Tue Aug 20 00:02:00 2024]  ? iomap_file_buffered_write_punch_delalloc+0x2f0/0x440
[Tue Aug 20 00:02:00 2024]  ? xfs_dax_write_iomap_end+0xa0/0xa0
[Tue Aug 20 00:02:00 2024]  xfs_buffered_write_iomap_end+0x52/0xc0
[Tue Aug 20 00:02:00 2024]  ? xfs_buffered_write_iomap_end+0xc0/0xc0
[Tue Aug 20 00:02:00 2024]  iomap_iter+0xce/0x4b0
[Tue Aug 20 00:02:00 2024]  ? xfs_dax_write_iomap_end+0xa0/0xa0
[Tue Aug 20 00:02:00 2024]  iomap_file_buffered_write+0xa9/0x420
[Tue Aug 20 00:02:00 2024]  xfs_file_buffered_write+0x9d/0x2e0
[Tue Aug 20 00:02:00 2024]  xfs_file_write_iter+0xc9/0x100
[Tue Aug 20 00:02:00 2024]  fileio_exec_async+0x25e/0x3a0 [scst_vdisk]
[Tue Aug 20 00:02:00 2024]  fileio_exec_write+0x2ce/0x400 [scst_vdisk]
[Tue Aug 20 00:02:00 2024]  ? __switch_to_asm+0xdd/0xf0
[Tue Aug 20 00:02:00 2024]  ? __switch_to_asm+0xd7/0xf0
[Tue Aug 20 00:02:00 2024]  ? __switch_to_asm+0xd1/0xf0
[Tue Aug 20 00:02:00 2024]  ? __switch_to_asm+0xcb/0xf0
[Tue Aug 20 00:02:00 2024]  vdev_do_job+0x36/0xe0 [scst_vdisk]
[Tue Aug 20 00:02:00 2024]  ? __switch_to_asm+0x8f/0xf0
[Tue Aug 20 00:02:00 2024]  fileio_exec+0x1f/0x30 [scst_vdisk]
[Tue Aug 20 00:02:00 2024]  scst_do_real_exec+0x51/0x130 [scst]
[Tue Aug 20 00:02:00 2024]  scst_exec_check_blocking+0xa8/0x220 [scst]
[Tue Aug 20 00:02:00 2024]  scst_process_active_cmd+0x200/0x18f0 [scst]
[Tue Aug 20 00:02:00 2024]  scst_cmd_thread+0x15c/0x500 [scst]
[Tue Aug 20 00:02:00 2024]  ? prepare_to_wait_event+0x160/0x160
[Tue Aug 20 00:02:00 2024]  kthread+0xd2/0x100
[Tue Aug 20 00:02:00 2024]  ? scst_cmd_done_local+0x90/0x90 [scst]
[Tue Aug 20 00:02:00 2024]  ? kthread_complete_and_exit+0x20/0x20
[Tue Aug 20 00:02:00 2024]  ret_from_fork+0x1c/0x28
[Tue Aug 20 00:02:00 2024] ---[ end trace 0000000000000000 ]---

Now my question is:
Where we have to search for the problem? At the kernel-maintaining staff or at the maintainer @SCST?

Thanks in advance and for investigation,
Mike
Comment 1 Long Li 2024-11-21 11:49:33 UTC
Hi, Mike:

Look at the code of 6.1.106:

 970                 /*
 971                  * If there is no more data to scan, all that is left is to
 972                  * punch out the remaining range.
 973                  */
 974                 if (start_byte == -ENXIO || start_byte == scan_end_byte)
 975                         break;
 976                 if (start_byte < 0) {
 977                         error = start_byte;
 978                         goto out_unlock; 
 979                 }
 980                 WARN_ON_ONCE(start_byte < punch_start_byte);
 981                 WARN_ON_ONCE(start_byte > scan_end_byte);
 982         
 983                 /*
 984                  * We find the end of this contiguous cached data range by
 985                  * seeking from start_byte to the beginning of the next hole.
 986                  */
 987                 data_end = mapping_seek_hole_data(inode->i_mapping, start_byte,
 988                                 scan_end_byte, SEEK_HOLE);
 989                 if (data_end < 0) {
 990                         error = data_end;
 991                         goto out_unlock;
 992                 }
 993                 WARN_ON_ONCE(data_end <= start_byte);  
 994                 WARN_ON_ONCE(data_end > scan_end_byte);
 995 
 996                 error = iomap_write_delalloc_scan(inode, &punch_start_byte,
 997                                 start_byte, data_end, punch);

Looking at your warning stack, it reminds me of a problem[1] I tried to solve before, but it seems different. In my case, there was only a warning on line 993. Perhaps it's not the same issue. Below is a link to my attempted fix patch, which wasn't accepted, but hopefully it can be helpful to you.

[1] https://patchwork.kernel.org/project/xfs/patch/20231216115559.3823359-1-leo.lilong@huawei.com/

Long Li
Comment 2 Mike-SPC 2024-11-28 21:09:27 UTC
Hello Long Li,

thanks for investigation.
It seems, that the patch went into the kernel 6.1.113:

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=e3aa99b13a99405d935910306d1bbf419edfd679

It looks like this:

--- snip ---

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 98617f00101d68..1833608f39318e 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -990,7 +990,15 @@ static int iomap_write_delalloc_release(struct inode *inode,
 			error = data_end;
 			goto out_unlock;
 		}
-		WARN_ON_ONCE(data_end <= start_byte);
+
+		/*
+		 * If we race with post-direct I/O invalidation of the page cache,
+		 * there might be no data left at start_byte.
+		 */
+		if (data_end == start_byte)
+			continue;
+
+		WARN_ON_ONCE(data_end < start_byte);
 		WARN_ON_ONCE(data_end > scan_end_byte);
 
 		error = iomap_write_delalloc_scan(inode, &punch_start_byte,

--- snap ---

Looks like your's:

https://patchwork.kernel.org/project/xfs/patch/20231216115559.3823359-1-leo.lilong@huawei.com/

Due to lack of time, I haven't gotten around to testing the latest kernel with this patch in it.

Regards,
Mike
Comment 3 Mike-SPC 2024-12-19 13:58:38 UTC
Hi there,

rechecked the newest kernel version 6.1.120.

Get the same problem.

Here is the kernel-backtrace:

[Thu Dec 19 14:03:21 2024] ------------[ cut here ]------------
[Thu Dec 19 14:03:21 2024] WARNING: CPU: 0 PID: 24318 at fs/iomap/buffered-io.c:980 iomap_file_buffered_write_punch_delalloc+0x398/0x440
[Thu Dec 19 14:03:21 2024] Modules linked in: iscsi_scst(O) scst_vdisk(O) scst(O) dlm quota_v2 quota_tree autofs4 tcp_bbr sch_fq udf crc_itu_t ses enclosure hid_generic wmi_bmof usbhid edac_mce_amd uas crc32_pclmul aesni_intel hid crypto_simd usb_storage rapl bnx2 i2c_piix4 r8169 pcspkr k10temp ccp i2c_core sha1_generic video mpt3sas backlight wmi
[Thu Dec 19 14:03:21 2024] CPU: 0 PID: 24318 Comm: disk014_0 Tainted: G S         O       6.1.120_LFS_FILE01 #1
[Thu Dec 19 14:03:21 2024] Hardware name: To Be Filled By O.E.M. X370 Pro4/X370 Pro4, BIOS P10.08 01/22/2024
[Thu Dec 19 14:03:21 2024] EIP: iomap_file_buffered_write_punch_delalloc+0x398/0x440
[Thu Dec 19 14:03:21 2024] Code: 84 8d 00 00 00 8b 45 e0 8b 40 20 83 c0 10 e8 2f 31 e0 ff 83 c4 44 89 f0 5b 5e 5f 5d e9 f5 31 8e 00 90 0f 0b e9 13 fe ff ff 90 <0f> 0b e9 fd fd ff ff 90 0f 0b e9 6b fe ff ff 90 0f 0b 8d b6 00 00
[Thu Dec 19 14:03:21 2024] EAX: bb68e000 EBX: ffffff85 ECX: bb68f000 EDX: 0000007b
[Thu Dec 19 14:03:21 2024] ESI: bb68f000 EDI: 00000000 EBP: 5137fbd0 ESP: 5137fb80
[Thu Dec 19 14:03:21 2024] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00010293
[Thu Dec 19 14:03:21 2024] CR0: 80050033 CR2: 0175a2d0 CR3: 03d1b720 CR4: 00350ef0
[Thu Dec 19 14:03:21 2024] Call Trace:
[Thu Dec 19 14:03:21 2024]  ? show_regs.cold+0x16/0x1b
[Thu Dec 19 14:03:21 2024]  ? iomap_file_buffered_write_punch_delalloc+0x398/0x440
[Thu Dec 19 14:03:21 2024]  ? __warn+0x87/0xe0
[Thu Dec 19 14:03:21 2024]  ? iomap_file_buffered_write_punch_delalloc+0x398/0x440
[Thu Dec 19 14:03:21 2024]  ? iomap_file_buffered_write_punch_delalloc+0x398/0x440
[Thu Dec 19 14:03:21 2024]  ? report_bug+0xe5/0x170
[Thu Dec 19 14:03:21 2024]  ? exc_overflow+0x60/0x60
[Thu Dec 19 14:03:21 2024]  ? handle_bug+0x2a/0x50
[Thu Dec 19 14:03:21 2024]  ? exc_invalid_op+0x1e/0x70
[Thu Dec 19 14:03:21 2024]  ? handle_exception+0x101/0x101
[Thu Dec 19 14:03:21 2024]  ? exc_overflow+0x60/0x60
[Thu Dec 19 14:03:21 2024]  ? iomap_file_buffered_write_punch_delalloc+0x398/0x440
[Thu Dec 19 14:03:21 2024]  ? exc_overflow+0x60/0x60
[Thu Dec 19 14:03:21 2024]  ? iomap_file_buffered_write_punch_delalloc+0x398/0x440
[Thu Dec 19 14:03:21 2024]  ? xfs_dax_write_iomap_end+0xa0/0xa0
[Thu Dec 19 14:03:21 2024]  ? xfs_buffered_write_iomap_end+0x52/0xc0
[Thu Dec 19 14:03:21 2024]  ? xfs_buffered_write_iomap_end+0xc0/0xc0
[Thu Dec 19 14:03:21 2024]  ? iomap_iter+0xce/0x4b0
[Thu Dec 19 14:03:21 2024]  ? xfs_dax_write_iomap_end+0xa0/0xa0
[Thu Dec 19 14:03:21 2024]  ? iomap_file_buffered_write+0xa9/0x420
[Thu Dec 19 14:03:21 2024]  ? xfs_file_buffered_write+0x9d/0x2e0
[Thu Dec 19 14:03:21 2024]  ? xfs_file_write_iter+0xc9/0x100
[Thu Dec 19 14:03:21 2024]  ? fileio_exec_async+0x25e/0x3a0 [scst_vdisk]
[Thu Dec 19 14:03:21 2024]  ? fileio_exec_write+0x2ce/0x400 [scst_vdisk]
[Thu Dec 19 14:03:21 2024]  ? vdev_do_job+0x36/0xe0 [scst_vdisk]
[Thu Dec 19 14:03:21 2024]  ? fileio_exec+0x1f/0x30 [scst_vdisk]
[Thu Dec 19 14:03:21 2024]  ? scst_do_real_exec+0x51/0x130 [scst]
[Thu Dec 19 14:03:21 2024]  ? scst_exec_check_blocking+0xa8/0x220 [scst]
[Thu Dec 19 14:03:21 2024]  ? scst_process_active_cmd+0x200/0x18f0 [scst]
[Thu Dec 19 14:03:21 2024]  ? scst_cmd_thread+0x15c/0x500 [scst]
[Thu Dec 19 14:03:21 2024]  ? prepare_to_wait_event+0x160/0x160
[Thu Dec 19 14:03:21 2024]  ? kthread+0xd2/0x100
[Thu Dec 19 14:03:21 2024]  ? scst_cmd_done_local+0x90/0x90 [scst]
[Thu Dec 19 14:03:21 2024]  ? kthread_complete_and_exit+0x20/0x20
[Thu Dec 19 14:03:21 2024]  ? ret_from_fork+0x1c/0x28
[Thu Dec 19 14:03:21 2024] ---[ end trace 0000000000000000 ]---

Thanks in advance and for investigation,
Mike
Comment 4 marco.nelissen 2024-12-30 21:26:30 UTC
I think this might be the same problem I'm running in to, which only seems
to happen on 32-bit kernels, starting with commit f43dc4dc3eff0.
Problem is still present in 6.13.

Easy repro steps are as follows:

hash mkfs.xfs || apt install -y xfsprogs
rm -f xfsimg.bin
truncate -s 6G xfsimg.bin  # I can reproduce this with an xfs image, but not with ext4
mkfs.xfs xfsimg.bin
mkdir -p xfs
mount xfsimg.bin xfs
truncate -s 5G xfs/diskimg.bin
mkfs.ext4 xfs/diskimg.bin  # this can probably be any fs type, it happens with fat too
mkdir -p mnt
mount xfs/diskimg.bin mnt
dd if=/dev/zero of=mnt/file.bin bs=1M of=mnt/file.bin bs=1M

The above almost immediately prints a warning to the kernel log, after
which there is a kworker thread hogging the CPU
Comment 5 marco.nelissen 2024-12-31 22:11:36 UTC
With the big caveat that I'm completely unfamiliar with this code, it seems
to me the problem is that here:
https://github.com/torvalds/linux/blame/ccb98ccef0e543c2bd4ef1a72270461957f3d8d0/mm/filemap.c#L2989
"bsz" is a 32-bit type on 32-bit kernels, and so when it gets used later
in that same function to mask the 64-bit "start" value with "~(bsz - 1)",
it's effectively truncating "start" to 32 bits.
This is more or less confirmed by the actual values of "start_byte" and
"punch_start_byte" when that WARN_ON_ONCE in buffer-io.c triggers, with
one being (close to) a 32-bit truncated version of the other.
Changing bsz to a 64-bit type fixes the problem for me.
Comment 6 Mike-SPC 2025-01-18 14:00:37 UTC
I am experiencing the same issue on a 32-bit system when using Kernel version 6.1.92 and above. 

As a non-programmer, I find this problem challenging to address independently.

It would be greatly appreciated if a fix could be provided in the form of a patch. Could the maintainers consider releasing one?

Thank you!
Comment 7 marco.nelissen 2025-01-18 23:51:39 UTC
On Sat, Jan 18, 2025 at 6:00 AM <bugzilla-daemon@kernel.org> wrote:
>
> https://bugzilla.kernel.org/show_bug.cgi?id=219504
>
> --- Comment #6 from Mike-SPC (speedcracker@hotmail.com) ---
> I am experiencing the same issue on a 32-bit system when using Kernel version
> 6.1.92 and above.
>
> As a non-programmer, I find this problem challenging to address
> independently.
>
> It would be greatly appreciated if a fix could be provided in the form of a
> patch. Could the maintainers consider releasing one?

There are patches for 2 separate 32-bit issues, both of which are in linux-next,
though only one of them appears to have been selected for 6.1, 6.6 and 6.12.
These patches are:
https://lore.kernel.org/linux-xfs/20250102190540.1356838-1-marco.nelissen@gmail.com/
https://lore.kernel.org/linux-xfs/20250109041253.2494374-1-marco.nelissen@gmail.com/
Comment 8 Mike-SPC 2025-01-28 20:00:06 UTC
(In reply to marco.nelissen from comment #7)

> There are patches for 2 separate 32-bit issues, both of which are in
> linux-next,
> though only one of them appears to have been selected for 6.1, 6.6 and 6.12.
> These patches are:
> https://lore.kernel.org/linux-xfs/20250102190540.1356838-1-marco.
> nelissen@gmail.com/
> https://lore.kernel.org/linux-xfs/20250109041253.2494374-1-marco.
> nelissen@gmail.com/

It looks like both patches are in kernel 6.1.127.
When I have time again, I will test kernel 6.1.127 (or higher) and report back.
Thanks so far.
Comment 9 Mike-SPC 2025-02-03 10:03:45 UTC
>(In reply to marco.nelissen from comment #7)
> On Sat, Jan 18, 2025 at 6:00 AM <bugzilla-daemon@kernel.org> wrote:

> There are patches for 2 separate 32-bit issues, both of which are in
> linux-next,
> though only one of them appears to have been selected for 6.1, 6.6 and 6.12.
> These patches are:
> https://lore.kernel.org/linux-xfs/20250102190540.1356838-1-marco.
> nelissen@gmail.com/
> https://lore.kernel.org/linux-xfs/20250109041253.2494374-1-marco.
> nelissen@gmail.com/

Hi there,

I've tested the latest kernel (6.1.128) and it works like it should be. :) 
Thanks again for investigation and fixing the bug.

Regards,
Mike

Note You need to log in before you can comment on or make changes to this bug.