Created attachment 301008 [details] output from dmesg after echo w > /proc/sysrq-trigger Overview: When I try to extract an uncompressed tar archive (2.6 milion files, 760.3 GiB in size) on newly created (empty) XFS file system, after first low tens of gigabytes extracted the process hangs in iowait indefinitely. One CPU core is 100% occupied with iowait, the other CPU core is idle (on 2-core Intel Celeron G1610T). I have kernel compiled with my .config file. When I try this with a more "standard" kernel, the problem is not reproducible. Steps to Reproduce: 1) compile the kernel with the attached .config 2) reboot with this kernel 3) create a new XFS filesystem on a spare drive (just mkfs.xfs -f <dev>) 4) mount this new file system 5) try to extract large amount of data there Actual results: After 20-40 GiB written, the process hangs in iowait indefinitely, never finishing the archive extraction. Expected Results: Archive extraction continues smoothly until done. Build Date & Hardware: 2022-05-01 on HP ProLiant MicroServer Gen8, 4GB ECC RAM Additional Information: No other filesystem tested with the same archive on the same hardware before or after this (ext2, ext3, ext4, reiserfs3, jfs, nilfs2, f2fs, btrfs, zfs) has shown this behavior. When I downgraded the kernel to 5.10.109, the XFS started working again. Kernel versions higher than 5.15 seem to be affected, I tried 5.17.1, 5.17.6 and 5.18.0-rc7, they all hang up after a few minutes. No error is reported to the system log or to dmesg when the process hangs. No error shows on stdout or stderr of the tar process either. This is not a SMR problem. None of the disks present in the test setup are SMR. All are CMR, and while they certainly are not brand new, they are all in good working condition. Attached is the dmesg output after issuing this command: echo w > /proc/sysrq-trigger More could be found here: https://forums.gentoo.org/viewtopic-p-8709116.html
Created attachment 301009 [details] kernel config file
Please see https://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F and supplement the bug report with the missing information. Thanks.
On Fri, May 20, 2022 at 11:56:06AM +0000, bugzilla-daemon@kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=216007 > > Bug ID: 216007 > Summary: XFS hangs in iowait when extracting large number of > files > Product: File System > Version: 2.5 > Kernel Version: 5.15.32 > Hardware: All > OS: Linux > Tree: Mainline > Status: NEW > Severity: normal > Priority: P1 > Component: XFS > Assignee: filesystem_xfs@kernel-bugs.kernel.org > Reporter: bugzkernelorg8392@araxon.sk > Regression: No > > Created attachment 301008 [details] > --> https://bugzilla.kernel.org/attachment.cgi?id=301008&action=edit > output from dmesg after echo w > /proc/sysrq-trigger > > Overview: > > When I try to extract an uncompressed tar archive (2.6 milion files, 760.3 > GiB > in size) on newly created (empty) XFS file system, after first low tens of > gigabytes extracted the process hangs in iowait indefinitely. One CPU core is > 100% occupied with iowait, the other CPU core is idle (on 2-core Intel > Celeron > G1610T). > > I have kernel compiled with my .config file. When I try this with a more > "standard" kernel, the problem is not reproducible. > > Steps to Reproduce: > > 1) compile the kernel with the attached .config > > 2) reboot with this kernel > > 3) create a new XFS filesystem on a spare drive (just mkfs.xfs -f <dev>) > > 4) mount this new file system > > 5) try to extract large amount of data there > > Actual results: > > After 20-40 GiB written, the process hangs in iowait indefinitely, never > finishing the archive extraction. [ 805.233836] task:tar state:D stack: 0 pid: 2492 ppid: 2491 flags:0x00004000 [ 805.233840] Call Trace: [ 805.233841] <TASK> [ 805.233842] __schedule+0x1c9/0x510 [ 805.233846] ? lock_timer_base+0x5c/0x80 [ 805.233850] schedule+0x3f/0xa0 [ 805.233853] schedule_timeout+0x7c/0xf0 [ 805.233858] ? init_timer_key+0x30/0x30 [ 805.233862] io_schedule_timeout+0x47/0x70 [ 805.233866] congestion_wait+0x79/0xd0 [ 805.233872] ? wait_woken+0x60/0x60 [ 805.233876] xfs_buf_alloc_pages+0xd0/0x1b0 [ 805.233881] xfs_buf_get_map+0x259/0x300 [ 805.233886] ? xfs_buf_item_init+0x150/0x160 [ 805.233892] xfs_trans_get_buf_map+0xa9/0x120 [ 805.233897] xfs_ialloc_inode_init+0x129/0x2d0 [ 805.233901] ? xfs_ialloc_ag_alloc+0x1df/0x630 [ 805.233904] xfs_ialloc_ag_alloc+0x1df/0x630 [ 805.233908] xfs_dialloc+0x1b4/0x720 [ 805.233912] xfs_create+0x1d7/0x450 [ 805.233917] xfs_generic_create+0x114/0x2d0 [ 805.233922] path_openat+0x510/0xe10 [ 805.233925] do_filp_open+0xad/0x150 [ 805.233929] ? xfs_blockgc_clear_iflag+0x93/0xb0 [ 805.233932] ? xfs_iunlock+0x52/0x90 [ 805.233937] do_sys_openat2+0x91/0x150 [ 805.233942] __x64_sys_openat+0x4e/0x90 [ 805.233946] do_syscall_64+0x43/0x90 [ 805.233952] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 805.233959] RIP: 0033:0x7f763ccc9572 [ 805.233962] RSP: 002b:00007ffef1391530 EFLAGS: 00000246 ORIG_RAX: 0000000000000101 [ 805.233966] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f763ccc9572 [ 805.233969] RDX: 00000000000809c1 RSI: 000055b1d5b19270 RDI: 0000000000000004 [ 805.233971] RBP: 0000000000000180 R08: 000000000000c0c0 R09: 000055b1d5b145f0 [ 805.233973] R10: 0000000000000180 R11: 0000000000000246 R12: 0000000000000000 [ 805.233974] R13: 00000000000809c1 R14: 000055b1d5b19270 R15: 000055b1d59d2248 [ 805.233977] </TASK> It's waiting on memory allocation, which is probably waiting on IO completion somewhere to clean dirty pages. This suggests there's a problem with the storage hardware, the storage stack below XFS or there's an issue with memory cleaning/reclaim stalling and not making progress. > Expected Results: > > Archive extraction continues smoothly until done. > > Build Date & Hardware: > > 2022-05-01 on HP ProLiant MicroServer Gen8, 4GB ECC RAM > > Additional Information: > > No other filesystem tested with the same archive on the same hardware before > or > after this (ext2, ext3, ext4, reiserfs3, jfs, nilfs2, f2fs, btrfs, zfs) has > shown this behavior. When I downgraded the kernel to 5.10.109, the XFS > started > working again. Kernel versions higher than 5.15 seem to be affected, I tried > 5.17.1, 5.17.6 and 5.18.0-rc7, they all hang up after a few minutes. Doesn't actually look like an XFS problem from the evidence supplied, though. What sort of storage subsystem does this machine have? If it's a spinning disk then you've probably just filled memory > More could be found here: https://forums.gentoo.org/viewtopic-p-8709116.html Oh, wait: "I compiled a more mainstream version of sys-kernel/gentoo-sources-5.15.32-r1 (removed my .config file and let genkernel to fill it with default options) and lo and behold, in this kernel I could not make it go stuck anymore. [....] However, after I altered my old kernel config to contain these values and rebooting, I'm still triggering the bug. It may not be a XFS issue after all." From the evidence presented, I'd agree that this doesn't look an XFS problem, either. Cheers, Dave.
> What sort of storage subsystem does this machine have? If it's a spinning > disk then you've probably just filled memory Yes, all the disks are classic spinning CMR disks. But, out of all file systems tried, only XFS is doing this on the test machine. I can trigger this behavior every time. And kernels from 5.10 and bellow still work, even with my non-standard .config. Here is the memory situation when it is stuck: ftp-back ~ # free total used free shared buff/cache available Mem: 3995528 175872 69240 416 3750416 3763584 Swap: 2097084 0 2097084 I will try to put together another test machine on Monday, without using any of the components from this one, just to rule out a hardware bug. This may not be a XFS bug, but so far only XFS seems to suffer from it.
On Sat, May 21, 2022 at 05:14:36AM +0000, bugzilla-daemon@kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=216007 > > --- Comment #4 from Peter Pavlisko (bugzkernelorg8392@araxon.sk) --- > > What sort of storage subsystem does this machine have? If it's a spinning > > disk then you've probably just filled memory > > Yes, all the disks are classic spinning CMR disks. But, out of all file > systems > tried, only XFS is doing this on the test machine. I can trigger this > behavior > every time. And kernels from 5.10 and bellow still work, even with my > non-standard .config. > > Here is the memory situation when it is stuck: > > ftp-back ~ # free > total used free shared buff/cache > available > Mem: 3995528 175872 69240 416 3750416 > 3763584 Doesn't tell us a whole lot except for "no free memory to allocate without reclaim". /proc/meminfo, /proc/vmstat and /proc/slabinfo would tell us a lot more. Also, knowing if you've tweaked things like dirty ratios, etc would also be helpful... > This may not be a XFS bug, but so far only XFS seems to suffer from it. Not that uncommon, really. XFS puts a different load on the memory allocation/reclaim and cache subsystems compared to other filesystems, so XFS tends to trip over bugs that others don't. Cheers, Dave.
(In reply to Chris Murphy from comment #2) > Please see > https://xfs.org/index.php/XFS_FAQ#Q: > _What_information_should_I_include_when_reporting_a_problem.3F and > supplement the bug report with the missing information. Thanks. certainly! > kernel version (uname -a) 5.15.32-gentoo-r1 > xfsprogs version (xfs_repair -V) xfs_repair version 5.14.2 > number of CPUs 1 CPU, 2 cores (Intel Celeron G1610T @ 2.30GHz) > contents of /proc/meminfo (at the time of iowait hangup) MemTotal: 3995528 kB MemFree: 29096 kB MemAvailable: 3749216 kB Buffers: 19984 kB Cached: 3556248 kB SwapCached: 0 kB Active: 62888 kB Inactive: 3560968 kB Active(anon): 272 kB Inactive(anon): 47772 kB Active(file): 62616 kB Inactive(file): 3513196 kB Unevictable: 0 kB Mlocked: 0 kB SwapTotal: 2097084 kB SwapFree: 2097084 kB Dirty: 28 kB Writeback: 0 kB AnonPages: 47628 kB Mapped: 19540 kB Shmem: 416 kB KReclaimable: 199964 kB Slab: 286472 kB SReclaimable: 199964 kB SUnreclaim: 86508 kB KernelStack: 1984 kB PageTables: 1648 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 4094848 kB Committed_AS: 117448 kB VmallocTotal: 34359738367 kB VmallocUsed: 2764 kB VmallocChunk: 0 kB Percpu: 832 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB Hugetlb: 0 kB DirectMap4k: 20364 kB DirectMap2M: 4139008 kB > contents of /proc/mounts /dev/root / ext4 rw,noatime 0 0 devtmpfs /dev devtmpfs rw,nosuid,relatime,size=10240k,nr_inodes=498934,mode=755 0 0 proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0 tmpfs /run tmpfs rw,nosuid,nodev,size=799220k,nr_inodes=819200,mode=755 0 0 sysfs /sys sysfs rw,nosuid,nodev,noexec,relatime 0 0 cgroup_root /sys/fs/cgroup tmpfs rw,nosuid,nodev,noexec,relatime,size=10240k,mode=755 0 0 openrc /sys/fs/cgroup/openrc cgroup rw,nosuid,nodev,noexec,relatime,release_agent=/lib/rc/sh/cgroup-release-agent.sh,name=openrc 0 0 none /sys/fs/cgroup/unified cgroup2 rw,nosuid,nodev,noexec,relatime,nsdelegate 0 0 mqueue /dev/mqueue mqueue rw,nosuid,nodev,noexec,relatime 0 0 devpts /dev/pts devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0 0 shm /dev/shm tmpfs rw,nosuid,nodev,noexec,relatime 0 0 binfmt_misc /proc/sys/fs/binfmt_misc binfmt_misc rw,nosuid,nodev,noexec,relatime 0 0 /dev/sdc1 /mnt/test xfs rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota 0 0 > contents of /proc/partitions major minor #blocks name 1 0 16384 ram0 1 1 16384 ram1 1 2 16384 ram2 1 3 16384 ram3 1 4 16384 ram4 1 5 16384 ram5 1 6 16384 ram6 1 7 16384 ram7 1 8 16384 ram8 1 9 16384 ram9 1 10 16384 ram10 1 11 16384 ram11 1 12 16384 ram12 1 13 16384 ram13 1 14 16384 ram14 1 15 16384 ram15 8 0 1953514584 sda 8 1 131072 sda1 8 2 2097152 sda2 8 3 1951285336 sda3 8 16 1953514584 sdb 8 17 131072 sdb1 8 18 2097152 sdb2 8 19 1951285336 sdb3 8 32 976762584 sdc 8 33 976761560 sdc1 11 0 1048575 sr0 9 3 1951285248 md3 9 2 2097088 md2 9 1 131008 md1 > RAID layout (hardware and/or software) software RAID1 on system disks (/dev/sda + /dev/sdb, ext4) no RAID on xfs disk (/dev/sdc, xfs) > LVM configuration no LVM > type of disks you are using CMR 3.5" SATA disks /dev/sda and /dev/sdb: WD Red 2TB (WDC WD20EFRX-68EUZN0) /dev/sdc: Seagate Constellation ES 1TB (MB1000GCEEK) > write cache status of drives not sure - how do I find out? > size of BBWC and mode it is running in no HW RAID (the drive controller is set to AHCI mode) no battery backup > xfs_info output on the filesystem in question meta-data=/dev/sdc1 isize=512 agcount=4, agsize=61047598 blks = sectsz=512 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=1, rmapbt=0 = reflink=1 bigtime=0 inobtcount=0 data = bsize=4096 blocks=244190390, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0, ftype=1 log =internal log bsize=4096 blocks=119233, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 > dmesg output showing all error messages and stack traces already an attachment to this ticket
Furthermore, I was able to do a git bisect on this repository: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git The offending commit is this one: 9ba0889e2272294bfbb5589b1b180ad2e782b2a4 is the first bad commit commit 9ba0889e2272294bfbb5589b1b180ad2e782b2a4 Author: Dave Chinner <david@fromorbit.com> Date: Tue Jun 8 09:19:22 2021 -0700 xfs: drop the AGI being passed to xfs_check_agi_freecount From: Dave Chinner <dchinner@redhat.com> Stephen Rothwell reported this compiler warning from linux-next: fs/xfs/libxfs/xfs_ialloc.c: In function 'xfs_difree_finobt': fs/xfs/libxfs/xfs_ialloc.c:2032:20: warning: unused variable 'agi' [-Wunused-variable] 2032 | struct xfs_agi *agi = agbp->b_addr; Which is fallout from agno -> perag conversions that were done in this function. xfs_check_agi_freecount() is the only user of "agi" in xfs_difree_finobt() now, and it only uses the agi to get the current free inode count. We hold that in the perag structure, so there's not need to directly reference the raw AGI to get this information. The btree cursor being passed to xfs_check_agi_freecount() has a reference to the perag being operated on, so use that directly in xfs_check_agi_freecount() rather than passing an AGI. Fixes: 7b13c5155182 ("xfs: use perag for ialloc btree cursors") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> fs/xfs/libxfs/xfs_ialloc.c | 28 +++++++++++++--------------- 1 file changed, 13 insertions(+), 15 deletions(-)
(In reply to Peter Pavlisko from comment #7) > Furthermore, I was able to do a git bisect on this repository: > > git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git > > The offending commit is this one: > > 9ba0889e2272294bfbb5589b1b180ad2e782b2a4 is the first bad commit > commit 9ba0889e2272294bfbb5589b1b180ad2e782b2a4 > Author: Dave Chinner <david@fromorbit.com> > Date: Tue Jun 8 09:19:22 2021 -0700 > > xfs: drop the AGI being passed to xfs_check_agi_freecount > > From: Dave Chinner <dchinner@redhat.com> > > Stephen Rothwell reported this compiler warning from linux-next: > > fs/xfs/libxfs/xfs_ialloc.c: In function 'xfs_difree_finobt': > fs/xfs/libxfs/xfs_ialloc.c:2032:20: warning: unused variable 'agi' > [-Wunused-variable] > 2032 | struct xfs_agi *agi = agbp->b_addr; > > Which is fallout from agno -> perag conversions that were done in > this function. xfs_check_agi_freecount() is the only user of "agi" > in xfs_difree_finobt() now, and it only uses the agi to get the > current free inode count. We hold that in the perag structure, so > there's not need to directly reference the raw AGI to get this > information. > > The btree cursor being passed to xfs_check_agi_freecount() has a > reference to the perag being operated on, so use that directly in > xfs_check_agi_freecount() rather than passing an AGI. > > Fixes: 7b13c5155182 ("xfs: use perag for ialloc btree cursors") > Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> > Signed-off-by: Dave Chinner <dchinner@redhat.com> > Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> > Reviewed-by: Darrick J. Wong <djwong@kernel.org> > Signed-off-by: Darrick J. Wong <djwong@kernel.org> > > fs/xfs/libxfs/xfs_ialloc.c | 28 +++++++++++++--------------- > 1 file changed, 13 insertions(+), 15 deletions(-) I'm sorry, I made a mistake somewhere while bisecting... This is not the commit. Disregard the quoted message, please. I will try bisecting again.
Shot in the dark: can you try the latest plain 5.15.x + 9a5280b312e2 aka: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/fs/xfs?id=9a5280b312e2e7898b6397b2ca3cfd03f67d7be1 I suspect your stack traces are not really representative of what's going on..
(In reply to Peter Pavlisko from comment #6) > (In reply to Chris Murphy from comment #2) > > number of CPUs > > 1 CPU, 2 cores (Intel Celeron G1610T @ 2.30GHz) OK, not much CPU power :) > > contents of /proc/meminfo > > (at the time of iowait hangup) > > MemTotal: 3995528 kB > MemFree: 29096 kB > MemAvailable: 3749216 kB > Buffers: 19984 kB > Cached: 3556248 kB > SwapCached: 0 kB > Active: 62888 kB > Inactive: 3560968 kB > Active(anon): 272 kB > Inactive(anon): 47772 kB > Active(file): 62616 kB > Inactive(file): 3513196 kB > Unevictable: 0 kB > Mlocked: 0 kB > SwapTotal: 2097084 kB > SwapFree: 2097084 kB > Dirty: 28 kB > Writeback: 0 kB Interestingly basically no dirty memory or memory under writeback. Basically everything is in Inactive(file) list which should be very easy to reclaim. But given you say we make no progress in reclaiming memory and from the traces we see process is hung waiting for memory allocation, it may be that the page cache pages are pinned in memory by extra references or something like that. The question is what could be XFS doing that clean page cache cannot be reclaimed? I have no good answer to that... ... > 8 0 1953514584 sda > 8 1 131072 sda1 > 8 2 2097152 sda2 > 8 3 1951285336 sda3 > 8 16 1953514584 sdb > 8 17 131072 sdb1 > 8 18 2097152 sdb2 > 8 19 1951285336 sdb3 > 8 32 976762584 sdc > 8 33 976761560 sdc1 > 11 0 1048575 sr0 > 9 3 1951285248 md3 > 9 2 2097088 md2 > 9 1 131008 md1 Actually not that much overall memory compared to the disk sizes or the size of unpacked archive. You've mentioned another kernel config does not exhibit the problem. Can you perhaps post it here for comparison?
I bisected the issue again, double checking every step, and I arrived at another commit. This looks more promising: c9fa563072e13337713a441cf30171feb4e96e6d is the first bad commit commit c9fa563072e13337713a441cf30171feb4e96e6d Author: Dave Chinner <dchinner@redhat.com> Date: Tue Jun 1 13:40:36 2021 +1000 xfs: use alloc_pages_bulk_array() for buffers Because it's more efficient than allocating pages one at a time in a loop. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> fs/xfs/xfs_buf.c | 62 ++++++++++++++++++++++---------------------------------- 1 file changed, 24 insertions(+), 38 deletions(-)
Interesting. Looks like a bug in the bulk allocator that it is not able to make progress when reclaim is needed? From a quick look I don't see kswapd being kicked or direct reclaim happening from the bulk allocator so that could be the reason. Adding Mel to CC, maybe he has some ideas.
(In reply to Dave Chinner from comment #5) > Doesn't tell us a whole lot except for "no free memory to allocate > without reclaim". /proc/meminfo, /proc/vmstat and /proc/slabinfo > would tell us a lot more. Notice: all values in this comment are from the kernel version 5.13.0-rc2+, commit c9fa563072e1, recorded at the time of the endless iowait > /proc/meminfo MemTotal: 3996092 kB MemFree: 42176 kB MemAvailable: 3668964 kB Buffers: 15684 kB Cached: 3400932 kB SwapCached: 0 kB Active: 58840 kB Inactive: 3405468 kB Active(anon): 268 kB Inactive(anon): 47836 kB Active(file): 58572 kB Inactive(file): 3357632 kB Unevictable: 0 kB Mlocked: 0 kB SwapTotal: 2097084 kB SwapFree: 2097084 kB Dirty: 0 kB Writeback: 0 kB AnonPages: 47700 kB Mapped: 19504 kB Shmem: 412 kB KReclaimable: 266460 kB Slab: 404388 kB SReclaimable: 266460 kB SUnreclaim: 137928 kB KernelStack: 1904 kB PageTables: 1672 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 4095128 kB Committed_AS: 117700 kB VmallocTotal: 34359738367 kB VmallocUsed: 2732 kB VmallocChunk: 0 kB Percpu: 896 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB Hugetlb: 0 kB DirectMap4k: 20364 kB DirectMap2M: 4139008 kB > /proc/vmstat nr_free_pages 10607 nr_zone_inactive_anon 11978 nr_zone_active_anon 67 nr_zone_inactive_file 839409 nr_zone_active_file 14643 nr_zone_unevictable 0 nr_zone_write_pending 0 nr_mlock 0 nr_bounce 0 nr_free_cma 0 nr_inactive_anon 11978 nr_active_anon 67 nr_inactive_file 839409 nr_active_file 14643 nr_unevictable 0 nr_slab_reclaimable 66615 nr_slab_unreclaimable 34482 nr_isolated_anon 0 nr_isolated_file 0 workingset_nodes 130210 workingset_refault_anon 0 workingset_refault_file 1345 workingset_activate_anon 0 workingset_activate_file 24 workingset_restore_anon 0 workingset_restore_file 0 workingset_nodereclaim 208053 nr_anon_pages 11931 nr_mapped 4876 nr_file_pages 854155 nr_dirty 0 nr_writeback 0 nr_writeback_temp 0 nr_shmem 103 nr_shmem_hugepages 0 nr_shmem_pmdmapped 0 nr_file_hugepages 0 nr_file_pmdmapped 0 nr_anon_transparent_hugepages 0 nr_vmscan_write 0 nr_vmscan_immediate_reclaim 8243 nr_dirtied 9075788 nr_written 9075787 nr_kernel_misc_reclaimable 0 nr_foll_pin_acquired 0 nr_foll_pin_released 0 nr_kernel_stack 1904 nr_page_table_pages 418 nr_swapcached 0 nr_dirty_threshold 171292 nr_dirty_background_threshold 85541 pgpgin 36279336 pgpgout 36908703 pswpin 0 pswpout 0 pgalloc_dma 1473 pgalloc_dma32 18299092 pgalloc_normal 325690 pgalloc_movable 0 allocstall_dma 0 allocstall_dma32 0 allocstall_normal 0 allocstall_movable 5 pgskip_dma 0 pgskip_dma32 0 pgskip_normal 35 pgskip_movable 0 pgfree 18637453 pgactivate 20445 pgdeactivate 266 pglazyfree 0 pgfault 397160 pgmajfault 585 pglazyfreed 0 pgrefill 34 pgreuse 87282 pgsteal_kswapd 17283321 pgsteal_direct 1048 pgscan_kswapd 17292764 pgscan_direct 1048 pgscan_direct_throttle 0 pgscan_anon 13 pgscan_file 17293799 pgsteal_anon 0 pgsteal_file 17284369 pginodesteal 0 slabs_scanned 212096 kswapd_inodesteal 0 kswapd_low_wmark_hit_quickly 11222 kswapd_high_wmark_hit_quickly 48 pageoutrun 11290 pgrotated 8478 drop_pagecache 0 drop_slab 0 oom_kill 0 htlb_buddy_alloc_success 0 htlb_buddy_alloc_fail 0 unevictable_pgs_culled 0 unevictable_pgs_scanned 0 unevictable_pgs_rescued 0 unevictable_pgs_mlocked 0 unevictable_pgs_munlocked 0 unevictable_pgs_cleared 0 unevictable_pgs_stranded 0 swap_ra 0 swap_ra_hit 0 direct_map_level2_splits 1 direct_map_level3_splits 0 nr_unstable 0 > /proc/slabinfo too big, I will upload it as an attachment > Also, knowing if you've tweaked things like dirty ratios, etc would > also be helpful... I didn't tweak any sysctl value, but I can't say if the defaults in Gentoo are the same as in other linux flavors
Created attachment 301026 [details] contents of the /proc/slabinfo
(In reply to Holger Hoffstätte from comment #9) > Shot in the dark: can you try the latest plain 5.15.x + 9a5280b312e2 > aka: > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/fs/ > xfs?id=9a5280b312e2e7898b6397b2ca3cfd03f67d7be1 > > I suspect your stack traces are not really representative of what's > going on.. I just tried commit 9a5280b312e2 (5.18.0-rc2+). The bug is still there.
(In reply to Jan Kara from comment #10) > You've mentioned another kernel config does not exhibit the problem. Can you > perhaps post it here for comparison? certainly... but I'm afraid that the differences between the two .config files are too big to be of any use I'll attach it shortly
Created attachment 301027 [details] .config file of a kernel unaffected by this bug
Created attachment 301044 [details] Patch to always allocate at least one page Hi Peter, Could you try the attached patch against 5.18 please? I was unable to reproduce the problem but I think what's happening is that an array for receiving a bulk allocation is partially populated and the bulk allocator is returning without allocating at least one page. Allocating even one page should hit the path where kswapd is woken.
(In reply to Mel Gorman from comment #18) > Created attachment 301044 [details] > Patch to always allocate at least one page > > Hi Peter, > > Could you try the attached patch against 5.18 please? I was unable to > reproduce the problem but I think what's happening is that an array for > receiving a bulk allocation is partially populated and the bulk allocator is > returning without allocating at least one page. Allocating even one page > should hit the path where kswapd is woken. Hi Mel, I tried this patch and it does indeed work with 5.18.0-rc7. Without the patch it freezes, after I apply the patch the archive extracts flawlessly.
(In reply to Peter Pavlisko from comment #19) > (In reply to Mel Gorman from comment #18) > > Created attachment 301044 [details] > > Patch to always allocate at least one page > > > > Hi Peter, > > > > Could you try the attached patch against 5.18 please? I was unable to > > reproduce the problem but I think what's happening is that an array for > > receiving a bulk allocation is partially populated and the bulk allocator > is > > returning without allocating at least one page. Allocating even one page > > should hit the path where kswapd is woken. > > Hi Mel, > > I tried this patch and it does indeed work with 5.18.0-rc7. Without the > patch it freezes, after I apply the patch the archive extracts flawlessly. Thanks Peter, I'll prepare a proper patch and post it today. You won't be cc'd as I only have the bugzilla email alias for you but I'll post a lore.kernel.org link here.
(In reply to Mel Gorman from comment #20) > (In reply to Peter Pavlisko from comment #19) > > (In reply to Mel Gorman from comment #18) > > > Created attachment 301044 [details] > > > Patch to always allocate at least one page > > > > > > Hi Peter, > > > > > > Could you try the attached patch against 5.18 please? I was unable to > > > reproduce the problem but I think what's happening is that an array for > > > receiving a bulk allocation is partially populated and the bulk allocator > > is > > > returning without allocating at least one page. Allocating even one page > > > should hit the path where kswapd is woken. > > > > Hi Mel, > > > > I tried this patch and it does indeed work with 5.18.0-rc7. Without the > > patch it freezes, after I apply the patch the archive extracts flawlessly. > > Thanks Peter, I'll prepare a proper patch and post it today. You won't be > cc'd as I only have the bugzilla email alias for you but I'll post a > lore.kernel.org link here. Thank you very much. I don't know if this is the proper place to discuss this, but I am curious about the cause. Was it an issue with the way XFS is calling the allocator in a for(;;) loop when it does not get the expected result? Or was it an issue with the allocator itself not working in some obscure edge case? Or was it my .config, stretching the kernel ability to function in a bad way?
(In reply to Mel Gorman from comment #20) > (In reply to Peter Pavlisko from comment #19) > > (In reply to Mel Gorman from comment #18) > > > Created attachment 301044 [details] > > > Patch to always allocate at least one page > > > > > > Hi Peter, > > > > > > Could you try the attached patch against 5.18 please? I was unable to > > > reproduce the problem but I think what's happening is that an array for > > > receiving a bulk allocation is partially populated and the bulk allocator > > is > > > returning without allocating at least one page. Allocating even one page > > > should hit the path where kswapd is woken. > > > > Hi Mel, > > > > I tried this patch and it does indeed work with 5.18.0-rc7. Without the > > patch it freezes, after I apply the patch the archive extracts flawlessly. > > Thanks Peter, I'll prepare a proper patch and post it today. You won't be > cc'd as I only have the bugzilla email alias for you but I'll post a > lore.kernel.org link here. https://lore.kernel.org/all/20220526091210.GC3441@techsingularity.net/
(In reply to Peter Pavlisko from comment #21) > (In reply to Mel Gorman from comment #20) > > (In reply to Peter Pavlisko from comment #19) > > > (In reply to Mel Gorman from comment #18) > > > > Created attachment 301044 [details] > > > > Patch to always allocate at least one page > > > > > > > > Hi Peter, > > > > > > > > Could you try the attached patch against 5.18 please? I was unable to > > > > reproduce the problem but I think what's happening is that an array for > > > > receiving a bulk allocation is partially populated and the bulk > allocator > > > is > > > > returning without allocating at least one page. Allocating even one > page > > > > should hit the path where kswapd is woken. > > > > > > Hi Mel, > > > > > > I tried this patch and it does indeed work with 5.18.0-rc7. Without the > > > patch it freezes, after I apply the patch the archive extracts > flawlessly. > > > > Thanks Peter, I'll prepare a proper patch and post it today. You won't be > > cc'd as I only have the bugzilla email alias for you but I'll post a > > lore.kernel.org link here. > > Thank you very much. > > I don't know if this is the proper place to discuss this, but I am curious > about the cause. Was it an issue with the way XFS is calling the allocator > in a for(;;) loop when it does not get the expected result? Or was it an > issue with the allocator itself not working in some obscure edge case? Or > was it my .config, stretching the kernel ability to function in a bad way? I think blaming XFS would be excessive even though the API says it only attempts to allocate the requested number of pages with no guarantee the exact number will be returned. A glance at the implementation would show it was trying to return at least one page and the code flow of XFS hints that the XFS developers expected that some progress would generally be made or kswapd would be woken as appropriate. The original intention was that the caller did not necessarily need all the pages but that's not true for XFS or NFS. While I could have altered XFS, it would have encouraged boiler-plate code to be created for NFS or any other user of the API that requires the exact number of pages to be returned. The allocator was working as intended but not necessarily working as desired for callers. I don't think your .config is at fault. While I could not reproduce the problem, I could force the problem to occur by injecting failures. It's possible that the problem is easier to trigger on your particular machine but the corner case existed since 5.13. It's an interesting coincidence that a similar problem was found on NFS at roughly the same time which might indicate that something else changed to make this corner case easier to trigger but the corner case was always there.
Hi I can reproduce the issue with ext4 filesystem. I am checking at the moment with different kernels from the Gentoo distro: 5.15.41 5.17.1 5.17.5 5.18.1-r2 (patch should be already applied here)
I have applied the patch on vanilla 5.18.1 kernel and the issue is not resolved on ext4. Here's the output of dmesg: [Sun Jun 5 09:44:51 2022] sysrq: Show Blocked State [Sun Jun 5 09:44:51 2022] task:kworker/u24:11 state:D stack: 0 pid: 145 ppid: 2 flags:0x00004000 [Sun Jun 5 09:44:51 2022] Workqueue: writeback wb_workfn (flush-8:32) [Sun Jun 5 09:44:51 2022] Call Trace: [Sun Jun 5 09:44:51 2022] <TASK> [Sun Jun 5 09:44:51 2022] __schedule+0x26d/0xe80 [Sun Jun 5 09:44:51 2022] ? __blk_flush_plug+0xe1/0x140 [Sun Jun 5 09:44:51 2022] schedule+0x46/0xa0 [Sun Jun 5 09:44:51 2022] io_schedule+0x3d/0x60 [Sun Jun 5 09:44:51 2022] blk_mq_get_tag+0x115/0x2b0 [Sun Jun 5 09:44:51 2022] ? destroy_sched_domains_rcu+0x20/0x20 [Sun Jun 5 09:44:51 2022] __blk_mq_alloc_requests+0x157/0x290 [Sun Jun 5 09:44:51 2022] blk_mq_submit_bio+0x188/0x520 [Sun Jun 5 09:44:51 2022] submit_bio_noacct_nocheck+0x252/0x2a0 [Sun Jun 5 09:44:51 2022] ext4_io_submit+0x1b/0x40 [Sun Jun 5 09:44:51 2022] ext4_writepages+0x4fc/0xe80 [Sun Jun 5 09:44:51 2022] ? ata_qc_issue+0x101/0x1b0 [Sun Jun 5 09:44:51 2022] ? __ata_scsi_queuecmd+0x17a/0x380 [Sun Jun 5 09:44:51 2022] ? update_load_avg+0x77/0x610 [Sun Jun 5 09:44:51 2022] do_writepages+0xbc/0x1a0 [Sun Jun 5 09:44:51 2022] ? __schedule+0x275/0xe80 [Sun Jun 5 09:44:51 2022] __writeback_single_inode+0x2c/0x180 [Sun Jun 5 09:44:51 2022] writeback_sb_inodes+0x204/0x490 [Sun Jun 5 09:44:51 2022] __writeback_inodes_wb+0x47/0xd0 [Sun Jun 5 09:44:51 2022] wb_writeback.isra.0+0x15d/0x1a0 [Sun Jun 5 09:44:51 2022] wb_workfn+0x23d/0x3a0 [Sun Jun 5 09:44:51 2022] process_one_work+0x1b0/0x300 [Sun Jun 5 09:44:51 2022] worker_thread+0x48/0x3c0 [Sun Jun 5 09:44:51 2022] ? rescuer_thread+0x380/0x380 [Sun Jun 5 09:44:51 2022] kthread+0xd4/0x100 [Sun Jun 5 09:44:51 2022] ? kthread_complete_and_exit+0x20/0x20 [Sun Jun 5 09:44:51 2022] ret_from_fork+0x22/0x30 [Sun Jun 5 09:44:51 2022] </TASK> [Sun Jun 5 09:44:51 2022] task:tar state:D stack: 0 pid:11576 ppid: 11566 flags:0x00000000 [Sun Jun 5 09:44:51 2022] Call Trace: [Sun Jun 5 09:44:51 2022] <TASK> [Sun Jun 5 09:44:51 2022] __schedule+0x26d/0xe80 [Sun Jun 5 09:44:51 2022] ? __mod_timer+0x1c2/0x3b0 [Sun Jun 5 09:44:51 2022] schedule+0x46/0xa0 [Sun Jun 5 09:44:51 2022] schedule_timeout+0x7c/0xf0 [Sun Jun 5 09:44:51 2022] ? fprop_fraction_percpu+0x2b/0x70 [Sun Jun 5 09:44:51 2022] ? init_timer_key+0x30/0x30 [Sun Jun 5 09:44:51 2022] io_schedule_timeout+0x47/0x70 [Sun Jun 5 09:44:51 2022] balance_dirty_pages_ratelimited+0x235/0xa10 [Sun Jun 5 09:44:51 2022] generic_perform_write+0x142/0x200 [Sun Jun 5 09:44:51 2022] ext4_buffered_write_iter+0x76/0x100 [Sun Jun 5 09:44:51 2022] new_sync_write+0x104/0x180 [Sun Jun 5 09:44:51 2022] vfs_write+0x1e6/0x270 [Sun Jun 5 09:44:51 2022] ksys_write+0x5a/0xd0 [Sun Jun 5 09:44:51 2022] do_syscall_64+0x3b/0x90 [Sun Jun 5 09:44:51 2022] entry_SYSCALL_64_after_hwframe+0x44/0xae [Sun Jun 5 09:44:51 2022] RIP: 0033:0x7f994fe7c483 [Sun Jun 5 09:44:51 2022] RSP: 002b:00007ffe79956648 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [Sun Jun 5 09:44:51 2022] RAX: ffffffffffffffda RBX: 0000000000001084 RCX: 00007f994fe7c483 [Sun Jun 5 09:44:51 2022] RDX: 0000000000001084 RSI: 000055f5bb5a9a00 RDI: 0000000000000003 [Sun Jun 5 09:44:51 2022] RBP: 000055f5bb5a9a00 R08: 00007f994fe7c260 R09: 0000000000000000 [Sun Jun 5 09:44:51 2022] R10: 00000000000001a4 R11: 0000000000000246 R12: 0000000000000003 [Sun Jun 5 09:44:51 2022] R13: 0000000000000003 R14: 000055f5bb5b5320 R15: 000055f5bb5a9a00 [Sun Jun 5 09:44:51 2022] </TASK>
Jordi, this is a very different issue (different filesystem, different stacktrace). Please don't hijack bugs like this. For all I can see your stacktraces look very standard and we are just waiting for the disk to complete the IO. If the IO does not progress, please file a separate bug against ext4. Thanks! And I'm closing this bug as it is already fixed.