The problem started when I symlinked a file in my /home/user to a file in my /root (both partitions are f2fs). By removing that symlink the problem seems no longer occurs. From dmesg I don't see any error related to f2fs. UUID=... / f2fs defaults,noatime,nodiscard 0 1 UUID=... /home f2fs defaults,noatime,nodiscard 0 2
I was too optimistic. The problem reoccurred. 217 root 20 0 0.0m 99.3 0.0 12:28.48 R f2fs_gc-259:3 the only related error I see the logs is: 30/05/22 13:13 kernel nvme nvme0: missing or invalid SUBNQN field. I forced fsck on all partitions, but with no result.
my f2fs mounts: /dev/nvme0n1p3 on / type f2fs (rw,noatime,lazytime,background_gc=on,nodiscard,no_heap,user_xattr,inline_xattr,acl,inline_data,inline_dentry,flush_merge,extent_cache,mode=adaptive,active_logs=6,alloc_mode=default,checkpoint_merge,fsync_mode=posix,discard_unit=block) /dev/nvme0n1p4 on /home type f2fs (rw,noatime,lazytime,background_gc=on,nodiscard,no_heap,user_xattr,inline_xattr,acl,inline_data,inline_dentry,flush_merge,extent_cache,mode=adaptive,active_logs=6,alloc_mode=default,checkpoint_merge,fsync_mode=posix,discard_unit=block) /dev/nvme1n1 on /run/media/guido/nvme1 type f2fs (rw,noatime,lazytime,background_gc=on,nodiscard,no_heap,user_xattr,inline_xattr,acl,inline_data,inline_dentry,flush_merge,extent_cache,mode=adaptive,active_logs=6,alloc_mode=default,checkpoint_merge,fsync_mode=posix,compress_algorithm=lz4,compress_log_size=2,compress_mode=fs,compress_cache,discard_unit=block)
Same for me. Every morning my system is nonoperational due to f2fs_gc takes 100% CPU... Mine is on LVM (2xNVMe): /dev/mapper/vg_ssd-lv_root on / type f2fs (rw,noatime,lazytime,background_gc=on,nodiscard,no_heap,user_xattr,inline_xattr,acl,inline_data,inline_dentry,flush_merge,extent_cache,mode=adaptive,active_logs=6,alloc_mode=default,nocheckpoint_merge,fsync_mode=posix,discard_unit=block) /dev/mapper/vg_ssd-lv_home on /home type f2fs (rw,noatime,lazytime,background_gc=on,nodiscard,no_heap,user_xattr,inline_xattr,acl,inline_data,inline_dentry,flush_merge,extent_cache,mode=adaptive,active_logs=6,alloc_mode=default,nocheckpoint_merge,fsync_mode=posix,discard_unit=block) /dev/mapper/vg_ssd2-lv_backup on /mnt/backup type f2fs (rw,noatime,lazytime,background_gc=on,nodiscard,no_heap,user_xattr,inline_xattr,acl,inline_data,inline_dentry,flush_merge,extent_cache,mode=adaptive,active_logs=6,alloc_mode=default,checkpoint_merge,fsync_mode=posix,discard_unit=block) ------- I cannot see any error in the journal or dmesg, except things timeout: May 31 21:25:10 mefisto kernel: R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000015 May 31 21:25:10 mefisto kernel: R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000015 May 31 21:25:10 mefisto kernel: RBP: 000056041c75b640 R08: 0000000000000000 R09: 00007ffdc8d378c8 May 31 21:25:10 mefisto kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000015 May 31 21:25:10 mefisto kernel: RAX: ffffffffffffffda RBX: 000056041c6ff430 RCX: 00007effb2a12a5a May 31 21:25:10 mefisto kernel: RSP: 002b:00007ffdc8d37820 EFLAGS: 00000293 ORIG_RAX: 000000000000004a May 31 21:25:10 mefisto kernel: RIP: 0033:0x7effb2a12a5a May 31 21:25:10 mefisto kernel: entry_SYSCALL_64_after_hwframe+0x44/0xae May 31 21:25:10 mefisto kernel: ? do_syscall_64+0x6b/0x90 May 31 21:25:10 mefisto kernel: ? syscall_exit_to_user_mode+0x26/0x50 May 31 21:25:10 mefisto kernel: ? exit_to_user_mode_prepare+0x111/0x140 May 31 21:25:10 mefisto kernel: ? do_syscall_64+0x6b/0x90 May 31 21:25:10 mefisto kernel: ? syscall_exit_to_user_mode+0x26/0x50 May 31 21:25:10 mefisto kernel: ? do_syscall_64+0x6b/0x90 May 31 21:25:10 mefisto kernel: ? syscall_exit_to_user_mode+0x26/0x50 May 31 21:25:10 mefisto kernel: do_syscall_64+0x5c/0x90 May 31 21:25:10 mefisto kernel: __x64_sys_fsync+0x3b/0x60 May 31 21:25:10 mefisto kernel: f2fs_do_sync_file+0x18b/0x920 [f2fs b9d199b68c91f6957b9756d57ec16e8bad3561b9] May 31 21:25:10 mefisto kernel: f2fs_issue_checkpoint+0x143/0x200 [f2fs b9d199b68c91f6957b9756d57ec16e8bad3561b9] May 31 21:25:10 mefisto kernel: ? terminate_walk+0x61/0xf0 May 31 21:25:10 mefisto kernel: rwsem_down_write_slowpath+0x1eb/0x5c0 May 31 21:25:10 mefisto kernel: schedule+0x4f/0xb0 May 31 21:25:10 mefisto kernel: __schedule+0x37c/0x11f0 May 31 21:25:10 mefisto kernel: <TASK> May 31 21:25:10 mefisto kernel: Call Trace: May 31 21:25:10 mefisto kernel: task:NetworkManager state:D stack: 0 pid: 3380 ppid: 1 flags:0x00004002 May 31 21:25:10 mefisto kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. May 31 21:25:10 mefisto kernel: Tainted: G U OE 5.18.0-arch1-1 #1 May 31 21:25:10 mefisto kernel: INFO: task NetworkManager:3380 blocked for more than 368 seconds. May 31 21:23:07 mefisto kernel: </TASK>
May I ask to share some debugging information of f2fs? # cat /sys/kernel/debug/f2fs/status
Created attachment 301085 [details] cat /sys/kernel/debug/f2fs/status
Sure! Attached.
dm-2: GC calls: 168 (BG: 169) dm-3: GC calls: 10 (BG: 10) dm-7: GC calls: 37 (BG: 64) Mostly it's healthy. Do you see f2fs_gc thread is running? May I ask some function traces when that's happening? # echo 1 > /sys/kernel/tracing/events/f2fs/enable # echo 1 > /sys/kernel/tracing/tracing_on # cat /sys/kernel/tracing/trace_pipe
Yes, it is running... I will collect some tracing for analysis...
Gosh, it is growing like crazy... After a second or tro file is already too bit to be attached... Here you have is shared: https://seafile.cryptohell.net/f/236fb102b26648bb97e5/?dl=1
Created attachment 301086 [details] f2fs status I attach my f2fs_status
(In reply to Guido from comment #10) > Created attachment 301086 [details] > f2fs status > > I attach my f2fs_status - ATGC data: 4304 4304 4304 It looks you have enabled atgc via mount option? Could you please share mount option?
yes, now I have atgc, but I added the atgc after the problem. the original mount options were: https://bugzilla.kernel.org/show_bug.cgi?id=216050#c2
p.s. do you see any problem? I did fsck to be safe.
Re #9, Thank you so much. f2fs_gc-254:2-1156 [013] ..... 22290.243361: f2fs_lookup_extent_tree_end: dev = (254,2), ino = 1409621, pgofs = 234, ext_info(fofs: 0, blk: 0, len: 0) f2fs_gc-254:2-1156 [013] ..... 22290.243362: f2fs_lookup_extent_tree_start: dev = (254,2), ino = 1409621, pgofs = 234 f2fs_gc-254:2-1156 [013] ..... 22290.243362: f2fs_lookup_extent_tree_end: dev = (254,2), ino = 1409621, pgofs = 234, ext_info(fofs: 0, blk: 0, len: 0) f2fs_gc-254:2-1156 [013] ..... 22290.243362: f2fs_lookup_extent_tree_start: dev = (254,2), ino = 1409621, pgofs = 234 f2fs_gc-254:2-1156 [013] ..... 22290.243362: f2fs_lookup_extent_tree_end: dev = (254,2), ino = 1409621, pgofs = 234, ext_info(fofs: 0, blk: 0, len: 0) f2fs_gc-254:2-1156 [013] ..... 22290.243362: f2fs_lookup_extent_tree_start: dev = (254,2), ino = 1409621, pgofs = 234 f2fs_gc-254:2-1156 [013] ..... 22290.243362: f2fs_lookup_extent_tree_end: dev = (254,2), ino = 1409621, pgofs = 234, ext_info(fofs: 0, blk: 0, len: 0) f2fs_gc-254:2-1156 [013] ..... 22290.243362: f2fs_lookup_extent_tree_start: dev = (254,2), ino = 1409621, pgofs = 234 It seems this disk is dm-2, since the extent hit count is something below. - Hit Ratio: 0% (38193 / 78044163449) I'm suspecting this flow, but not sure where it goes into a loop. f2fs_gc -> gc_data_segment -> f2fs_get_read_data_page -> f2fs_lookup_extent_cache
Good it is helpful. Anyway, is there anything one can do to stop blocking the system with that issue? f2fs_gc-254 is blocking the system...
(In reply to Jaegeuk Kim from comment #14) > Re #9, > > Thank you so much. > > f2fs_gc-254:2-1156 [013] ..... 22290.243361: > f2fs_lookup_extent_tree_end: dev = (254,2), ino = 1409621, pgofs = 234, > ext_info(fofs: 0, blk: 0, len: 0) > f2fs_gc-254:2-1156 [013] ..... 22290.243362: > f2fs_lookup_extent_tree_start: dev = (254,2), ino = 1409621, pgofs = 234 > f2fs_gc-254:2-1156 [013] ..... 22290.243362: > f2fs_lookup_extent_tree_end: dev = (254,2), ino = 1409621, pgofs = 234, > ext_info(fofs: 0, blk: 0, len: 0) > f2fs_gc-254:2-1156 [013] ..... 22290.243362: > f2fs_lookup_extent_tree_start: dev = (254,2), ino = 1409621, pgofs = 234 > f2fs_gc-254:2-1156 [013] ..... 22290.243362: > f2fs_lookup_extent_tree_end: dev = (254,2), ino = 1409621, pgofs = 234, > ext_info(fofs: 0, blk: 0, len: 0) > f2fs_gc-254:2-1156 [013] ..... 22290.243362: > f2fs_lookup_extent_tree_start: dev = (254,2), ino = 1409621, pgofs = 234 > f2fs_gc-254:2-1156 [013] ..... 22290.243362: > f2fs_lookup_extent_tree_end: dev = (254,2), ino = 1409621, pgofs = 234, > ext_info(fofs: 0, blk: 0, len: 0) > f2fs_gc-254:2-1156 [013] ..... 22290.243362: > f2fs_lookup_extent_tree_start: dev = (254,2), ino = 1409621, pgofs = 234 > > > It seems this disk is dm-2, since the extent hit count is something below. > - Hit Ratio: 0% (38193 / 78044163449) > > I'm suspecting this flow, but not sure where it goes into a loop. > > f2fs_gc > -> gc_data_segment > -> f2fs_get_read_data_page > -> f2fs_lookup_extent_cache I suspect this path, it looks impossible though... - move_data_page - f2fs_get_lock_data_page - f2fs_get_read_data_page
I think a quick WA would be remounting dm-2 with background_gc=off. By the way, may I ask two things: 1) can we get the dm-2 image? 2) can you specify the kernel hash tag of the latest f2fs patch in the kernel? Or, can you build the kernel with some debugging messages?
Thank you for the WA suggested. Will do it and see how it goes. It looks like dm-2 in my config is lv_root. I am happy to assist to provide more info as needed, but will need to be help with that and some hand-holding may be required... 1) As this is my root, can you advise on how to create an image in secure (and private!) way? Also, I'd rather not make it publicly accessible image ;-) 2) Will need to be told step by step the procedure.
what are the disadvantages of using background_gc=off considering that I have the weekly trim active anyway?
with atgc and gc_merge it seems that the problem does not recur but reading the documentation I have the feeling that I will see it again in 7 days...
just after my last comment on june 3, I set /sys/fs/f2fs/nvme0n1p3/atgc_age_threshold to 86400 (1 day) Despite this, the problem did not occur again. I have never rebooted, only put it to sleep overnight. Does this make sense or am I just postponing the problem?
the bug reoccurred today.
I setted background_gc=off to all the f2fs partitions. Nonetheless now I have f2fs_gc at 100% /dev/nvme0n1p3 on / type f2fs (rw,noatime,lazytime,background_gc=off,gc_merge,nodiscard,no_heap,user_xattr,inline_xattr,acl,inline_data,inline_dentry,flush_merge,extent_cache,mode=adaptive,active_logs=6,alloc_mode=default,checkpoint_merge,fsync_mode=posix,atgc,discard_unit=block) /dev/nvme0n1p4 on /home type f2fs (rw,noatime,lazytime,background_gc=off,gc_merge,nodiscard,no_heap,user_xattr,inline_xattr,acl,inline_data,inline_dentry,flush_merge,extent_cache,mode=adaptive,active_logs=6,alloc_mode=default,checkpoint_merge,fsync_mode=posix,atgc,discard_unit=block) /dev/nvme1n1 on /run/media/guido/nvme1 type f2fs (rw,noatime,lazytime,background_gc=off,gc_merge,nodiscard,no_heap,user_xattr,inline_xattr,acl,inline_data,inline_dentry,flush_merge,extent_cache,mode=adaptive,active_logs=6,alloc_mode=default,checkpoint_merge,fsync_mode=posix,compress_algorithm=zstd:6,compress_log_size=2,compress_chksum,compress_mode=fs,compress_cache,atgc,discard_unit=block)
Linux 5.19 rc4 doesn't solve the problem
Created attachment 301314 [details] f2f blocked problems in dmesg I have this problem 3rd time. Once in a week. There are lines in dmesg, attaching. $ uname -a Linux ... 5.18.6-arch1-1 #1 SMP PREEMPT_DYNAMIC Wed, 22 Jun 2022 18:10:56 +0000 x86_64 GNU/Linux
Created attachment 301315 [details] f2f blocked problems f2fs/status Attaching f2fs/status
Re 23, could you please describe one more time? Was there an f2fs_gc thread running 100%? Re #25, the dmesg is waiting for f2fs_issue_checkpoint, which is not GC. Do you think it's hung by other f2fs_gc?
(In reply to Jaegeuk Kim from comment #27) > Re 23, could you please describe one more time? Was there an f2fs_gc thread > running 100%? yes, one thread occupying 100%
From 8a351359b0834f048dc35b5f53276c14a35e4ff9 Mon Sep 17 00:00:00 2001 From: Chao Yu <chao@kernel.org> Date: Fri, 1 Jul 2022 08:58:50 +0800 Subject: [PATCH] f2fs: add debug log for gc_thread deadloop --- fs/f2fs/data.c | 3 +++ fs/f2fs/gc.c | 4 ++++ 2 files changed, 7 insertions(+) diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c index c448c3ee7ac3..6cb4ebbbae8b 100644 --- a/fs/f2fs/data.c +++ b/fs/f2fs/data.c @@ -1303,7 +1303,10 @@ struct page *f2fs_get_lock_data_page(struct inode *inode, pgoff_t index, { struct address_space *mapping = inode->i_mapping; struct page *page; + int i = 0; repeat: + if (++i % 1000 == 0) + printf("f2fs_get_lock_data_page i:%d\n", i); page = f2fs_get_read_data_page(inode, index, 0, for_write); if (IS_ERR(page)) return page; diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c index f4aa3c88118b..d252de051810 100644 --- a/fs/f2fs/gc.c +++ b/fs/f2fs/gc.c @@ -1470,6 +1470,7 @@ static int gc_data_segment(struct f2fs_sb_info *sbi, struct f2fs_summary *sum, int phase = 0; int submitted = 0; unsigned int usable_blks_in_seg = f2fs_usable_blks_in_seg(sbi, segno); + int i = 0; start_addr = START_BLOCK(sbi, segno); @@ -1555,6 +1556,9 @@ static int gc_data_segment(struct f2fs_sb_info *sbi, struct f2fs_summary *sum, continue; } + if (++i % 1000 == 0) + printf("gc_data_segment i:%d\n", i); + data_page = f2fs_get_read_data_page(inode, start_bidx, REQ_RAHEAD, true); f2fs_up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); -- 2.25.1
(In reply to Chao Yu from comment #29) Sorry, please ignore this patch, let me update it as an attachment. Could you please apply debug patch? If once the bug reproduces, let's check the printed log for more hints.
Created attachment 301318 [details] debug patch
(In reply to Jaegeuk Kim from comment #27) > Re #25, the dmesg is waiting for f2fs_issue_checkpoint, which is not GC. Do > you think it's hung by other f2fs_gc? Frankly, I have no idea. What I can tell for sure is that normally there are no f2fs messages in dmesg. But all 3 times that I had this problem - 100% CPU by f2fs_gc, there were similar f2fs messages and stacks in dmesg.
This only seems to happen with PCIe NVMe, not SATA SSD
I have the issue on an eMMC drive.
As a normal mouse mover, I fear I cannot add much of value here, but this error occurred on my system too after a system update. I ran this archlinux-system from two simple usb sticks, both formatted with f2fs. One thread of four of the cpu suddenly out of the blu is occupied with 100% by "f2fs_gc" after a handful of minutes of uptime. So far, I downgraded to kernel 5.17.19 and got hopefully rid of the problem.
Can anyone apply below debug patch? and once it reproduces, maybe it can provide clue.... https://bugzilla.kernel.org/attachment.cgi?id=301318
I (hope I) applied the patch to the archlinux kernel 5.18.14. Is this supposed to be a fix or is it just "bad luck" from the debugging point of view that the error didn't reoccur so far? ========= top | grep f2fs 320 root 20 0 0 0 0 S 0.3 0.0 0:00.10 f2fs_gc-8:17 =========
(In reply to bernd_b from comment #37) Instead of 100% I have now 0.3, that should be considered normal for a system without any specific work load?
@38bernd_b I'm not familiar with the code, but I think that the patch only adds some debug output
That was what I expected when reading "debug patch". I will observe. I can expect the additional debug output to be shown with "dmesg"?
Yes
Created attachment 301502 [details] hopefully_debug_patch_output So there the ruthless CPU usage is back again: root@P4560-dev bernd_b]# top | grep f2fs 322 root 20 0 0 0 0 R 100.0 0.0 1:59.10 f2fs_gc-8:33 322 root 20 0 0 0 0 R 99.7 0.0 2:02.10 f2fs_gc-8:33 322 root 20 0 0 0 0 R 99.7 0.0 2:05.10 f2fs_gc-8:33 322 root 20 0 0 0 0 R 100.0 0.0 2:08.10 f2fs_gc-8:33 322 root 20 0 0 0 0 R 100.0 0.0 2:11.10 f2fs_gc-8:33 322 root 20 0 0 0 0 R 100.0 0.0 2:14.11 f2fs_gc-8:33 I added the dmesg output in the attachment but I fear there is nothing useful to be found.
(In reply to bernd_b from comment #42) > I added the dmesg output in the attachment but I fear there is nothing > useful to be found. [ 2443.069033] f2fs_get_lock_data_page i:205012000 [ 2443.069688] f2fs_get_lock_data_page i:205014000 [ 2443.070016] f2fs_get_lock_data_page i:205015000 [ 2443.070344] f2fs_get_lock_data_page i:205016000 [ 2443.070672] f2fs_get_lock_data_page i:205017000 [ 2443.071000] f2fs_get_lock_data_page i:205018000 Thanks a lot for helping to debug this bug, I guess the output looks useful, since at least we have found where it loops at. I have no idea why it loops here since it's an impossible case, I can make a patch to skip this loop, but I'm afraid that won't fix root cause... :(
(In reply to Chao Yu from comment #43) > I have no idea why it loops > here since it's an impossible case, I can make a patch to skip this loop, > but I'm afraid that won't fix root cause... :( Could such a patch at least gives me/us the chance to log the event causing the trouble? Otherwise, the only chance left I see is to wait for further bug reports and to see if they bring new light into the matter?!
Created attachment 301519 [details] debug patch v2
(In reply to bernd_b from comment #44) > Could such a patch at least gives me/us the chance to log the event causing > the trouble? > Otherwise, the only chance left I see is to wait for further bug reports and > to see if they bring new light into the matter?! I added more debug info in v2, and also try to break the loop to see if gc can move on... Please note that it's necessary to backup your important data before using this patch, or you can just apply debug info part to see whether it can provide more useful info.
I don't see this problem for a *month*, since 2022-06-30, see #25. I didn't change pattern of work - laptop is always on, same daily job, no load at night, no hibernating. I'm on Arch Linux and do upgrade every weekend, more or less regularly. First time I had this problem was ~2022-06-15, at that time I was probably on kernel 5.18.3 or 5.18.4. I had it on 5.18.6, see #25. After next upgrade and up to now, kernel 5.18.13, I did not had this problem. This brings the hypothesis: may be the source of the problem is not (directly) in f2fs, but in the other part of the kernel, and kernel upgrades affect f2fs in a negative way? Kind of "feature interaction". Next question is: what was changed in the kernel around 5.18.3 or 5.18.4 that so dramatically affects f2fs?
(In reply to Yuriy Garin from comment #47) > > First time I had this problem was ~2022-06-15, at that time I was probably > on > kernel 5.18.3 or 5.18.4. I had it on 5.18.6, see #25. > > After next upgrade and up to now, kernel 5.18.13, I did not had this problem. > I used 5.18.14 when receiving the debug output I posted here. Questions remaining: - Better to apply debug_v2 patch to the newest kernel or stick with 5.18.14? - I would like to start to only apply the new debug part. Therefore, I leave out everything from line 43?
This problem doesn't seem to appear anymore after 5.18.15
Created attachment 301543 [details] dmesg log After a month of happy living (see #47) it got me again. $ uname -a Linux xxx 5.18.15-arch1-2 #1 SMP PREEMPT_DYNAMIC Mon, 01 Aug 2022 18:58:16 +0000 x86_64 GNU/Linux
Today f2fs_gc got it again just few hours after upgrade to fresh Arch: Linux xxx 5.19.1-arch2-1 #1 SMP PREEMPT_DYNAMIC Thu, 11 Aug 2022 16:06:13 +0000 x86_64 GNU/Linux What I have noticed is that I've left few instances of some custom application with open modified, but unsaved files. Is it a way to trigger the problem?
Please, share your experience - how do you deal with this problem? I do 'sudo systemctl poweroff', but, it takes hour, so, after waiting for most services down, I just press power button.
I'll patch kernel 4.19.1 with debug patch v2 and see if I'll get new debug info. So far, the safest for me is to run my system with kernel < 4.18.x
I patched kernel 4.19.1 with debug patch v2 but all I can see is that the workload of the cpu goes up to 3.5G and immediately down to 0.8G, and this about every 2 seconds, although there is no workload at all. And there is no output at all with dmesg, just as if nothing happens. Downgrading to 5.17.9 let the cpu stay steadily with 0.8G in the same situation. But who knows if this is something related to this subject at all? Anyway, I wonder why I miss any kernel messages.
This problem still persists in kernel 5.19.3. As mentioned before, f2fs_gc stays at 100% CPU utilization indefinitely: 247 root 20 0 0 0 0 R 99.7 0.0 90:30.67 f2fs_gc-8:4 Here is the call trace causing a hang: kernel: INFO: task f2fs_ckpt-8:4:244 blocked for more than 122 seconds. kernel: Not tainted 5.19.3-arch1-1 #1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kernel: task:f2fs_ckpt-8:4 state:D stack: 0 pid: 244 ppid: 2 flags:0x00004000 kernel: Call Trace: kernel: <TASK> kernel: __schedule+0x356/0x11a0 kernel: ? preempt_schedule+0x43/0x60 kernel: schedule+0x5e/0xd0 kernel: rwsem_down_write_slowpath+0x335/0x720 kernel: ? __schedule+0x35e/0x11a0 kernel: ? __checkpoint_and_complete_reqs+0x1b0/0x1b0 [f2fs d8aa87b5c63297ab56225cbf829d392db2a7d54b] kernel: __checkpoint_and_complete_reqs+0x7a/0x1b0 [f2fs d8aa87b5c63297ab56225cbf829d392db2a7d54b] kernel: ? __checkpoint_and_complete_reqs+0x1b0/0x1b0 [f2fs d8aa87b5c63297ab56225cbf829d392db2a7d54b] kernel: issue_checkpoint_thread+0x4c/0x110 [f2fs d8aa87b5c63297ab56225cbf829d392db2a7d54b] kernel: ? cpuacct_percpu_seq_show+0x20/0x20 kernel: kthread+0xde/0x110 kernel: ? kthread_complete_and_exit+0x20/0x20 kernel: ret_from_fork+0x22/0x30 kernel: </TASK>
Also, there is a another task hanging up with a call trace: kernel: INFO: task mandb:13173 blocked for more than 122 seconds. kernel: Not tainted 5.19.3-arch1-1 #1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kernel: task:mandb state:D stack: 0 pid:13173 ppid: 1 flags:0x00000002 kernel: Call Trace: kernel: <TASK> kernel: __schedule+0x356/0x11a0 kernel: ? ttwu_queue_wakelist+0xef/0x110 kernel: ? try_to_wake_up+0x23e/0x550 kernel: schedule+0x5e/0xd0 kernel: schedule_timeout+0x11c/0x150 kernel: __wait_for_common+0x94/0x1d0 kernel: ? usleep_range_state+0x90/0x90 kernel: f2fs_issue_checkpoint+0x11f/0x200 [f2fs d8aa87b5c63297ab56225cbf829d392db2a7d54b] kernel: ? f2fs_update_inode_page+0x3d/0x80 [f2fs d8aa87b5c63297ab56225cbf829d392db2a7d54b] kernel: f2fs_do_sync_file+0x18b/0x960 [f2fs d8aa87b5c63297ab56225cbf829d392db2a7d54b] kernel: __do_sys_msync+0x1d3/0x2c0 kernel: do_syscall_64+0x5f/0x90 kernel: ? syscall_exit_to_user_mode+0x1b/0x40 kernel: ? do_syscall_64+0x6b/0x90 kernel: ? exc_page_fault+0x74/0x170 kernel: entry_SYSCALL_64_after_hwframe+0x63/0xcd kernel: RIP: 0033:0x7f48b6d377c4 kernel: RSP: 002b:00007ffeddb19528 EFLAGS: 00000202 ORIG_RAX: 000000000000001a kernel: RAX: ffffffffffffffda RBX: 0000000000000f78 RCX: 00007f48b6d377c4 kernel: RDX: 0000000000000006 RSI: 0000000000004000 RDI: 00007f48b6e6a000 kernel: RBP: 000056363710af00 R08: 0000563636f48fc0 R09: 00000000035573fd kernel: R10: 0000000000001000 R11: 0000000000000202 R12: 00000000000000a6 kernel: R13: 0000000000000000 R14: 0000563636f48fc0 R15: 000056363710af00 kernel: </TASK>
I started to see this on 5.19.6-arch1-1 out of the blue. Noticed the first time while processing pacman post-install hooks hanging, so actually working on the FS. Now the system was essentially idle when f2fs_gc grabbed a core again. INFO: task f2fs_ckpt-259:2:377 blocked for more than 122 seconds. Tainted: P OE 5.19.6-arch1-1 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:f2fs_ckpt-259:2 state:D stack: 0 pid: 377 ppid: 2 flags:0x00004000 Call Trace: <TASK> __schedule+0x356/0x11a0 ? preempt_schedule+0x43/0x60 schedule+0x5e/0xd0 rwsem_down_write_slowpath+0x335/0x720 ? __schedule+0x35e/0x11a0 ? __checkpoint_and_complete_reqs+0x1b0/0x1b0 [f2fs 9fac87d5629570e172d2b77a90a74b98e0678f12] __checkpoint_and_complete_reqs+0x7a/0x1b0 [f2fs 9fac87d5629570e172d2b77a90a74b98e0678f12] ? __checkpoint_and_complete_reqs+0x1b0/0x1b0 [f2fs 9fac87d5629570e172d2b77a90a74b98e0678f12] issue_checkpoint_thread+0x4c/0x110 [f2fs 9fac87d5629570e172d2b77a90a74b98e0678f12] ? cpuacct_percpu_seq_show+0x20/0x20 kthread+0xdb/0x110 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x1f/0x30 </TASK> INFO: task kworker/u64:0:5076 blocked for more than 122 seconds. Tainted: P OE 5.19.6-arch1-1 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:kworker/u64:0 state:D stack: 0 pid: 5076 ppid: 2 flags:0x00004000 Workqueue: writeback wb_workfn (flush-259:0) Call Trace: <TASK> __schedule+0x356/0x11a0 ? ttwu_queue_wakelist+0xef/0x110 schedule+0x5e/0xd0 schedule_timeout+0x11c/0x150 __wait_for_common+0x91/0x1d0 ? usleep_range_state+0x90/0x90 f2fs_issue_checkpoint+0x11f/0x200 [f2fs 9fac87d5629570e172d2b77a90a74b98e0678f12] f2fs_balance_fs_bg+0x119/0x370 [f2fs 9fac87d5629570e172d2b77a90a74b98e0678f12] f2fs_write_node_pages+0x78/0x240 [f2fs 9fac87d5629570e172d2b77a90a74b98e0678f12] do_writepages+0xc1/0x1d0 ? __percpu_counter_sum+0x5b/0x70 ? sched_clock_cpu+0xd/0xb0 __writeback_single_inode+0x3d/0x360 writeback_sb_inodes+0x1ed/0x4a0 ? pollwake+0x78/0xa0 __writeback_inodes_wb+0x4c/0xf0 wb_writeback+0x204/0x2f0 wb_workfn+0x36f/0x520 process_one_work+0x1c4/0x380 worker_thread+0x51/0x390 ? rescuer_thread+0x3b0/0x3b0 kthread+0xdb/0x110 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x1f/0x30 </TASK>
Created attachment 301840 [details] f2fs debug patch log Applied debug patch v2 to Arch Linux kernel at: 3245cb65fd91 2022-09-14 Linus Torvalds Merge tag 'devicetree-fixes-for-6.0-2' ... I believe, it corresponds to f2fs master, right? $ uname -a Linux bonobox 6.0.0-rc5myarch-g3245cb65fd91-dirty #2 SMP PREEMPT_DYNAMIC Fri Sep 16 23:30:54 PDT 2022 x86_64 GNU/Linux It got 800 lines of the same address: f2fs_get_lock_data_page i:1000, mapping(0000000018871cc7, 000000008a3c4185, 0000000018871cc7), ino:370194, i_mode:33261, flags:2147553792
Please forgive my ignorance, I'm sure I'm getting it wrong. How this 'goto repeat' supposed to break a cycle? Here is a simplified code of f2fs_get_lock_data_page(): struct page *f2fs_get_lock_data_page(struct inode *inode, ...) { repeat: /* get page by inode - from cache or disk */ struct page *page = f2fs_get_read_data_page(inode, ...); if (page->mapping != inode->i_mapping) { /* page has bad mapping */ f2fs_put_page(page, 1); /* put page bad mapping back in cache */ goto repeat; /* do it again, without changing anything */ } return page; /* good */ } We look by inode for a page - in cache or disk. Got page, but it has a "wrong" mapping. We put a page - with "wrong" mapping - back into cache. And, without changing anything - in inode info or page info - we are trying to get a different result by doing the same thing with the same conditions.
Created attachment 301856 [details] dmesg log Another debug v2 output. Continuation of #58 - without rebooting. This one has a bit more variety.
Created attachment 301868 [details] dmesg log Another 900 debug patch traces. All on the same address: f2fs_get_lock_data_page i:5000, mapping(0000000007c40c1c, 000000008a3c4185, 0000000007c40c1c), ino:1138081, i_mode:33261, flags:2147553792 Good thing about this debug patch is that it breaks forever loop and all this mishap takes only couple of seconds :) What else can we do to help?
Created attachment 302936 [details] debug patch - show "fixed" To test #59 hypothesis, than 'repeat loop' is not going to "fix" problem, I've modified data.c - based on 5.19.12-arch1-1 source code: Linux bonobox 5.19.12-arch1-1 #5 SMP PREEMPT_DYNAMIC Sat Oct 1 23:25:37 PDT 2022 x86_64 GNU/Linux This modification prints 1st time when 'repeat loop' happens, and, it prints when 'repeat loop' fixes the problem. Hypothesis is: 'repeat loop' is never fixes the problem and doomed to loop forever.
Created attachment 302937 [details] log of debug patch - show "fixed" Continue #62. Log shows that 'repeat loop' never fixes problem: there are no messages "f2fs_get_lock_data_page: fix: ...". That supports hypothesis #59.
Running 6.0.1-arch2-1 for a few days. So far, so good! (Previously, this problem occurred from twice a day to once in a two days.) There were quite a lot of changes in f2fs sources in 6.0. Let's hope it works! Thanks!
Unfortunately 6.0.1-arch2-1 didn't do the trick for me. I'm still experiencing this bug, especially when my laptop is coming out of sleep.
(In reply to bogdan.nicolae from comment #65) > Unfortunately 6.0.1-arch2-1 didn't do the trick for me. I'm still > experiencing this bug, especially when my laptop is coming out of sleep. That sucks. I really hoped that it's gone. Well, when previous months it become unbearable, I did this - "palliative" workaround: 1. Read and try https://wiki.archlinux.org/title/Kernel/Traditional_compilation and https://wiki.archlinux.org/title/Kernel/Arch_Build_System, especially latter. 2. Using "Arch Build System", apply one of the debug patches, mentioned earlier, e.g. #62 - the good thing about those debug patches is that they break "dead loop". (Or just edit code in data.c - it's a couple of lines.) 3. Replace kernel by patched. Reboot. 4. On every reboot, before mount f2fs, run fsck.f2fs (though, so far, it never reported anything bad). Do this after running Arch 'pacman -Syu', when kernel was changed, ~ once a week. Once you got familiar with "Arch Build System", it takes ~30 minutes. (Naturally, stating the obvious, save your work, in the case if you loose computer.)
Yes, sadly confirming, this bug happens on 6.0.2-arch1-1: [Mon Oct 24 15:57:09 2022] INFO: task f2fs_ckpt-259:3:273 blocked for more than 122 seconds. [Mon Oct 24 15:57:09 2022] Not tainted 6.0.2-arch1-1 #1
The only reasonable solution that works for me in everyday use is to deactivate the background garbage collector entirely. For reference, this can be done using the background_gc=off mount option (which can be added to /etc/fstab).
That's a terrible workaround. Does it impact the SSD wear leveling algoritmh? Can a periodic fstrim mitigate this?
Well, if understand correctly, TRIM will still be issued on-demand when segments are cleared. There's also an option background_gc=sync, which I assume will block other I/O operations. Unfortunately, the documentation is vague about both aspects.
Since the problem occurred with the 5.18 kernel, I assume there is something wrong after that version. I think the analysis of the problem should start with the changes that occurred with version 5.18. At the moment, I am still stuck on version 5.17.9 for this very reason, which in fact prevents me from using the PC with more recent versions of the kernel.
I tried kernel 6.0.6, after 2 days the problem reoccurred :-(
just to try it out, I will now give background_gc=sync a chance
Hi all, The only way to find where the issue is, is to bisect from the latest working kernel to the first non working one
Guido just pointed that out in #71: the issue appeared since 5.18
(In reply to Matteo Croce from comment #74) > Hi all, > > The only way to find where the issue is, is to bisect from the latest > working kernel to the first non working one the last working was 5.17.15, the first with the bug is 5.18 I tried to give a look to the diff in gc.c file in kernel, they are very few, maybe the problem is related to GC_URGENT_HIGH / MID mechanism...
Great. If you do a bisect, you will find the problem in, let's say, 14 steps. Really worth a try.
I tried background_gc=sync. It doesn't solve the problem...
I just had some time to think of this issue, and suspect there was no time to reschedule the cpu in the loop? Can anyone try this change? diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c index a71e818cd67b..c351c3269874 100644 --- a/fs/f2fs/data.c +++ b/fs/f2fs/data.c @@ -1325,6 +1325,7 @@ struct page *f2fs_get_lock_data_page(struct inode *inode, pgoff_t index, lock_page(page); if (unlikely(page->mapping != mapping)) { f2fs_put_page(page, 1); + f2fs_io_schedule_timeout(DEFAULT_IO_TIMEOUT); goto repeat; } if (unlikely(!PageUptodate(page))) {
(In reply to Jaegeuk Kim from comment #79) I tried to apply it to kernel 6.0.8 but failed. I found the code at row 1313 so I can try to apply there. But there is another identical code at row 3568. Do we need to patch also there?
I think 1313 would be enough to avoid this issue first. 3568 case is after submit IO which could have a chance to get another states.
(In reply to Jaegeuk Kim from comment #81) > I think 1313 would be enough to avoid this issue first. > 3568 case is after submit IO which could have a chance to get another states. Thank you, I'm testing 6.0.8 patched. I will not turn off the PC for several days, so let's see what happens.
I tried this script to trigger the GC: https://github.com/LLJY/f2fs-gc/blob/master/f2fs-gc.sh It's been running for 10 minutes now, but it's stock to 2503 dirty segments on the root partition. But no sign of 100% cpu, everything looks normal.
this is the output Performing GC on /sys/fs/f2fs/nvme0n1p3/ 2589 2589 2503 2503 2503 2503 ... and a lot of 2503....
I modified the script to run it on the partition on my choice. No problem with the home and other partitions. It looks like something in root partition. May be it is related to the bug?
(In reply to Jaegeuk Kim from comment #79) Running this patch (and debug printk) on 6.0.8-arch1-1.
Created attachment 303184 [details] kernel log (with patch on data.c applied)
(In reply to Guido from comment #87) > Created attachment 303184 [details] > kernel log (with patch on data.c applied) After several days of use I still did not have the 100% cpu busy problem but it gets worse. The system would not go to sleep or shut down (this happened several times and forced me to brutally shut down the computer), so checking the log I noticed several errors related to f2fs. I attach the log. Follows an extract of the log for your convenience nov 15 02:17:13 manjaro kernel: INFO: task f2fs_ckpt-259:3:233 blocked for more than 245 seconds. nov 15 02:17:13 manjaro kernel: Tainted: G U 6.0.8-1-MANJARO #1 nov 15 02:17:13 manjaro kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. nov 15 02:17:13 manjaro kernel: task:f2fs_ckpt-259:3 state:D stack: 0 pid: 233 ppid: 2 flags:0x00004000 nov 15 02:17:13 manjaro kernel: Call Trace: nov 15 02:17:13 manjaro kernel: <TASK> nov 15 02:17:13 manjaro kernel: __schedule+0x343/0x11c0 nov 15 02:17:13 manjaro kernel: ? update_load_avg+0x7e/0x730 nov 15 02:17:13 manjaro kernel: schedule+0x5e/0xd0 nov 15 02:17:13 manjaro kernel: rwsem_down_write_slowpath+0x336/0x720 nov 15 02:17:13 manjaro kernel: ? psi_task_switch+0xc3/0x1f0 nov 15 02:17:13 manjaro kernel: ? __schedule+0x34b/0x11c0 nov 15 02:17:13 manjaro kernel: ? __checkpoint_and_complete_reqs+0x1b0/0x1b0 [f2fs 112497ead8e6784e9a6a664ca29672f96820d535] nov 15 02:17:13 manjaro kernel: __checkpoint_and_complete_reqs+0x7a/0x1b0 [f2fs 112497ead8e6784e9a6a664ca29672f96820d535] nov 15 02:17:13 manjaro kernel: ? __checkpoint_and_complete_reqs+0x1b0/0x1b0 [f2fs 112497ead8e6784e9a6a664ca29672f96820d535] nov 15 02:17:13 manjaro kernel: issue_checkpoint_thread+0x4c/0x110 [f2fs 112497ead8e6784e9a6a664ca29672f96820d535] nov 15 02:17:13 manjaro kernel: ? cpuacct_percpu_seq_show+0x20/0x20 nov 15 02:17:13 manjaro kernel: kthread+0xdb/0x110 nov 15 02:17:13 manjaro kernel: ? kthread_complete_and_exit+0x20/0x20 nov 15 02:17:13 manjaro kernel: ret_from_fork+0x1f/0x30 nov 15 02:17:13 manjaro kernel: </TASK> nov 15 02:17:13 manjaro kernel: INFO: task kworker/u16:10:26736 blocked for more than 245 seconds. nov 15 02:17:13 manjaro kernel: Tainted: G U 6.0.8-1-MANJARO #1 nov 15 02:17:13 manjaro kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. nov 15 02:17:13 manjaro kernel: task:kworker/u16:10 state:D stack: 0 pid:26736 ppid: 2 flags:0x00004000 nov 15 02:17:13 manjaro kernel: Workqueue: writeback wb_workfn (flush-259:0) nov 15 02:17:13 manjaro kernel: Call Trace: nov 15 02:17:13 manjaro kernel: <TASK> nov 15 02:17:13 manjaro kernel: __schedule+0x343/0x11c0 nov 15 02:17:13 manjaro kernel: schedule+0x5e/0xd0 nov 15 02:17:13 manjaro kernel: schedule_timeout+0x11c/0x150 nov 15 02:17:13 manjaro kernel: wait_for_completion+0x8a/0x160 nov 15 02:17:13 manjaro kernel: f2fs_issue_checkpoint+0x11f/0x200 [f2fs 112497ead8e6784e9a6a664ca29672f96820d535] nov 15 02:17:13 manjaro kernel: f2fs_balance_fs_bg+0x119/0x370 [f2fs 112497ead8e6784e9a6a664ca29672f96820d535] nov 15 02:17:13 manjaro kernel: f2fs_write_node_pages+0x78/0x240 [f2fs 112497ead8e6784e9a6a664ca29672f96820d535] nov 15 02:17:13 manjaro kernel: do_writepages+0xc1/0x1d0 nov 15 02:17:13 manjaro kernel: ? __wb_calc_thresh+0x4b/0x140 nov 15 02:17:13 manjaro kernel: __writeback_single_inode+0x3d/0x360 nov 15 02:17:13 manjaro kernel: ? inode_io_list_move_locked+0x69/0xc0 nov 15 02:17:13 manjaro kernel: writeback_sb_inodes+0x1ed/0x4a0 nov 15 02:17:13 manjaro kernel: __writeback_inodes_wb+0x4c/0xf0 nov 15 02:17:13 manjaro kernel: wb_writeback+0x204/0x2f0 nov 15 02:17:13 manjaro kernel: wb_workfn+0x31c/0x4f0 nov 15 02:17:13 manjaro kernel: ? __mod_timer+0x289/0x3b0 nov 15 02:17:13 manjaro kernel: process_one_work+0x1c4/0x380 nov 15 02:17:13 manjaro kernel: worker_thread+0x51/0x390 nov 15 02:17:13 manjaro kernel: ? rescuer_thread+0x3b0/0x3b0 nov 15 02:17:13 manjaro kernel: kthread+0xdb/0x110 nov 15 02:17:13 manjaro kernel: ? kthread_complete_and_exit+0x20/0x20 nov 15 02:17:13 manjaro kernel: ret_from_fork+0x1f/0x30 nov 15 02:17:13 manjaro kernel: </TASK> nov 15 02:19:16 manjaro kernel: INFO: task kworker/7:1:86 blocked for more than 122 seconds. nov 15 02:19:16 manjaro kernel: Tainted: G U 6.0.8-1-MANJARO #1 nov 15 02:19:16 manjaro kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. nov 15 02:19:16 manjaro kernel: task:kworker/7:1 state:D stack: 0 pid: 86 ppid: 2 flags:0x00004000 nov 15 02:19:16 manjaro kernel: Workqueue: inode_switch_wbs inode_switch_wbs_work_fn nov 15 02:19:16 manjaro kernel: Call Trace: nov 15 02:19:16 manjaro kernel: <TASK> nov 15 02:19:16 manjaro kernel: __schedule+0x343/0x11c0 nov 15 02:19:16 manjaro kernel: ? ttwu_do_wakeup+0x17/0x170 nov 15 02:19:16 manjaro kernel: ? try_to_wake_up+0x25f/0x560 nov 15 02:19:16 manjaro kernel: schedule+0x5e/0xd0 nov 15 02:19:16 manjaro kernel: rwsem_down_read_slowpath+0x2a6/0x520 nov 15 02:19:16 manjaro kernel: ? update_load_avg+0x7e/0x730 nov 15 02:19:16 manjaro kernel: inode_switch_wbs_work_fn+0x5c/0x810 nov 15 02:19:16 manjaro kernel: ? __switch_to_asm+0x3e/0x60 nov 15 02:19:16 manjaro kernel: ? finish_task_switch.isra.0+0x90/0x2d0 nov 15 02:19:16 manjaro kernel: ? __schedule+0x34b/0x11c0 nov 15 02:19:16 manjaro kernel: ? mod_memcg_lruvec_state+0x15/0x30 nov 15 02:19:16 manjaro kernel: process_one_work+0x1c4/0x380 nov 15 02:19:16 manjaro kernel: worker_thread+0x51/0x390 nov 15 02:19:16 manjaro kernel: ? rescuer_thread+0x3b0/0x3b0 nov 15 02:19:16 manjaro kernel: kthread+0xdb/0x110 nov 15 02:19:16 manjaro kernel: ? kthread_complete_and_exit+0x20/0x20 nov 15 02:19:16 manjaro kernel: ret_from_fork+0x1f/0x30 nov 15 02:19:16 manjaro kernel: </TASK> nov 15 02:19:16 manjaro kernel: INFO: task f2fs_ckpt-259:3:233 blocked for more than 368 seconds. nov 15 02:19:16 manjaro kernel: Tainted: G U 6.0.8-1-MANJARO #1 nov 15 02:19:16 manjaro kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. nov 15 02:19:16 manjaro kernel: task:f2fs_ckpt-259:3 state:D stack: 0 pid: 233 ppid: 2 flags:0x00004000 nov 15 02:19:16 manjaro kernel: Call Trace: nov 15 02:19:16 manjaro kernel: <TASK> nov 15 02:19:16 manjaro kernel: __schedule+0x343/0x11c0 nov 15 02:19:16 manjaro kernel: ? update_load_avg+0x7e/0x730 nov 15 02:19:16 manjaro kernel: schedule+0x5e/0xd0 nov 15 02:19:16 manjaro kernel: rwsem_down_write_slowpath+0x336/0x720 nov 15 02:19:16 manjaro kernel: ? psi_task_switch+0xc3/0x1f0 nov 15 02:19:16 manjaro kernel: ? __schedule+0x34b/0x11c0 nov 15 02:19:16 manjaro kernel: ? __checkpoint_and_complete_reqs+0x1b0/0x1b0 [f2fs 112497ead8e6784e9a6a664ca29672f96820d535] nov 15 02:19:16 manjaro kernel: __checkpoint_and_complete_reqs+0x7a/0x1b0 [f2fs 112497ead8e6784e9a6a664ca29672f96820d535] nov 15 02:19:16 manjaro kernel: ? __checkpoint_and_complete_reqs+0x1b0/0x1b0 [f2fs 112497ead8e6784e9a6a664ca29672f96820d535] nov 15 02:19:16 manjaro kernel: issue_checkpoint_thread+0x4c/0x110 [f2fs 112497ead8e6784e9a6a664ca29672f96820d535] nov 15 02:19:16 manjaro kernel: ? cpuacct_percpu_seq_show+0x20/0x20 nov 15 02:19:16 manjaro kernel: kthread+0xdb/0x110 nov 15 02:19:16 manjaro kernel: ? kthread_complete_and_exit+0x20/0x20 nov 15 02:19:16 manjaro kernel: ret_from_fork+0x1f/0x30 nov 15 02:19:16 manjaro kernel: </TASK> nov 15 02:19:16 manjaro kernel: INFO: task NetworkManager:791 blocked for more than 122 seconds. nov 15 02:19:16 manjaro kernel: Tainted: G U 6.0.8-1-MANJARO #1 nov 15 02:19:16 manjaro kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. nov 15 02:19:16 manjaro kernel: task:NetworkManager state:D stack: 0 pid: 791 ppid: 1 flags:0x00000002 nov 15 02:19:16 manjaro kernel: Call Trace: nov 15 02:19:16 manjaro kernel: <TASK> nov 15 02:19:16 manjaro kernel: __schedule+0x343/0x11c0 nov 15 02:19:16 manjaro kernel: schedule+0x5e/0xd0 nov 15 02:19:16 manjaro kernel: schedule_timeout+0x11c/0x150 nov 15 02:19:16 manjaro kernel: wait_for_completion+0x8a/0x160 nov 15 02:19:16 manjaro kernel: f2fs_issue_checkpoint+0x11f/0x200 [f2fs 112497ead8e6784e9a6a664ca29672f96820d535] nov 15 02:19:16 manjaro kernel: ? do_filp_open+0xb3/0x160 nov 15 02:19:16 manjaro kernel: f2fs_do_sync_file+0x184/0x9d0 [f2fs 112497ead8e6784e9a6a664ca29672f96820d535] nov 15 02:19:16 manjaro kernel: __x64_sys_fsync+0x3b/0x70 nov 15 02:19:16 manjaro kernel: do_syscall_64+0x5c/0x90 nov 15 02:19:16 manjaro kernel: ? do_syscall_64+0x6b/0x90 nov 15 02:19:16 manjaro kernel: ? do_syscall_64+0x6b/0x90 nov 15 02:19:16 manjaro kernel: ? syscall_exit_to_user_mode+0x1b/0x40 nov 15 02:19:16 manjaro kernel: ? do_syscall_64+0x6b/0x90 nov 15 02:19:16 manjaro kernel: ? do_syscall_64+0x6b/0x90 nov 15 02:19:16 manjaro kernel: ? do_syscall_64+0x6b/0x90 nov 15 02:19:16 manjaro kernel: ? do_syscall_64+0x6b/0x90 nov 15 02:19:16 manjaro kernel: entry_SYSCALL_64_after_hwframe+0x63/0xcd nov 15 02:19:16 manjaro kernel: RIP: 0033:0x7fe4694ddb0a nov 15 02:19:16 manjaro kernel: RSP: 002b:00007fff3d5a29d0 EFLAGS: 00000293 ORIG_RAX: 000000000000004a nov 15 02:19:16 manjaro kernel: RAX: ffffffffffffffda RBX: 000056249d7d17b0 RCX: 00007fe4694ddb0a nov 15 02:19:16 manjaro kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000016 nov 15 02:19:16 manjaro kernel: RBP: 0000000000000016 R08: 0000000000000000 R09: 00007fff3d5a2aa0 nov 15 02:19:16 manjaro kernel: R10: 0000000000000000 R11: 0000000000000293 R12: 000056249d86ec00 nov 15 02:19:16 manjaro kernel: R13: 0000000000000001 R14: 0000000000000016 R15: 000056249d83ccd0 nov 15 02:19:16 manjaro kernel: </TASK> nov 15 02:19:16 manjaro kernel: INFO: task kworker/u16:10:26736 blocked for more than 368 seconds. nov 15 02:19:16 manjaro kernel: Tainted: G U 6.0.8-1-MANJARO #1 nov 15 02:19:16 manjaro kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. nov 15 02:19:16 manjaro kernel: task:kworker/u16:10 state:D stack: 0 pid:26736 ppid: 2 flags:0x00004000 nov 15 02:19:16 manjaro kernel: Workqueue: writeback wb_workfn (flush-259:0) nov 15 02:19:16 manjaro kernel: Call Trace: nov 15 02:19:16 manjaro kernel: <TASK> nov 15 02:19:16 manjaro kernel: __schedule+0x343/0x11c0 nov 15 02:19:16 manjaro kernel: schedule+0x5e/0xd0 nov 15 02:19:16 manjaro kernel: schedule_timeout+0x11c/0x150 nov 15 02:19:16 manjaro kernel: wait_for_completion+0x8a/0x160 nov 15 02:19:16 manjaro kernel: f2fs_issue_checkpoint+0x11f/0x200 [f2fs 112497ead8e6784e9a6a664ca29672f96820d535] nov 15 02:19:16 manjaro kernel: f2fs_balance_fs_bg+0x119/0x370 [f2fs 112497ead8e6784e9a6a664ca29672f96820d535] nov 15 02:19:16 manjaro kernel: f2fs_write_node_pages+0x78/0x240 [f2fs 112497ead8e6784e9a6a664ca29672f96820d535] nov 15 02:19:16 manjaro kernel: do_writepages+0xc1/0x1d0 nov 15 02:19:16 manjaro kernel: ? __wb_calc_thresh+0x4b/0x140 nov 15 02:19:16 manjaro kernel: __writeback_single_inode+0x3d/0x360 nov 15 02:19:16 manjaro kernel: ? inode_io_list_move_locked+0x69/0xc0 nov 15 02:19:16 manjaro kernel: writeback_sb_inodes+0x1ed/0x4a0 nov 15 02:19:16 manjaro kernel: __writeback_inodes_wb+0x4c/0xf0 nov 15 02:19:16 manjaro kernel: wb_writeback+0x204/0x2f0 nov 15 02:19:16 manjaro kernel: wb_workfn+0x31c/0x4f0 nov 15 02:19:16 manjaro kernel: ? __mod_timer+0x289/0x3b0 nov 15 02:19:16 manjaro kernel: process_one_work+0x1c4/0x380 nov 15 02:19:16 manjaro kernel: worker_thread+0x51/0x390 nov 15 02:19:16 manjaro kernel: ? rescuer_thread+0x3b0/0x3b0 nov 15 02:19:16 manjaro kernel: kthread+0xdb/0x110 nov 15 02:19:16 manjaro kernel: ? kthread_complete_and_exit+0x20/0x20 nov 15 02:19:16 manjaro kernel: ret_from_fork+0x1f/0x30 nov 15 02:19:16 manjaro kernel: </TASK> nov 15 02:19:16 manjaro kernel: INFO: task kworker/6:2:27513 blocked for more than 122 seconds. nov 15 02:19:16 manjaro kernel: Tainted: G U 6.0.8-1-MANJARO #1 nov 15 02:19:16 manjaro kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. nov 15 02:19:16 manjaro kernel: task:kworker/6:2 state:D stack: 0 pid:27513 ppid: 2 flags:0x00004000 nov 15 02:19:16 manjaro kernel: Workqueue: inode_switch_wbs inode_switch_wbs_work_fn nov 15 02:19:16 manjaro kernel: Call Trace: nov 15 02:19:16 manjaro kernel: <TASK> nov 15 02:19:16 manjaro kernel: __schedule+0x343/0x11c0 nov 15 02:19:16 manjaro kernel: schedule+0x5e/0xd0 nov 15 02:19:16 manjaro kernel: rwsem_down_read_slowpath+0x2a6/0x520 nov 15 02:19:16 manjaro kernel: inode_switch_wbs_work_fn+0x5c/0x810 nov 15 02:19:16 manjaro kernel: ? __switch_to_asm+0x3e/0x60 nov 15 02:19:16 manjaro kernel: ? finish_task_switch.isra.0+0x90/0x2d0 nov 15 02:19:16 manjaro kernel: ? __schedule+0x34b/0x11c0 nov 15 02:19:16 manjaro kernel: process_one_work+0x1c4/0x380 nov 15 02:19:16 manjaro kernel: worker_thread+0x51/0x390 nov 15 02:19:16 manjaro kernel: ? rescuer_thread+0x3b0/0x3b0 nov 15 02:19:16 manjaro kernel: kthread+0xdb/0x110 nov 15 02:19:16 manjaro kernel: ? kthread_complete_and_exit+0x20/0x20 nov 15 02:19:16 manjaro kernel: ret_from_fork+0x1f/0x30 nov 15 02:19:16 manjaro kernel: </TASK> nov 15 02:19:16 manjaro kernel: INFO: task kworker/7:2:28769 blocked for more than 122 seconds. nov 15 02:19:16 manjaro kernel: Tainted: G U 6.0.8-1-MANJARO #1 nov 15 02:19:16 manjaro kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. nov 15 02:19:16 manjaro kernel: task:kworker/7:2 state:D stack: 0 pid:28769 ppid: 2 flags:0x00004000 nov 15 02:19:16 manjaro kernel: Workqueue: inode_switch_wbs inode_switch_wbs_work_fn nov 15 02:19:16 manjaro kernel: Call Trace: nov 15 02:19:16 manjaro kernel: <TASK> nov 15 02:19:16 manjaro kernel: __schedule+0x343/0x11c0 nov 15 02:19:16 manjaro kernel: ? try_to_wake_up+0x68/0x560 nov 15 02:19:16 manjaro kernel: schedule+0x5e/0xd0 nov 15 02:19:16 manjaro kernel: rwsem_down_read_slowpath+0x2a6/0x520 nov 15 02:19:16 manjaro kernel: inode_switch_wbs_work_fn+0x5c/0x810 nov 15 02:19:16 manjaro kernel: ? check_preempt_curr+0x5e/0x70 nov 15 02:19:16 manjaro kernel: ? ttwu_do_wakeup+0x17/0x170 nov 15 02:19:16 manjaro kernel: ? try_to_wake_up+0x25f/0x560 nov 15 02:19:16 manjaro kernel: process_one_work+0x1c4/0x380 nov 15 02:19:16 manjaro kernel: worker_thread+0x51/0x390 nov 15 02:19:16 manjaro kernel: ? rescuer_thread+0x3b0/0x3b0 nov 15 02:19:16 manjaro kernel: kthread+0xdb/0x110 nov 15 02:19:16 manjaro kernel: ? kthread_complete_and_exit+0x20/0x20 nov 15 02:19:16 manjaro kernel: ret_from_fork+0x1f/0x30 nov 15 02:19:16 manjaro kernel: </TASK>
I confirm the bug persists both with background_gc=on and background_gc=sync. It's especially prone to manifest when the machine is idle for a long time. It almost feels like the gc hangs because it has nothing to collect and therefore it is entering an infinite loop.
> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c > index a71e818cd67b..c351c3269874 100644 > --- a/fs/f2fs/data.c > +++ b/fs/f2fs/data.c > @@ -1325,6 +1325,7 @@ struct page *f2fs_get_lock_data_page(struct inode > *inode, pgoff_t index, > lock_page(page); > if (unlikely(page->mapping != mapping)) { > f2fs_put_page(page, 1); > + f2fs_io_schedule_timeout(DEFAULT_IO_TIMEOUT); > goto repeat; > } > if (unlikely(!PageUptodate(page))) { this patch seems to avoid the 100% cpu occupation but still doesn't solve the bug. I was wrong in the last comment, it's an improvement! As a workaround I tried to build the f2fs module from 5.17 but I failed. I'm not an expert, so I don't know how to forward-port the module.
Created attachment 303300 [details] debug patch with f2fs_io_schedule_timeout (In reply to #79) This debug patch adds f2fs_io_schedule_timeout call, as proposed in #79. It also prints when problem occurred first time (on this call) and it prints if problem was "fixed". In pseudo-code it looks like this: ... f2fs_get_lock_data_page(...) { int i = 0; repeat: page = f2fs_get_read_data_page(...); ... if (page->mapping != mapping) { if (i++ == 0) /* first time */ printk("bad ..."); f2fs_put_page(page, 1); f2fs_io_schedule_timeout(DEFAULT_IO_TIMEOUT); if (i >= 10000) return ERR_PTR(-EAGAIN); /* cannot resolve problem */ goto repeat; } if (i > 0) /* resolved problem successfully */ printk("fix ..."); ... return page; } Thus, ideally, good output should have couples of lines: bad ... good ... In short, it does *not* happen. I'm attaching detailed dmesg log in the next post.
Created attachment 303301 [details] dmesg log for patch f2fs_io_schedule_timeout #91 As you see, there is a lot of lines "bad" for the same address, and there are no corresponding lines "fix". It's all like this: f2fs_get_lock_data_page: bad: 19327, 00000000a70291ac, 0000000070d90d71
It's running on 6.0.9-arch1-1: $ uname -a Linux ... 6.0.9-arch1-1 #2 SMP PREEMPT_DYNAMIC Wed, 23 Nov 2022 05:14:08 +0000 x86_64 GNU/Linux
(In reply to Yuriy Garin from comment #93) > It's running on 6.0.9-arch1-1: > > $ uname -a > Linux ... 6.0.9-arch1-1 #2 SMP PREEMPT_DYNAMIC Wed, 23 Nov 2022 05:14:08 > +0000 x86_64 GNU/Linux Got the same result on 6.0.10-arch2-1. See timing, may be it helps. Once problems occurs, it goes every 4 minute for 1 1/2 hour. [Wed Nov 30 15:54:15 2022] f2fs_get_lock_data_page: bad: 1032147, 00000000be98c3cd, 00000000d0321d1e [Wed Nov 30 15:58:08 2022] f2fs_get_lock_data_page: bad: 1032147, 00000000be98c3cd, 00000000d0321d1e [Wed Nov 30 16:02:02 2022] f2fs_get_lock_data_page: bad: 1032147, 00000000be98c3cd, 00000000d0321d1e [Wed Nov 30 16:05:55 2022] f2fs_get_lock_data_page: bad: 1032147, 00000000be98c3cd, 00000000d0321d1e [Wed Nov 30 16:09:48 2022] f2fs_get_lock_data_page: bad: 1032147, 00000000be98c3cd, 00000000d0321d1e ... [Wed Nov 30 17:27:35 2022] f2fs_get_lock_data_page: bad: 1032147, 00000000be98c3cd, 00000000d0321d1e [Wed Nov 30 17:31:29 2022] f2fs_get_lock_data_page: bad: 1032147, 00000000be98c3cd, 00000000d0321d1e
May be it worth to inject printk "upstream", to see where this condition page->mapping != mapping happens at first place? Any ideas?
Well there's also a possibility that the mapping of the inode changes since it was initialized in the beginning: struct address_space *mapping = inode->i_mapping; How about printing all three: page->mapping, mapping and inode->i_mapping.
(In reply to bogdan.nicolae from comment #96) > Well there's also a possibility that the mapping of the inode changes since > it was initialized in the beginning: > struct address_space *mapping = inode->i_mapping; > > How about printing all three: page->mapping, mapping and inode->i_mapping. Good point, thanks!
It would be funny, if actually inode->i_mapping was correctly fixed already, and we spin for nothing. :)
Well, I tried to foce f2fs_gc on my partitions (with unpatched 6.0.11 kernel) It seems that the problem of 100% cpu occupation arises only on nvme0n1p3 (my root). The dirty sectors remains 1417 and do not go down and cpu is 100% occupied (since the start, not only when it is at 1417)
And I cannot stop f2fs_gc with [manjaro tmp]# echo 500 > /sys/fs/f2fs/nvme0n1p3/gc_urgent_sleep_time [manjaro tmp]# echo 0 > /sys/fs/f2fs/nvme0n1p3/gc_urgent
Very interesting: I run the script with kernel 5.15.81 and it works well on my root partition sudo bash ./f2fs-gc.sh [sudo] password di guido: Performing GC on /sys/fs/f2fs/nvme0n1p3/ 1849 425 330 307 1 GC completed for /sys/fs/f2fs/nvme0n1p3/ Performing GC on /sys/fs/f2fs/nvme0n1p4/ 472 118 47 GC completed for /sys/fs/f2fs/nvme0n1p4/ Performing GC on /sys/fs/f2fs/nvme1n1/ GC completed for /sys/fs/f2fs/nvme1n1/ guido~tmp$
interesting enough, after the script run on the 5.15 kernel had successfully reduced the dirty segments, I started the system with the 6.0.11 kernel and relaunched the script (after waiting for the dirty segments to return above 100). The script on 6.0.11 also worked without a problem on my root partition. As a precaution, I will run the script every 8 hours. Let's see if this will keep the partition clean and not cause problems with kernels > 5.17
Guido, so if I understand correctly, your theory is that something in the GC strategy changed starting with 5.17, and normally this wouldn't be a problem for a fresh partition but old partitions that were upgraded may be affected (and can be fixed by running the GC offline or with an older kernel)?
(In reply to bogdan.nicolae from comment #103) > Guido, so if I understand correctly, your theory is that something in the GC > strategy changed starting with 5.17, and normally this wouldn't be a problem > for a fresh partition but old partitions that were upgraded may be affected > (and can be fixed by running the GC offline or with an older kernel)? It seems so to me.
I'm running next debug patch, but problem is not happening for 4 days at this time. Can anybody suggest a way to increase chances of this GC problem? Sometimes it happens twice a day, usually once in a 2-3 days, but sometimes it runs well for month - with the same work pattern - development compilations all day, never turn computer off, no hibernation. By "way", I mean not scary, dangerous, intrusive way, like LLJY script in #83, something "more natural", less intrusive. Thanks!
(In reply to Yuriy Garin from comment #105) > I'm running next debug patch, but problem is not happening for 4 days at > this time. > > Can anybody suggest a way to increase chances of this GC problem? > > Sometimes it happens twice a day, usually once in a 2-3 days, but sometimes > it runs well for month - with the same work pattern - development > compilations all day, never turn computer off, no hibernation. > > By "way", I mean not scary, dangerous, intrusive way, like LLJY script in > #83, something "more natural", less intrusive. > > Thanks! Running the script on a 5.15 (lts) kernel should be safe (not really intrusive, gc is a supported operation). anyway, I obviously do not take responsibility :-)
I found that letting the machine go to sleep tends to trigger the bug more often after it wakes up. You could try starting an I/O intensive task like bonnie++, put the machine to sleep, then wake it up.
Thanks! How can you tell on what disk it happens? I have two nvme - one "plain" f2fs root, another is f2fs on dm-crypt - that's home, where a lot of compilation happens. From logs and stats I cannot tell where f2fs GC problem occurs. What should I look for? If I would know problematic disk, I would increase load on that disk. Second question: /sys/fs/f2fs atgc_age_threshold has 604800 value. That's 1 week. Changing it to one day or 4 hours - will it really help to trigger problem? If it will, it would be a "safe" way. Thanks again.
For me it seems that applying the debug patch with f2fs_io_schedule_timeout and running the f2fs-gc.sh script one time, then rebooting fixed the problem. For me this was on the root partition which is on a NVMe SSD. I also edited the f2fs-gc.sh script so it runs on that partition only instead of running on all found f2fs partitions before executing it.
I deactivate the f2fs-gc script for two days and... again the 100% cpu on f2fs_gc process :-(
Even worse, although I reactivated the script to force gc, I had the problem of the cpu at 100 per cent again, even though I had done the 'cleaning' with the 5.15 kernel earlier. So at the moment I'm unfortunately forced to use 5.15 all the time.
I feel that this may be a subtle page cache issue, which is really hard to find the root cause. That being said, we might have two options: 1) bisecting the kernel version, 2) trying 5.15 with all the f2fs upstream patches. 1) this requires lots of effort between 5.15 vs. 5.18 tho, is it doable? 2) https://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs-stable.git/log/?h=linux-5.15.y Is it doable to test this kernel? If this issue happens with this kernel, we can bisect among f2fs changes easily.
(In reply to Jaegeuk Kim from comment #112) Now I'm trying another solution: I used fstransform to format the partition and upgrade the filesystem to f2fs 1.15. So now I'm testing kernel 6.1. If it work, well. If not, I'll try you kernel version. Maybe other users can test your kernel early.
(In reply to Guido from comment #113) Why not test the "f2fs_io_schedule_timeout" kernel patch in combination with running the manual GC script one time (doesn't seem to matter if you run this on unpatched or patched kernel, all that's important is that you boot without any garbage into the patched kernel, so reboot right after executing the script) ? I did this cause I readed between the lines that this combination worked for others, too, and am having no more issues since around 5 days. And yes, I'm doing a lot to try to trigger this bug again. Also is it save to assume that this issue only occurs on root partitions which are on NVMe drives? I see a pattern here but still not 100% sure.
(In reply to Thomas from comment #114) > (In reply to Guido from comment #113) > > Why not test the "f2fs_io_schedule_timeout" kernel patch in combination with > running the manual GC script one time (doesn't seem to matter if you run > this on unpatched or patched kernel, all that's important is that you boot > without any garbage into the patched kernel, so reboot right after executing > the script) ? > > I did this cause I readed between the lines that this combination worked for > others, too, and am having no more issues since around 5 days. And yes, I'm > doing a lot to try to trigger this bug again. > > Also is it save to assume that this issue only occurs on root partitions > which are on NVMe drives? I see a pattern here but still not 100% sure. I already tried the patch (but not in combination with the script) and it solved the problem of 100% cpu but still f2fs_gc remains stuck and doesnt end the garbage collection, so the user cant shutdown in a safe manner. Anyway if I will have the 100% cpu problem in the following days, I'll try it.
(In reply to Jaegeuk Kim from comment #112) > this requires lots of effort between 5.15 vs. 5.18 tho, is it doable? Really good question. I think it is doable but with a lot of time and passion only. After all there is no easy way to recreate the issue but you need to run the kernel for days to see if it's stable. (In reply to Guido from comment #115) > I already tried the patch (but not in combination with the script) and it > solved the problem of 100% cpu but still f2fs_gc remains stuck and doesnt > end the garbage collection, so the user cant shutdown in a safe manner. Must have overlooked that statement, sorry. For me both problems seem to be solved with the script and patch combination through, so might we worth a try (ofc. after you finished your current test).
(In reply to Guido from comment #115) > it solved the problem of 100% cpu but still f2fs_gc remains stuck You're right, this just happened for me, too. So no more 100% CPU but the partitions I/O freezing. [28731.336375] f2fs_get_lock_data_page: bad: 825453, 00000000657faa62, 00000000ba8a2fe3 [28952.126658] f2fs_get_lock_data_page: bad: 825453, 00000000657faa62, 00000000ba8a2fe3
Created attachment 303439 [details] debug patch - print page/folio/ref_count This debug patch prints page, folio and folio reference count. As far as I understand logic behind 'f2fs_put_page(page, 1); goto repeat;' - it's an attempt to "unlock" page, release it from page cache and reload again. (I've found it not easy to distinguish between page and folio pointer - it's a C union, sometimes used as page, sometimes used as folio - definitely requires more kernel expertise. Please, tell me what should be done better.) After two weeks of running this patch, I've caught this GC problem and have a log. Attaching it in the next message.
Created attachment 303440 [details] debug patch log - page, folio and ref count As you see, folio pointer is valid. And, ref_count is not 1 before going to f2fs_put_page() - I guess, that's why it does not work. Silly thought :) Interestingly, ref count is 514, which looks suspiciously as a binary flag 1000000010. Is it possible that during 5.17/5.18 implementation of a "pin", somehow binary flag was written to ref count, or something like '1 << ...' happens?
What's I'm saying, it is, as was pointed in #112: "I feel that this may be a subtle page cache issue".
(In reply to Yuriy Garin from comment #119) Forgot to add note: $ uname -a Linux ... 6.1.0-arch1-1 #1 SMP PREEMPT_DYNAMIC Wed, 14 Dec 2022 04:55:09 +0000 x86_64 GNU/Linux
Created attachment 303441 [details] debug patch log - page, folio and ref count - #2 Today is a lucky day. After two weeks of waiting I've got this GC problem second time. It's on different inode, but page-mapping is the same.
(In reply to Guido from comment #113) > (In reply to Jaegeuk Kim from comment #112) > > Now I'm trying another solution: I used fstransform to format the partition > and upgrade the filesystem to f2fs 1.15. So now I'm testing kernel 6.1. If > it work, well. If not, I'll try you kernel version. > > Maybe other users can test your kernel early. After 1 month the problem is again here wing kernel 6.1.6...
I took the liberty of raising the importance of the bug because it renders the operating system unusable. I have not set 'blocking' only because not all users are affected. In any case, my experiment of reformatting the partition did not eliminate the problem and shows that it is probably more common than a corner case. The 5.15 LTS kernel will go EoF in October, I hope the bug will be fixed by then. Aside from that, I wonder if an analysis of the differences between the 5.17 and 5.18 kernels shows where the problem lies. I don't have the expertise to do that.
Can I ask to other reporters what distro they use? I use manjaro but the problem occurs also with archlinux kernel. Maybe it's related to CONFIG_F2FS_UNFAIR_RWSEM=y ?
The only way to find the issue is by doing a bisect. It's a long operation, but in the time we spent commenting, we would have found it already.
(In reply to Guido from comment #125) > Can I ask to other reporters what distro they use? Gentoo Linux > Maybe it's related to CONFIG_F2FS_UNFAIR_RWSEM=y ? Don't think so. My config: CONFIG_F2FS_FS=y CONFIG_F2FS_STAT_FS=y CONFIG_F2FS_FS_XATTR=y CONFIG_F2FS_FS_POSIX_ACL=y CONFIG_F2FS_FS_SECURITY=y # CONFIG_F2FS_CHECK_FS is not set # CONFIG_F2FS_FAULT_INJECTION is not set CONFIG_F2FS_FS_COMPRESSION=y CONFIG_F2FS_FS_LZO=y CONFIG_F2FS_FS_LZORLE=y CONFIG_F2FS_FS_LZ4=y CONFIG_F2FS_FS_LZ4HC=y CONFIG_F2FS_FS_ZSTD=y CONFIG_F2FS_IOSTAT=y # CONFIG_F2FS_UNFAIR_RWSEM is not set (In reply to Matteo Croce from comment #126) > The only way to find the issue is by doing a bisect. Bisecting this is impossible: There are 16205 commits between 5.17 and 5.18. To make sure you're bug free you would need to test each commit for maround 2 months. This means one would need 2 years and 4 months to bisect this (worst case scenario).
(In reply to Thomas from comment #127) > Bisecting this is impossible: There are 16205 commits between 5.17 and 5.18. Well, we need to check only the commits related to F2FS between the last 5.17.x and the first 5.18.
> Bisecting this is impossible: There are 16205 commits between 5.17 and 5.18. This will take roughly 14 steps. Long but not impossible.
(In reply to Matteo Croce from comment #129) > > Bisecting this is impossible: There are 16205 commits between 5.17 and > 5.18. > > This will take roughly 14 steps. Long but not impossible. Exactly: 14 steps * 2 months = 28 months = 2 years and 4 months. This ofc assumes you're bisecting 24/7...
Re Comment #122, By any chance, could you add a code to print "page->mapping->host->i_ino" if page->mapping->host exists, and the status of PageUptodate(page)? When GC tries to move the valid block, if the block was truncated and somehow MM gives a stale page, we may hit a loop? How about this to report the error to GC? GC will skip this migration and will do it later or skip it, if the block was really truncated. --- a/fs/f2fs/data.c +++ b/fs/f2fs/data.c @@ -1325,18 +1325,14 @@ struct page *f2fs_get_lock_data_page(struct inode *inode, pgoff_t index, { struct address_space *mapping = inode->i_mapping; struct page *page; -repeat: + page = f2fs_get_read_data_page(inode, index, 0, for_write, NULL); if (IS_ERR(page)) return page; /* wait for read completion */ lock_page(page); - if (unlikely(page->mapping != mapping)) { - f2fs_put_page(page, 1); - goto repeat; - } - if (unlikely(!PageUptodate(page))) { + if (unlikely(page->mapping != mapping || !PageUptodate(page))) { f2fs_put_page(page, 1); return ERR_PTR(-EIO); }
(In reply to Jaegeuk Kim from comment #131) > Re Comment #122, > > By any chance, could you add a code to print "page->mapping->host->i_ino" if > page->mapping->host exists, and the status of PageUptodate(page)? > > When GC tries to move the valid block, if the block was truncated and > somehow MM gives a stale page, we may hit a loop? > > How about this to report the error to GC? GC will skip this migration and > will do it later or skip it, if the block was really truncated. > > --- a/fs/f2fs/data.c > +++ b/fs/f2fs/data.c > @@ -1325,18 +1325,14 @@ struct page *f2fs_get_lock_data_page(struct inode > *inode, pgoff_t index, > { > struct address_space *mapping = inode->i_mapping; > struct page *page; > -repeat: > + > page = f2fs_get_read_data_page(inode, index, 0, for_write, NULL); > if (IS_ERR(page)) > return page; > > /* wait for read completion */ > lock_page(page); > - if (unlikely(page->mapping != mapping)) { > - f2fs_put_page(page, 1); > - goto repeat; > - } > - if (unlikely(!PageUptodate(page))) { > + if (unlikely(page->mapping != mapping || !PageUptodate(page))) { > f2fs_put_page(page, 1); > return ERR_PTR(-EIO); > } I want to try this patch later. Does the patch try to solve the problem, or does it only serve to produce a log?
I tried to apply the patch on 6.2 but it failed because the repeat is missing @1328.
Well lines got shifted a bit. It's now #1336 instead of #1325.
(In reply to bogdan.nicolae from comment #134) > Well lines got shifted a bit. It's now #1336 instead of #1325. Yes, in meantime I corrected the patch, I'm building the kernel now.
OK, I am testing the new kernel. I tried the script to force the GC and noticed that on the root partition it occupies 10%, while on the home partition the cpu occupation was almost negligible (0.7-1%). The process finished without any problems on all partitions. I will keep you updated of any problems in the coming days.
@Guido: any news? Did it work? I did't see any issues with this patch so far.
(In reply to bogdan.nicolae from comment #137) > @Guido: any news? Did it work? I did't see any issues with this patch so far. For me too, so far so good, but I think we still have to wait to be sure. Anyway I am beginning to have hope that the bug will be fixed with this patch.
I have been using the kernel with this patch for a month now and so far no problems. Out of superstition (I am Italian!), I'm afraid to say that the bug is fixed, but it seems plausible
Cook, it seems no reason not to merge this patch. Thanks,
Today I forced the gc on all partitions. No problem at all.
I've reviewed the refcount of the path and found one suspicious routine when handling page->private. By any chance, can we try this patch instead of the above workaround? https://lore.kernel.org/lkml/20230405204321.2056498-1-jaegeuk@kernel.org/T/#u
(In reply to Jaegeuk Kim from comment #142) > I've reviewed the refcount of the path and found one suspicious routine when > handling page->private. > > By any chance, can we try this patch instead of the above workaround? > > https://lore.kernel.org/lkml/20230405204321.2056498-1-jaegeuk@kernel.org/T/#u What kernel version? 6.3 RC5?
You can apply it to any kernel version that you're able to build. Let me know if there's a merge conflict.
I'm not apre to patch 6.2.9, I receive error for hunk #2 in both data.c and f2fs.c, I tried to change the patch entry point but it fails. Can you help me?
By any chance, does this work? This is the backport to 6.1. https://github.com/jaegeuk/f2fs-stable/commit/a0ba9030bd28c01b3e308499df5daec94414f4fb
Ok, I prepared the patches in v6.2. https://github.com/jaegeuk/f2fs-stable/commits/linux-6.2.y Please apply *two* patches on top of the tree.
Thank you, I'm building 6.2.10 with both patches and I will try it in next days/weeks
The build process fails but not on f2fs (it fails on a driver for some reason). Is there a way to build only the patched f2fs module against the stock kernel?
Created attachment 304096 [details] build error Ok, I found how to in documentation, but I receive errors during build (see attache build.log)
Thanks. I found one mistake in the previous backport of first patch. Could you please re-download them? https://github.com/jaegeuk/f2fs-stable/commits/linux-6.2.y
Done. I built it against my current kernel (6.2.7), then rebuild the initramfs and reboot the system. Then I forced gc with a script and it works without problems. I will test this kernel in next days and weeks. Hope other people can do the same.
I applied the patch on the latest archlinux kernel (6.2.10-arch1 https://github.com/pikatenor/linux/tree/archlinux-6.2.10-f2fs) and tried it, but f2fs_gc still hangs around 2 hours after boot. [ 0.000000] Linux version 6.2.10-arch1-1-test-507874-g453da3ddc42a (linux-test@archlinux) (gcc (GCC) 12.2.1 20230201, GNU ld (GNU Binutils) 2.40) #1 SMP PREEMPT_DYNAMIC Tue, 11 Apr 2023 16:26:44 +0000 [ 0.000000] Command line: initrd=\initramfs-linux-test.img cryptdevice=UUID=b5b188ee-8355-4638-b192-111ee6371c79:Homie root=UUID=ca2eb962-9af0-4d5c-869d-9c1916f32a2e rw quiet i915.enable_psr=0 [ 9584.264309] INFO: task f2fs_ckpt-259:4:213 blocked for more than 122 seconds. [ 9584.264313] Tainted: G U 6.2.10-arch1-1-test-507874-g453da3ddc42a #1 [ 9584.264314] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 9584.264315] task:f2fs_ckpt-259:4 state:D stack:0 pid:213 ppid:2 flags:0x00004000 [ 9584.264318] Call Trace: [ 9584.264319] <TASK> [ 9584.264321] __schedule+0x3c8/0x12e0 [ 9584.264326] ? select_task_rq_fair+0x16c/0x1c00 [ 9584.264329] ? update_load_avg+0x7e/0x780 [ 9584.264332] schedule+0x5e/0xd0 [ 9584.264333] rwsem_down_write_slowpath+0x329/0x700 [ 9584.264338] ? __pfx_issue_checkpoint_thread+0x10/0x10 [f2fs 137a18329c9b4a66b7d5836126aee7155321bd82] [ 9584.264366] __checkpoint_and_complete_reqs+0x7a/0x1b0 [f2fs 137a18329c9b4a66b7d5836126aee7155321bd82] [ 9584.264390] ? __pfx_issue_checkpoint_thread+0x10/0x10 [f2fs 137a18329c9b4a66b7d5836126aee7155321bd82] [ 9584.264411] issue_checkpoint_thread+0x4c/0x110 [f2fs 137a18329c9b4a66b7d5836126aee7155321bd82] [ 9584.264433] ? __pfx_autoremove_wake_function+0x10/0x10 [ 9584.264437] kthread+0xdb/0x110 [ 9584.264438] ? __pfx_kthread+0x10/0x10 [ 9584.264440] ret_from_fork+0x29/0x50 [ 9584.264445] </TASK> [ 9584.264508] INFO: task kworker/u16:2:19587 blocked for more than 122 seconds. [ 9584.264509] Tainted: G U 6.2.10-arch1-1-test-507874-g453da3ddc42a #1 [ 9584.264510] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 9584.264510] task:kworker/u16:2 state:D stack:0 pid:19587 ppid:2 flags:0x00004000 [ 9584.264514] Workqueue: writeback wb_workfn (flush-259:0) [ 9584.264517] Call Trace: [ 9584.264518] <TASK> [ 9584.264519] __schedule+0x3c8/0x12e0 [ 9584.264521] ? ttwu_queue_wakelist+0xef/0x110 [ 9584.264524] ? try_to_wake_up+0xd9/0x540 [ 9584.264527] schedule+0x5e/0xd0 [ 9584.264528] schedule_timeout+0x151/0x160 [ 9584.264531] wait_for_completion+0x8a/0x160 [ 9584.264534] f2fs_issue_checkpoint+0x11f/0x200 [f2fs 137a18329c9b4a66b7d5836126aee7155321bd82] [ 9584.264558] f2fs_balance_fs_bg+0x12e/0x390 [f2fs 137a18329c9b4a66b7d5836126aee7155321bd82] [ 9584.264582] f2fs_write_node_pages+0x78/0x240 [f2fs 137a18329c9b4a66b7d5836126aee7155321bd82] [ 9584.264606] do_writepages+0xc1/0x1d0 [ 9584.264610] __writeback_single_inode+0x3d/0x360 [ 9584.264614] writeback_sb_inodes+0x1ed/0x4a0 [ 9584.264618] __writeback_inodes_wb+0x4c/0xf0 [ 9584.264621] wb_writeback+0x204/0x2f0 [ 9584.264625] wb_workfn+0x354/0x4f0 [ 9584.264627] ? ttwu_queue_wakelist+0xef/0x110 [ 9584.264630] process_one_work+0x1c5/0x3c0 [ 9584.264633] worker_thread+0x51/0x390 [ 9584.264636] ? __pfx_worker_thread+0x10/0x10 [ 9584.264638] kthread+0xdb/0x110 [ 9584.264639] ? __pfx_kthread+0x10/0x10 [ 9584.264641] ret_from_fork+0x29/0x50 [ 9584.264645] </TASK>
Could you please reapply and test three patches here again? https://github.com/jaegeuk/f2fs-stable/commits/linux-6.2.y
(In reply to Jaegeuk Kim from comment #154) > Could you please reapply and test three patches here again? > > https://github.com/jaegeuk/f2fs-stable/commits/linux-6.2.y I see only two patches now.
I fetched the archlinux kernel (https://github.com/archlinux/linux/tree/v6.2.10-arch1) and rebased f2fs-stable onto it, so if the pre-existing stable tree did not contain that (third) patch, I applied only two patches. (In reply to Jaegeuk Kim from comment #154) > Could you please reapply and test three patches here again? Are you referring to the patch in comment #131?
Sorry, I found some issues in the original patches. Could you try two patches now on top of the tree? https://github.com/jaegeuk/f2fs-stable/commits/linux-6.2.y
Thanks, I am now trying it out and it seems working fine with my root partition mounted using background_gc=on. https://github.com/pikatenor/linux/commits/archlinux-6.2.10-f2fs2 I will continue to use it for a while and let you know how it turns out.
I too patched (this time using kernel 6.2.10). I also ran the script to force gc. I will use this kernel in the coming weeks.
After several weeks, no problem. I also foced gc now with no problem. Now I would like to swith to kernel 6.3, what patch I should use?
From Linus tree, could you please try this patch which was merged in 6.4-rc1? https://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git/commit/?id=635a52da8605e5d300ec8c18fdba8d6f8491755d
I'll try ASAP. I tried to patch 6.3.1 with the patches for 6.2.x but fails saying they are already in place. Seeing the code it seems so.
To be clear: should I try the patch merged in 6.4-rc1 to 6.3.1 kernel? If so, I prefer to try the kernel 6.4-rc1 instead, with that patche already in place.
Yup, 6.4-rc1 should have all patches, which is worth giving it a try.
Thank you, for now I'm trying linux-next-git 20230504.r0.g145e5cddfe8b-1 from AUR, it should have the patch already applied.
I've using 6.3.1 with 6.4-rc1 patch for a few days now and with no extra gc parameters, f2fs_gc-8:1 starts using 17.8% of cpu and basically the system becames unusable, can't open anything etc like a soft freeze, but after setting background_gc=sync (although it might be not ideal), it did not happen again, i hope this extra information helps Jaegeuk, love this filesystem. Regards
Matias, you saw the issue with the f2fs updates in 6.4-rc1, right? If so, we may need to consider [1] back.. [1] https://github.com/jaegeuk/f2fs/commit/400dc2a4d7ec96a1fc4168652a0862e7edab3671
Removed background_gc=sync and it happened again, i hope this message gets sent so you could take a look, this is the journalctl log after it happens. Kernel: 6.3.1 with f2fs updates of 6.4-rc1 May 05 20:13:44 cachyos-x8664 kernel: INFO: task f2fs_ckpt-8:1:204 blocked for more than 122 seconds. May 05 20:13:44 cachyos-x8664 kernel: Not tainted 6.3.1-1-cachyos #1 May 05 20:13:44 cachyos-x8664 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. May 05 20:13:44 cachyos-x8664 kernel: task:f2fs_ckpt-8:1 state:D stack:0 pid:204 ppid:2 flags:0x00004000 May 05 20:13:44 cachyos-x8664 kernel: Call Trace: May 05 20:13:44 cachyos-x8664 kernel: <TASK> May 05 20:13:44 cachyos-x8664 kernel: __schedule+0x441/0x17b0 May 05 20:13:44 cachyos-x8664 kernel: ? asm_sysvec_apic_timer_interrupt+0x1a/0x20 May 05 20:13:44 cachyos-x8664 kernel: schedule_preempt_disabled+0x65/0xe0 May 05 20:13:44 cachyos-x8664 kernel: rwsem_down_write_slowpath+0x22b/0x6c0 May 05 20:13:44 cachyos-x8664 kernel: ? psi_task_switch+0x12f/0x340 May 05 20:13:44 cachyos-x8664 kernel: ? __pfx_issue_checkpoint_thread+0x10/0x10 [f2fs d2333fc34706e39c1a83271e8b382b177aae887d] May 05 20:13:44 cachyos-x8664 kernel: down_write+0x5b/0x60 May 05 20:13:44 cachyos-x8664 kernel: __checkpoint_and_complete_reqs+0x7c/0x1b0 [f2fs d2333fc34706e39c1a83271e8b382b177aae887d] May 05 20:13:44 cachyos-x8664 kernel: ? __pfx_issue_checkpoint_thread+0x10/0x10 [f2fs d2333fc34706e39c1a83271e8b382b177aae887d] May 05 20:13:44 cachyos-x8664 kernel: issue_checkpoint_thread+0x4c/0x110 [f2fs d2333fc34706e39c1a83271e8b382b177aae887d] May 05 20:13:44 cachyos-x8664 kernel: ? __pfx_autoremove_wake_function+0x10/0x10 May 05 20:13:44 cachyos-x8664 kernel: kthread+0xdb/0x110 May 05 20:13:44 cachyos-x8664 kernel: ? __pfx_kthread+0x10/0x10 May 05 20:13:44 cachyos-x8664 kernel: ret_from_fork+0x29/0x50 May 05 20:13:44 cachyos-x8664 kernel: </TASK> May 05 20:13:44 cachyos-x8664 kernel: INFO: task kworker/u16:0:5392 blocked for more than 122 seconds. May 05 20:13:44 cachyos-x8664 kernel: Not tainted 6.3.1-1-cachyos #1 May 05 20:13:44 cachyos-x8664 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. May 05 20:13:44 cachyos-x8664 kernel: task:kworker/u16:0 state:D stack:0 pid:5392 ppid:2 flags:0x00004000 May 05 20:13:44 cachyos-x8664 kernel: Workqueue: writeback wb_workfn (flush-8:0) May 05 20:13:44 cachyos-x8664 kernel: Call Trace: May 05 20:13:44 cachyos-x8664 kernel: <TASK> May 05 20:13:44 cachyos-x8664 kernel: __schedule+0x441/0x17b0 May 05 20:13:44 cachyos-x8664 kernel: ? blk_mq_submit_bio+0x396/0x760 May 05 20:13:44 cachyos-x8664 kernel: ? ttwu_queue_wakelist+0xef/0x110 May 05 20:13:44 cachyos-x8664 kernel: schedule+0x5e/0xd0 May 05 20:13:44 cachyos-x8664 kernel: schedule_timeout+0x329/0x390 May 05 20:13:44 cachyos-x8664 kernel: ? autoremove_wake_function+0x32/0x60 May 05 20:13:44 cachyos-x8664 kernel: wait_for_completion+0x86/0x160 May 05 20:13:44 cachyos-x8664 kernel: f2fs_issue_checkpoint+0x11f/0x200 [f2fs d2333fc34706e39c1a83271e8b382b177aae887d] May 05 20:13:44 cachyos-x8664 kernel: f2fs_balance_fs_bg+0x12e/0x3b0 [f2fs d2333fc34706e39c1a83271e8b382b177aae887d] May 05 20:13:44 cachyos-x8664 kernel: f2fs_write_node_pages+0x85/0xa00 [f2fs d2333fc34706e39c1a83271e8b382b177aae887d] May 05 20:13:44 cachyos-x8664 kernel: ? __pfx_ata_scsi_rw_xlat+0x10/0x10 May 05 20:13:44 cachyos-x8664 kernel: ? ata_qc_issue+0x138/0x270 May 05 20:13:44 cachyos-x8664 kernel: ? ata_scsi_queuecmd+0xe4/0x170 May 05 20:13:44 cachyos-x8664 kernel: ? select_task_rq_fair+0x15d/0x2880 May 05 20:13:44 cachyos-x8664 kernel: ? __pfx_f2fs_write_node_pages+0x10/0x10 [f2fs d2333fc34706e39c1a83271e8b382b177aae887d] May 05 20:13:44 cachyos-x8664 kernel: do_writepages+0x8c/0x610 May 05 20:13:44 cachyos-x8664 kernel: ? blk_mq_do_dispatch_sched+0xa7/0x3c0 May 05 20:13:44 cachyos-x8664 kernel: ? _flat_send_IPI_mask+0x1f/0x30 May 05 20:13:44 cachyos-x8664 kernel: ? ttwu_queue_wakelist+0xef/0x110 May 05 20:13:44 cachyos-x8664 kernel: ? try_to_wake_up+0xd9/0xcb0 May 05 20:13:44 cachyos-x8664 kernel: __writeback_single_inode+0x3d/0x360 May 05 20:13:44 cachyos-x8664 kernel: writeback_sb_inodes+0x1ed/0x530 May 05 20:13:44 cachyos-x8664 kernel: ? __wake_up+0x8b/0xc0 May 05 20:13:44 cachyos-x8664 kernel: __writeback_inodes_wb+0x4c/0xf0 May 05 20:13:44 cachyos-x8664 kernel: wb_writeback+0x1fe/0x390 May 05 20:13:44 cachyos-x8664 kernel: wb_workfn+0x412/0x600 May 05 20:13:44 cachyos-x8664 kernel: ? __schedule+0x449/0x17b0 May 05 20:13:44 cachyos-x8664 kernel: process_one_work+0x24b/0x460 May 05 20:13:44 cachyos-x8664 kernel: worker_thread+0x55/0x4f0 May 05 20:13:44 cachyos-x8664 kernel: ? __pfx_worker_thread+0x10/0x10 May 05 20:13:44 cachyos-x8664 kernel: kthread+0xdb/0x110 May 05 20:13:44 cachyos-x8664 kernel: ? __pfx_kthread+0x10/0x10 May 05 20:13:44 cachyos-x8664 kernel: ret_from_fork+0x29/0x50
(In reply to Jaegeuk Kim from comment #167) > Matias, you saw the issue with the f2fs updates in 6.4-rc1, right? If so, we > may need to consider [1] back.. > > [1] > https://github.com/jaegeuk/f2fs/commit/ > 400dc2a4d7ec96a1fc4168652a0862e7edab3671 Since rc1 got released today, i'll try again to see if this issue cames back. maybe it was just a regression with 6.3.x kernel but we'll see
Since posting comment #158, I have been using the patched 6.2.10 kernel for a while. Initially it seemed stable, but in the last few days the problem has recurred - again f2fs_gc occupies 100% of the CPU core and blocks other kernel tasks. I am going to switch to the 6.4-rc1 kernel from now on, however I suspect that this bug has probably not been fully fixed. If some kind of patch or logs are needed for debugging, please let me know so I can help out.
All ok here with kernel 6.4 since May, 5th. My mount options: /dev/nvme0n1p3 on / type f2fs (rw,noatime,lazytime,background_gc=on,gc_merge,nodiscard,no_heap,user_xattr,inline_xattr,acl,inline_data,inline_dentry,flush_merge,barrier,extent_cache,mode=adaptive,active_logs=6,alloc_mode=default,checkpoint_merge,fsync_mode=posix,atgc,memory=normal) /dev/nvme0n1p4 on /home type f2fs (rw,noatime,lazytime,background_gc=on,gc_merge,nodiscard,no_heap,user_xattr,inline_xattr,acl,inline_data,inline_dentry,flush_merge,barrier,extent_cache,mode=adaptive,active_logs=6,alloc_mode=default,checkpoint_merge,fsync_mode=posix,atgc,memory=normal) /dev/nvme1n1 on /run/media/guido/nvme1 type f2fs (rw,noatime,lazytime,background_gc=on,gc_merge,nodiscard,no_heap,user_xattr,inline_xattr,acl,inline_data,inline_dentry,flush_merge,barrier,extent_cache,mode=adaptive,active_logs=6,alloc_mode=default,checkpoint_merge,fsync_mode=posix,compress_algorithm=zstd:6,compress_log_size=2,compress_chksum,compress_mode=fs,compress_cache,atgc,memory=normal)
Hi all, I'm by no means a super power user but I ran into this issue this week. I've updated my kernel to 6.3.6 and the problem appears. In the past I had them as well but I don't remember the kernel versions because it was fixed pretty quick. My mount options are the same as Guido's. I've tried several mount options but I had no luck on 6.3.6 Downgraded the kernel to 6.3.1 and the problem disappears. If you need more information please let me know.
Hi, This issue often occurs in my system when the disk is around 80% full. When the disk is less full I get much more uptime before this issue comes or would not even occur sometimes. $ uname -a Linux ... 6.4.2-arch1-1 #1 SMP PREEMPT_DYNAMIC Thu, 06 Jul 2023 18:35:54 +0000 x86_64 GNU/Linux
Thank you for the reports. I still suspect something happening in page cache tho, as a safeguard, let me try to apply [1] to -next and -stable branches. [1] https://github.com/jaegeuk/f2fs/commit/400dc2a4d7ec96a1fc4168652a0862e7edab3671
Still happens on my system with 6.4.x kernels after about 4 days (today the second time in the last two weeks): # uname -a Linux pavo 6.4.6-arch1-1 #1 SMP PREEMPT_DYNAMIC Mon, 24 Jul 2023 20:19:38 +0000 x86_64 GNU/Linux # uptime 12:49:15 up 4 days, 1:38, 14 users, load average: 6.09, 6.11, 6.09 # ps afux | grep f2fs root 421 0.0 0.0 0 0 ? D Jul26 0:06 \_ [f2fs_ckpt-254:0] root 422 0.0 0.0 0 0 ? S Jul26 0:00 \_ [f2fs_flush-254:0] root 423 0.0 0.0 0 0 ? S Jul26 0:00 \_ [f2fs_discard-254:0] root 424 6.8 0.0 0 0 ? R Jul26 399:26 \_ [f2fs_gc-254:0] root 714 0.0 0.0 0 0 ? S Jul26 0:02 \_ [f2fs_ckpt-254:1] root 715 0.0 0.0 0 0 ? S Jul26 0:00 \_ [f2fs_flush-254:1] root 716 0.0 0.0 0 0 ? S Jul26 0:00 \_ [f2fs_discard-254:1] root 725 0.0 0.0 0 0 ? S Jul26 0:09 \_ [f2fs_gc-254:1] # cat /sys/kernel/debug/f2fs/status =====[ partition info(dm-0). #0, RW, CP: Good]===== [SBI: fs_dirty] [SB: 1] [CP: 2] [SIT: 4] [NAT: 116] [SSA: 78] [MAIN: 39727(OverProv:569 Resv:289)] Current Time Sec: 352033 / Mounted Time Sec: 13 Policy: - IPU: [ FSYNC ] Utilization: 31% (6380179 valid blocks, 13105165 discard blocks) - Node: 204879 (Inode: 199852, Other: 5027) - Data: 6175300 - Inline_xattr Inode: 44851 - Inline_data Inode: 29674 - Inline_dentry Inode: 2151 - Compressed Inode: 0, Blocks: 0 - Swapfile Inode: 0 - Orphan/Append/Update Inode: 3, 0, 0 Main area: 39727 segs, 39727 secs 39727 zones TYPE segno secno zoneno dirty_seg full_seg valid_blk - COLD data: 27547 27547 27547 717 8794 4862994 - WARM data: 27570 27570 27570 41 2263 1171156 - HOT data: 326 326 326 22 215 120280 - Dir dnode: 61 61 61 27 2 14475 - File dnode: 310 310 310 350 28 190066 - Indir nodes: 261 261 261 1 0 333 - Pinned file: -1 -1 -1 - ATGC data: -1 -1 -1 - Valid: 11308 - Dirty: 1152 - Prefree: 1 - Free: 27266 (27266) CP calls: 5068 (BG: 4847) - cp blocks : 21743 - sit blocks : 27307 - nat blocks : 284434 - ssa blocks : 4982 CP merge: - Queued : 3 - Issued : 5069 - Total : 5069 - Cur time : 5(ms) - Peak time : 355(ms) GC calls: 4479 (BG: 4480) - data segments : 1682 (0) - node segments : 2797 (0) - Reclaimed segs : - Normal : 4479 - Idle CB : 0 - Idle Greedy : 0 - Idle AT : 0 - Urgent High : 0 - Urgent Mid : 0 - Urgent Low : 0 Try to move 2023304 blocks (BG: 0) - data blocks : 686401 (0) - node blocks : 1336903 (0) BG skip : IO: 843, Other: 8 Extent Cache (Read): - Hit Count: L1-1:104104223097 L1-2:40279 L2:19420 - Hit Ratio: 99% (104104282796 / 104107306835) - Inner Struct Count: tree: 33955(0), node: 160 Extent Cache (Block Age): - Allocated Data Blocks: 1147983 - Hit Count: L1:0 L2:0 - Hit Ratio: 0% (0 / 0) - Inner Struct Count: tree: 0(0), node: 0 Balancing F2FS Async: - DIO (R: 0, W: 0) - IO_R (Data: 0, Node: 0, Meta: 0 - IO_W (CP: 0, Data: 0, Flush: ( 0 3114 1), Discard: ( 0 7852)) cmd: 1954 undiscard:4539 - atomic IO: 0 (Max. 0) - compress: 0, hit: 0 - nodes: 63 in 27008 - dents: 0 in dirs: 0 ( 4) - datas: 30 in files: 0 - quota datas: 0 in quota files: 0 - meta: 26 in 1913 - imeta: 2 - fsync mark: 389 - NATs: 44/ 878 - SITs: 135/ 39727 - free_nids: 3161/ 13306775 - alloc_nids: 0 Distribution of User Blocks: [ valid | invalid | free ] [---------------|-|----------------------------------] IPU: 2095 blocks SSR: 0 blocks in 0 segments LFS: 2563874 blocks in 5008 segments BDF: 99, avg. vblocks: 494 Memory: 130282 KB - static: 11343 KB - cached all: 3254 KB - read extent cache: 2929 KB - block age extent cache: 0 KB - paged : 115684 KB =====[ partition info(dm-1). #1, RW, CP: Good]===== [SBI: fs_dirty need_fsck] [SB: 1] [CP: 2] [SIT: 32] [NAT: 88] [SSA: 876] [MAIN: 447237(OverProv:1897 Resv:960)] Current Time Sec: 352033 / Mounted Time Sec: 20 Policy: - IPU: [ FSYNC ] Utilization: 60% (137479434 valid blocks, 83700313 discard blocks) - Node: 796398 (Inode: 673288, Other: 123110) - Data: 136683036 - Inline_xattr Inode: 512050 - Inline_data Inode: 207070 - Inline_dentry Inode: 38713 - Compressed Inode: 0, Blocks: 0 - Swapfile Inode: 0 - Orphan/Append/Update Inode: 0, 0, 0 Main area: 447237 segs, 447237 secs 447237 zones TYPE segno secno zoneno dirty_seg full_seg valid_blk - COLD data: 270602 270602 270602 3627 124609 64920639 - WARM data: 269915 269915 269915 1012 138752 71272348 - HOT data: 2485 2485 2485 84 940 490021 - Dir dnode: 2507 2507 2507 72 71 63227 - File dnode: 2511 2511 2511 642 988 731964 - Indir nodes: 2380 2380 2380 21 1 1204 - Pinned file: -1 -1 -1 - ATGC data: -1 -1 -1 - Valid: 265367 - Dirty: 5452 - Prefree: 0 - Free: 176418 (176418) CP calls: 4575 (BG: 6476) - cp blocks : 18577 - sit blocks : 11759 - nat blocks : 6361 - ssa blocks : 15209 CP merge: - Queued : 0 - Issued : 7154 - Total : 7155 - Cur time : 4(ms) - Peak time : 517(ms) GC calls: 4109 (BG: 4109) - data segments : 4103 (0) - node segments : 6 (0) - Reclaimed segs : - Normal : 4109 - Idle CB : 0 - Idle Greedy : 0 - Idle AT : 0 - Urgent High : 0 - Urgent Mid : 0 - Urgent Low : 0 Try to move 6 blocks (BG: 0) - data blocks : 0 (0) - node blocks : 6 (0) BG skip : IO: 1617, Other: 3 Extent Cache (Read): - Hit Count: L1-1:78346 L1-2:808 L2:62 - Hit Ratio: 6% (79216 / 1314997) - Inner Struct Count: tree: 471059(0), node: 3 Extent Cache (Block Age): - Allocated Data Blocks: 7759701 - Hit Count: L1:0 L2:0 - Hit Ratio: 0% (0 / 0) - Inner Struct Count: tree: 0(0), node: 0 Balancing F2FS Async: - DIO (R: 0, W: 0) - IO_R (Data: 0, Node: 0, Meta: 0 - IO_W (CP: 0, Data: 0, Flush: ( 0 1274 1), Discard: ( 0 0)) cmd: 10381 undiscard:7805628 - atomic IO: 0 (Max. 0) - compress: 0, hit: 0 - nodes: 13 in 107925 - dents: 1 in dirs: 1 ( 6) - datas: 30 in files: 0 - quota datas: 0 in quota files: 0 - meta: 6 in 9440 - imeta: 4 - fsync mark: 0 - NATs: 6/ 63316 - SITs: 8/ 447237 - free_nids: 3843/ 9453836 - alloc_nids: 0 Distribution of User Blocks: [ valid | invalid | free ] [------------------------------|-|-------------------] IPU: 0 blocks SSR: 0 blocks in 0 segments LFS: 7790262 blocks in 15215 segments BDF: 99, avg. vblocks: 295 Memory: 620142 KB - static: 106987 KB - cached all: 43694 KB - read extent cache: 40481 KB - block age extent cache: 0 KB - paged : 469460 KB # dmesg # (only F2FS relevant parts) [ 0.000000] microcode: updated early: 0x25 -> 0x26, date = 2019-11-12 [ 0.000000] Linux version 6.4.6-arch1-1 (linux@archlinux) (gcc (GCC) 13.1.1 20230714, GNU ld (GNU Binutils) 2.40.0) #1 SMP PREEMPT_DYNAMIC Mon, 24 Jul 2023 20:19:38 +0000 ... [ 6.988098] device-mapper: uevent: version 1.0.3 [ 6.988195] device-mapper: ioctl: 4.48.0-ioctl (2023-03-01) initialised: dm-devel@redhat.com [ 7.023553] Key type encrypted registered [ 13.454117] F2FS-fs (dm-0): Mounted with checkpoint version = 424a2306 [ 13.639655] systemd[1]: systemd 253.7-1-arch running in system mode (+PAM +AUDIT -SELINUX -APPARMOR -IMA +SMACK +SECCOMP +GCRYPT +GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS +FIDO2 +IDN2 -IDN +IPTC +KMOD +LIBCRYPTSETUP +LIBFDISK +PCRE2 -PWQUALITY +P11KIT -QRENCODE +TPM2 +BZIP2 +LZ4 +XZ +ZLIB +ZSTD +BPF_FRAMEWORK +XKBCOMMON +UTMP -SYSVINIT default-hierarchy=unified) ... [ 20.654961] F2FS-fs (dm-1): Mounted with checkpoint version = 4bb49ae6 ... [327726.678181] INFO: task f2fs_ckpt-254:0:421 blocked for more than 122 seconds. [327726.678191] Not tainted 6.4.6-arch1-1 #1 [327726.678194] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [327726.678196] task:f2fs_ckpt-254:0 state:D stack:0 pid:421 ppid:2 flags:0x00004000 [327726.678205] Call Trace: [327726.678207] <TASK> [327726.678212] __schedule+0x3e8/0x1410 [327726.678221] ? sysvec_apic_timer_interrupt+0xe/0x90 [327726.678230] schedule+0x5e/0xd0 [327726.678235] schedule_preempt_disabled+0x15/0x30 [327726.678239] rwsem_down_write_slowpath+0x203/0x690 [327726.678250] ? __pfx_issue_checkpoint_thread+0x10/0x10 [f2fs daaabf438da293fce6726f2a1db3018f0802cf41] [327726.678338] down_write+0x5b/0x60 [327726.678344] __checkpoint_and_complete_reqs+0x7c/0x1b0 [f2fs daaabf438da293fce6726f2a1db3018f0802cf41] [327726.678424] ? __pfx_issue_checkpoint_thread+0x10/0x10 [f2fs daaabf438da293fce6726f2a1db3018f0802cf41] [327726.678499] issue_checkpoint_thread+0x4c/0x110 [f2fs daaabf438da293fce6726f2a1db3018f0802cf41] [327726.678575] ? __pfx_autoremove_wake_function+0x10/0x10 [327726.678584] kthread+0xe5/0x120 [327726.678590] ? __pfx_kthread+0x10/0x10 [327726.678596] ret_from_fork+0x29/0x50 [327726.678606] </TASK> [327726.678627] INFO: task kworker/u8:6:74861 blocked for more than 122 seconds. [327726.678631] Not tainted 6.4.6-arch1-1 #1 [327726.678633] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [327726.678635] task:kworker/u8:6 state:D stack:0 pid:74861 ppid:2 flags:0x00004000 [327726.678643] Workqueue: writeback wb_workfn (flush-254:0) [327726.678651] Call Trace: [327726.678653] <TASK> [327726.678656] __schedule+0x3e8/0x1410 [327726.678661] ? ttwu_queue_wakelist+0xef/0x110 [327726.678666] ? try_to_wake_up+0xd9/0x610 [327726.678674] schedule+0x5e/0xd0 [327726.678679] schedule_timeout+0x151/0x160 [327726.678687] wait_for_completion+0x8a/0x160 [327726.678693] f2fs_issue_checkpoint+0x11f/0x200 [f2fs daaabf438da293fce6726f2a1db3018f0802cf41] [327726.678772] f2fs_balance_fs_bg+0x12e/0x390 [f2fs daaabf438da293fce6726f2a1db3018f0802cf41] [327726.678856] f2fs_write_node_pages+0x78/0x240 [f2fs daaabf438da293fce6726f2a1db3018f0802cf41] [327726.678938] do_writepages+0xcf/0x1e0 [327726.678945] __writeback_single_inode+0x3d/0x360 [327726.678951] writeback_sb_inodes+0x1ed/0x4b0 [327726.678958] __writeback_inodes_wb+0x4c/0xf0 [327726.678964] wb_writeback+0x172/0x2f0 [327726.678970] wb_workfn+0x35b/0x510 [327726.678975] ? __schedule+0x3f0/0x1410 [327726.678979] ? __mod_timer+0x11f/0x370 [327726.678987] process_one_work+0x1c4/0x3d0 [327726.678996] worker_thread+0x51/0x390 [327726.679003] ? __pfx_worker_thread+0x10/0x10 [327726.679009] kthread+0xe5/0x120 [327726.679015] ? __pfx_kthread+0x10/0x10 [327726.679020] ret_from_fork+0x29/0x50 [327726.679030] </TASK> [327849.560344] INFO: task f2fs_ckpt-254:0:421 blocked for more than 245 seconds. [327849.560354] Not tainted 6.4.6-arch1-1 #1 [327849.560357] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [327849.560359] task:f2fs_ckpt-254:0 state:D stack:0 pid:421 ppid:2 flags:0x00004000 [327849.560367] Call Trace: [327849.560370] <TASK> [327849.560374] __schedule+0x3e8/0x1410 [327849.560383] ? sysvec_apic_timer_interrupt+0xe/0x90 [327849.560391] schedule+0x5e/0xd0 [327849.560395] schedule_preempt_disabled+0x15/0x30 [327849.560399] rwsem_down_write_slowpath+0x203/0x690 [327849.560409] ? __pfx_issue_checkpoint_thread+0x10/0x10 [f2fs daaabf438da293fce6726f2a1db3018f0802cf41] [327849.560501] down_write+0x5b/0x60 [327849.560507] __checkpoint_and_complete_reqs+0x7c/0x1b0 [f2fs daaabf438da293fce6726f2a1db3018f0802cf41] [327849.560587] ? __pfx_issue_checkpoint_thread+0x10/0x10 [f2fs daaabf438da293fce6726f2a1db3018f0802cf41] [327849.560662] issue_checkpoint_thread+0x4c/0x110 [f2fs daaabf438da293fce6726f2a1db3018f0802cf41] [327849.560739] ? __pfx_autoremove_wake_function+0x10/0x10 [327849.560748] kthread+0xe5/0x120 [327849.560755] ? __pfx_kthread+0x10/0x10 [327849.560761] ret_from_fork+0x29/0x50 [327849.560773] </TASK> [327849.560797] INFO: task kworker/u8:6:74861 blocked for more than 245 seconds. [327849.560800] Not tainted 6.4.6-arch1-1 #1 [327849.560803] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [327849.560805] task:kworker/u8:6 state:D stack:0 pid:74861 ppid:2 flags:0x00004000 [327849.560813] Workqueue: writeback wb_workfn (flush-254:0) [327849.560821] Call Trace: [327849.560824] <TASK> [327849.560827] __schedule+0x3e8/0x1410 [327849.560832] ? ttwu_queue_wakelist+0xef/0x110 [327849.560839] ? try_to_wake_up+0xd9/0x610 [327849.560847] schedule+0x5e/0xd0 [327849.560852] schedule_timeout+0x151/0x160 [327849.560859] wait_for_completion+0x8a/0x160 [327849.560867] f2fs_issue_checkpoint+0x11f/0x200 [f2fs daaabf438da293fce6726f2a1db3018f0802cf41] [327849.560947] f2fs_balance_fs_bg+0x12e/0x390 [f2fs daaabf438da293fce6726f2a1db3018f0802cf41] [327849.561033] f2fs_write_node_pages+0x78/0x240 [f2fs daaabf438da293fce6726f2a1db3018f0802cf41] [327849.561115] do_writepages+0xcf/0x1e0 [327849.561123] __writeback_single_inode+0x3d/0x360 [327849.561129] writeback_sb_inodes+0x1ed/0x4b0 [327849.561138] __writeback_inodes_wb+0x4c/0xf0 [327849.561144] wb_writeback+0x172/0x2f0 [327849.561150] wb_workfn+0x35b/0x510 [327849.561156] ? __schedule+0x3f0/0x1410 [327849.561160] ? __mod_timer+0x11f/0x370 [327849.561169] process_one_work+0x1c4/0x3d0 [327849.561179] worker_thread+0x51/0x390 [327849.561187] ? __pfx_worker_thread+0x10/0x10 [327849.561194] kthread+0xe5/0x120 [327849.561199] ? __pfx_kthread+0x10/0x10 [327849.561205] ret_from_fork+0x29/0x50 [327849.561215] </TASK> [327972.442481] INFO: task f2fs_ckpt-254:0:421 blocked for more than 368 seconds. [327972.442488] Not tainted 6.4.6-arch1-1 #1 [327972.442490] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [327972.442491] task:f2fs_ckpt-254:0 state:D stack:0 pid:421 ppid:2 flags:0x00004000 [327972.442497] Call Trace: [327972.442499] <TASK> [327972.442503] __schedule+0x3e8/0x1410 [327972.442510] ? sysvec_apic_timer_interrupt+0xe/0x90 [327972.442516] schedule+0x5e/0xd0 [327972.442518] schedule_preempt_disabled+0x15/0x30 [327972.442521] rwsem_down_write_slowpath+0x203/0x690 [327972.442529] ? __pfx_issue_checkpoint_thread+0x10/0x10 [f2fs daaabf438da293fce6726f2a1db3018f0802cf41] [327972.442588] down_write+0x5b/0x60 [327972.442592] __checkpoint_and_complete_reqs+0x7c/0x1b0 [f2fs daaabf438da293fce6726f2a1db3018f0802cf41] [327972.442639] ? __pfx_issue_checkpoint_thread+0x10/0x10 [f2fs daaabf438da293fce6726f2a1db3018f0802cf41] [327972.442682] issue_checkpoint_thread+0x4c/0x110 [f2fs daaabf438da293fce6726f2a1db3018f0802cf41] [327972.442727] ? __pfx_autoremove_wake_function+0x10/0x10 [327972.442733] kthread+0xe5/0x120 [327972.442738] ? __pfx_kthread+0x10/0x10 [327972.442741] ret_from_fork+0x29/0x50 [327972.442749] </TASK> [327972.442769] INFO: task kworker/u8:6:74861 blocked for more than 368 seconds. [327972.442771] Not tainted 6.4.6-arch1-1 #1 [327972.442773] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [327972.442774] task:kworker/u8:6 state:D stack:0 pid:74861 ppid:2 flags:0x00004000 [327972.442778] Workqueue: writeback wb_workfn (flush-254:0) [327972.442785] Call Trace: [327972.442786] <TASK> [327972.442788] __schedule+0x3e8/0x1410 [327972.442791] ? ttwu_queue_wakelist+0xef/0x110 [327972.442795] ? try_to_wake_up+0xd9/0x610 [327972.442800] schedule+0x5e/0xd0 [327972.442803] schedule_timeout+0x151/0x160 [327972.442808] wait_for_completion+0x8a/0x160 [327972.442813] f2fs_issue_checkpoint+0x11f/0x200 [f2fs daaabf438da293fce6726f2a1db3018f0802cf41] [327972.442860] f2fs_balance_fs_bg+0x12e/0x390 [f2fs daaabf438da293fce6726f2a1db3018f0802cf41] [327972.442912] f2fs_write_node_pages+0x78/0x240 [f2fs daaabf438da293fce6726f2a1db3018f0802cf41] [327972.442963] do_writepages+0xcf/0x1e0 [327972.442968] __writeback_single_inode+0x3d/0x360 [327972.442971] writeback_sb_inodes+0x1ed/0x4b0 [327972.442976] __writeback_inodes_wb+0x4c/0xf0 [327972.442979] wb_writeback+0x172/0x2f0 [327972.442983] wb_workfn+0x35b/0x510 [327972.442986] ? __schedule+0x3f0/0x1410 [327972.442989] ? __mod_timer+0x11f/0x370 [327972.442994] process_one_work+0x1c4/0x3d0 [327972.443001] worker_thread+0x51/0x390 [327972.443006] ? __pfx_worker_thread+0x10/0x10 [327972.443011] kthread+0xe5/0x120 [327972.443014] ? __pfx_kthread+0x10/0x10 [327972.443018] ret_from_fork+0x29/0x50 [327972.443025] </TASK> [328095.324600] INFO: task f2fs_ckpt-254:0:421 blocked for more than 491 seconds. [328095.324611] Not tainted 6.4.6-arch1-1 #1 [328095.324614] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [328095.324616] task:f2fs_ckpt-254:0 state:D stack:0 pid:421 ppid:2 flags:0x00004000 [328095.324625] Call Trace: [328095.324628] <TASK> [328095.324634] __schedule+0x3e8/0x1410 [328095.324643] ? sysvec_apic_timer_interrupt+0xe/0x90 [328095.324652] schedule+0x5e/0xd0 [328095.324657] schedule_preempt_disabled+0x15/0x30 [328095.324662] rwsem_down_write_slowpath+0x203/0x690 [328095.324673] ? __pfx_issue_checkpoint_thread+0x10/0x10 [f2fs daaabf438da293fce6726f2a1db3018f0802cf41] [328095.324759] down_write+0x5b/0x60 [328095.324765] __checkpoint_and_complete_reqs+0x7c/0x1b0 [f2fs daaabf438da293fce6726f2a1db3018f0802cf41] [328095.324839] ? __pfx_issue_checkpoint_thread+0x10/0x10 [f2fs daaabf438da293fce6726f2a1db3018f0802cf41] [328095.324910] issue_checkpoint_thread+0x4c/0x110 [f2fs daaabf438da293fce6726f2a1db3018f0802cf41] [328095.324981] ? __pfx_autoremove_wake_function+0x10/0x10 [328095.324990] kthread+0xe5/0x120 [328095.324997] ? __pfx_kthread+0x10/0x10 [328095.325003] ret_from_fork+0x29/0x50 [328095.325013] </TASK> [328095.325035] INFO: task kworker/u8:6:74861 blocked for more than 491 seconds. [328095.325038] Not tainted 6.4.6-arch1-1 #1 [328095.325041] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [328095.325043] task:kworker/u8:6 state:D stack:0 pid:74861 ppid:2 flags:0x00004000 [328095.325051] Workqueue: writeback wb_workfn (flush-254:0) [328095.325059] Call Trace: [328095.325061] <TASK> [328095.325064] __schedule+0x3e8/0x1410 [328095.325069] ? ttwu_queue_wakelist+0xef/0x110 [328095.325076] ? try_to_wake_up+0xd9/0x610 [328095.325084] schedule+0x5e/0xd0 [328095.325088] schedule_timeout+0x151/0x160 [328095.325096] wait_for_completion+0x8a/0x160 [328095.325103] f2fs_issue_checkpoint+0x11f/0x200 [f2fs daaabf438da293fce6726f2a1db3018f0802cf41] [328095.325178] f2fs_balance_fs_bg+0x12e/0x390 [f2fs daaabf438da293fce6726f2a1db3018f0802cf41] [328095.325258] f2fs_write_node_pages+0x78/0x240 [f2fs daaabf438da293fce6726f2a1db3018f0802cf41] [328095.325335] do_writepages+0xcf/0x1e0 [328095.325342] __writeback_single_inode+0x3d/0x360 [328095.325349] writeback_sb_inodes+0x1ed/0x4b0 [328095.325356] __writeback_inodes_wb+0x4c/0xf0 [328095.325362] wb_writeback+0x172/0x2f0 [328095.325368] wb_workfn+0x35b/0x510 [328095.325374] ? __schedule+0x3f0/0x1410 [328095.325378] ? __mod_timer+0x11f/0x370 [328095.325387] process_one_work+0x1c4/0x3d0 [328095.325395] worker_thread+0x51/0x390 [328095.325403] ? __pfx_worker_thread+0x10/0x10 [328095.325410] kthread+0xe5/0x120 [328095.325415] ? __pfx_kthread+0x10/0x10 [328095.325421] ret_from_fork+0x29/0x50 [328095.325430] </TASK> [328218.206759] INFO: task f2fs_ckpt-254:0:421 blocked for more than 614 seconds. [328218.206768] Not tainted 6.4.6-arch1-1 #1 [328218.206771] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [328218.206773] task:f2fs_ckpt-254:0 state:D stack:0 pid:421 ppid:2 flags:0x00004000 [328218.206779] Call Trace: [328218.206781] <TASK> [328218.206785] __schedule+0x3e8/0x1410 [328218.206792] ? sysvec_apic_timer_interrupt+0xe/0x90 [328218.206798] schedule+0x5e/0xd0 [328218.206801] schedule_preempt_disabled+0x15/0x30 [328218.206804] rwsem_down_write_slowpath+0x203/0x690 [328218.206812] ? __pfx_issue_checkpoint_thread+0x10/0x10 [f2fs daaabf438da293fce6726f2a1db3018f0802cf41] [328218.206872] down_write+0x5b/0x60 [328218.206877] __checkpoint_and_complete_reqs+0x7c/0x1b0 [f2fs daaabf438da293fce6726f2a1db3018f0802cf41] [328218.206928] ? __pfx_issue_checkpoint_thread+0x10/0x10 [f2fs daaabf438da293fce6726f2a1db3018f0802cf41] [328218.206975] issue_checkpoint_thread+0x4c/0x110 [f2fs daaabf438da293fce6726f2a1db3018f0802cf41] [328218.207020] ? __pfx_autoremove_wake_function+0x10/0x10 [328218.207027] kthread+0xe5/0x120 [328218.207031] ? __pfx_kthread+0x10/0x10 [328218.207035] ret_from_fork+0x29/0x50 [328218.207042] </TASK> [328218.207064] INFO: task kworker/u8:6:74861 blocked for more than 614 seconds. [328218.207067] Not tainted 6.4.6-arch1-1 #1 [328218.207068] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [328218.207070] task:kworker/u8:6 state:D stack:0 pid:74861 ppid:2 flags:0x00004000 [328218.207075] Workqueue: writeback wb_workfn (flush-254:0) [328218.207081] Call Trace: [328218.207082] <TASK> [328218.207084] __schedule+0x3e8/0x1410 [328218.207087] ? ttwu_queue_wakelist+0xef/0x110 [328218.207091] ? try_to_wake_up+0xd9/0x610 [328218.207097] schedule+0x5e/0xd0 [328218.207099] schedule_timeout+0x151/0x160 [328218.207105] wait_for_completion+0x8a/0x160 [328218.207109] f2fs_issue_checkpoint+0x11f/0x200 [f2fs daaabf438da293fce6726f2a1db3018f0802cf41] [328218.207158] f2fs_balance_fs_bg+0x12e/0x390 [f2fs daaabf438da293fce6726f2a1db3018f0802cf41] [328218.207211] f2fs_write_node_pages+0x78/0x240 [f2fs daaabf438da293fce6726f2a1db3018f0802cf41] [328218.207262] do_writepages+0xcf/0x1e0 [328218.207267] __writeback_single_inode+0x3d/0x360 [328218.207271] writeback_sb_inodes+0x1ed/0x4b0 [328218.207275] __writeback_inodes_wb+0x4c/0xf0 [328218.207279] wb_writeback+0x172/0x2f0 [328218.207282] wb_workfn+0x35b/0x510 [328218.207286] ? __schedule+0x3f0/0x1410 [328218.207288] ? __mod_timer+0x11f/0x370 [328218.207294] process_one_work+0x1c4/0x3d0 [328218.207300] worker_thread+0x51/0x390 [328218.207305] ? __pfx_worker_thread+0x10/0x10 [328218.207310] kthread+0xe5/0x120 [328218.207314] ? __pfx_kthread+0x10/0x10 [328218.207318] ret_from_fork+0x29/0x50 [328218.207324] </TASK> [328218.207325] Future hung task reports are suppressed, see sysctl kernel.hung_task_warnings
After several months the bug occurred again with kernel 6.5 rc4. After rebooting with REISUB, I tried forcing GC with the usual script and it gave me no problems.
(In reply to kelak from comment #175) > Still happens on my system with 6.4.x kernels after about 4 days (today the > second time in the last two weeks): Jaegeuk has proposed a workaround solution as below, it aims to enable from 6.6-rc1, could you please have a try with it? https://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git/commit/?h=dev&id=5a47ad28e6061c8fddf5c82d49fdc3280a80b2c4
>Jaegeuk has proposed a workaround solution as below, it aims to enable from >6.6-rc1, could you please have a try with it? > >https://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git/commit/?h=dev&id=5a47ad28e6061c8fddf5c82d49fdc3280a80b2c4 I'll try to build and test a 6.4.x kernel next week which has that workaround-patch included.
(In reply to kelak from comment #178) > >Jaegeuk has proposed a workaround solution as below, it aims to enable from > >6.6-rc1, could you please have a try with it? > > > > >https://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git/commit/?h=dev&id=5a47ad28e6061c8fddf5c82d49fdc3280a80b2c4 > > > I'll try to build and test a 6.4.x kernel next week which has that > workaround-patch included. Did it work?
(In reply to Guido from comment #179) > (In reply to kelak from comment #178) > > >Jaegeuk has proposed a workaround solution as below, it aims to enable > from > > >6.6-rc1, could you please have a try with it? > > > > > > > > >https://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git/commit/?h=dev&id=5a47ad28e6061c8fddf5c82d49fdc3280a80b2c4 > > > > > > I'll try to build and test a 6.4.x kernel next week which has that > > workaround-patch included. > > Did it work? I have rebuilt the arch linux kernel 6.4.6 with the mentioned patch last weekend. I have an uptime of one week now, and the issue has not occurred yet. So it's too short to say anything definite. However, before I started the patched kernel I played around with the garbage-collect-script. I observed that After some iterations the f2fs_gc-thread took ~25-30% CPU and the number of dirty-segments stayed constant at around 10000. I stopped the garbage-collect-script, but f2fs-gc-thread CPU usage stayed at ~20-30%. I rebooted after ~15 minutes. I observed this for both kernels, the patched one and the unpatched one. I'll leave the patched kernel running for some more weeks.
The fact that dirty segments don't go below 10000 suggests that the bug is not completely fixed even with this patch. This is unfortunate because the 5.15 kernel, which is not affected by the bug, is near to EOL (October, if I remember correctly). I am not an expert, but I assume it is safer not to interrupt the script (because the gc goes on on its own anyway) or shut down the pc while the gc is running, but to do in reverse what the script does. TBH I don't remember if doing this really interrupts the gc or in any case once it enters the loop there is nothing to do. For clarity, I found the script on the Internet, I am not responsible for any damage or loss of data.