Bug 219180 - Freezing user space processes failed - FSTRIM
Summary: Freezing user space processes failed - FSTRIM
Status: RESOLVED CODE_FIX
Alias: None
Product: File System
Classification: Unclassified
Component: btrfs (show other bugs)
Hardware: All Linux
: P3 low
Assignee: BTRFS virtual assignee
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-08-19 10:26 UTC by Luca Stefani
Modified: 2024-10-13 14:43 UTC (History)
1 user (show)

See Also:
Kernel Version: 6.10.3
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Luca Stefani 2024-08-19 10:26:23 UTC
Hey all,

I've been seeing occasional failures to suspend using btrfs on Fedora.

The call trace of the failed suspend is as follows
[558775.800991] PM: suspend entry (s2idle)
[558775.857721] Filesystems sync: 0.056 seconds
[558776.023796] Freezing user space processes
[558796.027961] Freezing user space processes failed after 20.004 seconds (1 tasks refusing to freeze, wq_busy=0):
[558796.028202] task:fstrim          state:D stack:0     pid:61717 tgid:61717 ppid:1      flags:0x00004006
[558796.028214] Call Trace:
[558796.028218]  <TASK>
[558796.028228]  __schedule+0x400/0x1720
[558796.028240]  ? mod_delayed_work_on+0xa4/0xb0
[558796.028250]  ? srso_alias_return_thunk+0x5/0xfbef5
[558796.028257]  ? srso_alias_return_thunk+0x5/0xfbef5
[558796.028263]  ? blk_mq_flush_plug_list.part.0+0x1e3/0x610
[558796.028274]  schedule+0x27/0xf0
[558796.028279]  schedule_timeout+0x12f/0x160
[558796.028290]  io_schedule_timeout+0x51/0x70
[558796.028296]  wait_for_completion_io+0x8a/0x160
[558796.028305]  submit_bio_wait+0x60/0x90
[558796.028314]  blkdev_issue_discard+0x91/0x100
[558796.028325]  btrfs_issue_discard+0xc4/0x140
[558796.028337]  btrfs_discard_extent+0x241/0x2a0
[558796.028345]  ? srso_alias_return_thunk+0x5/0xfbef5
[558796.028355]  do_trimming+0xd2/0x240
[558796.028369]  trim_bitmaps+0x350/0x4c0
[558796.028383]  btrfs_trim_block_group+0xb8/0x110
[558796.028391]  btrfs_trim_fs+0x118/0x440
[558796.028398]  ? srso_alias_return_thunk+0x5/0xfbef5
[558796.028404]  ? security_capable+0x41/0x70
[558796.028414]  btrfs_ioctl_fitrim+0x113/0x180
[558796.028423]  btrfs_ioctl+0xdaf/0x2670
[558796.028433]  ? srso_alias_return_thunk+0x5/0xfbef5
[558796.028438]  ? ioctl_has_perm.constprop.0.isra.0+0xd8/0x130
[558796.028452]  __x64_sys_ioctl+0x94/0xd0
[558796.028461]  do_syscall_64+0x82/0x160
[558796.028496]  ? srso_alias_return_thunk+0x5/0xfbef5
[558796.028502]  ? do_sys_openat2+0x9c/0xe0
[558796.028514]  ? srso_alias_return_thunk+0x5/0xfbef5
[558796.028520]  ? syscall_exit_to_user_mode+0x72/0x220
[558796.028526]  ? srso_alias_return_thunk+0x5/0xfbef5
[558796.028532]  ? do_syscall_64+0x8e/0x160
[558796.028536]  ? srso_alias_return_thunk+0x5/0xfbef5
[558796.028542]  ? syscall_exit_to_user_mode+0x72/0x220
[558796.028547]  ? srso_alias_return_thunk+0x5/0xfbef5
[558796.028553]  ? do_syscall_64+0x8e/0x160
[558796.028558]  ? srso_alias_return_thunk+0x5/0xfbef5
[558796.028564]  ? do_sys_openat2+0x9c/0xe0
[558796.028573]  ? srso_alias_return_thunk+0x5/0xfbef5
[558796.028579]  ? syscall_exit_to_user_mode+0x72/0x220
[558796.028584]  ? srso_alias_return_thunk+0x5/0xfbef5
[558796.028587]  ? do_syscall_64+0x8e/0x160
[558796.028590]  ? srso_alias_return_thunk+0x5/0xfbef5
[558796.028593]  ? syscall_exit_to_user_mode+0x72/0x220
[558796.028596]  ? srso_alias_return_thunk+0x5/0xfbef5
[558796.028599]  ? do_syscall_64+0x8e/0x160
[558796.028602]  ? do_syscall_64+0x8e/0x160
[558796.028604]  ? srso_alias_return_thunk+0x5/0xfbef5
[558796.028607]  ? do_syscall_64+0x8e/0x160
[558796.028609]  ? srso_alias_return_thunk+0x5/0xfbef5
[558796.028614]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[558796.028618] RIP: 0033:0x7f514a08ef2d
[558796.028655] RSP: 002b:00007ffdafb69b10 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[558796.028659] RAX: ffffffffffffffda RBX: 00007ffdafb69c80 RCX: 00007f514a08ef2d
[558796.028662] RDX: 00007ffdafb69b80 RSI: 00000000c0185879 RDI: 0000000000000003
[558796.028664] RBP: 00007ffdafb69b60 R08: 000055858efd6010 R09: 00007ffdafb68f02
[558796.028666] R10: 0000000000000000 R11: 0000000000000246 R12: 000055858efd8f30
[558796.028668] R13: 0000000000000003 R14: 000055858efd88e0 R15: 000055858efd8cb0

Failure seems to be caused by the fstrim kernel thread and following ioctl discard.

Seems like ext4 was affected by the same issue as can be seen in 216322 and was later solved by 5229a658f645 ("ext4: do not let fstrim block system suspend")

Looking at btrfs `btrfs_trim_free_extents` it seems to me it's affected by the same problem and the naive solution would be to simply also check `freezing(current)` along fatal_signal_pending(current), but I have no idea if that's enough.

Thanks for looking!

Note You need to log in before you can comment on or make changes to this bug.