Bug 219180

Summary: Freezing user space processes failed - FSTRIM
Product: File System Reporter: Luca Stefani (luca.stefani.ge1)
Component: btrfsAssignee: BTRFS virtual assignee (fs_btrfs)
Status: RESOLVED CODE_FIX    
Severity: low CC: luca.stefani.ge1
Priority: P3    
Hardware: All   
OS: Linux   
Kernel Version: 6.10.3 Subsystem:
Regression: No Bisected commit-id:

Description Luca Stefani 2024-08-19 10:26:23 UTC
Hey all,

I've been seeing occasional failures to suspend using btrfs on Fedora.

The call trace of the failed suspend is as follows
[558775.800991] PM: suspend entry (s2idle)
[558775.857721] Filesystems sync: 0.056 seconds
[558776.023796] Freezing user space processes
[558796.027961] Freezing user space processes failed after 20.004 seconds (1 tasks refusing to freeze, wq_busy=0):
[558796.028202] task:fstrim          state:D stack:0     pid:61717 tgid:61717 ppid:1      flags:0x00004006
[558796.028214] Call Trace:
[558796.028218]  <TASK>
[558796.028228]  __schedule+0x400/0x1720
[558796.028240]  ? mod_delayed_work_on+0xa4/0xb0
[558796.028250]  ? srso_alias_return_thunk+0x5/0xfbef5
[558796.028257]  ? srso_alias_return_thunk+0x5/0xfbef5
[558796.028263]  ? blk_mq_flush_plug_list.part.0+0x1e3/0x610
[558796.028274]  schedule+0x27/0xf0
[558796.028279]  schedule_timeout+0x12f/0x160
[558796.028290]  io_schedule_timeout+0x51/0x70
[558796.028296]  wait_for_completion_io+0x8a/0x160
[558796.028305]  submit_bio_wait+0x60/0x90
[558796.028314]  blkdev_issue_discard+0x91/0x100
[558796.028325]  btrfs_issue_discard+0xc4/0x140
[558796.028337]  btrfs_discard_extent+0x241/0x2a0
[558796.028345]  ? srso_alias_return_thunk+0x5/0xfbef5
[558796.028355]  do_trimming+0xd2/0x240
[558796.028369]  trim_bitmaps+0x350/0x4c0
[558796.028383]  btrfs_trim_block_group+0xb8/0x110
[558796.028391]  btrfs_trim_fs+0x118/0x440
[558796.028398]  ? srso_alias_return_thunk+0x5/0xfbef5
[558796.028404]  ? security_capable+0x41/0x70
[558796.028414]  btrfs_ioctl_fitrim+0x113/0x180
[558796.028423]  btrfs_ioctl+0xdaf/0x2670
[558796.028433]  ? srso_alias_return_thunk+0x5/0xfbef5
[558796.028438]  ? ioctl_has_perm.constprop.0.isra.0+0xd8/0x130
[558796.028452]  __x64_sys_ioctl+0x94/0xd0
[558796.028461]  do_syscall_64+0x82/0x160
[558796.028496]  ? srso_alias_return_thunk+0x5/0xfbef5
[558796.028502]  ? do_sys_openat2+0x9c/0xe0
[558796.028514]  ? srso_alias_return_thunk+0x5/0xfbef5
[558796.028520]  ? syscall_exit_to_user_mode+0x72/0x220
[558796.028526]  ? srso_alias_return_thunk+0x5/0xfbef5
[558796.028532]  ? do_syscall_64+0x8e/0x160
[558796.028536]  ? srso_alias_return_thunk+0x5/0xfbef5
[558796.028542]  ? syscall_exit_to_user_mode+0x72/0x220
[558796.028547]  ? srso_alias_return_thunk+0x5/0xfbef5
[558796.028553]  ? do_syscall_64+0x8e/0x160
[558796.028558]  ? srso_alias_return_thunk+0x5/0xfbef5
[558796.028564]  ? do_sys_openat2+0x9c/0xe0
[558796.028573]  ? srso_alias_return_thunk+0x5/0xfbef5
[558796.028579]  ? syscall_exit_to_user_mode+0x72/0x220
[558796.028584]  ? srso_alias_return_thunk+0x5/0xfbef5
[558796.028587]  ? do_syscall_64+0x8e/0x160
[558796.028590]  ? srso_alias_return_thunk+0x5/0xfbef5
[558796.028593]  ? syscall_exit_to_user_mode+0x72/0x220
[558796.028596]  ? srso_alias_return_thunk+0x5/0xfbef5
[558796.028599]  ? do_syscall_64+0x8e/0x160
[558796.028602]  ? do_syscall_64+0x8e/0x160
[558796.028604]  ? srso_alias_return_thunk+0x5/0xfbef5
[558796.028607]  ? do_syscall_64+0x8e/0x160
[558796.028609]  ? srso_alias_return_thunk+0x5/0xfbef5
[558796.028614]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[558796.028618] RIP: 0033:0x7f514a08ef2d
[558796.028655] RSP: 002b:00007ffdafb69b10 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[558796.028659] RAX: ffffffffffffffda RBX: 00007ffdafb69c80 RCX: 00007f514a08ef2d
[558796.028662] RDX: 00007ffdafb69b80 RSI: 00000000c0185879 RDI: 0000000000000003
[558796.028664] RBP: 00007ffdafb69b60 R08: 000055858efd6010 R09: 00007ffdafb68f02
[558796.028666] R10: 0000000000000000 R11: 0000000000000246 R12: 000055858efd8f30
[558796.028668] R13: 0000000000000003 R14: 000055858efd88e0 R15: 000055858efd8cb0

Failure seems to be caused by the fstrim kernel thread and following ioctl discard.

Seems like ext4 was affected by the same issue as can be seen in 216322 and was later solved by 5229a658f645 ("ext4: do not let fstrim block system suspend")

Looking at btrfs `btrfs_trim_free_extents` it seems to me it's affected by the same problem and the naive solution would be to simply also check `freezing(current)` along fatal_signal_pending(current), but I have no idea if that's enough.

Thanks for looking!