Bug 216322 - Freezing of tasks failed after 60.004 seconds (1 tasks refusing to freeze... task:fstrim ext4_trim_fs - Dell XPS 13 9310
Summary: Freezing of tasks failed after 60.004 seconds (1 tasks refusing to freeze... ...
Status: RESOLVED CODE_FIX
Alias: None
Product: File System
Classification: Unclassified
Component: ext4 (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: fs_ext4@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks: 178231
  Show dependency tree
 
Reported: 2022-08-03 20:18 UTC by Len Brown
Modified: 2023-09-25 12:04 UTC (History)
7 users (show)

See Also:
Kernel Version: 5.19.0
Subsystem:
Regression: No
Bisected commit-id:


Attachments
html.gz page showing the failure (377.08 KB, text/html)
2022-08-03 20:26 UTC, Len Brown
Details
issue.def (390 bytes, text/plain)
2023-04-21 23:46 UTC, Todd Brandt
Details
[PATCH 1/2] ext4: Move setting of trimmed bit into ext4_try_to_trim_range() (5.30 KB, patch)
2023-09-13 15:02 UTC, Jan Kara
Details | Diff
[PATCH 2/2] ext4: Do no let fstrim block system suspend (2.20 KB, patch)
2023-09-13 15:03 UTC, Jan Kara
Details | Diff

Description Len Brown 2022-08-03 20:18:26 UTC
The system suspend to idle path occasionally times out after 60 seconds
and throws the stack trace below.

The result is that the suspend path is aborted and the systems continues running.

Unfortunately, when a user invokes suspend, they expect it to work,
and they may not be around (lid closed) to try it again when it aborts...

[10483.047079] PM: suspend entry (s2idle)
[10483.052777] Filesystems sync: 0.005 seconds
[10483.052782] PM: Preparing system for sleep (s2idle)
[10483.060824] Freezing user space processes ... 
[10543.024088] Freezing of tasks failed after 60.004 seconds (1 tasks refusing to freeze, wq_busy=0):
[10543.024175] task:fstrim          state:D stack:    0 pid:225775 ppid:     1 flags:0x00004006
[10543.024183] Call Trace:
[10543.024186]  &ltTASK&gt
[10543.024192]  __schedule+0x306/0x9f0
[10543.024202]  schedule+0x5c/0xd0
[10543.024206]  schedule_timeout+0x87/0x160
[10543.024211]  ? timer_migration_handler+0xa0/0xa0
[10543.024217]  trace_clock_x86_tsc+0x20/0x20
[10543.024224]  __wait_for_common+0x8f/0x190
[10543.024228]  ? firmware_map_remove+0x9c/0x9c
[10543.024233]  wait_for_completion_io_timeout+0x1d/0x30
[10543.024237]  submit_bio_wait+0x7f/0xc0
[10543.024244]  blkdev_issue_discard+0x6e/0xc0
[10543.024250]  ext4_try_to_trim_range+0x1f0/0x440
[10543.024259]  ext4_trim_fs+0x327/0x4d0
[10543.024266]  __ext4_ioctl+0x2d3/0x1590
[10543.024270]  ? putname+0x59/0x70
[10543.024275]  ? __seccomp_filter+0x3a6/0x5c0
[10543.024283]  ext4_ioctl+0xe/0x20
[10543.024287]  __x64_sys_ioctl+0x92/0xd0
[10543.024293]  do_syscall_64+0x59/0x90
[10543.024297]  ? do_syscall_64+0x69/0x90
[10543.024300]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
[10543.024306] RIP: 0033:0x7f2ae6b1aaff
[10543.024312] RSP: 002b:00007ffd41e6de60 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[10543.024317] RAX: ffffffffffffffda RBX: 00007ffd41e6dfb0 RCX: 00007f2ae6b1aaff
[10543.024319] RDX: 00007ffd41e6ded0 RSI: 00000000c0185879 RDI: 0000000000000003
[10543.024322] RBP: 00005621c3642e90 R08: 00005621c3642e90 R09: 0000000000000000
[10543.024324] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003
[10543.024326] R13: 00005621c3642ce0 R14: 00005621c3642980 R15: 00005621c3642980
[10543.024330]  &lt/TASK&gt
[10543.024343] OOM killer enabled.
[10543.024344] Restarting tasks ... done.
[10543.085776] PM: suspend exit
Comment 1 Len Brown 2022-08-03 20:26:31 UTC
Created attachment 301522 [details]
html.gz page showing the failure

The sleepgraph output shows the 'fstrim' kernel thread
continuously calling schedule_time(15000).

Interestingly, re-trying the suspend after this failure
is successful.  So the failure is not permanent.
(perhaps the kernel is waiting for a frozen user process that is allowed to proceed?)
Comment 2 Theodore Tso 2022-08-04 00:44:45 UTC
So the problem is that the FITRIM ioctl does not check if a signal is pending, and so if the fstrim program requests that the entire SSD (len=ULLONG_MAX), like the broomstick set off by Mickey Mouse in Fantasia's "Sorcerer's Apprentive", it will mindlessly send discard requests for any blocks not in use by the file system until it is done.   Or to put it another way, "Neither rain, nor snow, or a request to freeze the OS, shall stop the FITRIM ioctl from its appointed task."  :-)

The question is how to fix things.   The problem is that the FITRIM ioctl interface is pretty horrible.   The fstrim_range.len variable is an IN/OUT field where on the input it is the number of bytes that should be trimmed (from start to start+len) and when the ioctl returns fstrm_range.len is the number of bytes that were actually trimmed.   So this is not really amenable for -ERESTARTSYS.

Worse, the fstrim program in util-linux doesn't handle an EAGAIN error return code, so if it gets the EAGAIN after try_to_freeze_tasks send the fake signal to the process, fstrim will print to stderr "fstrim: FITRIM ioctl failed" and the rest of the file system trim operation will be aborted.

It might be that the only way we can fix this is to have FITRIM return EAGAIN, which will stop the fstrim in its tracks.  This is... not great, but typically fstrim is run out of crontab or a systemd timer once a month, so if the user tries to suspend right as the fstrim is running, hopefully we'll get lucky next month.    We can then try teach fstrim to do the right thing, and so this lossage mode would only happen in the combination of a new kernel and an older version of util-linux.

I'm not happy with that solution, but the alternative of creating a new FITRIM2 ioctl that has a sane interface means that you need an new kernel and a new util-linux package, and if you don't, the user will have to deal with a hot laptop bag and a drained battery.   And not changing FITRIM's behaviour will have the same potential end result, if the user gets unlucky and tries to suspend the laptop when there is more than 60 seconds left before FITRIM to complete.   :-/

The other thing I'll note is that every file system has its own FITRIM implementation, and I suspect they all have this issue, because the FITRIM interface is fundamentally flawed.
Comment 3 Theodore Tso 2022-08-04 01:01:50 UTC
The other consideration is if there is some other userspace application other than util-linux which is using the FITRIM ioctl --- for example, what if systemd decided it needed to reimplement fstrim the way it's reimplemented syslogd, ntpd, etc., etc., etc.?     In which case, if we change FITRIM so that if it gets a signal or if the system tries to suspend itself, it will return EAGAIN and fstrim_range.len will have the number of bytes trimmed so far --- this might cause the systemd-reimplementation (or any other hypothetical users of FITRIM) to break if there is a suspend-to-ram happening at an inopportune time.

So which is worse?   

1)   Leaving suspend-to-ram broken if the user is unlucky enough to try to suspend their laptop while fstrim is run automatically by systemd or out of crontab?   

2)   Breaking random userspace programs that use FITRIM so they doesn't complete the requested file system/SSD maintenance if the user tries to suspend their laptop while that program happens to be running?   (We can fix the userspace programs which use FITRIM so they handle the EAGAIN error return as we find them, of course.   At the moment, it's only util-linux as far as I know.)

In the long term, #2 seems like the best approach, IMHO.  OTOH, it could be argued that we've lived with this for years and years and years, and no one has noticed up until now.
Comment 4 Lukas Czerner 2022-08-04 11:47:47 UTC
On Thu, Aug 04, 2022 at 12:44:45AM +0000, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=216322
> 
> Theodore Tso (tytso@mit.edu) changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                  CC|                            |tytso@mit.edu
> 
> --- Comment #2 from Theodore Tso (tytso@mit.edu) ---
> So the problem is that the FITRIM ioctl does not check if a signal is
> pending,
> and so if the fstrim program requests that the entire SSD (len=ULLONG_MAX),
> like the broomstick set off by Mickey Mouse in Fantasia's "Sorcerer's
> Apprentive", it will mindlessly send discard requests for any blocks not in
> use
> by the file system until it is done.   Or to put it another way, "Neither
> rain,
> nor snow, or a request to freeze the OS, shall stop the FITRIM ioctl from its
> appointed task."  :-)
> 
> The question is how to fix things.   The problem is that the FITRIM ioctl
> interface is pretty horrible.   The fstrim_range.len variable is an IN/OUT
> field where on the input it is the number of bytes that should be trimmed
> (from
> start to start+len) and when the ioctl returns fstrm_range.len is the number
> of
> bytes that were actually trimmed.   So this is not really amenable for
> -ERESTARTSYS.
> 
> Worse, the fstrim program in util-linux doesn't handle an EAGAIN error return
> code, so if it gets the EAGAIN after try_to_freeze_tasks send the fake signal
> to the process, fstrim will print to stderr "fstrim: FITRIM ioctl failed" and
> the rest of the file system trim operation will be aborted.
> 
> It might be that the only way we can fix this is to have FITRIM return
> EAGAIN,
> which will stop the fstrim in its tracks.  This is... not great, but
> typically
> fstrim is run out of crontab or a systemd timer once a month, so if the user
> tries to suspend right as the fstrim is running, hopefully we'll get lucky
> next
> month.    We can then try teach fstrim to do the right thing, and so this
> lossage mode would only happen in the combination of a new kernel and an
> older
> version of util-linux.
> 
> I'm not happy with that solution, but the alternative of creating a new
> FITRIM2
> ioctl that has a sane interface means that you need an new kernel and a new
> util-linux package, and if you don't, the user will have to deal with a hot
> laptop bag and a drained battery.   And not changing FITRIM's behaviour will
> have the same potential end result, if the user gets unlucky and tries to
> suspend the laptop when there is more than 60 seconds left before FITRIM to
> complete.   :-/
> 
> The other thing I'll note is that every file system has its own FITRIM
> implementation, and I suspect they all have this issue, because the FITRIM
> interface is fundamentally flawed.

I agree that the FITRIM interface is flawed in this way. But
ext4_try_to_trim_range() actually does have fatal_signal_pending() and
will return -ERESTARTSYS if that's true. Or did you have something else in
mind?

Also in that case, I see no reason why we would not be able to adjust
the fstrim_range to make it easier to re-start where we left off if
we're going to return -ERESTARTSYS. I am missing something?

I have not had time to look deeply into the traces, but are you actually
sure that we're not stuck in blkdev_issue_discard() instead?

-Lukas
Comment 5 Theodore Tso 2022-08-04 14:45:56 UTC
On Thu, Aug 04, 2022 at 11:47:47AM +0000, bugzilla-daemon@kernel.org wrote:
> 
> I agree that the FITRIM interface is flawed in this way. But
> ext4_try_to_trim_range() actually does have fatal_signal_pending() and
> will return -ERESTARTSYS if that's true. Or did you have something else in
> mind?

The fatal_signal_pending() only checks for SIGKILL.  I'm not sure why
it returns ERESTARTSYS, since that's not applicable for a kill -9
signal.  The fake_signal_wake_up() function in kernel/freezer.c
doesn't send a fatal signal, so the fatal_signal_pending() check isn't
going to help here.

> Also in that case, I see no reason why we would not be able to adjust
> the fstrim_range to make it easier to re-start where we left off if
> we're going to return -ERESTARTSYS. I am missing something?

Well, we could adjust fstrim_range.start and fstrim_range.len to make
it easier to restart --- but that's only if we know for sure that
we're going to be restarting the system call.  So we need to break
some abstraction barriers since if the signal is one where based on
the sigaction flags, the system all is *not* restarted, then
fstrim_range.len is supposed to contain the number of bytes trimmed.

And even if the system call is restarted, there's no place to stash
the number of bytes trimmed so far, since fstrim_range.len is
overloaded.   This why the interface is so horrible...

> I have not had time to look deeply into the traces, but are you actually
> sure that we're not stuck in blkdev_issue_discard() instead?

I'm not 100% certain, but unless the block device has been put to
sleep first (in which case I think we would have noticed much sooner
since lots of other suspend-to-ram use cases would be failling --- in
writeback threads, for example), I'd be really surprised if we're
getting stuck there.

Even when we need to wait for the queue to be drained so there is
space to send the next discard, that shouldn't take 60+ seconds.

      	      	       		     	       - Ted
Comment 6 Len Brown 2022-08-09 13:40:36 UTC
Is there an simple recipe to reproduce this failure?

Note that upon this 60-second timeout, the kernel aborts the suspend
and un-freezes user-space.  Well, what is running in user space
is the sleepgraph program with --multi, which saves the record
of the failure and immediately provokes the next suspend -- which works.

Did the act of unfreezing user-space result in progress,
or were we just lucky that the operation completed before the
subsequent suspend request?

Note that Rafael thought that the kernel timeout was 20 seconds,
not 60 seconds, and he's inclined to make it shorter, not longer.
(are we sure that this process is actually making progress,
 and not in some kind of deadlock that would persist no matter
 now long the timeout?)
Comment 7 Theodore Tso 2022-08-09 17:48:13 UTC
I suspect you got lucky.  Depending on the SSD's performance in processing discard requests, the size of the file system, and how fragmented the free space might be, it could take several minutes for the FITRIM as executed by fstrim(8) to complete.   At the moment, it can be interrupted via a kill -9, but not anything else.

It wasn't a matter of the FITRIM failing to make progress; it was making progress, and it was busy sending tons of discard requests to the storage device.  It was just that it currently ignores "fake signal" sent by the kernel when it attempts to suspend userspace processes until it completes its task.

So if the FITRIM normally takes 3 minutes on that particular storage device, and it suspend was triggered 90 seconds after fstrim(8) was triggered by cron/systemd, the 60 second timeout would have caused the suspend to fail, and then the next suspend would have worked since the FITRIM would have completed before the 60 second timeout expired.

To reproduce this failure, presumably what you would want to do is to mount and unmount the file system, since FITRIM sets a flag on a block group after it has been trimmed, which is cleared when blocks are freed in that block group, and a subsequent FITRIM will skip block groups that still have the flag set.   Then trigger the fstrim, and immediately try to suspend the laptop.   If your SSD is sufficiently slow, and your file system is sufficiently large and fragmented, then you should see it fail.   If not, you could try changing the kernel timeout to a smaller value, to a value smaller than the time it takes for the command "time fstrim <mntpnt>":

For example, on my new laptop (a 2021 Samsung Galaxy Pro 360):

% sudo time fstrim /
0.00user 1.32system 1:14.57elapsed 1%CPU (0avgtext+0avgdata 2732maxresident)k
176inputs+0outputs (0major+137minor)pagefaults 0swaps
% df -h /
Filesystem      Size  Used Avail Use% Mounted on
/dev/nvme0n1p7  1.1T   28G  996G   3% /

This is a 2TB, so if root file system used the full 2TB of space, it would have taken roughly 2 minutes for fstrim to run, and FITRIM is uninterruptible (except via a kill -9 signal).

On my Dell Precision Tower development machine, which has an older SSD, things are even worse:

% sudo time fstrim /
[sudo] password for tytso: 
0.00user 34.56system 13:27.21elapsed 4%CPU (0avgtext+0avgdata 2724maxresident)k
10184inputs+0outputs (2major+131minor)pagefaults 0swaps
% df -h /
Filesystem                 Size  Used Avail Use% Mounted on
/dev/mapper/cwcc--vg-root  824G  283G  499G  37% /

Please note, I'm not requesting that the kernel timeout be extended from 60 seconds to 15 minutes.   We need to find some different solution.  :-)
Comment 8 Dave Chinner 2022-08-10 00:29:50 UTC
On Thu, Aug 04, 2022 at 11:47:47AM +0000, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=216322
> 
> --- Comment #4 from Lukas Czerner (lczerner@redhat.com) ---
> On Thu, Aug 04, 2022 at 12:44:45AM +0000, bugzilla-daemon@kernel.org wrote:
> > https://bugzilla.kernel.org/show_bug.cgi?id=216322
> > 
> > Theodore Tso (tytso@mit.edu) changed:
> > 
> >            What    |Removed                     |Added
> >
> ----------------------------------------------------------------------------
> >                  CC|                            |tytso@mit.edu
> > 
> > --- Comment #2 from Theodore Tso (tytso@mit.edu) ---
> > So the problem is that the FITRIM ioctl does not check if a signal is
> > pending,
> > and so if the fstrim program requests that the entire SSD (len=ULLONG_MAX),
> > like the broomstick set off by Mickey Mouse in Fantasia's "Sorcerer's
> > Apprentive", it will mindlessly send discard requests for any blocks not in
> > use
> > by the file system until it is done.   Or to put it another way, "Neither
> > rain,
> > nor snow, or a request to freeze the OS, shall stop the FITRIM ioctl from
> its
> > appointed task."  :-)
> > 
> > The question is how to fix things.   The problem is that the FITRIM ioctl
> > interface is pretty horrible.   The fstrim_range.len variable is an IN/OUT
> > field where on the input it is the number of bytes that should be trimmed
> > (from
> > start to start+len) and when the ioctl returns fstrm_range.len is the
> number
> > of
> > bytes that were actually trimmed.   So this is not really amenable for
> > -ERESTARTSYS.
> > 
> > Worse, the fstrim program in util-linux doesn't handle an EAGAIN error
> return
> > code, so if it gets the EAGAIN after try_to_freeze_tasks send the fake
> signal
> > to the process, fstrim will print to stderr "fstrim: FITRIM ioctl failed"
> and
> > the rest of the file system trim operation will be aborted.
> > 
> > It might be that the only way we can fix this is to have FITRIM return
> > EAGAIN,
> > which will stop the fstrim in its tracks.  This is... not great, but
> > typically
> > fstrim is run out of crontab or a systemd timer once a month, so if the
> user
> > tries to suspend right as the fstrim is running, hopefully we'll get lucky
> > next
> > month.    We can then try teach fstrim to do the right thing, and so this
> > lossage mode would only happen in the combination of a new kernel and an
> > older
> > version of util-linux.
> > 
> > I'm not happy with that solution, but the alternative of creating a new
> > FITRIM2
> > ioctl that has a sane interface means that you need an new kernel and a new
> > util-linux package, and if you don't, the user will have to deal with a hot
> > laptop bag and a drained battery.   And not changing FITRIM's behaviour
> will
> > have the same potential end result, if the user gets unlucky and tries to
> > suspend the laptop when there is more than 60 seconds left before FITRIM to
> > complete.   :-/
> > 
> > The other thing I'll note is that every file system has its own FITRIM
> > implementation, and I suspect they all have this issue, because the FITRIM
> > interface is fundamentally flawed.
> 
> I agree that the FITRIM interface is flawed in this way. But
> ext4_try_to_trim_range() actually does have fatal_signal_pending() and
> will return -ERESTARTSYS if that's true. Or did you have something else in
> mind?

Why not just do:

	if (freezing(current))
		break;

After the call to fatal_signal_pending()?

Remember: FITRIM is an -advisory- API. It does not provide any
guarantees that the free space in the filesystem has any specific
operation done on it, nor does the backing store guarantee that it
performs GC on ranges the filesystem discards because discards are
advisory as well!

Hence the FITRIM API isn't a problem here at all - it's purely an
advosiry interface and does not guarantee storage level garbage
collection. Hence if filesystems skip the remaining requested range
because the system is being suspended, then it isn't the end of the
world.  Userspace already has to expect that FITRIM will *do
nothing*, and if userspace is doing FITRIM often enough that suspend
is an issue, the next scheduled userspace FITRIM pass will clean up
what this one skipped...

Hence I don't see any problem with just stopping FITRIM and
returning "no error" if it detects a suspend operation in progress.
Simple logic, easy to retrofit to all filesystems, and doesn't
require any userspace awareness of the issue at all...

Cheers,

Dave.
Comment 9 brian.bascoy 2022-09-09 06:02:59 UTC
I have a similar (same?) issue with my XPS 13 but the failing tasks seem completely random. I think this started after upgrading my kernel from 5.13 to 5.14.

Len, can you confirm this only happens with fstrim?

Here are a few examples from my current boot:

Freezing of tasks failed after 20.002 seconds (1 tasks refusing to freeze, wq_busy=0):
task:python3         state:D stack:    0 pid:120996 ppid:     1 flags:0x00000006
Call Trace:
 <TASK>
 __schedule+0x2ae/0x7c0
 schedule+0x4e/0xb0
 fanotify_handle_event+0x352/0x4d0
 ? wait_woken+0x60/0x60
 fsnotify+0x2ff/0x550
 __fsnotify_parent+0xff/0x310
 security_file_open+0xdd/0x150
 ? security_file_open+0xdd/0x150
 do_dentry_open+0xf2/0x380
 vfs_open+0x2d/0x30
 do_open.isra.0+0x224/0x420
 path_openat+0x18e/0xc80
 do_filp_open+0xb2/0x120
 ? __check_object_size+0x13f/0x150
 do_sys_openat2+0x245/0x310
 do_sys_open+0x46/0x80
 __x64_sys_openat+0x20/0x30
 do_syscall_64+0x38/0xc0
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f2885d3c6eb
RSP: 002b:00007fff68482670 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
RAX: ffffffffffffffda RBX: 00007f2885c27140 RCX: 00007f2885d3c6eb
RDX: 0000000000080000 RSI: 00007f28856a2870 RDI: 00000000ffffff9c
RBP: 00007f28856a2870 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000080000
R13: 00007f2885499990 R14: 0000000000080000 R15: 0000000000000001
 </TASK>

---

Freezing of tasks failed after 20.006 seconds (1 tasks refusing to freeze, wq_busy=0):
task:git             state:D stack:    0 pid:121036 ppid: 16154 flags:0x00000006
Call Trace:
 <TASK>
 __schedule+0x2ae/0x7c0
 schedule+0x4e/0xb0
 fanotify_handle_event+0x352/0x4d0
 ? wait_woken+0x60/0x60
 fsnotify+0x2ff/0x550
 __fsnotify_parent+0xff/0x310
 ? get_acl+0x1d/0x170
 security_file_open+0xdd/0x150
 ? security_file_open+0xdd/0x150
 do_dentry_open+0xf2/0x380
 vfs_open+0x2d/0x30
 do_open.isra.0+0x224/0x420
 path_openat+0x18e/0xc80
 ? filemap_map_pages+0x134/0x630
 do_filp_open+0xb2/0x120
 ? __check_object_size+0x13f/0x150
 do_sys_openat2+0x245/0x310
 do_sys_open+0x46/0x80
 __x64_sys_openat+0x20/0x30
 do_syscall_64+0x38/0xc0
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f3fead69bcc
RSP: 002b:00007ffca0166ac0 EFLAGS: 00000287 ORIG_RAX: 0000000000000101
RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f3fead69bcc
RDX: 0000000000080000 RSI: 00007f3feae31570 RDI: 00000000ffffff9c
RBP: 00007ffca0166c80 R08: 0000000000080000 R09: 00007f3feae31570
R10: 0000000000000000 R11: 0000000000000287 R12: 00007ffca0167cb8
R13: 00007ffca0166b20 R14: 00005575094ba320 R15: 00007ffca0166b20
 </TASK>

---

Freezing of tasks failed after 20.007 seconds (1 tasks refusing to freeze, wq_busy=0):
task:grub-editenv    state:D stack:    0 pid:125647 ppid:     1 flags:0x00000004
Call Trace:
 <TASK>
 __schedule+0x2ae/0x7c0
 schedule+0x4e/0xb0
 fanotify_handle_event+0x352/0x4d0
 ? wait_woken+0x60/0x60
 fsnotify+0x2ff/0x550
 ? chacha_block_generic+0x6f/0xb0
 __fsnotify_parent+0xff/0x310
 security_file_open+0xdd/0x150
 ? security_file_open+0xdd/0x150
 do_dentry_open+0xf2/0x380
 vfs_open+0x2d/0x30
 do_open.isra.0+0x224/0x420
 path_openat+0x18e/0xc80
 do_filp_open+0xb2/0x120
 ? __check_object_size+0x13f/0x150
 do_sys_openat2+0x245/0x310
 do_sys_open+0x46/0x80
 __x64_sys_openat+0x20/0x30
 do_syscall_64+0x38/0xc0
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f125633bb38
RSP: 002b:00007ffd4004d148 EFLAGS: 00000287 ORIG_RAX: 0000000000000101
RAX: ffffffffffffffda RBX: 00007f12563501f8 RCX: 00007f125633bb38
RDX: 0000000000080000 RSI: 00007f125634321b RDI: 00000000ffffff9c
RBP: 00007ffd4004d300 R08: 0000000000080000 R09: 00007f125634321b
R10: 0000000000000000 R11: 0000000000000287 R12: ffffffffffffffff
R13: 0000000000000001 R14: 000000000000000d R15: 000056394c0aa990
 </TASK>
Comment 10 Jan Kara 2022-09-26 15:59:17 UTC
(In reply to brian.bascoy from comment #9)
> I have a similar (same?) issue with my XPS 13 but the failing tasks seem
> completely random. I think this started after upgrading my kernel from 5.13
> to 5.14.
> 
> Len, can you confirm this only happens with fstrim?
> 
> Here are a few examples from my current boot:
> 
> Freezing of tasks failed after 20.002 seconds (1 tasks refusing to freeze,
> wq_busy=0):
> task:python3         state:D stack:    0 pid:120996 ppid:     1
> flags:0x00000006
> Call Trace:
>  <TASK>
>  __schedule+0x2ae/0x7c0
>  schedule+0x4e/0xb0
>  fanotify_handle_event+0x352/0x4d0
>  ? wait_woken+0x60/0x60
>  fsnotify+0x2ff/0x550
>  __fsnotify_parent+0xff/0x310
>  security_file_open+0xdd/0x150
>  ? security_file_open+0xdd/0x150
>  do_dentry_open+0xf2/0x380
>  vfs_open+0x2d/0x30
>  do_open.isra.0+0x224/0x420
>  path_openat+0x18e/0xc80
>  do_filp_open+0xb2/0x120
>  ? __check_object_size+0x13f/0x150
>  do_sys_openat2+0x245/0x310
>  do_sys_open+0x46/0x80
>  __x64_sys_openat+0x20/0x30
>  do_syscall_64+0x38/0xc0
>  entry_SYSCALL_64_after_hwframe+0x44/0xae
> RIP: 0033:0x7f2885d3c6eb
> RSP: 002b:00007fff68482670 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
> RAX: ffffffffffffffda RBX: 00007f2885c27140 RCX: 00007f2885d3c6eb
> RDX: 0000000000080000 RSI: 00007f28856a2870 RDI: 00000000ffffff9c
> RBP: 00007f28856a2870 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000080000
> R13: 00007f2885499990 R14: 0000000000080000 R15: 0000000000000001
>  </TASK>
> 
> ---
> 
> Freezing of tasks failed after 20.006 seconds (1 tasks refusing to freeze,
> wq_busy=0):
> task:git             state:D stack:    0 pid:121036 ppid: 16154
> flags:0x00000006
> Call Trace:
>  <TASK>
>  __schedule+0x2ae/0x7c0
>  schedule+0x4e/0xb0
>  fanotify_handle_event+0x352/0x4d0
>  ? wait_woken+0x60/0x60
>  fsnotify+0x2ff/0x550
>  __fsnotify_parent+0xff/0x310
>  ? get_acl+0x1d/0x170
>  security_file_open+0xdd/0x150
>  ? security_file_open+0xdd/0x150
>  do_dentry_open+0xf2/0x380
>  vfs_open+0x2d/0x30
>  do_open.isra.0+0x224/0x420
>  path_openat+0x18e/0xc80
>  ? filemap_map_pages+0x134/0x630
>  do_filp_open+0xb2/0x120
>  ? __check_object_size+0x13f/0x150
>  do_sys_openat2+0x245/0x310
>  do_sys_open+0x46/0x80
>  __x64_sys_openat+0x20/0x30
>  do_syscall_64+0x38/0xc0
>  entry_SYSCALL_64_after_hwframe+0x44/0xae
> RIP: 0033:0x7f3fead69bcc
> RSP: 002b:00007ffca0166ac0 EFLAGS: 00000287 ORIG_RAX: 0000000000000101
> RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f3fead69bcc
> RDX: 0000000000080000 RSI: 00007f3feae31570 RDI: 00000000ffffff9c
> RBP: 00007ffca0166c80 R08: 0000000000080000 R09: 00007f3feae31570
> R10: 0000000000000000 R11: 0000000000000287 R12: 00007ffca0167cb8
> R13: 00007ffca0166b20 R14: 00005575094ba320 R15: 00007ffca0166b20
>  </TASK>
> 
> ---
> 
> Freezing of tasks failed after 20.007 seconds (1 tasks refusing to freeze,
> wq_busy=0):
> task:grub-editenv    state:D stack:    0 pid:125647 ppid:     1
> flags:0x00000004
> Call Trace:
>  <TASK>
>  __schedule+0x2ae/0x7c0
>  schedule+0x4e/0xb0
>  fanotify_handle_event+0x352/0x4d0
>  ? wait_woken+0x60/0x60
>  fsnotify+0x2ff/0x550
>  ? chacha_block_generic+0x6f/0xb0
>  __fsnotify_parent+0xff/0x310
>  security_file_open+0xdd/0x150
>  ? security_file_open+0xdd/0x150
>  do_dentry_open+0xf2/0x380
>  vfs_open+0x2d/0x30
>  do_open.isra.0+0x224/0x420
>  path_openat+0x18e/0xc80
>  do_filp_open+0xb2/0x120
>  ? __check_object_size+0x13f/0x150
>  do_sys_openat2+0x245/0x310
>  do_sys_open+0x46/0x80
>  __x64_sys_openat+0x20/0x30
>  do_syscall_64+0x38/0xc0
>  entry_SYSCALL_64_after_hwframe+0x44/0xae
> RIP: 0033:0x7f125633bb38
> RSP: 002b:00007ffd4004d148 EFLAGS: 00000287 ORIG_RAX: 0000000000000101
> RAX: ffffffffffffffda RBX: 00007f12563501f8 RCX: 00007f125633bb38
> RDX: 0000000000080000 RSI: 00007f125634321b RDI: 00000000ffffff9c
> RBP: 00007ffd4004d300 R08: 0000000000080000 R09: 00007f125634321b
> R10: 0000000000000000 R11: 0000000000000287 R12: ffffffffffffffff
> R13: 0000000000000001 R14: 000000000000000d R15: 000056394c0aa990
>  </TASK>

This is very different problem (although symptoms are similar). The kernel is waiting for response to fanotify permission event. Apparently you have some application that is using fanotify permission events (maybe some antivirus or security solution). Sadly there is no easy way to gracefully handle suspend in such cases. Anyway there was no change in this code for a few years so if this started happening for you recently I think the userspace change is more likely.
Comment 11 brian.bascoy 2022-09-27 13:24:21 UTC
Thanks very much Jan, I think you are absolutely right and it is indeed an issue with Fortinet VPN client (which includes some anti-malware functionality). Sorry about the confusion.
Comment 12 Todd Brandt 2023-04-21 23:46:20 UTC
Created attachment 304174 [details]
issue.def
Comment 13 Todd Brandt 2023-09-06 16:28:11 UTC
Here's a method to easily reproduce this issue. The fstrim timer usually trips at Midnight Monday mornings on ubuntu, so when testing happens during this time we get this issue. Here's a way to trigger fstrim any time you want over suspend/resume:

--------------
#!/bin/sh

/sbin/fstrim --listed-in /etc/fstab:/proc/self/mountinfo --verbose --quiet-unsupported &
sudo sleepgraph -m mem -rtcwake 5
--------------
Comment 14 Jan Kara 2023-09-13 15:02:37 UTC
Created attachment 305102 [details]
[PATCH 1/2] ext4: Move setting of trimmed bit into ext4_try_to_trim_range()
Comment 15 Jan Kara 2023-09-13 15:03:17 UTC
Created attachment 305103 [details]
[PATCH 2/2] ext4: Do no let fstrim block system suspend

These two patches should fix the issue...
Comment 16 Jan Kara 2023-09-25 12:04:23 UTC
Patches were merged upstream as commits 45e4ab320c9 ("ext4: move setting of trimmed bit into ext4_try_to_trim_range()") and 5229a658f645 ("ext4: do not let fstrim block system suspend"). Closing the bug.

Note You need to log in before you can comment on or make changes to this bug.