Bug 196691

Summary: fstrim refuses to freeze during system suspend (ext4)
Product: IO/Storage Reporter: Imre Deak (imre.deak)
Component: Block LayerAssignee: Jens Axboe (axboe)
Status: NEEDINFO ---    
Severity: normal CC: adrinael, benjamin.loison, david, imre.deak, marcos.souza.org, tomi.p.sarvela
Priority: P1    
Hardware: x86-64   
OS: Linux   
Kernel Version: 4.15.0 Subsystem:
Regression: No Bisected commit-id:
Attachments: bootlog

Description Imre Deak 2017-08-17 13:18:18 UTC
System suspend to S3 and S4 with /sys/power/pm_test set to devices fails with the following backtrace:

[  680.774124] Freezing of tasks failed after 20.005 seconds (1 tasks refusing to freeze, wq_busy=0):
[  680.783120] fstrim          D    0  1612   1611 0x00000004
[  680.783129] Call Trace:
[  680.783135]  __schedule+0x3c3/0xb00
[  680.783140]  ? queue_unplugged+0x66/0x1b0
[  680.783144]  ? wait_for_common_io.constprop.1+0xea/0x190
[  680.783146]  schedule+0x40/0x90
[  680.783148]  schedule_timeout+0x251/0x490
[  680.783152]  ? trace_hardirqs_on_caller+0xe3/0x1b0
[  680.783154]  ? trace_hardirqs_on+0xd/0x10
[  680.783158]  ? wait_for_common_io.constprop.1+0xea/0x190
[  680.783160]  io_schedule_timeout+0x1e/0x50
[  680.783162]  ? io_schedule_timeout+0x1e/0x50
[  680.783165]  wait_for_common_io.constprop.1+0x109/0x190
[  680.783167]  ? wake_up_q+0x80/0x80
[  680.783170]  wait_for_completion_io+0x18/0x20
[  680.783172]  submit_bio_wait+0x59/0x70
[  680.783177]  blkdev_issue_discard+0x71/0xb0
[  680.783180]  ? ext4_trim_fs+0x429/0xc30
[  680.783183]  ext4_trim_fs+0x4b7/0xc30
[  680.783185]  ? ext4_trim_fs+0x4b7/0xc30
[  680.783192]  ext4_ioctl+0xc3b/0x1100
[  680.783197]  do_vfs_ioctl+0x94/0x670
[  680.783199]  ? entry_SYSCALL_64_fastpath+0x5/0xb1
[  680.783202]  ? __this_cpu_preempt_check+0x13/0x20
[  680.783205]  ? trace_hardirqs_on_caller+0xe3/0x1b0
[  680.783207]  SyS_ioctl+0x41/0x70
[  680.783210]  entry_SYSCALL_64_fastpath+0x1c/0xb1
[  680.783212] RIP: 0033:0x7fc7008a7587
[  680.783214] RSP: 002b:00007fffdb19f718 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  680.783217] RAX: ffffffffffffffda RBX: ffffffff81489a93 RCX: 00007fc7008a7587
[  680.783218] RDX: 00007fffdb19f720 RSI: 00000000c0185879 RDI: 0000000000000004
[  680.783220] RBP: ffffc9000011ff88 R08: 00000001502de180 R09: 0000000000000000
[  680.783221] R10: 000000000000000a R11: 0000000000000246 R12: 00007fc700fbf250
[  680.783222] R13: 0000000000000000 R14: 00000001502d8800 R15: 0000000000000004
[  680.783225]  ? __this_cpu_preempt_check+0x13/0x20
[  680.783231] OOM killer enabled.
Comment 1 Imre Deak 2017-08-17 13:21:57 UTC
SSD is OCZ Vertex 3, firmware ver 2.25
Comment 2 Tomi Sarvela 2017-08-17 13:26:48 UTC
Created attachment 257995 [details]
bootlog
Comment 3 Imre Deak 2017-08-17 13:39:13 UTC
The bug isn't MD related so assuming 'Block Layer' component for now.
Comment 4 Imre Deak 2018-02-01 08:19:52 UTC
+Dave.

Dave, I read some explanation from you of how fstrim could cause extensive delays. Do you think the 20s freeze during system suspend is just that, or could point to some other issue? Could changing the SSD or using discard instead prevent this?
Comment 5 Jens Axboe 2018-02-01 15:30:32 UTC
I'm assuming the discard just takes a long time to complete. We can't suspend with in-flight IO. So this doesn't look like a bug to me, the only option we have is to wait for the IO to complete.

You can potentially improve the situation here by making discards happen in smaller units. That can be done by setting:

/sys/block/nvme0n1/queue/discard_max_bytes

(substitute nvme0n1 for your device) to a smaller value. Bigger discards will then be chopped into pieces of discard_max_bytes in size.
Comment 6 Imre Deak 2018-02-01 19:00:50 UTC
(In reply to Jens Axboe from comment #5)
> I'm assuming the discard just takes a long time to complete. We can't
> suspend with in-flight IO. So this doesn't look like a bug to me, the only
> option we have is to wait for the IO to complete.
> 
> You can potentially improve the situation here by making discards happen in
> smaller units. That can be done by setting:
> 
> /sys/block/nvme0n1/queue/discard_max_bytes
> 
> (substitute nvme0n1 for your device) to a smaller value. Bigger discards
> will then be chopped into pieces of discard_max_bytes in size.

Thanks, so I suppose this also depends on how fast the device can handle the trim/discard command. We'll try the above setting or else replacing the device, if that solves it I'm fine with closing this bug.
Comment 7 Marcos Souza 2019-02-02 22:56:07 UTC
> Thanks, so I suppose this also depends on how fast the device can handle the
> trim/discard command. We'll try the above setting or else replacing the
> device, if that solves it I'm fine with closing this bug.

Did the solution work for you? Can this bug be closed?

Thanks.