System suspend to S3 and S4 with /sys/power/pm_test set to devices fails with the following backtrace: [ 680.774124] Freezing of tasks failed after 20.005 seconds (1 tasks refusing to freeze, wq_busy=0): [ 680.783120] fstrim D 0 1612 1611 0x00000004 [ 680.783129] Call Trace: [ 680.783135] __schedule+0x3c3/0xb00 [ 680.783140] ? queue_unplugged+0x66/0x1b0 [ 680.783144] ? wait_for_common_io.constprop.1+0xea/0x190 [ 680.783146] schedule+0x40/0x90 [ 680.783148] schedule_timeout+0x251/0x490 [ 680.783152] ? trace_hardirqs_on_caller+0xe3/0x1b0 [ 680.783154] ? trace_hardirqs_on+0xd/0x10 [ 680.783158] ? wait_for_common_io.constprop.1+0xea/0x190 [ 680.783160] io_schedule_timeout+0x1e/0x50 [ 680.783162] ? io_schedule_timeout+0x1e/0x50 [ 680.783165] wait_for_common_io.constprop.1+0x109/0x190 [ 680.783167] ? wake_up_q+0x80/0x80 [ 680.783170] wait_for_completion_io+0x18/0x20 [ 680.783172] submit_bio_wait+0x59/0x70 [ 680.783177] blkdev_issue_discard+0x71/0xb0 [ 680.783180] ? ext4_trim_fs+0x429/0xc30 [ 680.783183] ext4_trim_fs+0x4b7/0xc30 [ 680.783185] ? ext4_trim_fs+0x4b7/0xc30 [ 680.783192] ext4_ioctl+0xc3b/0x1100 [ 680.783197] do_vfs_ioctl+0x94/0x670 [ 680.783199] ? entry_SYSCALL_64_fastpath+0x5/0xb1 [ 680.783202] ? __this_cpu_preempt_check+0x13/0x20 [ 680.783205] ? trace_hardirqs_on_caller+0xe3/0x1b0 [ 680.783207] SyS_ioctl+0x41/0x70 [ 680.783210] entry_SYSCALL_64_fastpath+0x1c/0xb1 [ 680.783212] RIP: 0033:0x7fc7008a7587 [ 680.783214] RSP: 002b:00007fffdb19f718 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 680.783217] RAX: ffffffffffffffda RBX: ffffffff81489a93 RCX: 00007fc7008a7587 [ 680.783218] RDX: 00007fffdb19f720 RSI: 00000000c0185879 RDI: 0000000000000004 [ 680.783220] RBP: ffffc9000011ff88 R08: 00000001502de180 R09: 0000000000000000 [ 680.783221] R10: 000000000000000a R11: 0000000000000246 R12: 00007fc700fbf250 [ 680.783222] R13: 0000000000000000 R14: 00000001502d8800 R15: 0000000000000004 [ 680.783225] ? __this_cpu_preempt_check+0x13/0x20 [ 680.783231] OOM killer enabled.
SSD is OCZ Vertex 3, firmware ver 2.25
Created attachment 257995 [details] bootlog
The bug isn't MD related so assuming 'Block Layer' component for now.
+Dave. Dave, I read some explanation from you of how fstrim could cause extensive delays. Do you think the 20s freeze during system suspend is just that, or could point to some other issue? Could changing the SSD or using discard instead prevent this?
I'm assuming the discard just takes a long time to complete. We can't suspend with in-flight IO. So this doesn't look like a bug to me, the only option we have is to wait for the IO to complete. You can potentially improve the situation here by making discards happen in smaller units. That can be done by setting: /sys/block/nvme0n1/queue/discard_max_bytes (substitute nvme0n1 for your device) to a smaller value. Bigger discards will then be chopped into pieces of discard_max_bytes in size.
(In reply to Jens Axboe from comment #5) > I'm assuming the discard just takes a long time to complete. We can't > suspend with in-flight IO. So this doesn't look like a bug to me, the only > option we have is to wait for the IO to complete. > > You can potentially improve the situation here by making discards happen in > smaller units. That can be done by setting: > > /sys/block/nvme0n1/queue/discard_max_bytes > > (substitute nvme0n1 for your device) to a smaller value. Bigger discards > will then be chopped into pieces of discard_max_bytes in size. Thanks, so I suppose this also depends on how fast the device can handle the trim/discard command. We'll try the above setting or else replacing the device, if that solves it I'm fine with closing this bug.
> Thanks, so I suppose this also depends on how fast the device can handle the > trim/discard command. We'll try the above setting or else replacing the > device, if that solves it I'm fine with closing this bug. Did the solution work for you? Can this bug be closed? Thanks.